Download excel file from web?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • patf@well.com

    Download excel file from web?

    Hi - experienced programmer but this is my first Python program.

    This URL will retrieve an excel spreadsheet containing (that day's)
    msci stock index returns.



    Want to write python to download and save the file.

    So far I've arrived at this:

    # import pdb
    import urllib2
    from win32com.client import Dispatch

    xlApp = Dispatch("Excel .Application")

    # test 1
    # xlApp.Workbooks .Add()
    # xlApp.ActiveShe et.Cells(1,1).V alue = 'A'
    # xlApp.ActiveWor kbook.ActiveShe et.Cells(2,1).V alue = 'B'
    # xlBook = xlApp.ActiveWor kbook
    # xlBook.SaveAs(F ilename='C:\\te st.xls')


    # pdb.set_trace()
    response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
    excel?
    priceLevel=0&sc ope=0&currency= 15&style=C&size =36&market=1897 &asOf=Jul
    +25%2C+2008&exp ort=Excel_IEIPe rfRegional')
    # test 2 - returns check = False
    check_for_data = urllib2.Request ('http://www.mscibarra.c om/webapp/
    indexperf/excel?
    priceLevel=0&sc ope=0&currency= 15&style=C&size =36&market=1897 &asOf=Jul
    +25%2C+2008&exp ort=Excel_IEIPe rfRegional').ha s_data()

    xlApp = response.fp
    print(response. fp.name)
    print(xlApp.nam e)
    xlApp.write
    xlApp.Close
  • patf@well.com

    #2
    Re: Download excel file from web?

    On Jul 28, 3:00 pm, "p...@well. com" <p...@well.comw rote:
    Hi - experienced programmer but this is my first Python program.
    >
    This URL will retrieve an excel spreadsheet containing (that day's)
    msci stock index returns.
    >
    http://www.mscibarra.com/webapp/inde...vel=0&scope=0&....
    >
    Want to write python to download and save the file.
    >
    So far I've arrived at this:
    >
    # import pdb
    import urllib2
    from win32com.client import Dispatch
    >
    xlApp = Dispatch("Excel .Application")
    >
    # test 1
    # xlApp.Workbooks .Add()
    # xlApp.ActiveShe et.Cells(1,1).V alue = 'A'
    # xlApp.ActiveWor kbook.ActiveShe et.Cells(2,1).V alue = 'B'
    # xlBook = xlApp.ActiveWor kbook
    # xlBook.SaveAs(F ilename='C:\\te st.xls')
    >
    # pdb.set_trace()
    response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
    excel?
    priceLevel=0&sc ope=0&currency= 15&style=C&size =36&market=1897 &asOf=Jul
    +25%2C+2008&exp ort=Excel_IEIPe rfRegional')
    # test 2 - returns check = False
    check_for_data = urllib2.Request ('http://www.mscibarra.c om/webapp/
    indexperf/excel?
    priceLevel=0&sc ope=0&currency= 15&style=C&size =36&market=1897 &asOf=Jul
    +25%2C+2008&exp ort=Excel_IEIPe rfRegional').ha s_data()
    >
    xlApp = response.fp
    print(response. fp.name)
    print(xlApp.nam e)
    xlApp.write
    xlApp.Close
    Woops hit Send when I wanted Preview. Looks like the html [quote] tag
    doesn't work from groups.google.c om (nice).

    Anway, in test 1 above, I determined how to instantiate an excel
    object; put some stuff in it; then save to disk.

    So, in theory, I'm retrieving my excel spreadsheet with

    response = urllib2.urlopen ()

    Except what then do I do with this?

    Well for one read some of the urllib2 documentation and found the
    Request class with the method has_data() on it. It returns False.
    Hmm that's not encouraging.

    I supposed the trick to understand what urllib2.urlopen is returning
    to me; rummage around in there; and hopefully find my excel file.

    I use pdb to debug. This is interesting:

    (Pdb) dir(response)
    ['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
    'code', '
    fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
    'readline', '
    readlines', 'url']
    (Pdb)

    I suppose the members with __*_ are methods; and the names without the
    underbars are attributes (variables) (?).

    Or maybe this isn't at all the right direction to take (maybe there
    are much better modules to do this stuff). Would be happy to learn if
    that's the case (and if that gets the job done for me).

    pat

    Comment

    • Diez B. Roggisch

      #3
      Re: Download excel file from web?

      patf@well.com schrieb:[QUOTE]
      On Jul 28, 3:00 pm, "p...@well. com" <p...@well.comw rote:
      >Hi - experienced programmer but this is my first Python program.
      >>
      >This URL will retrieve an excel spreadsheet containing (that day's)
      >msci stock index returns.
      >>
      >http://www.mscibarra.com/webapp/inde...vel=0&scope=0&...
      >>
      >Want to write python to download and save the file.
      >>
      >So far I've arrived at this:
      >>
      >
      ># import pdb
      >import urllib2
      >from win32com.client import Dispatch
      >>
      >xlApp = Dispatch("Excel .Application")
      >>
      ># test 1
      ># xlApp.Workbooks .Add()
      ># xlApp.ActiveShe et.Cells(1,1).V alue = 'A'
      ># xlApp.ActiveWor kbook.ActiveShe et.Cells(2,1).V alue = 'B'
      ># xlBook = xlApp.ActiveWor kbook
      ># xlBook.SaveAs(F ilename='C:\\te st.xls')
      >>
      ># pdb.set_trace()
      >response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
      >excel?
      >priceLevel=0&s cope=0&currency =15&style=C&siz e=36&market=189 7&asOf=Jul
      >+25%2C+2008&ex port=Excel_IEIP erfRegional')
      ># test 2 - returns check = False
      >check_for_da ta = urllib2.Request ('http://www.mscibarra.c om/webapp/
      >indexperf/excel?
      >priceLevel=0&s cope=0&currency =15&style=C&siz e=36&market=189 7&asOf=Jul
      >+25%2C+2008&ex port=Excel_IEIP erfRegional').h as_data()
      >>
      >xlApp = response.fp
      >print(response .fp.name)
      >print(xlApp.na me)
      >xlApp.write
      >xlApp.Close
      >
      >
      Woops hit Send when I wanted Preview. Looks like the html
      tag
      doesn't work from groups.google.c om (nice).
      >
      Anway, in test 1 above, I determined how to instantiate an excel
      object; put some stuff in it; then save to disk.
      >
      So, in theory, I'm retrieving my excel spreadsheet with
      >
      response = urllib2.urlopen ()
      >
      Except what then do I do with this?
      >
      Well for one read some of the urllib2 documentation and found the
      Request class with the method has_data() on it. It returns False.
      Hmm that's not encouraging.
      >
      I supposed the trick to understand what urllib2.urlopen is returning
      to me; rummage around in there; and hopefully find my excel file.
      >
      I use pdb to debug. This is interesting:
      >
      (Pdb) dir(response)
      ['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
      'code', '
      fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
      'readline', '
      readlines', 'url']
      (Pdb)
      >
      I suppose the members with __*_ are methods; and the names without the
      underbars are attributes (variables) (?).
      No, these are the names of all attributes and methods. read is a method,
      for example.
      Or maybe this isn't at all the right direction to take (maybe there
      are much better modules to do this stuff). Would be happy to learn if
      that's the case (and if that gets the job done for me).

      The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
      clear on this:

      """
      This function returns a file-like object with two additional methods:
      """


      And then for file-like objects:




      """
      read( [size])
      Read at most size bytes from the file (less if the read hits EOF
      before obtaining size bytes). If the size argument is negative or
      omitted, read all data until EOF is reached. The bytes are returned as a
      string object. An empty string is returned when EOF is encountered
      immediately. (For certain files, like ttys, it makes sense to continue
      reading after an EOF is hit.) Note that this method may call the
      underlying C function fread() more than once in an effort to acquire as
      close to size bytes as possible. Also note that when in non-blocking
      mode, less data than what was requested may be returned, even if no size
      parameter was given.
      """

      Diez

      Comment

      • patf@well.com

        #4
        Re: Download excel file from web?

        On Jul 28, 3:29 pm, "Diez B. Roggisch" <de...@nospam.w eb.dewrote:[QUOTE]
        p...@well.com schrieb:
        >
        >
        >
        On Jul 28, 3:00 pm, "p...@well. com" <p...@well.comw rote:
        Hi - experienced programmer but this is my first Python program.
        >
        This URL will retrieve an excel spreadsheet containing (that day's)
        msci stock index returns.
        >>
        Want to write python to download and save the file.
        >
        So far I've arrived at this:
        >
        # import pdb
        import urllib2
        from win32com.client import Dispatch
        >
        xlApp = Dispatch("Excel .Application")
        >
        # test 1
        # xlApp.Workbooks .Add()
        # xlApp.ActiveShe et.Cells(1,1).V alue = 'A'
        # xlApp.ActiveWor kbook.ActiveShe et.Cells(2,1).V alue = 'B'
        # xlBook = xlApp.ActiveWor kbook
        # xlBook.SaveAs(F ilename='C:\\te st.xls')
        >
        # pdb.set_trace()
        response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
        excel?
        priceLevel=0&sc ope=0&currency= 15&style=C&size =36&market=1897 &asOf=Jul
        +25%2C+2008&exp ort=Excel_IEIPe rfRegional')
        # test 2 - returns check = False
        check_for_data = urllib2.Request ('http://www.mscibarra.c om/webapp/
        indexperf/excel?
        priceLevel=0&sc ope=0&currency= 15&style=C&size =36&market=1897 &asOf=Jul
        +25%2C+2008&exp ort=Excel_IEIPe rfRegional').ha s_data()
        >
        xlApp = response.fp
        print(response. fp.name)
        print(xlApp.nam e)
        xlApp.write
        xlApp.Close
        >
        Woops hit Send when I wanted Preview.  Looks like the html
        tag
        doesn't work from groups.google.c om (nice).
        >
        Anway, in test 1 above, I determined how to instantiate an excel
        object; put some stuff in it; then save to disk.
        >
        So, in theory, I'm retrieving my excel spreadsheet with
        >
        response = urllib2.urlopen ()
        >
        Except what then do I do with this?
        >
        Well for one read some of the urllib2 documentation and found the
        Request class with the method has_data() on it.  It returns False.
        Hmm that's not encouraging.
        >
        I supposed the trick to understand what urllib2.urlopen is returning
        to me; rummage around in there; and hopefully find my excel file.
        >
        I use pdb to debug.  This is interesting:
        >
        (Pdb) dir(response)
        ['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
        'code', '
        fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
        'readline', '
        readlines', 'url']
        (Pdb)
        >
        I suppose the members with __*_ are methods; and the names without the
        underbars are attributes (variables) (?).
        >
        No, these are the names of all attributes and methods. read is a method,
        for example.
        right - I got it backwards.
        >
        Or maybe this isn't at all the right direction to take (maybe there
        are much better modules to do this stuff).  Would be happy to learn if
        that's the case (and if that gets the job done for me).
        >
        The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
        clear on this:
        >
        """
        This function returns a file-like object with two additional methods:
        """
        >
        And then for file-like objects:
        >

        >
        """
        read(   [size])
             Read at most size bytes from the file (less if the read hits EOF
        before obtaining size bytes). If the size argument is negative or
        omitted, read all data until EOF is reached. The bytes are returned as a
        string object. An empty string is returned when EOF is encountered
        immediately. (For certain files, like ttys, it makes sense to continue
        reading after an EOF is hit.) Note that this method may call the
        underlying C function fread() more than once in an effort to acquire as
        close to size bytes as possible. Also note that when in non-blocking
        mode, less data than what was requested may be returned, even if no size
        parameter was given.
        """
        >
        Diez
        Just stumbled upon .read:

        response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
        excel?
        priceLevel=0&sc ope=0&currency= 15&style=C&size =36&market=1897 &asOf=Jul
        +25%2C+2008&exp ort=Excel_IEIPe rfRegional').re ad

        Now the question is: what to do with this? I'll look at the
        documentation that you point to.

        thanx - pat

        Comment

        • patf@well.com

          #5
          Re: Download excel file from web?

          On Jul 28, 3:33 pm, "p...@well. com" <p...@well.comw rote:[QUOTE]
          On Jul 28, 3:29 pm, "Diez B. Roggisch" <de...@nospam.w eb.dewrote:
          >
          >
          >
          p...@well.com schrieb:
          >
          On Jul 28, 3:00 pm, "p...@well. com" <p...@well.comw rote:
          >Hi - experienced programmer but this is my first Python program.
          >
          >This URL will retrieve an excel spreadsheet containing (that day's)
          >msci stock index returns.
          >>
          >Want to write python to download and save the file.
          >
          >So far I've arrived at this:
          >
          >
          ># import pdb
          >import urllib2
          >from win32com.client import Dispatch
          >
          >xlApp = Dispatch("Excel .Application")
          >
          ># test 1
          ># xlApp.Workbooks .Add()
          ># xlApp.ActiveShe et.Cells(1,1).V alue = 'A'
          ># xlApp.ActiveWor kbook.ActiveShe et.Cells(2,1).V alue = 'B'
          ># xlBook = xlApp.ActiveWor kbook
          ># xlBook.SaveAs(F ilename='C:\\te st.xls')
          >
          ># pdb.set_trace()
          >response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
          >excel?
          >priceLevel=0&s cope=0&currency =15&style=C&siz e=36&market=189 7&asOf=Jul
          >+25%2C+2008&ex port=Excel_IEIP erfRegional')
          ># test 2 - returns check = False
          >check_for_da ta = urllib2.Request ('http://www.mscibarra.c om/webapp/
          >indexperf/excel?
          >priceLevel=0&s cope=0&currency =15&style=C&siz e=36&market=189 7&asOf=Jul
          >+25%2C+2008&ex port=Excel_IEIP erfRegional').h as_data()
          >
          >xlApp = response.fp
          >print(response .fp.name)
          >print(xlApp.na me)
          >xlApp.write
          >xlApp.Close
          >
          >
          Woops hit Send when I wanted Preview.  Looks like the html
          tag
          doesn't work from groups.google.c om (nice).
          >
          Anway, in test 1 above, I determined how to instantiate an excel
          object; put some stuff in it; then save to disk.
          >
          So, in theory, I'm retrieving my excel spreadsheet with
          >
          response = urllib2.urlopen ()
          >
          Except what then do I do with this?
          >
          Well for one read some of the urllib2 documentation and found the
          Request class with the method has_data() on it.  It returns False.
          Hmm that's not encouraging.
          >
          I supposed the trick to understand what urllib2.urlopen is returning
          to me; rummage around in there; and hopefully find my excel file.
          >
          I use pdb to debug.  This is interesting:
          >
          (Pdb) dir(response)
          ['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
          'code', '
          fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
          'readline', '
          readlines', 'url']
          (Pdb)
          >
          I suppose the members with __*_ are methods; and the names without the
          underbars are attributes (variables) (?).
          >
          No, these are the names of all attributes and methods. read is a method,
          for example.
          >
          right - I got it backwards.
          >
          >
          >
          >
          >
          Or maybe this isn't at all the right direction to take (maybe there
          are much better modules to do this stuff).  Would be happy to learnif
          that's the case (and if that gets the job done for me).
          >
          The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
          clear on this:
          >
          """
          This function returns a file-like object with two additional methods:
          """
          >
          And then for file-like objects:
          >>
          """
          read(   [size])
               Read at most size bytes from the file (less if the read hitsEOF
          before obtaining size bytes). If the size argument is negative or
          omitted, read all data until EOF is reached. The bytes are returned as a
          string object. An empty string is returned when EOF is encountered
          immediately. (For certain files, like ttys, it makes sense to continue
          reading after an EOF is hit.) Note that this method may call the
          underlying C function fread() more than once in an effort to acquire as
          close to size bytes as possible. Also note that when in non-blocking
          mode, less data than what was requested may be returned, even if no size
          parameter was given.
          """
          >
          Diez
          >
          Just stumbled upon .read:
          >
          response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
          excel?
          priceLevel=0&sc ope=0&currency= 15&style=C&size =36&market=1897 &asOf=Jul
          +25%2C+2008&exp ort=Excel_IEIPe rfRegional').re ad
          >
          Now the question is: what to do with this?  I'll look at the
          documentation that you point to.
          >
          thanx - pat
          Or rather (next iteration):

          response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
          excel?
          priceLevel=0&sc ope=0&currency= 15&style=C&size =36&market=1897 &asOf=Jul
          +25%2C+2008&exp ort=Excel_IEIPe rfRegional').re ad(1000000)

          The file is generally something like 26 KB so specifying 1,000,000
          seems like a good idea (first approximation).

          And then when I do:

          print(response)

          I get a whole lot of garbage (and some non-garbage), so I know I'm
          onto something.

          When I read the .read documentation further, it says that read() has
          returned the data as a string object. Now - how do I convince Python
          that the string object is in fact an excel file - and save it to disk?

          pat

          Comment

          • Guilherme Polo

            #6
            Re: Download excel file from web?

            On Mon, Jul 28, 2008 at 7:43 PM, patf@well.com <patf@well.comw rote:[QUOTE]
            On Jul 28, 3:33 pm, "p...@well. com" <p...@well.comw rote:
            >On Jul 28, 3:29 pm, "Diez B. Roggisch" <de...@nospam.w eb.dewrote:
            >>
            >>
            >>
            p...@well.com schrieb:
            >>
            On Jul 28, 3:00 pm, "p...@well. com" <p...@well.comw rote:
            >Hi - experienced programmer but this is my first Python program.
            >>
            >This URL will retrieve an excel spreadsheet containing (that day's)
            >msci stock index returns.
            >>>>
            >Want to write python to download and save the file.
            >>
            >So far I've arrived at this:
            >>
            >
            ># import pdb
            >import urllib2
            >from win32com.client import Dispatch
            >>
            >xlApp = Dispatch("Excel .Application")
            >>
            ># test 1
            ># xlApp.Workbooks .Add()
            ># xlApp.ActiveShe et.Cells(1,1).V alue = 'A'
            ># xlApp.ActiveWor kbook.ActiveShe et.Cells(2,1).V alue = 'B'
            ># xlBook = xlApp.ActiveWor kbook
            ># xlBook.SaveAs(F ilename='C:\\te st.xls')
            >>
            ># pdb.set_trace()
            >response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
            >excel?
            >priceLevel=0&s cope=0&currency =15&style=C&siz e=36&market=189 7&asOf=Jul
            >+25%2C+2008&ex port=Excel_IEIP erfRegional')
            ># test 2 - returns check = False
            >check_for_da ta = urllib2.Request ('http://www.mscibarra.c om/webapp/
            >indexperf/excel?
            >priceLevel=0&s cope=0&currency =15&style=C&siz e=36&market=189 7&asOf=Jul
            >+25%2C+2008&ex port=Excel_IEIP erfRegional').h as_data()
            >>
            >xlApp = response.fp
            >print(response .fp.name)
            >print(xlApp.na me)
            >xlApp.write
            >xlApp.Close
            >
            >>
            Woops hit Send when I wanted Preview. Looks like the html
            tag
            doesn't work from groups.google.c om (nice).
            >>
            Anway, in test 1 above, I determined how to instantiate an excel
            object; put some stuff in it; then save to disk.
            >>
            So, in theory, I'm retrieving my excel spreadsheet with
            >>
            response = urllib2.urlopen ()
            >>
            Except what then do I do with this?
            >>
            Well for one read some of the urllib2 documentation and found the
            Request class with the method has_data() on it. It returns False.
            Hmm that's not encouraging.
            >>
            I supposed the trick to understand what urllib2.urlopen is returning
            to me; rummage around in there; and hopefully find my excel file.
            >>
            I use pdb to debug. This is interesting:
            >>
            (Pdb) dir(response)
            ['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
            'code', '
            fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
            'readline', '
            readlines', 'url']
            (Pdb)
            >>
            I suppose the members with __*_ are methods; and the names without the
            underbars are attributes (variables) (?).
            >>
            No, these are the names of all attributes and methods. read is a method,
            for example.
            >>
            >right - I got it backwards.
            >>
            >>
            >>
            >>
            >>
            Or maybe this isn't at all the right direction to take (maybe there
            are much better modules to do this stuff). Would be happy to learn if
            that's the case (and if that gets the job done for me).
            >>
            The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
            clear on this:
            >>
            """
            This function returns a file-like object with two additional methods:
            """
            >>
            And then for file-like objects:
            >>>>
            """
            read( [size])
            Read at most size bytes from the file (less if the read hits EOF
            before obtaining size bytes). If the size argument is negative or
            omitted, read all data until EOF is reached. The bytes are returned as a
            string object. An empty string is returned when EOF is encountered
            immediately. (For certain files, like ttys, it makes sense to continue
            reading after an EOF is hit.) Note that this method may call the
            underlying C function fread() more than once in an effort to acquire as
            close to size bytes as possible. Also note that when in non-blocking
            mode, less data than what was requested may be returned, even if no size
            parameter was given.
            """
            >>
            Diez
            >>
            >Just stumbled upon .read:
            >>
            >response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
            >excel?
            >priceLevel=0&s cope=0&currency =15&style=C&siz e=36&market=189 7&asOf=Jul
            >+25%2C+2008&ex port=Excel_IEIP erfRegional').r ead
            >>
            >Now the question is: what to do with this? I'll look at the
            >documentatio n that you point to.
            >>
            >thanx - pat
            >
            Or rather (next iteration):
            >
            response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
            excel?
            priceLevel=0&sc ope=0&currency= 15&style=C&size =36&market=1897 &asOf=Jul
            +25%2C+2008&exp ort=Excel_IEIPe rfRegional').re ad(1000000)
            >
            The file is generally something like 26 KB so specifying 1,000,000
            seems like a good idea (first approximation).
            >
            And then when I do:
            >
            print(response)
            >
            I get a whole lot of garbage (and some non-garbage), so I know I'm
            onto something.
            >
            When I read the .read documentation further, it says that read() has
            returned the data as a string object. Now - how do I convince Python
            that the string object is in fact an excel file - and save it to disk?
            >
            You don't need to convince Python, just write it to a file.
            More reading for you: http://docs.python.org/tut/node9.html


            --
            -- Guilherme H. Polo Goncalves

            Comment

            • patf@well.com

              #7
              Re: Download excel file from web?

              On Jul 28, 3:52 pm, "Guilherme Polo" <ggp...@gmail.c omwrote:[QUOTE]
              On Mon, Jul 28, 2008 at 7:43 PM, p...@well.com <p...@well.comw rote:
              On Jul 28, 3:33 pm, "p...@well. com" <p...@well.comw rote:
              On Jul 28, 3:29 pm, "Diez B. Roggisch" <de...@nospam.w eb.dewrote:
              >
              p...@well.com schrieb:
              >
              On Jul 28, 3:00 pm, "p...@well. com" <p...@well.comw rote:
              >Hi - experienced programmer but this is my first Python program.
              >
              >This URL will retrieve an excel spreadsheet containing (that day's)
              >msci stock index returns.
              >>
              >Want to write python to download and save the file.
              >
              >So far I've arrived at this:
              >
              >
              ># import pdb
              >import urllib2
              >from win32com.client import Dispatch
              >
              >xlApp = Dispatch("Excel .Application")
              >
              ># test 1
              ># xlApp.Workbooks .Add()
              ># xlApp.ActiveShe et.Cells(1,1).V alue = 'A'
              ># xlApp.ActiveWor kbook.ActiveShe et.Cells(2,1).V alue = 'B'
              ># xlBook = xlApp.ActiveWor kbook
              ># xlBook.SaveAs(F ilename='C:\\te st.xls')
              >
              ># pdb.set_trace()
              >response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
              >excel?
              >priceLevel=0&s cope=0&currency =15&style=C&siz e=36&market=189 7&asOf=Jul
              >+25%2C+2008&ex port=Excel_IEIP erfRegional')
              ># test 2 - returns check = False
              >check_for_da ta = urllib2.Request ('http://www.mscibarra.c om/webapp/
              >indexperf/excel?
              >priceLevel=0&s cope=0&currency =15&style=C&siz e=36&market=189 7&asOf=Jul
              >+25%2C+2008&ex port=Excel_IEIP erfRegional').h as_data()
              >
              >xlApp = response.fp
              >print(response .fp.name)
              >print(xlApp.na me)
              >xlApp.write
              >xlApp.Close
              >
              >
              Woops hit Send when I wanted Preview.  Looks like the html
              tag
              doesn't work from groups.google.c om (nice).
              >
              Anway, in test 1 above, I determined how to instantiate an excel
              object; put some stuff in it; then save to disk.
              >
              So, in theory, I'm retrieving my excel spreadsheet with
              >
              response = urllib2.urlopen ()
              >
              Except what then do I do with this?
              >
              Well for one read some of the urllib2 documentation and found the
              Request class with the method has_data() on it.  It returns False.
              Hmm that's not encouraging.
              >
              I supposed the trick to understand what urllib2.urlopen is returning
              to me; rummage around in there; and hopefully find my excel file.
              >
              I use pdb to debug.  This is interesting:
              >
              (Pdb) dir(response)
              ['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
              'code', '
              fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
              'readline', '
              readlines', 'url']
              (Pdb)
              >
              I suppose the members with __*_ are methods; and the names withoutthe
              underbars are attributes (variables) (?).
              >
              No, these are the names of all attributes and methods. read is a method,
              for example.
              >
              right - I got it backwards.
              >
              Or maybe this isn't at all the right direction to take (maybe there
              are much better modules to do this stuff).  Would be happy to learn if
              that's the case (and if that gets the job done for me).
              >
              The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
              clear on this:
              >
              """
              This function returns a file-like object with two additional methods:
              """
              >
              And then for file-like objects:
              >>
              """
              read(   [size])
                   Read at most size bytes from the file (less if the read hits EOF
              before obtaining size bytes). If the size argument is negative or
              omitted, read all data until EOF is reached. The bytes are returned as a
              string object. An empty string is returned when EOF is encountered
              immediately. (For certain files, like ttys, it makes sense to continue
              reading after an EOF is hit.) Note that this method may call the
              underlying C function fread() more than once in an effort to acquireas
              close to size bytes as possible. Also note that when in non-blocking
              mode, less data than what was requested may be returned, even if no size
              parameter was given.
              """
              >
              Diez
              >
              Just stumbled upon .read:
              >
              response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
              excel?
              priceLevel=0&sc ope=0&currency= 15&style=C&size =36&market=1897 &asOf=Jul
              +25%2C+2008&exp ort=Excel_IEIPe rfRegional').re ad
              >
              Now the question is: what to do with this?  I'll look at the
              documentation that you point to.
              >
              thanx - pat
              >
              Or rather (next iteration):
              >
              response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
              excel?
              priceLevel=0&sc ope=0&currency= 15&style=C&size =36&market=1897 &asOf=Jul
              +25%2C+2008&exp ort=Excel_IEIPe rfRegional').re ad(1000000)
              >
              The file is generally something like 26 KB so specifying 1,000,000
              seems like a good idea (first approximation).
              >
              And then when I do:
              >
              print(response)
              >
              I get a whole lot of garbage (and some non-garbage), so I know I'm
              onto something.
              >
              When I read the .read documentation further, it says that read() has
              returned the data as a string object.  Now - how do I convince Python
              that the string object is in fact an excel file - and save it to disk?
              >
              You don't need to convince Python, just write it to a file.
              More reading for you:http://docs.python.org/tut/node9.html
              >>
              --
              -- Guilherme H. Polo Goncalves
              OK:

              response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
              excel?
              priceLevel=0&sc ope=0&currency= 15&style=C&size =36&market=1897 &asOf=Jul
              +25%2C+2008&exp ort=Excel_IEIPe rfRegional').re ad(1000000)
              # print(response)
              f = open("c:\\msci. xls",'w')
              f.write(respons e)

              OK this makes the file, and there's a c:\msci.xls in place and it's
              about the right size. But whether I make the second param to open 'w'
              or 'wb', when I try to open msci.xls from the Windows file explorer,
              excel tells me that the file is corrupted.

              pat

              Comment

              • patf@well.com

                #8
                Re: Download excel file from web?

                On Jul 28, 4:04 pm, "p...@well. com" <p...@well.comw rote:[QUOTE]
                On Jul 28, 3:52 pm, "Guilherme Polo" <ggp...@gmail.c omwrote:
                >
                >
                >
                On Mon, Jul 28, 2008 at 7:43 PM, p...@well.com <p...@well.comw rote:
                On Jul 28, 3:33 pm, "p...@well. com" <p...@well.comw rote:
                >On Jul 28, 3:29 pm, "Diez B. Roggisch" <de...@nospam.w eb.dewrote:
                >
                p...@well.com schrieb:
                >
                On Jul 28, 3:00 pm, "p...@well. com" <p...@well.comw rote:
                >Hi - experienced programmer but this is my first Python program..
                >
                >This URL will retrieve an excel spreadsheet containing (that day's)
                >msci stock index returns.
                >>
                >Want to write python to download and save the file.
                >
                >So far I've arrived at this:
                >
                >
                ># import pdb
                >import urllib2
                >from win32com.client import Dispatch
                >
                >xlApp = Dispatch("Excel .Application")
                >
                ># test 1
                ># xlApp.Workbooks .Add()
                ># xlApp.ActiveShe et.Cells(1,1).V alue = 'A'
                ># xlApp.ActiveWor kbook.ActiveShe et.Cells(2,1).V alue = 'B'
                ># xlBook = xlApp.ActiveWor kbook
                ># xlBook.SaveAs(F ilename='C:\\te st.xls')
                >
                ># pdb.set_trace()
                >response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
                >excel?
                >priceLevel=0&s cope=0&currency =15&style=C&siz e=36&market=189 7&asOf=Jul
                >+25%2C+2008&ex port=Excel_IEIP erfRegional')
                ># test 2 - returns check = False
                >check_for_da ta = urllib2.Request ('http://www.mscibarra.c om/webapp/
                >indexperf/excel?
                >priceLevel=0&s cope=0&currency =15&style=C&siz e=36&market=189 7&asOf=Jul
                >+25%2C+2008&ex port=Excel_IEIP erfRegional').h as_data()
                >
                >xlApp = response.fp
                >print(response .fp.name)
                >print(xlApp.na me)
                >xlApp.write
                >xlApp.Close
                >
                >
                Woops hit Send when I wanted Preview.  Looks like the html
                tag
                doesn't work from groups.google.c om (nice).
                >
                Anway, in test 1 above, I determined how to instantiate an excel
                object; put some stuff in it; then save to disk.
                >
                So, in theory, I'm retrieving my excel spreadsheet with
                >
                response = urllib2.urlopen ()
                >
                Except what then do I do with this?
                >
                Well for one read some of the urllib2 documentation and found the
                Request class with the method has_data() on it.  It returns False.
                Hmm that's not encouraging.
                >
                I supposed the trick to understand what urllib2.urlopen is returning
                to me; rummage around in there; and hopefully find my excel file..
                >
                I use pdb to debug.  This is interesting:
                >
                (Pdb) dir(response)
                ['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
                'code', '
                fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
                'readline', '
                readlines', 'url']
                (Pdb)
                >
                I suppose the members with __*_ are methods; and the names without the
                underbars are attributes (variables) (?).
                >
                No, these are the names of all attributes and methods. read is a method,
                for example.
                >
                >right - I got it backwards.
                >
                Or maybe this isn't at all the right direction to take (maybe there
                are much better modules to do this stuff).  Would be happy to learn if
                that's the case (and if that gets the job done for me).
                >
                The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
                clear on this:
                >
                """
                This function returns a file-like object with two additional methods:
                """
                >
                And then for file-like objects:
                >>
                """
                read(   [size])
                     Read at most size bytes from the file (less if the readhits EOF
                before obtaining size bytes). If the size argument is negative or
                omitted, read all data until EOF is reached. The bytes are returned as a
                string object. An empty string is returned when EOF is encountered
                immediately. (For certain files, like ttys, it makes sense to continue
                reading after an EOF is hit.) Note that this method may call the
                underlying C function fread() more than once in an effort to acquire as
                close to size bytes as possible. Also note that when in non-blocking
                mode, less data than what was requested may be returned, even if no size
                parameter was given.
                """
                >
                Diez
                >
                >Just stumbled upon .read:
                >
                >response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
                >excel?
                >priceLevel=0&s cope=0&currency =15&style=C&siz e=36&market=189 7&asOf=Jul
                >+25%2C+2008&ex port=Excel_IEIP erfRegional').r ead
                >
                >Now the question is: what to do with this?  I'll look at the
                >documentatio n that you point to.
                >
                >thanx - pat
                >
                Or rather (next iteration):
                >
                response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
                excel?
                priceLevel=0&sc ope=0&currency= 15&style=C&size =36&market=1897 &asOf=Jul
                +25%2C+2008&exp ort=Excel_IEIPe rfRegional').re ad(1000000)
                >
                The file is generally something like 26 KB so specifying 1,000,000
                seems like a good idea (first approximation).
                >
                And then when I do:
                >
                print(response)
                >
                I get a whole lot of garbage (and some non-garbage), so I know I'm
                onto something.
                >
                When I read the .read documentation further, it says that read() has
                returned the data as a string object.  Now - how do I convince Python
                that the string object is in fact an excel file - and save it to disk?
                >
                You don't need to convince Python, just write it to a file.
                More reading for you:http://docs.python.org/tut/node9.html
                >>
                --
                -- Guilherme H. Polo Goncalves
                >
                OK:
                >
                response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
                excel?
                priceLevel=0&sc ope=0&currency= 15&style=C&size =36&market=1897 &asOf=Jul
                +25%2C+2008&exp ort=Excel_IEIPe rfRegional').re ad(1000000)
                # print(response)
                f = open("c:\\msci. xls",'w')
                f.write(respons e)
                >
                OK this makes the file, and there's a c:\msci.xls in place and it's
                about the right size. But whether I make the second param to open 'w'
                or 'wb', when I try to open msci.xls from the Windows file explorer,
                excel tells me that the file is corrupted.
                >
                pat
                Nope - must have been stumbling over my own feet.

                'wb' _is_ necessary (as I would expect).

                So it works:

                # pdb.set_trace()
                response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
                excel?
                priceLevel=0&sc ope=0&currency= 15&style=C&size =36&market=1897 &asOf=Jul
                +25%2C+2008&exp ort=Excel_IEIPe rfRegional').re ad(1000000)
                # print(response)
                f = open("c:\\msci. xls",'wb')
                f.write(respons e)
                f.flush
                f.close

                I know the f.flush and f.close are redundant - in the sense that both
                flush the contents to disk. So I can probably just take out the
                f.flush.

                Thanx for the help.

                pat

                Comment

                • Guilherme Polo

                  #9
                  Re: Download excel file from web?

                  On Mon, Jul 28, 2008 at 8:04 PM, patf@well.com <patf@well.comw rote:[QUOTE]
                  On Jul 28, 3:52 pm, "Guilherme Polo" <ggp...@gmail.c omwrote:
                  >On Mon, Jul 28, 2008 at 7:43 PM, p...@well.com <p...@well.comw rote:
                  On Jul 28, 3:33 pm, "p...@well. com" <p...@well.comw rote:
                  >On Jul 28, 3:29 pm, "Diez B. Roggisch" <de...@nospam.w eb.dewrote:
                  >>
                  p...@well.com schrieb:
                  >>
                  On Jul 28, 3:00 pm, "p...@well. com" <p...@well.comw rote:
                  >Hi - experienced programmer but this is my first Python program.
                  >>
                  >This URL will retrieve an excel spreadsheet containing (that day's)
                  >msci stock index returns.
                  >>>>
                  >Want to write python to download and save the file.
                  >>
                  >So far I've arrived at this:
                  >>
                  >
                  ># import pdb
                  >import urllib2
                  >from win32com.client import Dispatch
                  >>
                  >xlApp = Dispatch("Excel .Application")
                  >>
                  ># test 1
                  ># xlApp.Workbooks .Add()
                  ># xlApp.ActiveShe et.Cells(1,1).V alue = 'A'
                  ># xlApp.ActiveWor kbook.ActiveShe et.Cells(2,1).V alue = 'B'
                  ># xlBook = xlApp.ActiveWor kbook
                  ># xlBook.SaveAs(F ilename='C:\\te st.xls')
                  >>
                  ># pdb.set_trace()
                  >response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
                  >excel?
                  >priceLevel=0&s cope=0&currency =15&style=C&siz e=36&market=189 7&asOf=Jul
                  >+25%2C+2008&ex port=Excel_IEIP erfRegional')
                  ># test 2 - returns check = False
                  >check_for_da ta = urllib2.Request ('http://www.mscibarra.c om/webapp/
                  >indexperf/excel?
                  >priceLevel=0&s cope=0&currency =15&style=C&siz e=36&market=189 7&asOf=Jul
                  >+25%2C+2008&ex port=Excel_IEIP erfRegional').h as_data()
                  >>
                  >xlApp = response.fp
                  >print(response .fp.name)
                  >print(xlApp.na me)
                  >xlApp.write
                  >xlApp.Close
                  >
                  >>
                  Woops hit Send when I wanted Preview. Looks like the html
                  tag
                  doesn't work from groups.google.c om (nice).
                  >>
                  Anway, in test 1 above, I determined how to instantiate an excel
                  object; put some stuff in it; then save to disk.
                  >>
                  So, in theory, I'm retrieving my excel spreadsheet with
                  >>
                  response = urllib2.urlopen ()
                  >>
                  Except what then do I do with this?
                  >>
                  Well for one read some of the urllib2 documentation and found the
                  Request class with the method has_data() on it. It returns False.
                  Hmm that's not encouraging.
                  >>
                  I supposed the trick to understand what urllib2.urlopen is returning
                  to me; rummage around in there; and hopefully find my excel file.
                  >>
                  I use pdb to debug. This is interesting:
                  >>
                  (Pdb) dir(response)
                  ['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
                  'code', '
                  fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
                  'readline', '
                  readlines', 'url']
                  (Pdb)
                  >>
                  I suppose the members with __*_ are methods; and the names without the
                  underbars are attributes (variables) (?).
                  >>
                  No, these are the names of all attributes and methods. read is a method,
                  for example.
                  >>
                  >right - I got it backwards.
                  >>
                  Or maybe this isn't at all the right direction to take (maybe there
                  are much better modules to do this stuff). Would be happy to learn if
                  that's the case (and if that gets the job done for me).
                  >>
                  The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
                  clear on this:
                  >>
                  """
                  This function returns a file-like object with two additional methods:
                  """
                  >>
                  And then for file-like objects:
                  >>>>
                  """
                  read( [size])
                  Read at most size bytes from the file (less if the read hits EOF
                  before obtaining size bytes). If the size argument is negative or
                  omitted, read all data until EOF is reached. The bytes are returned as a
                  string object. An empty string is returned when EOF is encountered
                  immediately. (For certain files, like ttys, it makes sense to continue
                  reading after an EOF is hit.) Note that this method may call the
                  underlying C function fread() more than once in an effort to acquire as
                  close to size bytes as possible. Also note that when in non-blocking
                  mode, less data than what was requested may be returned, even if no size
                  parameter was given.
                  """
                  >>
                  Diez
                  >>
                  >Just stumbled upon .read:
                  >>
                  >response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
                  >excel?
                  >priceLevel=0&s cope=0&currency =15&style=C&siz e=36&market=189 7&asOf=Jul
                  >+25%2C+2008&ex port=Excel_IEIP erfRegional').r ead
                  >>
                  >Now the question is: what to do with this? I'll look at the
                  >documentatio n that you point to.
                  >>
                  >thanx - pat
                  >>
                  Or rather (next iteration):
                  >>
                  response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
                  excel?
                  priceLevel=0&sc ope=0&currency= 15&style=C&size =36&market=1897 &asOf=Jul
                  +25%2C+2008&exp ort=Excel_IEIPe rfRegional').re ad(1000000)
                  >>
                  The file is generally something like 26 KB so specifying 1,000,000
                  seems like a good idea (first approximation).
                  >>
                  And then when I do:
                  >>
                  print(response)
                  >>
                  I get a whole lot of garbage (and some non-garbage), so I know I'm
                  onto something.
                  >>
                  When I read the .read documentation further, it says that read() has
                  returned the data as a string object. Now - how do I convince Python
                  that the string object is in fact an excel file - and save it to disk?
                  >>
                  >You don't need to convince Python, just write it to a file.
                  >More reading for you:http://docs.python.org/tut/node9.html
                  >>>>
                  >--
                  >-- Guilherme H. Polo Goncalves
                  >
                  OK:
                  >
                  response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
                  excel?
                  priceLevel=0&sc ope=0&currency= 15&style=C&size =36&market=1897 &asOf=Jul
                  +25%2C+2008&exp ort=Excel_IEIPe rfRegional').re ad(1000000)
                  # print(response)
                  f = open("c:\\msci. xls",'w')
                  f.write(respons e)
                  I would initially change that to:

                  response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/excel?priceLeve l=0&scope=0&cur rency=15&style= C&size=36&marke t=1897&asOf=Jul +25%2C+2008&exp ort=Excel_IEIPe rfRegional')

                  f = open("c:\\msci. xls", "wb")
                  for line in response:
                  f.write(line)
                  f.close()

                  and then..
                  >
                  OK this makes the file, and there's a c:\msci.xls in place and it's
                  about the right size. But whether I make the second param to open 'w'
                  or 'wb', when I try to open msci.xls from the Windows file explorer,
                  excel tells me that the file is corrupted.
                  try it.


                  --
                  -- Guilherme H. Polo Goncalves

                  Comment

                  • patf@well.com

                    #10
                    Re: Download excel file from web?

                    On Jul 28, 4:20 pm, "Guilherme Polo" <ggp...@gmail.c omwrote:[QUOTE]
                    On Mon, Jul 28, 2008 at 8:04 PM, p...@well.com <p...@well.comw rote:
                    On Jul 28, 3:52 pm, "Guilherme Polo" <ggp...@gmail.c omwrote:
                    On Mon, Jul 28, 2008 at 7:43 PM, p...@well.com <p...@well.comw rote:
                    On Jul 28, 3:33 pm, "p...@well. com" <p...@well.comw rote:
                    On Jul 28, 3:29 pm, "Diez B. Roggisch" <de...@nospam.w eb.dewrote:
                    >
                    p...@well.com schrieb:
                    >
                    On Jul 28, 3:00 pm, "p...@well. com" <p...@well.comw rote:
                    >Hi - experienced programmer but this is my first Python program.
                    >
                    >This URL will retrieve an excel spreadsheet containing (that day's)
                    >msci stock index returns.
                    >>
                    >Want to write python to download and save the file.
                    >
                    >So far I've arrived at this:
                    >
                    >
                    ># import pdb
                    >import urllib2
                    >from win32com.client import Dispatch
                    >
                    >xlApp = Dispatch("Excel .Application")
                    >
                    ># test 1
                    ># xlApp.Workbooks .Add()
                    ># xlApp.ActiveShe et.Cells(1,1).V alue = 'A'
                    ># xlApp.ActiveWor kbook.ActiveShe et.Cells(2,1).V alue = 'B'
                    ># xlBook = xlApp.ActiveWor kbook
                    ># xlBook.SaveAs(F ilename='C:\\te st.xls')
                    >
                    ># pdb.set_trace()
                    >response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
                    >excel?
                    >priceLevel=0&s cope=0&currency =15&style=C&siz e=36&market=189 7&asOf=Jul
                    >+25%2C+2008&ex port=Excel_IEIP erfRegional')
                    ># test 2 - returns check = False
                    >check_for_da ta = urllib2.Request ('http://www.mscibarra.c om/webapp/
                    >indexperf/excel?
                    >priceLevel=0&s cope=0&currency =15&style=C&siz e=36&market=189 7&asOf=Jul
                    >+25%2C+2008&ex port=Excel_IEIP erfRegional').h as_data()
                    >
                    >xlApp = response.fp
                    >print(response .fp.name)
                    >print(xlApp.na me)
                    >xlApp.write
                    >xlApp.Close
                    >
                    >
                    Woops hit Send when I wanted Preview.  Looks like the html
                    tag
                    doesn't work from groups.google.c om (nice).
                    >
                    Anway, in test 1 above, I determined how to instantiate an excel
                    object; put some stuff in it; then save to disk.
                    >
                    So, in theory, I'm retrieving my excel spreadsheet with
                    >
                    response = urllib2.urlopen ()
                    >
                    Except what then do I do with this?
                    >
                    Well for one read some of the urllib2 documentation and found the
                    Request class with the method has_data() on it.  It returns False.
                    Hmm that's not encouraging.
                    >
                    I supposed the trick to understand what urllib2.urlopen is returning
                    to me; rummage around in there; and hopefully find my excel file.
                    >
                    I use pdb to debug.  This is interesting:
                    >
                    (Pdb) dir(response)
                    ['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
                    'code', '
                    fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
                    'readline', '
                    readlines', 'url']
                    (Pdb)
                    >
                    I suppose the members with __*_ are methods; and the names without the
                    underbars are attributes (variables) (?).
                    >
                    No, these are the names of all attributes and methods. read is a method,
                    for example.
                    >
                    right - I got it backwards.
                    >
                    Or maybe this isn't at all the right direction to take (maybe there
                    are much better modules to do this stuff).  Would be happy tolearn if
                    that's the case (and if that gets the job done for me).
                    >
                    The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
                    clear on this:
                    >
                    """
                    This function returns a file-like object with two additional methods:
                    """
                    >
                    And then for file-like objects:
                    >>
                    """
                    read(   [size])
                         Read at most size bytes from the file (less if the read hits EOF
                    before obtaining size bytes). If the size argument is negative or
                    omitted, read all data until EOF is reached. The bytes are returned as a
                    string object. An empty string is returned when EOF is encountered
                    immediately. (For certain files, like ttys, it makes sense to continue
                    reading after an EOF is hit.) Note that this method may call the
                    underlying C function fread() more than once in an effort to acquire as
                    close to size bytes as possible. Also note that when in non-blocking
                    mode, less data than what was requested may be returned, even if no size
                    parameter was given.
                    """
                    >
                    Diez
                    >
                    Just stumbled upon .read:
                    >
                    response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
                    excel?
                    priceLevel=0&sc ope=0&currency= 15&style=C&size =36&market=1897 &asOf=Jul
                    +25%2C+2008&exp ort=Excel_IEIPe rfRegional').re ad
                    >
                    Now the question is: what to do with this?  I'll look at the
                    documentation that you point to.
                    >
                    thanx - pat
                    >
                    Or rather (next iteration):
                    >
                    response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
                    excel?
                    priceLevel=0&sc ope=0&currency= 15&style=C&size =36&market=1897 &asOf=Jul
                    +25%2C+2008&exp ort=Excel_IEIPe rfRegional').re ad(1000000)
                    >
                    The file is generally something like 26 KB so specifying 1,000,000
                    seems like a good idea (first approximation).
                    >
                    And then when I do:
                    >
                    print(response)
                    >
                    I get a whole lot of garbage (and some non-garbage), so I know I'm
                    onto something.
                    >
                    When I read the .read documentation further, it says that read() has
                    returned the data as a string object.  Now - how do I convince Python
                    that the string object is in fact an excel file - and save it to disk?
                    >
                    You don't need to convince Python, just write it to a file.
                    More reading for you:http://docs.python.org/tut/node9.html
                    >>
                    --
                    -- Guilherme H. Polo Goncalves
                    >
                    OK:
                    >
                    response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
                    excel?
                    priceLevel=0&sc ope=0&currency= 15&style=C&size =36&market=1897 &asOf=Jul
                    +25%2C+2008&exp ort=Excel_IEIPe rfRegional').re ad(1000000)
                    # print(response)
                    f = open("c:\\msci. xls",'w')
                    f.write(respons e)
                    >
                    I would initially change that to:
                    >
                    response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/excel?priceLeve l=0&scope=0&... )
                    >
                    f = open("c:\\msci. xls", "wb")
                    for line in response:
                        f.write(line)
                    f.close()
                    >
                    and then..
                    >
                    >
                    >
                    OK this makes the file, and there's a c:\msci.xls in place and it's
                    about the right size. But whether I make the second param to open 'w'
                    or 'wb', when I try to open msci.xls from the Windows file explorer,
                    excel tells me that the file is corrupted.
                    >
                    try it.
                    >
                    >
                    >>
                    --
                    -- Guilherme H. Polo Goncalves
                    A simple f.write(respons e) does work (click on a single row in Excel
                    and you get a single row).

                    But I can see that what you recommend Guilherme is probably safer -
                    thanx.

                    pat

                    Comment

                    • MRAB

                      #11
                      Re: Download excel file from web?

                      On Jul 29, 12:41 am, "p...@well. com" <p...@well.comw rote:[QUOTE]
                      On Jul 28, 4:20 pm, "Guilherme Polo" <ggp...@gmail.c omwrote:
                      >
                      >
                      >
                      On Mon, Jul 28, 2008 at 8:04 PM, p...@well.com <p...@well.comw rote:
                      On Jul 28, 3:52 pm, "Guilherme Polo" <ggp...@gmail.c omwrote:
                      >On Mon, Jul 28, 2008 at 7:43 PM, p...@well.com <p...@well.comw rote:
                      On Jul 28, 3:33 pm, "p...@well. com" <p...@well.comw rote:
                      >On Jul 28, 3:29 pm, "Diez B. Roggisch" <de...@nospam.w eb.dewrote:
                      >
                      p...@well.com schrieb:
                      >
                      On Jul 28, 3:00 pm, "p...@well. com" <p...@well.comw rote:
                      >Hi - experienced programmer but this is my first Python program.
                      >
                      >This URL will retrieve an excel spreadsheet containing (thatday's)
                      >msci stock index returns.
                      >>
                      >Want to write python to download and save the file.
                      >
                      >So far I've arrived at this:
                      >
                      >
                      ># import pdb
                      >import urllib2
                      >from win32com.client import Dispatch
                      >
                      >xlApp = Dispatch("Excel .Application")
                      >
                      ># test 1
                      ># xlApp.Workbooks .Add()
                      ># xlApp.ActiveShe et.Cells(1,1).V alue = 'A'
                      ># xlApp.ActiveWor kbook.ActiveShe et.Cells(2,1).V alue = 'B'
                      ># xlBook = xlApp.ActiveWor kbook
                      ># xlBook.SaveAs(F ilename='C:\\te st.xls')
                      >
                      ># pdb.set_trace()
                      >response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
                      >excel?
                      >priceLevel=0&s cope=0&currency =15&style=C&siz e=36&market=189 7&asOf=Jul
                      >+25%2C+2008&ex port=Excel_IEIP erfRegional')
                      ># test 2 - returns check = False
                      >check_for_da ta = urllib2.Request ('http://www.mscibarra.c om/webapp/
                      >indexperf/excel?
                      >priceLevel=0&s cope=0&currency =15&style=C&siz e=36&market=189 7&asOf=Jul
                      >+25%2C+2008&ex port=Excel_IEIP erfRegional').h as_data()
                      >
                      >xlApp = response.fp
                      >print(response .fp.name)
                      >print(xlApp.na me)
                      >xlApp.write
                      >xlApp.Close
                      >
                      >
                      Woops hit Send when I wanted Preview.  Looks like the html
                      tag
                      doesn't work from groups.google.c om (nice).
                      >
                      Anway, in test 1 above, I determined how to instantiate an excel
                      object; put some stuff in it; then save to disk.
                      >
                      So, in theory, I'm retrieving my excel spreadsheet with
                      >
                      response = urllib2.urlopen ()
                      >
                      Except what then do I do with this?
                      >
                      Well for one read some of the urllib2 documentation and foundthe
                      Request class with the method has_data() on it.  It returnsFalse.
                      Hmm that's not encouraging.
                      >
                      I supposed the trick to understand what urllib2.urlopen is returning
                      to me; rummage around in there; and hopefully find my excel file.
                      >
                      I use pdb to debug.  This is interesting:
                      >
                      (Pdb) dir(response)
                      ['__doc__', '__init__', '__iter__', '__module__', '__repr__','clo se',
                      'code', '
                      fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
                      'readline', '
                      readlines', 'url']
                      (Pdb)
                      >
                      I suppose the members with __*_ are methods; and the names without the
                      underbars are attributes (variables) (?).
                      >
                      No, these are the names of all attributes and methods. read is a method,
                      for example.
                      >
                      >right - I got it backwards.
                      >
                      Or maybe this isn't at all the right direction to take (maybethere
                      are much better modules to do this stuff).  Would be happy to learn if
                      that's the case (and if that gets the job done for me).
                      >
                      The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
                      clear on this:
                      >
                      """
                      This function returns a file-like object with two additional methods:
                      """
                      >
                      And then for file-like objects:
                      >>
                      """
                      read(   [size])
                           Read at most size bytes from the file (less if the read hits EOF
                      before obtaining size bytes). If the size argument is negative or
                      omitted, read all data until EOF is reached. The bytes are returned as a
                      string object. An empty string is returned when EOF is encountered
                      immediately. (For certain files, like ttys, it makes sense to continue
                      reading after an EOF is hit.) Note that this method may call the
                      underlying C function fread() more than once in an effort to acquire as
                      close to size bytes as possible. Also note that when in non-blocking
                      mode, less data than what was requested may be returned, even if no size
                      parameter was given.
                      """
                      >
                      Diez
                      >
                      >Just stumbled upon .read:
                      >
                      >response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
                      >excel?
                      >priceLevel=0&s cope=0&currency =15&style=C&siz e=36&market=189 7&asOf=Jul
                      >+25%2C+2008&ex port=Excel_IEIP erfRegional').r ead
                      >
                      >Now the question is: what to do with this?  I'll look at the
                      >documentatio n that you point to.
                      >
                      >thanx - pat
                      >
                      Or rather (next iteration):
                      >
                      response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
                      excel?
                      priceLevel=0&sc ope=0&currency= 15&style=C&size =36&market=1897 &asOf=Jul
                      +25%2C+2008&exp ort=Excel_IEIPe rfRegional').re ad(1000000)
                      >
                      The file is generally something like 26 KB so specifying 1,000,000
                      seems like a good idea (first approximation).
                      >
                      And then when I do:
                      >
                      print(response)
                      >
                      I get a whole lot of garbage (and some non-garbage), so I know I'm
                      onto something.
                      >
                      When I read the .read documentation further, it says that read() has
                      returned the data as a string object.  Now - how do I convince Python
                      that the string object is in fact an excel file - and save it to disk?
                      >
                      >You don't need to convince Python, just write it to a file.
                      >More reading for you:http://docs.python.org/tut/node9.html
                      >>
                      >--
                      >-- Guilherme H. Polo Goncalves
                      >
                      OK:
                      >
                      response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
                      excel?
                      priceLevel=0&sc ope=0&currency= 15&style=C&size =36&market=1897 &asOf=Jul
                      +25%2C+2008&exp ort=Excel_IEIPe rfRegional').re ad(1000000)
                      # print(response)
                      f = open("c:\\msci. xls",'w')
                      f.write(respons e)
                      >
                      I would initially change that to:
                      >
                      response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/excel?priceLeve l=0&scope=0&... )
                      >
                      f = open("c:\\msci. xls", "wb")
                      for line in response:
                          f.write(line)
                      f.close()
                      >
                      and then..
                      >
                      OK this makes the file, and there's a c:\msci.xls in place and it's
                      about the right size. But whether I make the second param to open 'w'
                      or 'wb', when I try to open msci.xls from the Windows file explorer,
                      excel tells me that the file is corrupted.
                      >
                      try it.
                      >>
                      --
                      -- Guilherme H. Polo Goncalves
                      >
                      A simple f.write(respons e) does work (click on a single row in Excel
                      and you get a single row).
                      >
                      But I can see that what you recommend Guilherme is probably safer -
                      thanx.
                      >
                      pat
                      If response contains a string then:

                      for line in response:
                      f.write(line)

                      will actually be writing the string one character at a time!

                      Comment

                      • patf@well.com

                        #12
                        Re: Download excel file from web?

                        On Jul 28, 5:39 pm, MRAB <goo...@mrabarn ett.plus.comwro te:[QUOTE]
                        On Jul 29, 12:41 am, "p...@well. com" <p...@well.comw rote:
                        >
                        >
                        >
                        On Jul 28, 4:20 pm, "Guilherme Polo" <ggp...@gmail.c omwrote:
                        >
                        On Mon, Jul 28, 2008 at 8:04 PM, p...@well.com <p...@well.comw rote:
                        On Jul 28, 3:52 pm, "Guilherme Polo" <ggp...@gmail.c omwrote:
                        On Mon, Jul 28, 2008 at 7:43 PM, p...@well.com <p...@well.comw rote:
                        On Jul 28, 3:33 pm, "p...@well. com" <p...@well.comw rote:
                        On Jul 28, 3:29 pm, "Diez B. Roggisch" <de...@nospam.w eb.dewrote:
                        >
                        p...@well.com schrieb:
                        >
                        On Jul 28, 3:00 pm, "p...@well. com" <p...@well.comw rote:
                        >Hi - experienced programmer but this is my first Python program.
                        >
                        >This URL will retrieve an excel spreadsheet containing (that day's)
                        >msci stock index returns.
                        >>
                        >Want to write python to download and save the file.
                        >
                        >So far I've arrived at this:
                        >
                        >
                        ># import pdb
                        >import urllib2
                        >from win32com.client import Dispatch
                        >
                        >xlApp = Dispatch("Excel .Application")
                        >
                        ># test 1
                        ># xlApp.Workbooks .Add()
                        ># xlApp.ActiveShe et.Cells(1,1).V alue = 'A'
                        ># xlApp.ActiveWor kbook.ActiveShe et.Cells(2,1).V alue = 'B'
                        ># xlBook = xlApp.ActiveWor kbook
                        ># xlBook.SaveAs(F ilename='C:\\te st.xls')
                        >
                        ># pdb.set_trace()
                        >response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
                        >excel?
                        >priceLevel=0&s cope=0&currency =15&style=C&siz e=36&market=189 7&asOf=Jul
                        >+25%2C+2008&ex port=Excel_IEIP erfRegional')
                        ># test 2 - returns check = False
                        >check_for_da ta = urllib2.Request ('http://www.mscibarra.c om/webapp/
                        >indexperf/excel?
                        >priceLevel=0&s cope=0&currency =15&style=C&siz e=36&market=189 7&asOf=Jul
                        >+25%2C+2008&ex port=Excel_IEIP erfRegional').h as_data()
                        >
                        >xlApp = response.fp
                        >print(response .fp.name)
                        >print(xlApp.na me)
                        >xlApp.write
                        >xlApp.Close
                        >
                        >
                        Woops hit Send when I wanted Preview.  Looks like the html
                        tag
                        doesn't work from groups.google.c om (nice).
                        >
                        Anway, in test 1 above, I determined how to instantiate an excel
                        object; put some stuff in it; then save to disk.
                        >
                        So, in theory, I'm retrieving my excel spreadsheet with
                        >
                        response = urllib2.urlopen ()
                        >
                        Except what then do I do with this?
                        >
                        Well for one read some of the urllib2 documentation and found the
                        Request class with the method has_data() on it.  It returns False.
                        Hmm that's not encouraging.
                        >
                        I supposed the trick to understand what urllib2.urlopen is returning
                        to me; rummage around in there; and hopefully find my excelfile.
                        >
                        I use pdb to debug.  This is interesting:
                        >
                        (Pdb) dir(response)
                        ['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
                        'code', '
                        fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
                        'readline', '
                        readlines', 'url']
                        (Pdb)
                        >
                        I suppose the members with __*_ are methods; and the names without the
                        underbars are attributes (variables) (?).
                        >
                        No, these are the names of all attributes and methods. read is a method,
                        for example.
                        >
                        right - I got it backwards.
                        >
                        Or maybe this isn't at all the right direction to take (maybe there
                        are much better modules to do this stuff).  Would be happy to learn if
                        that's the case (and if that gets the job done for me).
                        >
                        The docs (http://docs.python.org/lib/module-urllib2.html) arepretty
                        clear on this:
                        >
                        """
                        This function returns a file-like object with two additional methods:
                        """
                        >
                        And then for file-like objects:
                        >>
                        """
                        read(   [size])
                             Read at most size bytes from the file (less if theread hits EOF
                        before obtaining size bytes). If the size argument is negative or
                        omitted, read all data until EOF is reached. The bytes are returned as a
                        string object. An empty string is returned when EOF is encountered
                        immediately. (For certain files, like ttys, it makes sense tocontinue
                        reading after an EOF is hit.) Note that this method may call the
                        underlying C function fread() more than once in an effort to acquire as
                        close to size bytes as possible. Also note that when in non-blocking
                        mode, less data than what was requested may be returned, evenif no size
                        parameter was given.
                        """
                        >
                        Diez
                        >
                        Just stumbled upon .read:
                        >
                        response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
                        excel?
                        priceLevel=0&sc ope=0&currency= 15&style=C&size =36&market=1897 &asOf=Jul
                        +25%2C+2008&exp ort=Excel_IEIPe rfRegional').re ad
                        >
                        Now the question is: what to do with this?  I'll look at the
                        documentation that you point to.
                        >
                        thanx - pat
                        >
                        Or rather (next iteration):
                        >
                        response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
                        excel?
                        priceLevel=0&sc ope=0&currency= 15&style=C&size =36&market=1897 &asOf=Jul
                        +25%2C+2008&exp ort=Excel_IEIPe rfRegional').re ad(1000000)
                        >
                        The file is generally something like 26 KB so specifying 1,000,000
                        seems like a good idea (first approximation).
                        >
                        And then when I do:
                        >
                        print(response)
                        >
                        I get a whole lot of garbage (and some non-garbage), so I know I'm
                        onto something.
                        >
                        When I read the .read documentation further, it says that read()has
                        returned the data as a string object.  Now - how do I convincePython
                        that the string object is in fact an excel file - and save it todisk?
                        >
                        You don't need to convince Python, just write it to a file.
                        More reading for you:http://docs.python.org/tut/node9.html
                        >>
                        --
                        -- Guilherme H. Polo Goncalves
                        >
                        OK:
                        >
                        response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
                        excel?
                        priceLevel=0&sc ope=0&currency= 15&style=C&size =36&market=1897 &asOf=Jul
                        +25%2C+2008&exp ort=Excel_IEIPe rfRegional').re ad(1000000)
                        # print(response)
                        f = open("c:\\msci. xls",'w')
                        f.write(respons e)
                        >
                        I would initially change that to:
                        >
                        response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/excel?priceLeve l=0&scope=0&... )
                        >
                        f = open("c:\\msci. xls", "wb")
                        for line in response:
                            f.write(line)
                        f.close()
                        >
                        and then..
                        >
                        OK this makes the file, and there's a c:\msci.xls in place and it's
                        about the right size. But whether I make the second param to open 'w'
                        or 'wb', when I try to open msci.xls from the Windows file explorer,
                        excel tells me that the file is corrupted.
                        >
                        try it.
                        >>
                        --
                        -- Guilherme H. Polo Goncalves
                        >
                        A simple f.write(respons e) does work (click on a single row in Excel
                        and you get a single row).
                        >
                        But I can see that what you recommend Guilherme is probably safer -
                        thanx.
                        >
                        pat
                        >
                        If response contains a string then:
                        >
                        for line in response:
                            f.write(line)
                        >
                        will actually be writing the string one character at a time!
                        Hmm. In this case, response was a string object. (that's what
                        urllib2.urlopen ().read() returns).

                        My concern was with line ending characters (delimiters). I was
                        thinking that if the string object doesn't contain line ending
                        delimiters then maybe the for loop was better. Although that begs the
                        question of how

                        for line in reponse

                        recognizes lines (as defined by line ending delimiters) in the first
                        place.

                        pat

                        Comment

                        • Guilherme Polo

                          #13
                          Re: Download excel file from web?

                          On Mon, Jul 28, 2008 at 9:39 PM, MRAB <google@mrabarn ett.plus.comwro te:[QUOTE]
                          On Jul 29, 12:41 am, "p...@well. com" <p...@well.comw rote:
                          >On Jul 28, 4:20 pm, "Guilherme Polo" <ggp...@gmail.c omwrote:
                          >>
                          >>
                          >>
                          On Mon, Jul 28, 2008 at 8:04 PM, p...@well.com <p...@well.comw rote:
                          On Jul 28, 3:52 pm, "Guilherme Polo" <ggp...@gmail.c omwrote:
                          >On Mon, Jul 28, 2008 at 7:43 PM, p...@well.com <p...@well.comw rote:
                          On Jul 28, 3:33 pm, "p...@well. com" <p...@well.comw rote:
                          >On Jul 28, 3:29 pm, "Diez B. Roggisch" <de...@nospam.w eb.dewrote:
                          >>
                          p...@well.com schrieb:
                          >>
                          On Jul 28, 3:00 pm, "p...@well. com" <p...@well.comw rote:
                          >Hi - experienced programmer but this is my first Python program.
                          >>
                          >This URL will retrieve an excel spreadsheet containing (that day's)
                          >msci stock index returns.
                          >>>>
                          >Want to write python to download and save the file.
                          >>
                          >So far I've arrived at this:
                          >>
                          >
                          ># import pdb
                          >import urllib2
                          >from win32com.client import Dispatch
                          >>
                          >xlApp = Dispatch("Excel .Application")
                          >>
                          ># test 1
                          ># xlApp.Workbooks .Add()
                          ># xlApp.ActiveShe et.Cells(1,1).V alue = 'A'
                          ># xlApp.ActiveWor kbook.ActiveShe et.Cells(2,1).V alue = 'B'
                          ># xlBook = xlApp.ActiveWor kbook
                          ># xlBook.SaveAs(F ilename='C:\\te st.xls')
                          >>
                          ># pdb.set_trace()
                          >response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
                          >excel?
                          >priceLevel=0&s cope=0&currency =15&style=C&siz e=36&market=189 7&asOf=Jul
                          >+25%2C+2008&ex port=Excel_IEIP erfRegional')
                          ># test 2 - returns check = False
                          >check_for_da ta = urllib2.Request ('http://www.mscibarra.c om/webapp/
                          >indexperf/excel?
                          >priceLevel=0&s cope=0&currency =15&style=C&siz e=36&market=189 7&asOf=Jul
                          >+25%2C+2008&ex port=Excel_IEIP erfRegional').h as_data()
                          >>
                          >xlApp = response.fp
                          >print(response .fp.name)
                          >print(xlApp.na me)
                          >xlApp.write
                          >xlApp.Close
                          >
                          >>
                          Woops hit Send when I wanted Preview. Looks like the html
                          tag
                          doesn't work from groups.google.c om (nice).
                          >>
                          Anway, in test 1 above, I determined how to instantiate an excel
                          object; put some stuff in it; then save to disk.
                          >>
                          So, in theory, I'm retrieving my excel spreadsheet with
                          >>
                          response = urllib2.urlopen ()
                          >>
                          Except what then do I do with this?
                          >>
                          Well for one read some of the urllib2 documentation and found the
                          Request class with the method has_data() on it. It returns False.
                          Hmm that's not encouraging.
                          >>
                          I supposed the trick to understand what urllib2.urlopen is returning
                          to me; rummage around in there; and hopefully find my excel file.
                          >>
                          I use pdb to debug. This is interesting:
                          >>
                          (Pdb) dir(response)
                          ['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
                          'code', '
                          fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
                          'readline', '
                          readlines', 'url']
                          (Pdb)
                          >>
                          I suppose the members with __*_ are methods; and the names without the
                          underbars are attributes (variables) (?).
                          >>
                          No, these are the names of all attributes and methods. read is a method,
                          for example.
                          >>
                          >right - I got it backwards.
                          >>
                          Or maybe this isn't at all the right direction to take (maybe there
                          are much better modules to do this stuff). Would be happy to learn if
                          that's the case (and if that gets the job done for me).
                          >>
                          The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
                          clear on this:
                          >>
                          """
                          This function returns a file-like object with two additional methods:
                          """
                          >>
                          And then for file-like objects:
                          >>>>
                          """
                          read( [size])
                          Read at most size bytes from the file (less if the read hits EOF
                          before obtaining size bytes). If the size argument is negative or
                          omitted, read all data until EOF is reached. The bytes are returned as a
                          string object. An empty string is returned when EOF is encountered
                          immediately. (For certain files, like ttys, it makes sense to continue
                          reading after an EOF is hit.) Note that this method may call the
                          underlying C function fread() more than once in an effort to acquire as
                          close to size bytes as possible. Also note that when in non-blocking
                          mode, less data than what was requested may be returned, even if no size
                          parameter was given.
                          """
                          >>
                          Diez
                          >>
                          >Just stumbled upon .read:
                          >>
                          >response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
                          >excel?
                          >priceLevel=0&s cope=0&currency =15&style=C&siz e=36&market=189 7&asOf=Jul
                          >+25%2C+2008&ex port=Excel_IEIP erfRegional').r ead
                          >>
                          >Now the question is: what to do with this? I'll look at the
                          >documentatio n that you point to.
                          >>
                          >thanx - pat
                          >>
                          Or rather (next iteration):
                          >>
                          response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
                          excel?
                          priceLevel=0&sc ope=0&currency= 15&style=C&size =36&market=1897 &asOf=Jul
                          +25%2C+2008&exp ort=Excel_IEIPe rfRegional').re ad(1000000)
                          >>
                          The file is generally something like 26 KB so specifying 1,000,000
                          seems like a good idea (first approximation).
                          >>
                          And then when I do:
                          >>
                          print(response)
                          >>
                          I get a whole lot of garbage (and some non-garbage), so I know I'm
                          onto something.
                          >>
                          When I read the .read documentation further, it says that read() has
                          returned the data as a string object. Now - how do I convince Python
                          that the string object is in fact an excel file - and save it to disk?
                          >>
                          >You don't need to convince Python, just write it to a file.
                          >More reading for you:http://docs.python.org/tut/node9.html
                          >>>>
                          >--
                          >-- Guilherme H. Polo Goncalves
                          >>
                          OK:
                          >>
                          response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
                          excel?
                          priceLevel=0&sc ope=0&currency= 15&style=C&size =36&market=1897 &asOf=Jul
                          +25%2C+2008&exp ort=Excel_IEIPe rfRegional').re ad(1000000)
                          # print(response)
                          f = open("c:\\msci. xls",'w')
                          f.write(respons e)
                          >>
                          I would initially change that to:
                          >>
                          response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/excel?priceLeve l=0&scope=0&... )
                          >>
                          f = open("c:\\msci. xls", "wb")
                          for line in response:
                          f.write(line)
                          f.close()
                          >>
                          and then..
                          >>
                          OK this makes the file, and there's a c:\msci.xls in place and it's
                          about the right size. But whether I make the second param to open 'w'
                          or 'wb', when I try to open msci.xls from the Windows file explorer,
                          excel tells me that the file is corrupted.
                          >>
                          try it.
                          >>>>
                          --
                          -- Guilherme H. Polo Goncalves
                          >>
                          >A simple f.write(respons e) does work (click on a single row in Excel
                          >and you get a single row).
                          >>
                          >But I can see that what you recommend Guilherme is probably safer -
                          >thanx.
                          >>
                          >pat
                          >
                          If response contains a string then:
                          >
                          Did you notice I removed the read(...) part ?
                          for line in response:
                          f.write(line)
                          >
                          will actually be writing the string one character at a time!
                          --

                          >


                          --
                          -- Guilherme H. Polo Goncalves

                          Comment

                          • patf@well.com

                            #14
                            Re: Download excel file from web?

                            On Jul 28, 6:05 pm, "Guilherme Polo" <ggp...@gmail.c omwrote:[QUOTE]
                            On Mon, Jul 28, 2008 at 9:39 PM, MRAB <goo...@mrabarn ett.plus.comwro te:
                            On Jul 29, 12:41 am, "p...@well. com" <p...@well.comw rote:
                            On Jul 28, 4:20 pm, "Guilherme Polo" <ggp...@gmail.c omwrote:
                            >
                            On Mon, Jul 28, 2008 at 8:04 PM, p...@well.com <p...@well.comw rote:
                            On Jul 28, 3:52 pm, "Guilherme Polo" <ggp...@gmail.c omwrote:
                            >On Mon, Jul 28, 2008 at 7:43 PM, p...@well.com <p...@well.comw rote:
                            On Jul 28, 3:33 pm, "p...@well. com" <p...@well.comw rote:
                            >On Jul 28, 3:29 pm, "Diez B. Roggisch" <de...@nospam.w eb.dewrote:
                            >
                            p...@well.com schrieb:
                            >
                            On Jul 28, 3:00 pm, "p...@well. com" <p...@well.comw rote:
                            >Hi - experienced programmer but this is my first Python program.
                            >
                            >This URL will retrieve an excel spreadsheet containing (that day's)
                            >msci stock index returns.
                            >>
                            >Want to write python to download and save the file.
                            >
                            >So far I've arrived at this:
                            >
                            >
                            ># import pdb
                            >import urllib2
                            >from win32com.client import Dispatch
                            >
                            >xlApp = Dispatch("Excel .Application")
                            >
                            ># test 1
                            ># xlApp.Workbooks .Add()
                            ># xlApp.ActiveShe et.Cells(1,1).V alue = 'A'
                            ># xlApp.ActiveWor kbook.ActiveShe et.Cells(2,1).V alue = 'B'
                            ># xlBook = xlApp.ActiveWor kbook
                            ># xlBook.SaveAs(F ilename='C:\\te st.xls')
                            >
                            ># pdb.set_trace()
                            >response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
                            >excel?
                            >priceLevel=0&s cope=0&currency =15&style=C&siz e=36&market=189 7&asOf=Jul
                            >+25%2C+2008&ex port=Excel_IEIP erfRegional')
                            ># test 2 - returns check = False
                            >check_for_da ta = urllib2.Request ('http://www.mscibarra.c om/webapp/
                            >indexperf/excel?
                            >priceLevel=0&s cope=0&currency =15&style=C&siz e=36&market=189 7&asOf=Jul
                            >+25%2C+2008&ex port=Excel_IEIP erfRegional').h as_data()
                            >
                            >xlApp = response.fp
                            >print(response .fp.name)
                            >print(xlApp.na me)
                            >xlApp.write
                            >xlApp.Close
                            >
                            >
                            Woops hit Send when I wanted Preview.  Looks like the html
                            tag
                            doesn't work from groups.google.c om (nice).
                            >
                            Anway, in test 1 above, I determined how to instantiate anexcel
                            object; put some stuff in it; then save to disk.
                            >
                            So, in theory, I'm retrieving my excel spreadsheet with
                            >
                            response = urllib2.urlopen ()
                            >
                            Except what then do I do with this?
                            >
                            Well for one read some of the urllib2 documentation and found the
                            Request class with the method has_data() on it.  It returns False.
                            Hmm that's not encouraging.
                            >
                            I supposed the trick to understand what urllib2.urlopen isreturning
                            to me; rummage around in there; and hopefully find my excel file.
                            >
                            I use pdb to debug.  This is interesting:
                            >
                            (Pdb) dir(response)
                            ['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
                            'code', '
                            fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next','read',
                            'readline', '
                            readlines', 'url']
                            (Pdb)
                            >
                            I suppose the members with __*_ are methods; and the nameswithout the
                            underbars are attributes (variables) (?).
                            >
                            No, these are the names of all attributes and methods. read is a method,
                            for example.
                            >
                            >right - I got it backwards.
                            >
                            Or maybe this isn't at all the right direction to take (maybe there
                            are much better modules to do this stuff).  Would be happy to learn if
                            that's the case (and if that gets the job done for me).
                            >
                            The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
                            clear on this:
                            >
                            """
                            This function returns a file-like object with two additionalmetho ds:
                            """
                            >
                            And then for file-like objects:
                            >>
                            """
                            read(   [size])
                                 Read at most size bytes from the file (less if the read hits EOF
                            before obtaining size bytes). If the size argument is negative or
                            omitted, read all data until EOF is reached. The bytes are returned as a
                            string object. An empty string is returned when EOF is encountered
                            immediately. (For certain files, like ttys, it makes sense to continue
                            reading after an EOF is hit.) Note that this method may callthe
                            underlying C function fread() more than once in an effort toacquire as
                            close to size bytes as possible. Also note that when in non-blocking
                            mode, less data than what was requested may be returned, even if no size
                            parameter was given.
                            """
                            >
                            Diez
                            >
                            >Just stumbled upon .read:
                            >
                            >response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
                            >excel?
                            >priceLevel=0&s cope=0&currency =15&style=C&siz e=36&market=189 7&asOf=Jul
                            >+25%2C+2008&ex port=Excel_IEIP erfRegional').r ead
                            >
                            >Now the question is: what to do with this?  I'll look at the
                            >documentatio n that you point to.
                            >
                            >thanx - pat
                            >
                            Or rather (next iteration):
                            >
                            response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
                            excel?
                            priceLevel=0&sc ope=0&currency= 15&style=C&size =36&market=1897 &asOf=Jul
                            +25%2C+2008&exp ort=Excel_IEIPe rfRegional').re ad(1000000)
                            >
                            The file is generally something like 26 KB so specifying 1,000,000
                            seems like a good idea (first approximation).
                            >
                            And then when I do:
                            >
                            print(response)
                            >
                            I get a whole lot of garbage (and some non-garbage), so I know I'm
                            onto something.
                            >
                            When I read the .read documentation further, it says that read() has
                            returned the data as a string object.  Now - how do I convince Python
                            that the string object is in fact an excel file - and save it to disk?
                            >
                            >You don't need to convince Python, just write it to a file.
                            >More reading for you:http://docs.python.org/tut/node9.html
                            >>
                            >--
                            >-- Guilherme H. Polo Goncalves
                            >
                            OK:
                            >
                            response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
                            excel?
                            priceLevel=0&sc ope=0&currency= 15&style=C&size =36&market=1897 &asOf=Jul
                            +25%2C+2008&exp ort=Excel_IEIPe rfRegional').re ad(1000000)
                            # print(response)
                            f = open("c:\\msci. xls",'w')
                            f.write(respons e)
                            >
                            I would initially change that to:
                            >
                            response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/excel?priceLeve l=0&scope=0&... )
                            >
                            f = open("c:\\msci. xls", "wb")
                            for line in response:
                                f.write(line)
                            f.close()
                            >
                            and then..
                            >
                            OK this makes the file, and there's a c:\msci.xls in place and it's
                            about the right size. But whether I make the second param to open 'w'
                            or 'wb', when I try to open msci.xls from the Windows file explorer,
                            excel tells me that the file is corrupted.
                            >
                            try it.
                            >>
                            --
                            -- Guilherme H. Polo Goncalves
                            >
                            A simple f.write(respons e) does work (click on a single row in Excel
                            and you get a single row).
                            >
                            But I can see that what you recommend Guilherme is probably safer -
                            thanx.
                            >
                            pat
                            >
                            If response contains a string then:
                            >
                            Did you notice I removed the read(...) part ?
                            >
                            for line in response:
                               f.write(line)
                            >
                            will actually be writing the string one character at a time!
                            --
                            http://mail.python.org/mailman/listinfo/python-list
                            >
                            --
                            -- Guilherme H. Polo Goncalves
                            Actually no I didn't Guilherme (although I'll take it out now).

                            Would leaving the in urllib2.urlopen ().read() imply, as MRAB would
                            seem to indicate, that the following for loop would act byte-by-byte?
                            And if so, how?

                            Even with the .read() in, it was very fast. But it looks like it
                            won't hurt (and very possibly helps) to take it out.

                            pat

                            Comment

                            • patf@well.com

                              #15
                              Re: Download excel file from web?

                              On Jul 28, 6:05 pm, "Guilherme Polo" <ggp...@gmail.c omwrote:[QUOTE]
                              On Mon, Jul 28, 2008 at 9:39 PM, MRAB <goo...@mrabarn ett.plus.comwro te:
                              On Jul 29, 12:41 am, "p...@well. com" <p...@well.comw rote:
                              On Jul 28, 4:20 pm, "Guilherme Polo" <ggp...@gmail.c omwrote:
                              >
                              On Mon, Jul 28, 2008 at 8:04 PM, p...@well.com <p...@well.comw rote:
                              On Jul 28, 3:52 pm, "Guilherme Polo" <ggp...@gmail.c omwrote:
                              >On Mon, Jul 28, 2008 at 7:43 PM, p...@well.com <p...@well.comw rote:
                              On Jul 28, 3:33 pm, "p...@well. com" <p...@well.comw rote:
                              >On Jul 28, 3:29 pm, "Diez B. Roggisch" <de...@nospam.w eb.dewrote:
                              >
                              p...@well.com schrieb:
                              >
                              On Jul 28, 3:00 pm, "p...@well. com" <p...@well.comw rote:
                              >Hi - experienced programmer but this is my first Python program.
                              >
                              >This URL will retrieve an excel spreadsheet containing (that day's)
                              >msci stock index returns.
                              >>
                              >Want to write python to download and save the file.
                              >
                              >So far I've arrived at this:
                              >
                              >
                              ># import pdb
                              >import urllib2
                              >from win32com.client import Dispatch
                              >
                              >xlApp = Dispatch("Excel .Application")
                              >
                              ># test 1
                              ># xlApp.Workbooks .Add()
                              ># xlApp.ActiveShe et.Cells(1,1).V alue = 'A'
                              ># xlApp.ActiveWor kbook.ActiveShe et.Cells(2,1).V alue = 'B'
                              ># xlBook = xlApp.ActiveWor kbook
                              ># xlBook.SaveAs(F ilename='C:\\te st.xls')
                              >
                              ># pdb.set_trace()
                              >response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
                              >excel?
                              >priceLevel=0&s cope=0&currency =15&style=C&siz e=36&market=189 7&asOf=Jul
                              >+25%2C+2008&ex port=Excel_IEIP erfRegional')
                              ># test 2 - returns check = False
                              >check_for_da ta = urllib2.Request ('http://www.mscibarra.c om/webapp/
                              >indexperf/excel?
                              >priceLevel=0&s cope=0&currency =15&style=C&siz e=36&market=189 7&asOf=Jul
                              >+25%2C+2008&ex port=Excel_IEIP erfRegional').h as_data()
                              >
                              >xlApp = response.fp
                              >print(response .fp.name)
                              >print(xlApp.na me)
                              >xlApp.write
                              >xlApp.Close
                              >
                              >
                              Woops hit Send when I wanted Preview.  Looks like the html
                              tag
                              doesn't work from groups.google.c om (nice).
                              >
                              Anway, in test 1 above, I determined how to instantiate anexcel
                              object; put some stuff in it; then save to disk.
                              >
                              So, in theory, I'm retrieving my excel spreadsheet with
                              >
                              response = urllib2.urlopen ()
                              >
                              Except what then do I do with this?
                              >
                              Well for one read some of the urllib2 documentation and found the
                              Request class with the method has_data() on it.  It returns False.
                              Hmm that's not encouraging.
                              >
                              I supposed the trick to understand what urllib2.urlopen isreturning
                              to me; rummage around in there; and hopefully find my excel file.
                              >
                              I use pdb to debug.  This is interesting:
                              >
                              (Pdb) dir(response)
                              ['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
                              'code', '
                              fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next','read',
                              'readline', '
                              readlines', 'url']
                              (Pdb)
                              >
                              I suppose the members with __*_ are methods; and the nameswithout the
                              underbars are attributes (variables) (?).
                              >
                              No, these are the names of all attributes and methods. read is a method,
                              for example.
                              >
                              >right - I got it backwards.
                              >
                              Or maybe this isn't at all the right direction to take (maybe there
                              are much better modules to do this stuff).  Would be happy to learn if
                              that's the case (and if that gets the job done for me).
                              >
                              The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
                              clear on this:
                              >
                              """
                              This function returns a file-like object with two additionalmetho ds:
                              """
                              >
                              And then for file-like objects:
                              >>
                              """
                              read(   [size])
                                   Read at most size bytes from the file (less if the read hits EOF
                              before obtaining size bytes). If the size argument is negative or
                              omitted, read all data until EOF is reached. The bytes are returned as a
                              string object. An empty string is returned when EOF is encountered
                              immediately. (For certain files, like ttys, it makes sense to continue
                              reading after an EOF is hit.) Note that this method may callthe
                              underlying C function fread() more than once in an effort toacquire as
                              close to size bytes as possible. Also note that when in non-blocking
                              mode, less data than what was requested may be returned, even if no size
                              parameter was given.
                              """
                              >
                              Diez
                              >
                              >Just stumbled upon .read:
                              >
                              >response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
                              >excel?
                              >priceLevel=0&s cope=0&currency =15&style=C&siz e=36&market=189 7&asOf=Jul
                              >+25%2C+2008&ex port=Excel_IEIP erfRegional').r ead
                              >
                              >Now the question is: what to do with this?  I'll look at the
                              >documentatio n that you point to.
                              >
                              >thanx - pat
                              >
                              Or rather (next iteration):
                              >
                              response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
                              excel?
                              priceLevel=0&sc ope=0&currency= 15&style=C&size =36&market=1897 &asOf=Jul
                              +25%2C+2008&exp ort=Excel_IEIPe rfRegional').re ad(1000000)
                              >
                              The file is generally something like 26 KB so specifying 1,000,000
                              seems like a good idea (first approximation).
                              >
                              And then when I do:
                              >
                              print(response)
                              >
                              I get a whole lot of garbage (and some non-garbage), so I know I'm
                              onto something.
                              >
                              When I read the .read documentation further, it says that read() has
                              returned the data as a string object.  Now - how do I convince Python
                              that the string object is in fact an excel file - and save it to disk?
                              >
                              >You don't need to convince Python, just write it to a file.
                              >More reading for you:http://docs.python.org/tut/node9.html
                              >>
                              >--
                              >-- Guilherme H. Polo Goncalves
                              >
                              OK:
                              >
                              response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
                              excel?
                              priceLevel=0&sc ope=0&currency= 15&style=C&size =36&market=1897 &asOf=Jul
                              +25%2C+2008&exp ort=Excel_IEIPe rfRegional').re ad(1000000)
                              # print(response)
                              f = open("c:\\msci. xls",'w')
                              f.write(respons e)
                              >
                              I would initially change that to:
                              >
                              response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/excel?priceLeve l=0&scope=0&... )
                              >
                              f = open("c:\\msci. xls", "wb")
                              for line in response:
                                  f.write(line)
                              f.close()
                              >
                              and then..
                              >
                              OK this makes the file, and there's a c:\msci.xls in place and it's
                              about the right size. But whether I make the second param to open 'w'
                              or 'wb', when I try to open msci.xls from the Windows file explorer,
                              excel tells me that the file is corrupted.
                              >
                              try it.
                              >>
                              --
                              -- Guilherme H. Polo Goncalves
                              >
                              A simple f.write(respons e) does work (click on a single row in Excel
                              and you get a single row).
                              >
                              But I can see that what you recommend Guilherme is probably safer -
                              thanx.
                              >
                              pat
                              >
                              If response contains a string then:
                              >
                              Did you notice I removed the read(...) part ?
                              >
                              for line in response:
                                 f.write(line)
                              >
                              will actually be writing the string one character at a time!
                              --
                              http://mail.python.org/mailman/listinfo/python-list
                              >
                              --
                              -- Guilherme H. Polo Goncalves
                              Actually no I didn't Guilherme (although I'll take it out now).

                              Would leaving the in urllib2.urlopen ().read() imply, as MRAB would
                              seem to indicate, that the following for loop would act byte-by-byte?
                              And if so, how?

                              Even with the .read() in, it was very fast. But it looks like it
                              won't hurt (and very possibly helps) to take it out.

                              pat

                              Comment

                              Working...