Re: Download excel file from web?
On Tue, Jul 29, 2008 at 1:47 AM, patf@well.com <patf@well.comw rote:[QUOTE]
On Jul 28, 6:05 pm, "Guilherme Polo" <ggp...@gmail.c omwrote:
..read() returns a string, so yes.
The point in removing the .read(xxxxx) is that you no longer need to
guess how long is the file to read it entirely.
--
-- Guilherme H. Polo Goncalves
On Tue, Jul 29, 2008 at 1:47 AM, patf@well.com <patf@well.comw rote:[QUOTE]
On Jul 28, 6:05 pm, "Guilherme Polo" <ggp...@gmail.c omwrote:
>On Mon, Jul 28, 2008 at 9:39 PM, MRAB <goo...@mrabarn ett.plus.comwro te:
>>
>>
>>
>>
>>>>
>>
>>
>>
>
Actually no I didn't Guilherme (although I'll take it out now).
>
Would leaving the in urllib2.urlopen ().read() imply, as MRAB would
seem to indicate, that the following for loop would act byte-by-byte?
And if so, how?
On Jul 29, 12:41 am, "p...@well. com" <p...@well.comw rote:
>On Jul 28, 4:20 pm, "Guilherme Polo" <ggp...@gmail.c omwrote:
>On Jul 28, 4:20 pm, "Guilherme Polo" <ggp...@gmail.c omwrote:
On Mon, Jul 28, 2008 at 8:04 PM, p...@well.com <p...@well.comw rote:
On Jul 28, 3:52 pm, "Guilherme Polo" <ggp...@gmail.c omwrote:
>On Mon, Jul 28, 2008 at 7:43 PM, p...@well.com <p...@well.comw rote:
On Jul 28, 3:33 pm, "p...@well. com" <p...@well.comw rote:
>On Jul 28, 3:29 pm, "Diez B. Roggisch" <de...@nospam.w eb.dewrote:
On Jul 28, 3:52 pm, "Guilherme Polo" <ggp...@gmail.c omwrote:
>On Mon, Jul 28, 2008 at 7:43 PM, p...@well.com <p...@well.comw rote:
On Jul 28, 3:33 pm, "p...@well. com" <p...@well.comw rote:
>On Jul 28, 3:29 pm, "Diez B. Roggisch" <de...@nospam.w eb.dewrote:
p...@well.com schrieb:
On Jul 28, 3:00 pm, "p...@well. com" <p...@well.comw rote:
>Hi - experienced programmer but this is my first Python program.
>Hi - experienced programmer but this is my first Python program.
>This URL will retrieve an excel spreadsheet containing (that day's)
>msci stock index returns.
>msci stock index returns.
>Want to write python to download and save the file.
>So far I've arrived at this:
>
>>
>>
>>
>>
># import pdb
>import urllib2
>from win32com.client import Dispatch
>import urllib2
>from win32com.client import Dispatch
>xlApp = Dispatch("Excel .Application")
># test 1
># xlApp.Workbooks .Add()
># xlApp.ActiveShe et.Cells(1,1).V alue = 'A'
># xlApp.ActiveWor kbook.ActiveShe et.Cells(2,1).V alue = 'B'
># xlBook = xlApp.ActiveWor kbook
># xlBook.SaveAs(F ilename='C:\\te st.xls')
># xlApp.Workbooks .Add()
># xlApp.ActiveShe et.Cells(1,1).V alue = 'A'
># xlApp.ActiveWor kbook.ActiveShe et.Cells(2,1).V alue = 'B'
># xlBook = xlApp.ActiveWor kbook
># xlBook.SaveAs(F ilename='C:\\te st.xls')
># pdb.set_trace()
>response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
>excel?
>priceLevel=0&s cope=0¤cy =15&style=C&siz e=36&market=189 7&asOf=Jul
>+25%2C+2008&ex port=Excel_IEIP erfRegional')
># test 2 - returns check = False
>check_for_da ta = urllib2.Request ('http://www.mscibarra.c om/webapp/
>indexperf/excel?
>priceLevel=0&s cope=0¤cy =15&style=C&siz e=36&market=189 7&asOf=Jul
>+25%2C+2008&ex port=Excel_IEIP erfRegional').h as_data()
>response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
>excel?
>priceLevel=0&s cope=0¤cy =15&style=C&siz e=36&market=189 7&asOf=Jul
>+25%2C+2008&ex port=Excel_IEIP erfRegional')
># test 2 - returns check = False
>check_for_da ta = urllib2.Request ('http://www.mscibarra.c om/webapp/
>indexperf/excel?
>priceLevel=0&s cope=0¤cy =15&style=C&siz e=36&market=189 7&asOf=Jul
>+25%2C+2008&ex port=Excel_IEIP erfRegional').h as_data()
>xlApp = response.fp
>print(response .fp.name)
>print(xlApp.na me)
>xlApp.write
>xlApp.Close
>
>print(response .fp.name)
>print(xlApp.na me)
>xlApp.write
>xlApp.Close
>
Woops hit Send when I wanted Preview. Looks like the html
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>>>
>>
>>
>>
>>
>>
>>
>>
>>
>>>>
>>
>>
>>
>>
>>
>Did you notice I removed the read(...) part ?
>>
>>
>>
>--
>-- Guilherme H. Polo Goncalves
tag
doesn't work from groups.google.c om (nice).
doesn't work from groups.google.c om (nice).
Anway, in test 1 above, I determined how to instantiate an excel
object; put some stuff in it; then save to disk.
object; put some stuff in it; then save to disk.
So, in theory, I'm retrieving my excel spreadsheet with
response = urllib2.urlopen ()
Except what then do I do with this?
Well for one read some of the urllib2 documentation and found the
Request class with the method has_data() on it. It returns False.
Hmm that's not encouraging.
Request class with the method has_data() on it. It returns False.
Hmm that's not encouraging.
I supposed the trick to understand what urllib2.urlopen is returning
to me; rummage around in there; and hopefully find my excel file.
to me; rummage around in there; and hopefully find my excel file.
I use pdb to debug. This is interesting:
(Pdb) dir(response)
['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
'code', '
fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
'readline', '
readlines', 'url']
(Pdb)
['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
'code', '
fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
'readline', '
readlines', 'url']
(Pdb)
I suppose the members with __*_ are methods; and the names without the
underbars are attributes (variables) (?).
underbars are attributes (variables) (?).
No, these are the names of all attributes and methods. read is a method,
for example.
for example.
>right - I got it backwards.
Or maybe this isn't at all the right direction to take (maybe there
are much better modules to do this stuff). Would be happy to learn if
that's the case (and if that gets the job done for me).
are much better modules to do this stuff). Would be happy to learn if
that's the case (and if that gets the job done for me).
The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
clear on this:
clear on this:
"""
This function returns a file-like object with two additional methods:
"""
This function returns a file-like object with two additional methods:
"""
And then for file-like objects:
"""
read( [size])
Read at most size bytes from the file (less if the read hits EOF
before obtaining size bytes). If the size argument is negative or
omitted, read all data until EOF is reached. The bytes are returned as a
string object. An empty string is returned when EOF is encountered
immediately. (For certain files, like ttys, it makes sense to continue
reading after an EOF is hit.) Note that this method may call the
underlying C function fread() more than once in an effort to acquire as
close to size bytes as possible. Also note that when in non-blocking
mode, less data than what was requested may be returned, even if no size
parameter was given.
"""
read( [size])
Read at most size bytes from the file (less if the read hits EOF
before obtaining size bytes). If the size argument is negative or
omitted, read all data until EOF is reached. The bytes are returned as a
string object. An empty string is returned when EOF is encountered
immediately. (For certain files, like ttys, it makes sense to continue
reading after an EOF is hit.) Note that this method may call the
underlying C function fread() more than once in an effort to acquire as
close to size bytes as possible. Also note that when in non-blocking
mode, less data than what was requested may be returned, even if no size
parameter was given.
"""
Diez
>Just stumbled upon .read:
>response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
>excel?
>priceLevel=0&s cope=0¤cy =15&style=C&siz e=36&market=189 7&asOf=Jul
>+25%2C+2008&ex port=Excel_IEIP erfRegional').r ead
>excel?
>priceLevel=0&s cope=0¤cy =15&style=C&siz e=36&market=189 7&asOf=Jul
>+25%2C+2008&ex port=Excel_IEIP erfRegional').r ead
>Now the question is: what to do with this? I'll look at the
>documentatio n that you point to.
>documentatio n that you point to.
>thanx - pat
Or rather (next iteration):
response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
excel?
priceLevel=0&sc ope=0¤cy= 15&style=C&size =36&market=1897 &asOf=Jul
+25%2C+2008&exp ort=Excel_IEIPe rfRegional').re ad(1000000)
excel?
priceLevel=0&sc ope=0¤cy= 15&style=C&size =36&market=1897 &asOf=Jul
+25%2C+2008&exp ort=Excel_IEIPe rfRegional').re ad(1000000)
The file is generally something like 26 KB so specifying 1,000,000
seems like a good idea (first approximation).
seems like a good idea (first approximation).
And then when I do:
print(response)
I get a whole lot of garbage (and some non-garbage), so I know I'm
onto something.
onto something.
When I read the .read documentation further, it says that read() has
returned the data as a string object. Now - how do I convince Python
that the string object is in fact an excel file - and save it to disk?
returned the data as a string object. Now - how do I convince Python
that the string object is in fact an excel file - and save it to disk?
>You don't need to convince Python, just write it to a file.
>More reading for you:http://docs.python.org/tut/node9.html
>More reading for you:http://docs.python.org/tut/node9.html
>--
>-- Guilherme H. Polo Goncalves
>-- Guilherme H. Polo Goncalves
OK:
response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/
excel?
priceLevel=0&sc ope=0¤cy= 15&style=C&size =36&market=1897 &asOf=Jul
+25%2C+2008&exp ort=Excel_IEIPe rfRegional').re ad(1000000)
# print(response)
f = open("c:\\msci. xls",'w')
f.write(respons e)
excel?
priceLevel=0&sc ope=0¤cy= 15&style=C&size =36&market=1897 &asOf=Jul
+25%2C+2008&exp ort=Excel_IEIPe rfRegional').re ad(1000000)
# print(response)
f = open("c:\\msci. xls",'w')
f.write(respons e)
I would initially change that to:
response = urllib2.urlopen ('http://www.mscibarra.c om/webapp/indexperf/excel?priceLeve l=0&scope=0&... )
f = open("c:\\msci. xls", "wb")
for line in response:
f.write(line)
f.close()
for line in response:
f.write(line)
f.close()
and then..
OK this makes the file, and there's a c:\msci.xls in place and it's
about the right size. But whether I make the second param to open 'w'
or 'wb', when I try to open msci.xls from the Windows file explorer,
excel tells me that the file is corrupted.
about the right size. But whether I make the second param to open 'w'
or 'wb', when I try to open msci.xls from the Windows file explorer,
excel tells me that the file is corrupted.
try it.
--
-- Guilherme H. Polo Goncalves
-- Guilherme H. Polo Goncalves
>A simple f.write(respons e) does work (click on a single row in Excel
>and you get a single row).
>and you get a single row).
>But I can see that what you recommend Guilherme is probably safer -
>thanx.
>thanx.
>pat
If response contains a string then:
>Did you notice I removed the read(...) part ?
>>
for line in response:
f.write(line)
f.write(line)
will actually be writing the string one character at a time!
--
>http://mail.python.org/mailman/listinfo/python-list
--
>http://mail.python.org/mailman/listinfo/python-list
>--
>-- Guilherme H. Polo Goncalves
Actually no I didn't Guilherme (although I'll take it out now).
>
Would leaving the in urllib2.urlopen ().read() imply, as MRAB would
seem to indicate, that the following for loop would act byte-by-byte?
And if so, how?
The point in removing the .read(xxxxx) is that you no longer need to
guess how long is the file to read it entirely.
--
-- Guilherme H. Polo Goncalves
Comment