User Profile

erbrose · Sep 10 '10, 08:49 PM

Thanks, that does make more sense now!

erbrose · Sep 10 '10, 08:19 PM

Thanks so much...
now seeing as i don't really see something like this in the documentation, could you explain this line a bit more?
datetime.dateti me(*time.strpti me(s, "%Y-%m-%d %H:%M:%S")[:6])

I understand calling datetime.dateti me, but why the * before time? and also what exactly is the [:6] pertaining too
Thanks again for your help!!!

erbrose · Aug 3 '10, 04:46 PM

wow, thanks dwblas!
That dramatically increased the search speed! Still alot to learn with python..
Very much appreciated!
Eric

erbrose · Jul 26 '10, 04:35 PM

alright, well ignore this post.. Just figured out that if i just print the final values after the loop, it works

Code:

#!/usr/bin/python
import sys
TmpArr = []
OutArr = []
i = int(0)
j = int(0)
id = ""
VAL1 = int(0)
VAL2 = int(0)
TOTAL = int(0)

for line in sys.stdin:
    j += 1
    try:            
        line = line.strip()
        TmpArr

...

erbrose · Jun 28 '10, 08:59 PM

alright after a bit more web searches.. found it, so thought I would share if other folks ever need to do this.

Code:

Imports System
Imports System.IO
Imports System.Net

Dim uriWebSite As New Uri("http://ourserver...com:port#/filetos...es/actual_file")
        Dim WReq As WebRequest = System.Net.WebRequest.Create(uriWebSite)
        Dim wResp As WebResponse = WReq.GetResponse()
        Dim

...

erbrose · May 14 '10, 06:29 PM

thanks... that one took me a while to figure out too! :)

erbrose · May 14 '10, 03:51 PM

oh... i found my error...
in line 15 i was appending all of a to d, then joining c and d into tuple e. what i should have done is just join c and a into e...
my out is now correct... thank you so much!

erbrose · May 14 '10, 03:45 PM

sorry,
didn't mean to imply that i wasn't willing or able to try. I definitely worked on the tuple method yesterday and still did not sort correctly. the code that i originally posted was that method... well l least thats what i thought that method was.. where e was the tuple

Code:

a = []
d = []
i = 0
c = []


line = "0001 , 4 , 0.34 , 3 , 15 , 25.3"
a.append(line.split(','))

...

erbrose · May 14 '10, 02:16 PM

Thanks for the reply...
still not sure how that would help me, as its still sorting by all the values and not just select values/columns in my MD List.

erbrose · May 3 '10, 07:03 PM

this is what i've been doing... im pretty new to python too though...

Code:

TmpArr = []

for line in openfile:
    #strips line
    line = line.strip()
    TmpArr.append(line.split('\t'))

now you have a multidimensiona l list (TmpArr)... you can sort by columns by doing something like this

Code:

TmpArr.sort(key=lambda a:(a[0]))

if say you wanted to sort by authorlist

erbrose · Apr 29 '10, 03:10 PM

Thanks!
Alright.. well the only reason I check for line == 0 (or in this case line = "") is the actual end of file.. there will be no NULL lines from the map input. I am still having to pretty much duplicate the code as you see. The code is all over the place too as im still in debug mode... but it is working properly with a csv file as the input.. I am calculating the average, standard deviation, median, min and max values too. Will...

erbrose · Apr 28 '10, 07:37 PM

i am messing around with just running the reducer.py with a txt file and am able to process the whole file by adding this

Code:

while True:
    line = reader.readline()
    if len(line) != 0:
        <my code here>
    else:
        <repeat my code here>

seems slightly wrong to have to repeat all my code in the if and the else but it works... am not able to get it to work using sys.stdin......

erbrose · Apr 26 '10, 09:19 PM

one last post (on this subject anyhow!) just wanted to let you know that I was able to complete my first Map/Reduce job on Hadoop with Python! Thanks again for all your help!

erbrose · Apr 26 '10, 02:34 PM

Hey all!
Well, i am now able to run this code on my sample csv file (3 million rows) on my desktop. It completes in under 1 minute. Which is great! Still having issues on the hadoop end, but i think that problem is not for this forum. Would still appreciate any suggestions or improvements on the code itself as im still very much a newbie!

Code:

import time
t1 = time.clock()
TmpArr = []
Unique = []
SortArr = []

...

User Profile

Profile Sidebar

Leave a comment:

Leave a comment:

dealing with datetime while parsing a csv

Leave a comment:

best way to match values in two tables...

Leave a comment:

looping through values error....

Leave a comment:

reading a file from a url to array

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

How to sort a multi-dimensional list in python 2.3

Leave a comment:

Leave a comment:

Leave a comment:

parsing a file

Leave a comment:

Leave a comment: