User Profile

Collapse

Profile Sidebar

Collapse
erbrose
erbrose
Last Activity: Apr 10 '12, 07:17 PM
Joined: Oct 12 '06
Location:
  •  
  • Time
  • Show
  • Source
Clear All
new posts

  • erbrose
    replied to dealing with datetime while parsing a csv
    Thanks, that does make more sense now!
    See more | Go to post

    Leave a comment:


  • erbrose
    replied to dealing with datetime while parsing a csv
    Thanks so much...
    now seeing as i don't really see something like this in the documentation, could you explain this line a bit more?
    datetime.dateti me(*time.strpti me(s, "%Y-%m-%d %H:%M:%S")[:6])

    I understand calling datetime.dateti me, but why the * before time? and also what exactly is the [:6] pertaining too
    Thanks again for your help!!!
    See more | Go to post

    Leave a comment:


  • erbrose
    started a topic dealing with datetime while parsing a csv

    dealing with datetime while parsing a csv

    Hello all!
    I am parsing a csv file and one of the fields is a date time field that looks something like this
    2010-01-15 23:15:30
    year-month-day hour24:minute:s econd

    as i loop through this csv i am going to need to do some time arithmetic. my question is, how do i turn that field from a string in my list to a datetime object?
    Code:
    from datatime import datetime
    TmpArr = []
    reader = open("c:/test.csv",'r')
    ...
    See more | Go to post

  • erbrose
    replied to best way to match values in two tables...
    wow, thanks dwblas!
    That dramatically increased the search speed! Still alot to learn with python..
    Very much appreciated!
    Eric
    See more | Go to post

    Leave a comment:


  • erbrose
    started a topic best way to match values in two tables...

    best way to match values in two tables...

    Hey all,
    Sorry the subject should have said..
    "best way to match values in TWO tables"
    I have two tables that I need to match based off an Unique ID in both tables. Im running this process using hadoop streaming with python, so the actual code is a bit different (ie using csv files to debug locally). I've tried a couple different methods and both are not quite fast enough... ha!
    First was like this, using the...
    See more | Go to post
    Last edited by bvdet; Aug 4 '10, 12:13 PM. Reason: corrrected title

  • erbrose
    replied to looping through values error....
    alright, well ignore this post.. Just figured out that if i just print the final values after the loop, it works

    Code:
    #!/usr/bin/python
    import sys
    TmpArr = []
    OutArr = []
    i = int(0)
    j = int(0)
    id = ""
    VAL1 = int(0)
    VAL2 = int(0)
    TOTAL = int(0)
    
    for line in sys.stdin:
        j += 1
        try:            
            line = line.strip()
            TmpArr
    ...
    See more | Go to post

    Leave a comment:


  • erbrose
    started a topic looping through values error....

    looping through values error....

    Hey all.
    I do not understand what is wrong with my script and would love some help... first off the examples in my script are based off running a map reduce in hadoop.. the part I am struggling with is the reduce.. my basic input is something like this
    ID--VAL1--VAL2
    41,0,1
    41,1,0
    41,1,0
    46,0,1
    46,0,1
    46,1,0
    46,1,0
    basically I need to loop through each line and check to see if the ID from...
    See more | Go to post

  • alright after a bit more web searches.. found it, so thought I would share if other folks ever need to do this.
    Code:
    Imports System
    Imports System.IO
    Imports System.Net
    
    Dim uriWebSite As New Uri("http://ourserver...com:port#/filetos...es/actual_file")
            Dim WReq As WebRequest = System.Net.WebRequest.Create(uriWebSite)
            Dim wResp As WebResponse = WReq.GetResponse()
            Dim
    ...
    See more | Go to post

    Leave a comment:


  • erbrose
    started a topic reading a file from a url to array

    reading a file from a url to array

    Hey All,
    I have been trying to research how to get a file (basically a csv file) from a url to an array and my search has come up with some examples, but not quite what I am looking for. Would appreciated any direction and or other examples of folks that have done this before.
    Basically we have a cloud computer set up (Hadoop) and the output of some of our processes is what I am trying to get to..
    the file is something like
    ...
    See more | Go to post

  • thanks... that one took me a while to figure out too! :)
    See more | Go to post

    Leave a comment:


  • oh... i found my error...
    in line 15 i was appending all of a to d, then joining c and d into tuple e. what i should have done is just join c and a into e...
    my out is now correct... thank you so much!
    See more | Go to post

    Leave a comment:


  • sorry,
    didn't mean to imply that i wasn't willing or able to try. I definitely worked on the tuple method yesterday and still did not sort correctly. the code that i originally posted was that method... well l least thats what i thought that method was.. where e was the tuple
    Code:
    a = []
    d = []
    i = 0
    c = []
    
    
    line = "0001 , 4 , 0.34 , 3 , 15 , 25.3"
    a.append(line.split(','))
    ...
    See more | Go to post

    Leave a comment:


  • Thanks for the reply...
    still not sure how that would help me, as its still sorting by all the values and not just select values/columns in my MD List.
    See more | Go to post

    Leave a comment:


  • erbrose
    started a topic How to sort a multi-dimensional list in python 2.3

    How to sort a multi-dimensional list in python 2.3

    Yes I know.. Python 2.3... unfortunately our development servers are RedHat and that is the default version installed on them and apparently, upgrading can cause system failures (trying to figure out a work around to run multiple versions of python, but not there yet) in the mean time... I need see if any one can help me with sorting a multidimensiona l list by certain elements with in that list. I've read about Schwartzian Transform in Python? but...
    See more | Go to post

  • this is what i've been doing... im pretty new to python too though...
    Code:
    TmpArr = []
    
    for line in openfile:
        #strips line
        line = line.strip()
        TmpArr.append(line.split('\t'))
    now you have a multidimensiona l list (TmpArr)... you can sort by columns by doing something like this

    Code:
    TmpArr.sort(key=lambda a:(a[0]))
    if say you wanted to sort by authorlist
    See more | Go to post

    Leave a comment:


  • erbrose
    replied to parsing a file
    Thanks!
    Alright.. well the only reason I check for line == 0 (or in this case line = "") is the actual end of file.. there will be no NULL lines from the map input. I am still having to pretty much duplicate the code as you see. The code is all over the place too as im still in debug mode... but it is working properly with a csv file as the input.. I am calculating the average, standard deviation, median, min and max values too. Will...
    See more | Go to post

    Leave a comment:


  • erbrose
    replied to parsing a file
    i am messing around with just running the reducer.py with a txt file and am able to process the whole file by adding this
    Code:
    while True:
        line = reader.readline()
        if len(line) != 0:
            <my code here>
        else:
            <repeat my code here>
    seems slightly wrong to have to repeat all my code in the if and the else but it works... am not able to get it to work using sys.stdin......
    See more | Go to post

    Leave a comment:


  • erbrose
    started a topic parsing a file

    parsing a file

    Hey all.
    (hopefully) a quick question here. I am processing data using Hadoop Streaming Map/Reduce.. the map.py is straight forward.. basically takes the input data (in the form of sys.stdin), loads it into a list, sorts that list, then... well not exactly sure what hadoop does with that, but pretty sure it creates a temporarly file much like a csv in memory

    Code:
    for line in sys.stdin:
        <append into a list then sort>
    ...
    See more | Go to post

  • erbrose
    replied to python and hadoop
    one last post (on this subject anyhow!) just wanted to let you know that I was able to complete my first Map/Reduce job on Hadoop with Python! Thanks again for all your help!
    See more | Go to post

    Leave a comment:


  • erbrose
    replied to python and hadoop
    Hey all!
    Well, i am now able to run this code on my sample csv file (3 million rows) on my desktop. It completes in under 1 minute. Which is great! Still having issues on the hadoop end, but i think that problem is not for this forum. Would still appreciate any suggestions or improvements on the code itself as im still very much a newbie!

    Code:
    import time
    t1 = time.clock()
    TmpArr = []
    Unique = []
    SortArr = []
    ...
    See more | Go to post

    Leave a comment:

No activity results to display
Show More
Working...