looping through values error....

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • erbrose
    New Member
    • Oct 2006
    • 58

    looping through values error....

    Hey all.
    I do not understand what is wrong with my script and would love some help... first off the examples in my script are based off running a map reduce in hadoop.. the part I am struggling with is the reduce.. my basic input is something like this
    ID--VAL1--VAL2
    41,0,1
    41,1,0
    41,1,0
    46,0,1
    46,0,1
    46,1,0
    46,1,0
    basically I need to loop through each line and check to see if the ID from the next line = the ID from previous line and if it does, add keep a SUM value of both VAL1 and VAL2... and at the end, a total sum of VAL1 + VAL2.
    so something like this
    ID--VAL1_SUM--VAL2_SUM--TOTAL
    41,2,1,3
    46,2,2,4


    The script I have does exactly that..but only for the first set of ID's.. (41).. it always leaves out the last set of ID's (46), so i am getting only this
    ID--VAL1_SUM--VAL2_SUM--TOTAL
    41,2,1,3

    Anyhow.. any help would be appreciated
    Code:
    #!/usr/bin/python
    import sys
    TmpArr = []
    OutArr = []
    i = int(0)
    j = int(0)
    id = ""
    VAL1 = int(0)
    VAL2 = int(0)
    TOTAL = int(0)
    
    for line in sys.stdin:
        j += 1
        try:            
            line = line.strip()
            TmpArr = line.split(',')
            
            if i == 0:
                #first loop always addes line to OutArr too
                OutArr = line.split(',')
                id = OutArr[0]
                VAL1 = VAL1 + int(OutArr[1])
                VAL2 = VAL2 + int(OutArr[2])
                i += 1
            else:
                #now check if the new line id = previous line..
                if TmpArr[0] == OutArr[0]:
                    id = TmpArr[0]
                    VAL1 = VAL1 + int(TmpArr[1])
                    VAL2 = VAL2 + int(TmpArr[2])
                    OutArr = line.split(',')
                    i += 1
                else:
                    #if the new line id != previous line.. then print sums...
                    TOTAL = VAL1 + VAL2
                    print( id + ',' + str(VAL1) + ',' + str(VAL2) + ',' + str(TOTAL) )
                    OutArr = line.split(',')
                    id = OutArr[0]
                    VAL1 = int(OutArr[1])
                    VAL2 = int(OutArr[2])
    
        except ValueError:
            break
    Cheers,
    Eric


    ps I am stuck using Python 2.3 on RedHat servers
  • erbrose
    New Member
    • Oct 2006
    • 58

    #2
    alright, well ignore this post.. Just figured out that if i just print the final values after the loop, it works

    Code:
    #!/usr/bin/python
    import sys
    TmpArr = []
    OutArr = []
    i = int(0)
    j = int(0)
    id = ""
    VAL1 = int(0)
    VAL2 = int(0)
    TOTAL = int(0)
    
    for line in sys.stdin:
        j += 1
        try:            
            line = line.strip()
            TmpArr = line.split(',')
            
            if i == 0:
                #first loop always addes line to OutArr too
                OutArr = line.split(',')
                id = OutArr[0]
                VAL1 = VAL1 + int(OutArr[1])
                VAL2 = VAL2 + int(OutArr[2])
                i += 1
            else:
                #now check if the new line id = previous line..
                if TmpArr[0] == OutArr[0]:
                    id = TmpArr[0]
                    VAL1 = VAL1 + int(TmpArr[1])
                    VAL2 = VAL2 + int(TmpArr[2])
                    OutArr = line.split(',')
                    i += 1
                else:
                    #if the new line id != previous line.. then print sums...
                    TOTAL = VAL1 + VAL2
                    print( id + ',' + str(VAL1) + ',' + str(VAL2) + ',' + str(TOTAL) )
                    OutArr = line.split(',')
                    id = OutArr[0]
                    VAL1 = int(OutArr[1])
                    VAL2 = int(OutArr[2])
    
        except ValueError:
            break
    TOTAL = VAL1 + VAL2
    print( id + ',' + str(VAL1) + ',' + str(VAL2) + ',' + str(TOTAL) )

    Comment

    Working...