Hey all.
(hopefully) a quick question here. I am processing data using Hadoop Streaming Map/Reduce.. the map.py is straight forward.. basically takes the input data (in the form of sys.stdin), loads it into a list, sorts that list, then... well not exactly sure what hadoop does with that, but pretty sure it creates a temporarly file much like a csv in memory
I then have a reduce.py that takes each line from the mysterious hadoop temp file and loads it as sys.stdin... like this...
my issue is as soon as line = nothing, the process ends... even if there is still data to process. Is there an error checking way to fix this with stdin?
an example would be this... my table looks like this
ID---VAL
01--20
01--22
01--25
02--10
02--15
02--17
03--5
03--7
my output SHOULD look like this
ID---AVG---COUNT
01--22.3--3
02--14.0--3
03--6.0--2
but its coming out like this
ID---AVG---COUNT
01--22.3--3
02--14.0--3
Sorry this is so long winded and thanks for any input. Also, i could post my whole code if needed but its a bit long winded too!
Cheers,
Eric
(hopefully) a quick question here. I am processing data using Hadoop Streaming Map/Reduce.. the map.py is straight forward.. basically takes the input data (in the form of sys.stdin), loads it into a list, sorts that list, then... well not exactly sure what hadoop does with that, but pretty sure it creates a temporarly file much like a csv in memory
Code:
for line in sys.stdin:
<append into a list then sort>
for m in TmpArr:
print m
I then have a reduce.py that takes each line from the mysterious hadoop temp file and loads it as sys.stdin... like this...
Code:
for line in sys.stdin:
<load into temporary list and do some stuff>
an example would be this... my table looks like this
ID---VAL
01--20
01--22
01--25
02--10
02--15
02--17
03--5
03--7
my output SHOULD look like this
ID---AVG---COUNT
01--22.3--3
02--14.0--3
03--6.0--2
but its coming out like this
ID---AVG---COUNT
01--22.3--3
02--14.0--3
Sorry this is so long winded and thanks for any input. Also, i could post my whole code if needed but its a bit long winded too!
Cheers,
Eric
Comment