Reading Java byte[] data stream over standard input

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • sapsi

    Reading Java byte[] data stream over standard input

    Hello,
    I am using HadoopStreaming using a BinaryInputStre am. What this
    basically does is send a stream of bytes ( the java type is : private
    byte[] bytes) to my python program.

    I have done a test like this,
    while 1:
    x=sys.stdin.rea d(100)
    if x:
    print x
    else:
    break

    Now, the incoming data is binary(though mine is actually merely ascii
    text) but the output is not what is expected. I expect for e.g

    all/86000/114.310.151.209 .60370-121.110.5.176.1 13\n62485.9718
    118.010.241.12 60370 128.210.5.176

    However i get a 1 before all and a 4 just after \n and before the 6.

    My question is : how do i read binary data(Java's byte stream) from
    stdin?
    Or is this actually what i'm getting?

    Thanks
    Sapsi
  • sapsi

    #2
    Re: Reading Java byte[] data stream over standard input

    I should also mention that for some reason there are several binay
    values popping in between for some reason. This behavior (for the
    inputr stream) is not expected

    Now, the incoming data is binary(though mine is actually merely ascii
    text) but the output is not what is expected. I expect for e.g
    >
    all/86000/114.310.151.209 .60370-121.110.5.176.1 13\n62485.9718
    118.010.241.12 60370 128.210.5.176
    >
    However i get a 1 before all and a 4 just after \n and before the 6.
    >
    My question is : how do i read binary data(Java's byte stream) from
    stdin?
    Or is this actually what i'm getting?
    >
    Thanks
    Sapsi

    Comment

    • Marc 'BlackJack' Rintsch

      #3
      Re: Reading Java byte[] data stream over standard input

      On Sun, 18 May 2008 22:11:33 -0700, sapsi wrote:
      I am using HadoopStreaming using a BinaryInputStre am. What this
      basically does is send a stream of bytes ( the java type is : private
      byte[] bytes) to my python program.
      >
      I have done a test like this,
      while 1:
      x=sys.stdin.rea d(100)
      if x:
      print x
      else:
      break
      >
      Now, the incoming data is binary(though mine is actually merely ascii
      text) but the output is not what is expected. I expect for e.g
      >
      all/86000/114.310.151.209 .60370-121.110.5.176.1 13\n62485.9718
      118.010.241.12 60370 128.210.5.176
      >
      However i get a 1 before all and a 4 just after \n and before the 6.
      >
      My question is : how do i read binary data(Java's byte stream) from
      stdin?
      Or is this actually what i'm getting?
      If there's extra data in `x` then it was sent to stdin. Maybe there's
      some extra information like string length, Java type information, or
      checksums encoded in that data!?

      Ciao,
      Marc 'BlackJack' Rintsch

      Comment

      • sapsi

        #4
        Re: Reading Java byte[] data stream over standard input

        Yes, that could be the case. Browsing through hadoop's source, i see
        stdin in the above code is reading from piped Java DataOutputStrea m.
        I read of a libray on the net Javadata.py that reads this but it has
        disappeared.
        What is involved in reading from a Dataoutputstrea m?

        Thank you
        Sapsi

        Comment

        • Marc 'BlackJack' Rintsch

          #5
          Re: Reading Java byte[] data stream over standard input

          On Mon, 19 May 2008 00:14:25 -0700, sapsi wrote:
          Yes, that could be the case. Browsing through hadoop's source, i see
          stdin in the above code is reading from piped Java DataOutputStrea m.
          I read of a libray on the net Javadata.py that reads this but it has
          disappeared.
          What is involved in reading from a Dataoutputstrea m?
          According to the Java docs of `DataInput` and `DataOutput` it is quite
          simple. Most methods just seem to write the necessary bytes for the
          primitive types except `writeUTF()` which prefixes the string data with
          length information.

          So if it is not Strings you are writing then "hadoop" seems to throw in
          some information into the stream.

          Ciao,
          Marc 'BlackJack' Rintsch

          Comment

          • Giles Brown

            #6
            Re: Reading Java byte[] data stream over standard input

            On 19 May, 06:11, sapsi <saptarshi.g... @gmail.comwrote :
            Hello,
            I am using HadoopStreaming using a BinaryInputStre am. What this
            basically does is send a stream of bytes ( the java type is : private
            byte[] bytes) to my python program.
            >
            I have done a test like this,
            while 1:
            x=sys.stdin.rea d(100)
            if x:
            print x
            else:
            break
            >
            Now, the incoming data is binary(though mine is actually merely ascii
            text) but the output is not what is expected. I expect for e.g
            >
            all/86000/114.310.151.209 .60370-121.110.5.176.1 13\n62485.9718
            118.010.241.12 60370 128.210.5.176
            >
            However i get a 1 before all and a 4 just after \n and before the 6.
            >
            My question is : how do i read binary data(Java's byte stream) from
            stdin?
            Or is this actually what i'm getting?
            >
            Thanks
            Sapsi
            In the past I've sent binary data to a java applet reading
            DataInputStream using xdrlib from the standard library. I'd expect
            that it would work in the reverse direction so I suggest you have a
            look at that.

            Giles

            Comment

            • John Machin

              #7
              Re: Reading Java byte[] data stream over standard input

              sapsi wrote:
              I should also mention that for some reason there are several binay
              values popping in between for some reason. This behavior (for the
              inputr stream) is not expected
              >
              >
              >Now, the incoming data is binary(though mine is actually merely ascii
              >text) but the output is not what is expected. I expect for e.g
              >>
              >all/86000/114.310.151.209 .60370-121.110.5.176.1 13\n62485.9718
              >118.010.241. 12 60370 128.210.5.176
              >>
              >However i get a 1 before all and a 4 just after \n and before the 6.
              >>
              >My question is : how do i read binary data(Java's byte stream) from
              >stdin?
              >Or is this actually what i'm getting?
              >>
              Consider changing "print x" to "print repr(x)" ... this would mean that
              you have a better chance of understanding what the extra or unexpected
              popping-in bytes are.

              Comment

              Working...