Simple question: Why does Java write bytecode "high byte first"

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • blazedaces
    Contributor
    • May 2007
    • 284

    Simple question: Why does Java write bytecode "high byte first"

    I was working all day yesterday (and will probably be for most of today) with bytecode where java wrote the bytecode and matlab read it... I had never worked with bytecode in great detail before so this was quite a challenge. I mean, all in all, the experience was interesting and because it was harder then it should have been, more educational.

    Still, I've been told almost every other program reads "low byte first" and thinking about it, people usually when writing in byte-code write "low byte first". I don't know if it's true, but either way, is there a specific reason sun decided to read/write byte-code this way? Perhaps for encryption?

    For those unfamiliar with byte-code, let's look at an example: if your number is let's say 3 and it's two bytes long (we'll say signed since we're talking java here) then it would be written (what I think is normal, "low byte first") simply like this:

    Code:
    00000000 00000011
     [byte 1]    [byte 2]
    But java would write it:
    Code:
    00000011 00000000 
     [byte 1]    [byte 2]
    Just curious. If the answer is simply, "because it does, why are apples red?" then so be it, I just want to know.

    Thank you,

    -blazed
  • JosAH
    Recognized Expert MVP
    • Mar 2007
    • 11453

    #2
    High byte first, or 'big endian' is the international agreed way to transport them
    over network wires etc. It's only Intel and a few others that use low byte first,
    or 'little endian'. The Java virtual machine was a bit inspired by Sun's SPARC
    processor which also uses big endian. When the byte code gets in the claws of
    the JIT compiler (that turns the whole shebang into native machine code), the
    big endian numbers as swapped around to little endian numbers.

    kind regards,

    Jos

    Comment

    • blazedaces
      Contributor
      • May 2007
      • 284

      #3
      Originally posted by JosAH
      High byte first, or 'big endian' is the international agreed way to transport them
      over network wires etc. It's only Intel and a few others that use low byte first,
      or 'little endian'. The Java virtual machine was a bit inspired by Sun's SPARC
      processor which also uses big endian. When the byte code gets in the claws of
      the JIT compiler (that turns the whole shebang into native machine code), the
      big endian numbers as swapped around to little endian numbers.

      kind regards,

      Jos
      Very interesting. I had never known what 'big endian' or 'little endian' meant, but the words continued to pop up as I was trying to find the answer to my problem. I kept assuming that my long,int,short, double, etc. to unsigned byte converters were working incorrectly. Further testing brought me to the right conclusion.

      This makes sense though, but now I'm wondering why Matlab is on default, that is if you don't specify the "machinefor mat" (it calls the option for big/little endian, etc.), little endian read format. I would suggest that the program that reads/writes all these files through matlab change it, but it's too highly used, I doubt they (my employers) would consider it, I'm just a lowly co-op.

      Thank you again for the helpful information,

      -blazed

      Edit: I read further in the help section and found that it does the following by default: "Numeric format of the machine on which MATLAB is running (the default)" - what it calls 'Native' or for short 'n'.

      Comment

      • JosAH
        Recognized Expert MVP
        • Mar 2007
        • 11453

        #4
        The term was first coined by Jonathan Swift in one of his Gulliver's Travels stories
        where two tribes were in a severe fight about which side of a boiled egg should
        be opened to eat it: the big endian side or the little endian side.

        Of course those good old Vaxes had a 'from the middle' format as well.

        kind regards,

        Jos ;-)

        Comment

        Working...