file read, binary or text mode

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Guyon Morée

    file read, binary or text mode

    what is the difference?

    if I open a text file in binary (rb) mode, it doesn't matter... the read()
    output is the same.



  • Peter Hansen

    #2
    Re: file read, binary or text mode

    Guyon Morée wrote:[color=blue]
    > what is the difference?
    >
    > if I open a text file in binary (rb) mode, it doesn't matter... the read()
    > output is the same.[/color]

    If you are on Linux that's the case... or under other
    conditions. Maybe describing your platform and showing
    an example of what you're trying to do would be helpful.

    -Peter

    Comment

    • Grant Edwards

      #3
      Re: file read, binary or text mode

      On 2004-09-24, Guyon Morée <gumuz@NO_looze _SPAM.net> wrote:
      [color=blue]
      > what is the difference?[/color]

      42?
      [color=blue]
      > if I open a text file in binary (rb) mode, it doesn't matter... the read()
      > output is the same.[/color]

      OK...

      --
      Grant Edwards grante Yow! They
      at collapsed... like nuns
      visi.com in the street... they had
      no teenappeal!

      Comment

      • Askari

        #4
        Re: file read, binary or text mode

        "Guyon Morée" <gumuz@NO_looze _SPAM.net> wrote in
        news:41540121$0 $3891$4d4ebb8e@ news.nl.uu.net:
        [color=blue]
        > what is the difference?
        >
        > if I open a text file in binary (rb) mode, it doesn't matter... the
        > read() output is the same.
        >
        >
        >
        >[/color]

        "rb" and "r" on a text file is the same if your text file have ascii
        caractere (8bit) but it's not the same for Unicode caractere (16 bit).
        Bref, if you sure that your file is ONLY text, use "r", else, use always
        "rb". And "r" don't read the control caractere other that "\n" "\t" .. etc

        Comment

        • Guyon Morée

          #5
          Re: file read, binary or text mode

          ok, i have huffman encoding code.

          this is actually build for text, but because python can also read a binary
          file as a string, this applies equally well :)

          but, i was just wondering if this gives any problems if I use text-mode read
          for the binary files and vice versa.

          If I undertand correctly now, using binary mode is _always_ save, right?


          "Peter Hansen" <peter@engcorp. com> wrote in message
          news:MaqdnVoFl5 tkmsncRVn-uQ@powergate.ca ...[color=blue]
          > Guyon Morée wrote:[color=green]
          > > what is the difference?
          > >
          > > if I open a text file in binary (rb) mode, it doesn't matter... the[/color][/color]
          read()[color=blue][color=green]
          > > output is the same.[/color]
          >
          > If you are on Linux that's the case... or under other
          > conditions. Maybe describing your platform and showing
          > an example of what you're trying to do would be helpful.
          >
          > -Peter[/color]


          Comment

          • Roel Schroeven

            #6
            Re: file read, binary or text mode

            Guyon Morée wrote:[color=blue]
            > what is the difference?[/color]

            On Unix/Linux, none.

            On Windows, binary mode is just that while text mode translates "\r\n"
            (or "\n\r", I always forget) to "\n" on input and vice-versa on output.

            I don't know about other platforms.
            [color=blue]
            > if I open a text file in binary (rb) mode, it doesn't matter... the read()
            > output is the same.[/color]

            Depends on your platform, and the format of the text file (Unix, Windows
            or other platform style line endings).

            --
            "Codito ergo sum"
            Roel Schroeven

            Comment

            • Grant Edwards

              #7
              Re: file read, binary or text mode

              On 2004-09-24, Guyon Morée <gumuz@NO_looze _SPAM.net> wrote:
              [color=blue]
              > ok, i have huffman encoding code.[/color]

              You should open the file in binary.
              [color=blue]
              > this is actually build for text,[/color]

              All of the Huffman encoding implimentations I've seen output
              binary, but I'll take your word for it.
              [color=blue]
              > but because python can also
              > read a binary file as a string, this applies equally well :)[/color]

              If the file contains printiable text with cr/nl, nl, or cr line
              endings, then open it in text mode. Otherwise open it in
              binary mode.
              [color=blue]
              > but, i was just wondering if this gives any problems if I use
              > text-mode read for the binary files and vice versa.[/color]

              Yes, it will give you problems.
              [color=blue]
              > If I undertand correctly now, using binary mode is _always_ save, right?[/color]

              No.

              If it's text, open it in text mode. That way the line endings
              are handled properly.

              --
              Grant Edwards grante Yow! I think I'll do BOTH
              at if I can get RESIDUALS!!
              visi.com

              Comment

              • Peter Hansen

                #8
                Re: file read, binary or text mode

                Guyon Morée wrote:[color=blue]
                > ok, i have huffman encoding code.
                >
                > this is actually build for text, but because python can also read a binary
                > file as a string, this applies equally well :)
                >
                > but, i was just wondering if this gives any problems if I use text-mode read
                > for the binary files and vice versa.
                >
                > If I undertand correctly now, using binary mode is _always_ save, right?[/color]

                You're not helping a whole lot here. What platform are you using?
                I'll assume from the headers in your message that it's Windows.
                If that's true, then forget about text and binary and ASCII for
                a moment, and just consider this.

                If you open a file on Windows using "r" or "rt" or the default (which
                is "r"), then when you read the file any occurrences of the byte
                sequence 13 followed by 10 (that is, CR LF or \r\n or whatever you want
                to call it) will be replaced as the file is read by just the 10, or the
                LF, or the \n, or whatever you want to call it.

                If you use "rb" instead of just "r" or the default, then this
                translation will not occur and you will retrieve all bytes in
                the file just as they are stored there.

                It's up to you to pick the behaviour you need. Saying it's
                "huffman encoding code" doesn't really help, since that doesn't
                refer to any universal standard representation data. It
                seems likely that it's binary (i.e. the translation provided by
                not using "rb" is undesirable), but nobody here knows where you
                got that file or what it contains.

                And in case that doesn't answer the questions above: (1) yes,
                it can definitely give problems reading text files as binary
                and vice versa, and (2) binary mode applies whenever "b" is
                used on Windows, and not otherwise, so if you save a file without
                using "wb" you will get the same translation as above but in
                the reverse direction (LF or \n gets turned into CR LF or \r\n
                on output).

                -Peter

                Comment

                • Roel Schroeven

                  #9
                  Re: file read, binary or text mode

                  Guyon Morée wrote:
                  [color=blue]
                  > ok, i have huffman encoding code.
                  >
                  > this is actually build for text, but because python can also read a binary
                  > file as a string, this applies equally well :)
                  >
                  > but, i was just wondering if this gives any problems if I use text-mode read
                  > for the binary files and vice versa.
                  >
                  > If I undertand correctly now, using binary mode is _always_ save, right?[/color]

                  It's safe in the sense that everything goes out exactly as it came in.
                  For example, gzip uses binary mode even when compressing text files. The
                  files may be text, but gzip doesn't care about that. It doesn't care
                  about words, sentences and line endings, but it does care about
                  representing exactly the bytes that are in the file.

                  Editors, diff, wc, ... use text mode.
                  cp, tar, gzip, ... use binary mode.

                  --
                  "Codito ergo sum"
                  Roel Schroeven

                  Comment

                  • Terry Reedy

                    #10
                    Re: file read, binary or text mode


                    "Askari" <askari@address NonValide.com> wrote in message
                    news:Xns956E4CD A892D7askariadd ressNonVali@207 .35.177.135...[color=blue]
                    > "Guyon Morée" <gumuz@NO_looze _SPAM.net> wrote in
                    > news:41540121$0 $3891$4d4ebb8e@ news.nl.uu.net:
                    >
                    > "rb" and "r" on a text file is the same if your text file have ascii
                    > caractere (8bit) but it's not the same for Unicode caractere (16 bit).
                    > Bref, if you sure that your file is ONLY text, use "r", else, use always
                    > "rb". And "r" don't read the control caractere other that "\n" "\t" ..
                    > etc[/color]

                    Newbies, ignore this confusion.

                    On Windows, text mode autoconverts \r\n to \n on input and viceverse on
                    output. I believe that that is all the difference. Period.

                    Terry J. Reedy



                    Comment

                    • Ralf Schmitt

                      #11
                      Re: file read, binary or text mode

                      "Terry Reedy" <tjreedy@udel.e du> writes:
                      [color=blue]
                      >
                      > Newbies, ignore this confusion.
                      >
                      > On Windows, text mode autoconverts \r\n to \n on input and viceverse on
                      > output. I believe that that is all the difference. Period.
                      >[/color]

                      That's not quite the case. As always windows sucks big time:

                      $ cat bla.py
                      open("b.txt", "w").write("bla \x1a")
                      print len(open("b.txt ", "rb").read( ))
                      open("b.txt", "a+")
                      print len(open("b.txt ", "rb").read( ))

                      ralf@CRACK ~
                      $ python bla.py
                      4
                      3


                      The last character gets stripped if it's 0x1a when opening a file for
                      appending in text mode. I remember this from a posting on the metakit
                      mailing list. The poor guy corrupted his databases while he wanted to
                      check for write access:


                      - Ralf

                      --
                      brainbot technologies ag
                      boppstrasse 64 . 55118 mainz . germany
                      fon +49 6131 211639-1 . fax +49 6131 211639-2
                      http://brainbot.com/ mailto:ralf@bra inbot.com

                      Comment

                      • Peter Hansen

                        #12
                        Re: file read, binary or text mode

                        Ralf Schmitt wrote:[color=blue]
                        > "Terry Reedy" <tjreedy@udel.e du> writes:[color=green]
                        >>On Windows, text mode autoconverts \r\n to \n on input and viceverse on
                        >>output. I believe that that is all the difference. Period.[/color]
                        >
                        > That's not quite the case. As always windows sucks big time:[/color]
                        [snip example with ^Z][color=blue]
                        > The last character gets stripped if it's 0x1a when opening a file for
                        > appending in text mode.[/color]

                        Good point. Note for the picky: it doesn't just get stripped... it
                        *is* the last character, even if there's data following. Or to
                        be blunt, ^Z (byte value 26) is treated as EOF on Windows when not
                        using binary mode to read files.

                        I suspect Terry and others (including I) overlooked this because
                        ^Z is pretty much obsolete, and since few applications *write*
                        ^Z as the last character of text files any more, almost nobody
                        bothers to remember that text mode is slightly more complicated
                        than just the CR LF to LF conversion and back.

                        -Peter

                        Comment

                        • Grant Edwards

                          #13
                          Re: file read, binary or text mode

                          On 2004-09-24, Peter Hansen <peter@engcorp. com> wrote:
                          [color=blue]
                          > Good point. Note for the picky: it doesn't just get stripped... it
                          > *is* the last character, even if there's data following. Or to
                          > be blunt, ^Z (byte value 26) is treated as EOF on Windows when not
                          > using binary mode to read files.[/color]

                          <history>

                          That's because CP/M allocated file space in blocks and only
                          kept track of the length of the file in blocks. It was common
                          practice to mark the end of the "real" data in a text file with
                          a ^Z (IIRC, this was done by the application writing to the
                          file). Otherwise, you had no way of knowing _where_ in that
                          last block the data actually ended.

                          The original MS/PC-DOS was basically a CP/M clone.

                          I presume CP/M copied that behavior from RSX-11 or RT-11, but
                          that's just an educated guess.

                          </history>

                          --
                          Grant Edwards grante Yow! My mind is making
                          at ashtrays in Dayton...
                          visi.com

                          Comment

                          • Roel Schroeven

                            #14
                            Re: file read, binary or text mode

                            Terry Reedy wrote:
                            [color=blue]
                            > "Askari" <askari@address NonValide.com> wrote in message
                            > news:Xns956E4CD A892D7askariadd ressNonVali@207 .35.177.135...
                            >[color=green]
                            >>"Guyon Morée" <gumuz@NO_looze _SPAM.net> wrote in
                            >>news:41540121 $0$3891$4d4ebb8 e@news.nl.uu.ne t:
                            >>
                            >>"rb" and "r" on a text file is the same if your text file have ascii
                            >>caractere (8bit) but it's not the same for Unicode caractere (16 bit).
                            >>Bref, if you sure that your file is ONLY text, use "r", else, use always
                            >>"rb". And "r" don't read the control caractere other that "\n" "\t" ..
                            >>etc[/color]
                            >
                            >
                            > Newbies, ignore this confusion.
                            >
                            > On Windows, text mode autoconverts \r\n to \n on input and viceverse on
                            > output. I believe that that is all the difference. Period.[/color]

                            It's the main difference, but not the only thing. From the MSDN
                            documentation on fopen:

                            "t

                            Open in text (translated) mode. In this mode, CTRL+Z is interpreted as
                            an end-of-file character on input. In files opened for reading/writing
                            with "a+", fopen checks for a CTRL+Z at the end of the file and removes
                            it, if possible. This is done because using fseek and ftell to move
                            within a file that ends with a CTRL+Z, may cause fseek to behave
                            improperly near the end of the file.

                            Also, in text mode, carriage return–linefeed combinations are translated
                            into single linefeeds on input, and linefeed characters are translated
                            to carriage return–linefeed combinations on output. When a Unicode
                            stream-I/O function operates in text mode (the default), the source or
                            destination stream is assumed to be a sequence of multibyte characters.
                            Therefore, the Unicode stream-input functions convert multibyte
                            characters to wide characters (as if by a call to the mbtowc function).
                            For the same reason, the Unicode stream-output functions convert wide
                            characters to multibyte characters (as if by a call to the wctomb
                            function)."

                            So there's
                            - the line endings translation
                            - the issue of CTRL-Z as end of file that gets stripped (CTRL-Z is
                            decimal 26 or hex 1a, consistent with Ralf's mail)
                            - the Unicode issue, which I frankly don't understand

                            --
                            "Codito ergo sum"
                            Roel Schroeven

                            Comment

                            • Alan G Isaac

                              #15
                              Re: file read, binary or text mode

                              "Roel Schroeven" <rschroev_nospa m_ml@fastmail.f m> wrote in message
                              news:OjW4d.2559 17$OR1.13371520 @phobos.telenet-ops.be...[color=blue]
                              > It's safe in the sense that everything goes out exactly as it came in.
                              > For example, gzip uses binary mode even when compressing text files. The
                              > files may be text, but gzip doesn't care about that. It doesn't care
                              > about words, sentences and line endings, but it does care about
                              > representing exactly the bytes that are in the file.[/color]

                              I think the following is the same question from another angle.
                              I have an .zip archive of compressed files that
                              I want to decompress. Using the zipfile module,
                              I tried
                              z=zipfile.ZipFi le(local.zip)
                              for zname in z.namelist():
                              localtxtfile='c :/puthere/'+zname
                              f=open(localtxt file,'w')
                              f.write(z.read( zname))
                              f.close

                              The original files were all plain text,
                              created on an unspecified platform.
                              The files I decompressed this way contained
                              *two successive* carriage returns
                              (ASCII 13) at the end of each line.
                              If I change 'w' to 'wb' I get only one
                              carriage return at the end of each line.

                              Why is this extra carriage return added?
                              My original guess was the using 'w' instead
                              of 'wb' would be the right action, since the
                              platform for the original files is unspecified
                              and the original files are known to be plain text.

                              Thanks,
                              Alan Isaac


                              Comment

                              Working...