UTF-8 file reading and writing for PHP

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • HaggMan

    UTF-8 file reading and writing for PHP

    I'm creating a page that:
    - accepts user input in whatever language
    - saves that input to a file
    - reads the file and displays the original input

    The following code successfully writes the user input to a file (when I
    open the file, it's in the correct font), but I can't get PHP to read
    the file and display the correct characters.

    HTML --------------- Form
    <FORM name=saveform method=post action="wiki.ph p">
    File:
    <TEXTAREA name=thetext rows=20 cols=30></TEXTAREA>
    </TEXTAREA>
    <INPUT type=submit>
    <INPUT type=hidden name=action value="save">
    </FORM>

    PHP --------------- Sticks the data in a file
    $message = $_REQUEST['thetext'];
    echo $message; // This displays the correct stuff
    $filename = "tmp/tmp.txt";
    $fr = fopen($filename , "wb+");
    // adding header
    fwrite($fr, pack("CCC",0xef ,0xbb,0xbf));
    fputs($fr, $message);
    fclose($fr);

    PHP --------------- Read the data from the file
    $thefile = file($filename) ;
    array_shift($th efile); //To get rid of the BOM
    $ret = "";
    foreach ($thefile as $i => $line) {
    $line = rtrim(utf8_deco de($line));
    $ret .= $line;
    }
    echo $ret; // This _doesn't_ display the correct stuff

  • Bent Stigsen

    #2
    Re: UTF-8 file reading and writing for PHP

    HaggMan wrote:[color=blue]
    > I'm creating a page that:
    > - accepts user input in whatever language
    > - saves that input to a file
    > - reads the file and displays the original input
    >
    > The following code successfully writes the user input to a file (when I
    > open the file, it's in the correct font), but I can't get PHP to read
    > the file and display the correct characters.[/color]
    [snip][color=blue]
    > PHP --------------- Sticks the data in a file
    > $message = $_REQUEST['thetext'];
    > echo $message; // This displays the correct stuff
    > $filename = "tmp/tmp.txt";
    > $fr = fopen($filename , "wb+");
    > // adding header
    > fwrite($fr, pack("CCC",0xef ,0xbb,0xbf));[/color]

    Is it safe to assume the data to be UTF-8?

    If you just discard the byteordermark later, there's little reason to
    add it (if there ever was).
    [color=blue]
    > fputs($fr, $message);
    > fclose($fr);
    >
    > PHP --------------- Read the data from the file
    > $thefile = file($filename) ;
    > array_shift($th efile); //To get rid of the BOM[/color]

    BOM = 3 bytes
    $thefile = array of lines of text terminated by newline.
    [color=blue]
    > $ret = "";
    > foreach ($thefile as $i => $line) {
    > $line = rtrim(utf8_deco de($line));[/color]

    I am not sure what to make of this. If you expect the browser to send
    data in utf-8, then I would assume you serve your pages in utf-8, then
    why convert the text to iso8859-1?
    [color=blue]
    > $ret .= $line;
    > }
    > echo $ret; // This _doesn't_ display the correct stuff[/color]

    Start with simple file_put_conten ts and file_get_conten ts.


    /Bent

    Comment

    • HaggMan

      #3
      Re: UTF-8 file reading and writing for PHP

      Thanks for the reply...

      My goal is to allow user input in UTF-8, in Arabic script, for example.
      I then save what they input to a file. Then I'd like to retrieve and
      print out the original stuff that they wrote.

      I've tried various variations of utf8_encode() and utf8_decode() and
      even without them, and every time, the resulting stuff is just ????? or
      other weird characters.

      Bent Stigsen wrote:[color=blue]
      > HaggMan wrote:[color=green]
      > > I'm creating a page that:
      > > - accepts user input in whatever language
      > > - saves that input to a file
      > > - reads the file and displays the original input
      > >
      > > The following code successfully writes the user input to a file (when I
      > > open the file, it's in the correct font), but I can't get PHP to read
      > > the file and display the correct characters.[/color]
      > [snip][color=green]
      > > PHP --------------- Sticks the data in a file
      > > $message = $_REQUEST['thetext'];
      > > echo $message; // This displays the correct stuff
      > > $filename = "tmp/tmp.txt";
      > > $fr = fopen($filename , "wb+");
      > > // adding header
      > > fwrite($fr, pack("CCC",0xef ,0xbb,0xbf));[/color]
      >
      > Is it safe to assume the data to be UTF-8?
      >
      > If you just discard the byteordermark later, there's little reason to
      > add it (if there ever was).
      >[color=green]
      > > fputs($fr, $message);
      > > fclose($fr);
      > >
      > > PHP --------------- Read the data from the file
      > > $thefile = file($filename) ;
      > > array_shift($th efile); //To get rid of the BOM[/color]
      >
      > BOM = 3 bytes
      > $thefile = array of lines of text terminated by newline.
      >[color=green]
      > > $ret = "";
      > > foreach ($thefile as $i => $line) {
      > > $line = rtrim(utf8_deco de($line));[/color]
      >
      > I am not sure what to make of this. If you expect the browser to send
      > data in utf-8, then I would assume you serve your pages in utf-8, then
      > why convert the text to iso8859-1?
      >[color=green]
      > > $ret .= $line;
      > > }
      > > echo $ret; // This _doesn't_ display the correct stuff[/color]
      >
      > Start with simple file_put_conten ts and file_get_conten ts.
      >
      >
      > /Bent[/color]

      Comment

      • Bent Stigsen

        #4
        Re: UTF-8 file reading and writing for PHP

        HaggMan wrote:[color=blue]
        > Thanks for the reply...
        >
        > My goal is to allow user input in UTF-8, in Arabic script, for example.
        > I then save what they input to a file. Then I'd like to retrieve and
        > print out the original stuff that they wrote.
        >
        > I've tried various variations of utf8_encode() and utf8_decode() and
        > even without them, and every time, the resulting stuff is just ????? or
        > other weird characters.[/color]
        [snip]

        If what you get sent is in UTF-8 and the page you send out is in
        UTF-8, then you don't need to do anything.

        Make sure what you get really is UTF-8 (e.g.
        "mb_detect_enco ding($thetext)" )

        Also make sure the browser is told it is UTF-8. (check headers,
        metatags, xml-declaration)


        /Bent

        Comment

        • R. Rajesh Jeba Anbiah

          #5
          Re: UTF-8 file reading and writing for PHP

          HaggMan wrote:[color=blue]
          > I'm creating a page that:
          > - accepts user input in whatever language
          > - saves that input to a file
          > - reads the file and displays the original input
          >
          > The following code successfully writes the user input to a file (when I
          > open the file, it's in the correct font), but I can't get PHP to read
          > the file and display the correct characters.[/color]
          <snip>

          Save your (processing) PHP script in UTF-8.

          --
          <?php echo 'Just another PHP saint'; ?>
          Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/

          Comment

          • HaggMan

            #6
            Re: UTF-8 file reading and writing for PHP

            Thank you, both of you, for your help... I finally figured out what
            does it:

            The part I didn't mention (go figure, I thought it was harmless) is
            that I was doing some str_replaces (with regex) on the UTF8 stuff, and
            that is what messed up the output. So, when I accept the normal input,
            save it to the file, and throw out the normal output, it comes out
            exactly as I typed it (I'm testing with Arabic).

            So I guess a followup question would be: How do I parse through UTF8
            stuff with regexpressions? I'll do some research tomorrow.

            Thank you again!

            HaggMan


            R. Rajesh Jeba Anbiah wrote:[color=blue]
            > HaggMan wrote:[color=green]
            > > I'm creating a page that:
            > > - accepts user input in whatever language
            > > - saves that input to a file
            > > - reads the file and displays the original input
            > >
            > > The following code successfully writes the user input to a file (when I
            > > open the file, it's in the correct font), but I can't get PHP to read
            > > the file and display the correct characters.[/color]
            > <snip>
            >
            > Save your (processing) PHP script in UTF-8.
            >
            > --
            > <?php echo 'Just another PHP saint'; ?>
            > Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/[/color]

            Comment

            • R. Rajesh Jeba Anbiah

              #7
              Re: UTF-8 file reading and writing for PHP

              HaggMan wrote:
              <snip>[color=blue]
              > So I guess a followup question would be: How do I parse through UTF8
              > stuff with regexpressions? I'll do some research tomorrow.[/color]

              You may use mb_ereg and any other mb string functions
              <http://in2.php.net/mbstring>

              And, may also use hexadecimal representation with PCRE functions such
              as preg_match() <1112468085.876 654.134690@o13g 2000cwo.googleg roups.com>
              ( http://groups.google.com/group/comp....4b602f9b5a78b? ),
              but it will be more cumbersome.

              --
              <?php echo 'Just another PHP saint'; ?>
              Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/

              Comment

              Working...