Character conversion

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Rudolf Horbas

    Character conversion

    Hi all,

    in a string from a form post, I sometimes get characters like ő
    (õ) as raw data (not masked). ( "Å‘", if Your mail program
    supports this).

    It appears that Mozilla converts the input automatically to ő, so I
    end up with ő in my database (MSIE does not convert it).

    My problem is that I need to export these data to a csv-file, where
    ő is no good.

    I've tried to use preg_replace_ca llback() with chr() to convert the
    values back when exporting, but I get inconsistent results for this one
    character (others seem to work):

    ------------ TEST ------------

    <?php
    function replaceChars( $matches )
    {
    return chr( $matches[1] );
    }
    ?>
    <form method="POST" action="#">
    <input type="text" name="test">
    </form>

    <pre>
    raw: <?=$_POST['test'];?>

    htmlentities(): <?=htmlentities ( $_POST['test'] );?>

    chr: <?=preg_replace _callback( "/&#([0-9]+);/",
    replaceChars,
    $_POST['test'] );
    ?>
    &amp;#337;: ő

    chr(337): <?=chr(337);? >
    </pre>

    ------------ /TEST ------------

    Am I making a dumb mistake? Is the value in &#(int); not the value for
    the chr()-argument?

    Thanks,
    Rudi
  • Pedro Graca

    #2
    Re: Character conversion

    Rudolf Horbas wrote:[color=blue]
    > <form method="POST" action="#">[/color]

    Try this:

    <form method="POST" action="#" accept-charset="iso-8859-1">


    info @ http://www.w3.org/TR/html4/interact/forms.html

    --
    USENET would be a better place if everybody read: : mail address :
    http://www.catb.org/~esr/faqs/smart-questions.html : is valid for :
    http://www.netmeister.org/news/learn2quote2.html : "text/plain" :
    http://www.expita.com/nomime.html : to 10K bytes :

    Comment

    • Rudolf Horbas

      #3
      Re: Character conversion

      Pedro Graca wrote:
      [color=blue]
      > Try this:
      >
      > <form method="POST" action="#" accept-charset="iso-8859-1">[/color]

      Thanks Pedro -- this fixed my problem of the data getting in.

      For the export part (and out of curiosity):

      Why isn't this working as expected?:

      <?php
      function replaceChars( $matches )
      {
      return chr( $matches[1] );
      }
      echo preg_replace_ca llback( "/&#([0-9]+);/",
      replaceChars,
      "ő" );
      ?>

      .... which of course sums up in <?=chr(337)?>

      <?=chr(123)?> == &#123;

      but

      <?=chr(337)?> != ő

      Rudi

      Comment

      • Pedro Graca

        #4
        Re: Character conversion

        Rudolf Horbas wrote:[color=blue]
        > <?=chr(123)?> == &#123;
        >
        > but
        >
        > <?=chr(337)?> != ő[/color]


        chr(x) is the same as chr( x % 256 )

        so chr(123) is chr(123) :)

        but chr(337) is chr(81) :(


        Maybe iconv functions would help you
        PHP is a popular general-purpose scripting language that powers everything from your blog to the most popular websites in the world.

        --
        USENET would be a better place if everybody read: : mail address :
        http://www.catb.org/~esr/faqs/smart-questions.html : is valid for :
        http://www.netmeister.org/news/learn2quote2.html : "text/plain" :
        http://www.expita.com/nomime.html : to 10K bytes :

        Comment

        • Daniel Tryba

          #5
          Re: Character conversion

          Rudolf Horbas <rhorbas@gmx.ne t> wrote:[color=blue][color=green]
          >> <form method="POST" action="#" accept-charset="iso-8859-1">[/color]
          >
          > Thanks Pedro -- this fixed my problem of the data getting in.[/color]

          In what characterset are pages being served? If the encoding sent by the
          server is UTF-8, a clients response should be UTF-8.
          [color=blue]
          > ... which of course sums up in <?=chr(337)?>
          >
          > <?=chr(123)?> == &#123;
          >
          > but
          >
          > <?=chr(337)?> != ő[/color]


          <q>
          Description
          string chr ( int ascii)

          Returns a one-character string containing the character specified by
          ascii.
          </q>

          ő is the unicode character at index 337. The first 256 characters
          of unicode are compatible with iso-8859-1 (which includes ascii).

          --

          Daniel Tryba

          Comment

          • Rudolf Horbas

            #6
            Re: Character conversion

            Pedro Graca wrote:[color=blue]
            > chr(x) is the same as chr( x % 256 )
            >
            > so chr(123) is chr(123) :)
            >
            > but chr(337) is chr(81) :([/color]

            That was the dumb mistake I was afraid I'm making :-)

            I get it chr() does not serve my purpose here ...
            [color=blue]
            > Maybe iconv functions would help you
            > http://www.php.net/iconv[/color]

            No, that's _way_ too much hassle (to install on our server).

            I only get a couple Hungarians who sign up for a congress; and only few
            of them bring in these special chars (it's ő, Ő, ű,
            Ű) from the ISO 8859-2 charset:
            (http://en.wikipedia.org/wiki/Hungari...Writing_system)

            I'm doing a dumb str_replace() on them to õ, Õ, û, Û, which are very
            similar (mentioned in the wikipedia article.

            Thanks Pedro (and Daniel) -- another lesson learned.

            Rudi

            Comment

            • Daniel Tryba

              #7
              Re: Character conversion

              Rudolf Horbas <rhorbas@gmx.ne t> wrote:[color=blue][color=green]
              >> Maybe iconv functions would help you
              >> http://www.php.net/iconv[/color]
              >
              > No, that's _way_ too much hassle (to install on our server).[/color]

              Maybe you should take a look at PHPs multibyte string support,


              After a colleague found this beauty, we decided to use it to swith the
              whole website to UTF-8. This solved some headaches for the application
              (yet another webmail thingy) concerning encodings of the content. For
              example you can "mix" multiple character sets to together, so we can now
              show some nice korean spam mail in the correct font and at the same time
              display an ad for spam filters using the euro symbol :)

              --

              Daniel Tryba

              Comment

              Working...