htmentities does not translate german "umlaute"

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Robert Zierhofer

    htmentities does not translate german "umlaute"

    Hi all,

    I currently face a problem with htmlentities and german "umlaute".
    After moving my scripts to a new box (from Linux to FreeBSD) I had to
    see that htmlentities is not working anymore.
    The BSD Server (FreeBSD 5.1.2) runs PHP 4.3.9 and Apache 2 as well as
    the Linux Server does/did too.

    I also tried defining the charset with ISO 8859-1 as 3rd parameter in
    htmlentities but without a result.

    Any suggestions how to solve this mysterious misery?

    Thx
    Rob
  • Michael Fesser

    #2
    Re: htmentities does not translate german "umlaute&q uot;

    .oO(Robert Zierhofer)
    [color=blue]
    >I currently face a problem with htmlentities and german "umlaute".[/color]

    Another question: Why do you want to translate them?

    Micha

    Comment

    • Robert Zierhofer

      #3
      Re: htmentities does not translate german "umlaute&q uot;

      Michael Fesser wrote:[color=blue]
      > .oO(Robert Zierhofer)
      >
      >[color=green]
      >>I currently face a problem with htmlentities and german "umlaute".[/color]
      >
      >
      > Another question: Why do you want to translate them?
      >
      > Micha[/color]
      Well, I would say that the HTML equivalents are a bit more reliable in
      terms of browser display than ä, ö and ü's.
      Wouldn't you agree :)

      Comment

      • Michael Fesser

        #4
        Re: htmentities does not translate german "umlaute&q uot;

        .oO(Robert Zierhofer)
        [color=blue]
        >Michael Fesser wrote:
        >[color=green]
        >> Another question: Why do you want to translate them?
        >>[/color]
        >Well, I would say that the HTML equivalents are a bit more reliable in
        >terms of browser display than ä, ö and ü's.
        >Wouldn't you agree :)[/color]

        No. I deliver my documents as ISO-8859-1 (Latin-1), which contains
        umlauts and other "special" chars. All browsers I have available on my
        machines are able to handle that. And if you deliver your documents as
        UTF-8 you don't really have to care anymore.

        I don't think using entities is really necessary, except for <, >,
        " and & sometimes. That's why I've never used htmlentities(),
        but htmlspecialchar s().

        Micha

        Comment

        • Daniel Tryba

          #5
          Re: htmentities does not translate german "umlaute&q uot;

          Robert Zierhofer <rob@starbugg.d e> wrote:[color=blue]
          > Well, I would say that the HTML equivalents are a bit more reliable in
          > terms of browser display than ?, ? and ?'s.
          > Wouldn't you agree :)[/color]

          No, the document encoding of HTML is Unicode. iso-8859-1 characters
          are part of that characterset, to be more precise: the first 256
          characters of unicode are equal to us-ascii plus iso-8859-1.

          Comment

          • Robert Zierhofer

            #6
            Re: htmentities does not translate german &quot;umlaute&q uot;

            Michael Fesser wrote:[color=blue]
            > .oO(Robert Zierhofer)
            >
            >[color=green]
            >>Michael Fesser wrote:
            >>
            >>[color=darkred]
            >>>Another question: Why do you want to translate them?
            >>>[/color]
            >>
            >>Well, I would say that the HTML equivalents are a bit more reliable in
            >>terms of browser display than ä, ö and ü's.
            >>Wouldn't you agree :)[/color]
            >
            >
            > No. I deliver my documents as ISO-8859-1 (Latin-1), which contains
            > umlauts and other "special" chars. All browsers I have available on my
            > machines are able to handle that. And if you deliver your documents as
            > UTF-8 you don't really have to care anymore.
            >
            > I don't think using entities is really necessary, except for &lt;, &gt;,
            > &quot; and &amp; sometimes. That's why I've never used htmlentities(),
            > but htmlspecialchar s().
            >
            > Micha[/color]
            Micha,

            I also deliver my documents in ISO-8859-1. Do you use Windows?
            As I do not and on all of my Browsers umlauts are not properly displayed.

            Greetings
            Rob

            Comment

            • Michael Fesser

              #7
              Re: htmentities does not translate german &quot;umlaute&q uot;

              .oO(Robert Zierhofer)
              [color=blue]
              >I also deliver my documents in ISO-8859-1. Do you use Windows?[/color]

              Yep, Win2k most of the time, but I'm also using Linux from time to time.
              [color=blue]
              >As I do not and on all of my Browsers umlauts are not properly displayed.[/color]

              OK, what browsers on what OS? Does it happen in general or only on
              particular websites? Can you give an example URL, which uses no entities
              and does not look correctly on your system?

              Micha

              Comment

              • Robert Zierhofer

                #8
                Re: htmentities does not translate german &quot;umlaute&q uot;

                Michael Fesser wrote:[color=blue]
                > .oO(Robert Zierhofer)
                >
                >[color=green]
                >>I also deliver my documents in ISO-8859-1. Do you use Windows?[/color]
                >
                >
                > Yep, Win2k most of the time, but I'm also using Linux from time to time.
                >
                >[color=green]
                >>As I do not and on all of my Browsers umlauts are not properly displayed.[/color]
                >
                >
                > OK, what browsers on what OS? Does it happen in general or only on
                > particular websites? Can you give an example URL, which uses no entities
                > and does not look correctly on your system?
                >
                > Micha[/color]
                Ok,
                OS -> MAC OSX
                Browsers -> Safari, Firefox, Mozilla
                Yepp, it happens in general.. I think at least :)
                Nope let me correct myself - it does not happen in general.
                But I can not name the exceptions. But as you know from your maths
                class... one exception's enough to proove that a theory is wrong.
                My site, the one with the htmlentities problem, is not reachable yet
                without editing your host file.
                But if you wanna do so, use

                213.203.227.121 kingstoncorner. de

                The example phrase looks on my browsers like this:

                Hallo & <Frau> & KrŠmer, hŠtten Sie šffentliches GetŸmmel vermeiden kšnnen?

                This is what I used:

                $str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel
                vermeiden können?";
                $encoded = htmlspecialchar s($str, ENT_NOQUOTES, "iso-8859-1");
                print $encoded;

                Regs
                Rob

                Comment

                • Pedro Graca

                  #9
                  Re: htmentities does not translate german &quot;umlaute&q uot;

                  Robert Zierhofer wrote:[color=blue]
                  > This is what I used:
                  >
                  > $str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel
                  > vermeiden können?";
                  > $encoded = htmlspecialchar s($str, ENT_NOQUOTES, "iso-8859-1");
                  > print $encoded;[/color]

                  Running php 4.3.9 on a Debian GNU/Linux system.

                  php$ cat umlaut.php
                  <?php
                  $str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel"
                  . " vermeiden können?";

                  echo '1: ', $str, "\n\n";
                  echo '2: ', htmlspecialchar s($str, ENT_NOQUOTES, "iso-8859-1"), "\n\n";
                  echo '3: ', htmlentities($s tr), "\n\n";
                  ?>


                  php$ php umlaut.php
                  1: Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel vermeiden
                  können?

                  2: Hallo &amp; &lt;Frau&gt; &amp; Krämer, hätten Sie öffentliches
                  Getümmel vermeiden können?

                  3: Hallo &amp; &lt;Frau&gt; &amp; Kr&auml;mer, h&auml;tten Sie
                  &ouml;ffentlich es Get&uuml;mmel vermeiden k&ouml;nnen?


                  --
                  Mail to my "From:" address is readable by all at http://www.dodgeit.com/
                  == ** ## !! ------------------------------------------------ !! ## ** ==
                  TEXT-ONLY mail to the whole "Reply-To:" address ("My Name" <my@address>)
                  may bypass my spam filter. If it does, I may reply from another address!

                  Comment

                  • Robert Zierhofer

                    #10
                    Re: htmentities does not translate german &quot;umlaute&q uot;

                    Pedro Graca wrote:[color=blue]
                    > Robert Zierhofer wrote:
                    >[color=green]
                    >>This is what I used:
                    >>
                    >> $str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel
                    >>vermeiden können?";
                    >> $encoded = htmlspecialchar s($str, ENT_NOQUOTES, "iso-8859-1");
                    >> print $encoded;[/color]
                    >
                    >
                    > Running php 4.3.9 on a Debian GNU/Linux system.
                    >
                    > php$ cat umlaut.php
                    > <?php
                    > $str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel"
                    > . " vermeiden können?";
                    >
                    > echo '1: ', $str, "\n\n";
                    > echo '2: ', htmlspecialchar s($str, ENT_NOQUOTES, "iso-8859-1"), "\n\n";
                    > echo '3: ', htmlentities($s tr), "\n\n";
                    > ?>
                    >
                    >
                    > php$ php umlaut.php
                    > 1: Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel vermeiden
                    > können?
                    >
                    > 2: Hallo &amp; &lt;Frau&gt; &amp; Krämer, hätten Sie öffentliches
                    > Getümmel vermeiden können?
                    >
                    > 3: Hallo &amp; &lt;Frau&gt; &amp; Kr&auml;mer, h&auml;tten Sie
                    > &ouml;ffentlich es Get&uuml;mmel vermeiden k&ouml;nnen?
                    >
                    >[/color]
                    Hi Pedro,

                    so it really looks as if it is a FreeBSD issue here :(
                    Coz this is exactly the behavior of htmlspecialchar s(), and
                    htmlentities() on my old linux box.

                    Thx for trying though.
                    Do you have any idea what could be the bug in that specific case?
                    Regs
                    Rob

                    Comment

                    • Pedro Graca

                      #11
                      Re: htmentities does not translate german &quot;umlaute&q uot;

                      Robert Zierhofer wrote:[color=blue]
                      > Do you have any idea what could be the bug in that specific case?[/color]

                      No ... I didn't check php bugs database :)

                      Try this


                      <?php
                      $tab = get_html_transl ation_table(HTM L_ENTITIES);
                      $tab['¨'] = '&uml;';
                      $tab['Ä'] = '&Auml;';
                      $tab['Ë'] = '&Euml;';
                      $tab['Ï'] = '&Iuml;';
                      $tab['Ö'] = '&Ouml;';
                      $tab['Ü'] = '&Uuml;';
                      $tab['ä'] = '&auml;';
                      $tab['ë'] = '&euml;';
                      $tab['ï'] = '&iuml;';
                      $tab['ö'] = '&ouml;';
                      $tab['ü'] = '&uuml;';
                      $tab['ÿ'] = '&yuml;';

                      $str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel"
                      . " vermeiden können?";
                      echo strtr($str, $tab), "\n";
                      ?>

                      --
                      Mail to my "From:" address is readable by all at http://www.dodgeit.com/
                      == ** ## !! ------------------------------------------------ !! ## ** ==
                      TEXT-ONLY mail to the whole "Reply-To:" address ("My Name" <my@address>)
                      may bypass my spam filter. If it does, I may reply from another address!

                      Comment

                      • Andy Hassall

                        #12
                        Re: htmentities does not translate german &quot;umlaute&q uot;

                        On 08 Dec 2004 17:40:23 GMT, Daniel Tryba <spam@tryba.inv alid> wrote:
                        [color=blue]
                        >Robert Zierhofer <rob@starbugg.d e> wrote:[color=green]
                        >> Well, I would say that the HTML equivalents are a bit more reliable in
                        >> terms of browser display than ?, ? and ?'s.
                        >> Wouldn't you agree :)[/color]
                        >
                        >No, the document encoding of HTML is Unicode.[/color]

                        Don't you mean the document _character set_ of HTML is Unicode (or even more
                        precisely ISO10646)? The character encoding can then be any encoding that
                        represents a subset of Unicode.

                        Actually according to HTML 4.0.1 sec. 5.2.2 it seems the encoding doesn't even
                        have to be a subset:

                        "Note. If, for a specific application, it becomes necessary to refer to
                        characters outside [ISO10646], characters should be assigned to a private zone
                        to avoid conflicts with present or future versions of the standard. This is
                        highly discouraged, however, for reasons of portability."

                        --
                        Andy Hassall / <andy@andyh.co. uk> / <http://www.andyh.co.uk >
                        <http://www.andyhsoftwa re.co.uk/space> Space: disk usage analysis tool

                        Comment

                        • Robert Zierhofer

                          #13
                          Re: htmentities does not translate german &quot;umlaute&q uot;

                          Pedro Graca wrote:[color=blue]
                          > Robert Zierhofer wrote:
                          >[color=green]
                          >>Do you have any idea what could be the bug in that specific case?[/color]
                          >
                          >
                          > No ... I didn't check php bugs database :)
                          >
                          > Try this
                          >
                          >
                          > <?php
                          > $tab = get_html_transl ation_table(HTM L_ENTITIES);
                          > $tab['¨'] = '&uml;';
                          > $tab['Ä'] = '&Auml;';
                          > $tab['Ë'] = '&Euml;';
                          > $tab['Ï'] = '&Iuml;';
                          > $tab['Ö'] = '&Ouml;';
                          > $tab['Ü'] = '&Uuml;';
                          > $tab['ä'] = '&auml;';
                          > $tab['ë'] = '&euml;';
                          > $tab['ï'] = '&iuml;';
                          > $tab['ö'] = '&ouml;';
                          > $tab['ü'] = '&uuml;';
                          > $tab['ÿ'] = '&yuml;';
                          >
                          > $str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel"
                          > . " vermeiden können?";
                          > echo strtr($str, $tab), "\n";
                          > ?>
                          >[/color]
                          Hi Pedro,
                          nice work around!
                          works fine - will though continue to look for an overall solution for
                          the problem.
                          Thank you very much for your help.
                          Regs
                          Rob

                          Comment

                          • Daniel Tryba

                            #14
                            Re: htmentities does not translate german &quot;umlaute&q uot;

                            Andy Hassall <andy@andyh.co. uk> wrote:[color=blue][color=green]
                            >>No, the document encoding of HTML is Unicode.[/color]
                            >
                            > Don't you mean the document _character set_ of HTML is Unicode (or even more
                            > precisely ISO10646)?[/color]

                            It's the same to me.
                            [color=blue]
                            > The character encoding can then be any encoding that represents a
                            > subset of Unicode.[/color]

                            Character encoding, called charset in the HTTP/1.1 RFC
                            <q src='http://www.ietf.org/rfc/rfc2616.txt'>
                            Note: This use of the term "character set" is more commonly
                            referred to as a "character encoding." However, since HTTP and
                            MIME share the same registry, it is important that the
                            terminology also be shared.
                            </q>

                            Is at the base of all confusion of the terms document/character
                            encoding.

                            HTML uses internally unicode, the documents get transfered by eg HTTP in
                            a specific encoding, mostly due to efficientcy (why send multiple bytes
                            per character when you can suffice with 1 byte if you only use a
                            specific subset like iso-8859-1). And to make things even worse 2 HTTP
                            gateways may choose to encode the bytestream (eg to make it 7bit clean).
                            [color=blue]
                            > Actually according to HTML 4.0.1 sec. 5.2.2 it seems the encoding doesn't even
                            > have to be a subset:
                            >
                            > "Note. If, for a specific application, it becomes necessary to refer to
                            > characters outside [ISO10646], characters should be assigned to a private zone
                            > to avoid conflicts with present or future versions of the standard. This is
                            > highly discouraged, however, for reasons of portability."[/color]

                            So if you have a character that isn't in unicode, you'll have to add it
                            to unicode (the iuserdefine private zones) to make it work, so it
                            unicode again :)

                            Comment

                            Working...