character sets

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • WindAndWaves

    character sets

    Hi Folk


    Here I am writing my first php / mysql site, almost ready, and now this... charactersets.. ..

    The encoding that I use on my webpage is:

    <META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">

    When people enter new data I use

    $newvalue = htmlentities($_ POST["newvalue"], ENT_QUOTES)

    I then SQL this into my table and next I display the value

    e.g. <DIV CLASS="content" >'.$newvalue. '</DIV>

    All of this works fine, BUT, funny characters that may have been entered through the form (e.g. Word-Style quotation marks,
    e-accent-grave, etc..) are taking on a whole new life. I put in an e with an accent and it changed into a chinese character.

    I tried to run

    $link = mysql_connect($ host, $username, $password);
    $charset = mysql_character _set_name($link );
    printf ("character set is %s\n", $charset);

    but that only gave me an error.

    I searched on google, but many of the notes are in other languages.... ;-)

    Does anyone have any hints in English?

    TIA

    - Nicolaas







  • NC

    #2
    Re: character sets

    WindAndWaves wrote:[color=blue]
    >
    > The encoding that I use on my webpage is:
    > <META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">
    >
    > When people enter new data I use
    > $newvalue = htmlentities($_ POST["newvalue"], ENT_QUOTES)
    >
    > I then SQL this into my table and next I display the value
    > e.g. <DIV CLASS="content" >'.$newvalue. '</DIV>
    >
    > All of this works fine, BUT, funny characters that may have been
    > entered through the form (e.g. Word-Style quotation marks,
    > e-accent-grave, etc..) are taking on a whole new life.
    > I put in an e with an accent and it changed into a chinese
    > character.[/color]

    You have two options to fix this:

    1. Convert your strings from UTF-8 into, say, ISO-8859-1,
    before storing them in the database:

    $string = iconv('UTF-8', 'ISO-8859-1', $string);

    You will need your PHP installation to be compiled
    with iconv extension to do that.

    2. Set your MySQL server's character set to UTF-8.

    First, check if you currently have UTF-8 support.
    Run this query:

    SHOW VARIABLES;

    find the `character_sets ` variable in the output and
    verify that `utf8` is listed among the character sets
    currently supported. If there's no support for UTF-8,
    install or configure it (see MySQL documentation for
    details).

    If and when you have UTF-8 support, you can set
    UTF-8 as the default character set for your database:

    ALTER DATABASE db_name
    DEFAULT CHARACTER SET utf8;

    Alternatively, you can change character set setting
    on a per-connection basis by sending this query:

    SET NAMES 'utf8';

    first thing after establishing a connection to the
    database.

    Cheers,
    NC

    Comment

    • Andy Hassall

      #3
      Re: character sets

      On 1 Feb 2005 13:40:47 -0800, "NC" <nc@iname.com > wrote:
      [color=blue]
      >You have two options to fix this:
      >
      >1. Convert your strings from UTF-8 into, say, ISO-8859-1,
      >before storing them in the database:
      >
      >$string = iconv('UTF-8', 'ISO-8859-1', $string);[/color]

      That's a lossy conversion though, so be careful.
      [color=blue]
      >You will need your PHP installation to be compiled
      >with iconv extension to do that.
      >
      >2. Set your MySQL server's character set to UTF-8.
      >
      >First, check if you currently have UTF-8 support.
      >Run this query:
      >
      >SHOW VARIABLES;
      >
      >find the `character_sets ` variable in the output and
      >verify that `utf8` is listed among the character sets
      >currently supported. If there's no support for UTF-8,
      >install or configure it (see MySQL documentation for
      >details).
      >
      >If and when you have UTF-8 support, you can set
      >UTF-8 as the default character set for your database:
      >
      >ALTER DATABASE db_name
      >DEFAULT CHARACTER SET utf8;
      >
      >Alternativel y, you can change character set setting
      >on a per-connection basis by sending this query:
      >
      >SET NAMES 'utf8';
      >
      >first thing after establishing a connection to the
      >database.[/color]

      If you don't mind any length functions returning the wrong values (i.e.
      returning byte length not character length), you could probably even get away
      with storing UTF-8 in MySQL without setting anything - provided it doesn't
      attempt to do any character set conversions, just stores strings as-is.

      But basically you have to be very careful when working with character set
      encodings, since you've got to know what you're dealing with at each step, and
      whether any function's going to try and interpret the encoded bytes into a
      character, or just pass it on.

      --
      Andy Hassall / <andy@andyh.co. uk> / <http://www.andyh.co.uk >
      <http://www.andyhsoftwa re.co.uk/space> Space: disk usage analysis tool

      Comment

      • Andy Hassall

        #4
        Re: character sets

        On Tue, 1 Feb 2005 21:02:15 +1300, "WindAndWav es" <access@ngaru.c om> wrote:
        [color=blue]
        >Here I am writing my first php / mysql site, almost ready, and now this... charactersets.. ..
        >
        >The encoding that I use on my webpage is:
        >
        ><META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">[/color]

        Send a proper character set header; using <meta> for content type and
        encodings is generally for situations where HTTP headers don't exist, e.g.
        reading off a filesystem.

        header("Content-type: text/html; charset=utf-8");

        http://uk2.php.net/header[color=blue]
        >
        >When people enter new data I use
        >
        >$newvalue = htmlentities($_ POST["newvalue"], ENT_QUOTES)[/color]

        That defaults to ISO-8859-1; if you pass it UTF-8 without setting the third
        parameter, you'll corrupt your data.


        [color=blue]
        >I then SQL this into my table and next I display the value
        >
        >e.g. <DIV CLASS="content" >'.$newvalue. '</DIV>
        >
        >All of this works fine, BUT, funny characters that may have been entered through the form (e.g. Word-Style quotation marks,
        >e-accent-grave, etc..)[/color]

        These characters all exist in UTF-8 - as does almost every character.
        [color=blue]
        >are taking on a whole new life. I put in an e with an accent and it changed into a chinese character.[/color]

        Can you give a short self-contained example demonstrating it?
        [color=blue]
        >I tried to run
        >
        >$link = mysql_connect($ host, $username, $password);
        >$charset = mysql_character _set_name($link );
        >printf ("character set is %s\n", $charset);
        >
        >but that only gave me an error.[/color]

        According to the manul there's no such function. There's
        mysqli_characte r_set_name, from the new PHP5 mysqli extension - but not in the
        old mysql extension.

        --
        Andy Hassall / <andy@andyh.co. uk> / <http://www.andyh.co.uk >
        <http://www.andyhsoftwa re.co.uk/space> Space: disk usage analysis tool

        Comment

        • WindAndWaves

          #5
          Re: character sets


          "Andy Hassall" <andy@andyh.co. uk> wrote in message news:3m2001pa76 v1og0k0vr0iriev 750f42suv@4ax.c om...[color=blue]
          > On Tue, 1 Feb 2005 21:02:15 +1300, "WindAndWav es" <access@ngaru.c om> wrote:
          >[color=green]
          > >Here I am writing my first php / mysql site, almost ready, and now this... charactersets.. ..
          > >
          > >The encoding that I use on my webpage is:
          > >
          > ><META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">[/color]
          >
          > Send a proper character set header; using <meta> for content type and
          > encodings is generally for situations where HTTP headers don't exist, e.g.
          > reading off a filesystem.
          >
          > header("Content-type: text/html; charset=utf-8");
          >
          > http://uk2.php.net/header[color=green]
          > >
          > >When people enter new data I use
          > >
          > >$newvalue = htmlentities($_ POST["newvalue"], ENT_QUOTES)[/color]
          >
          > That defaults to ISO-8859-1; if you pass it UTF-8 without setting the third
          > parameter, you'll corrupt your data.
          >
          > http://uk2.php.net/htmlentities
          >[color=green]
          > >I then SQL this into my table and next I display the value
          > >
          > >e.g. <DIV CLASS="content" >'.$newvalue. '</DIV>
          > >
          > >All of this works fine, BUT, funny characters that may have been entered through the form (e.g. Word-Style quotation marks,
          > >e-accent-grave, etc..)[/color]
          >
          > These characters all exist in UTF-8 - as does almost every character.
          >[color=green]
          > >are taking on a whole new life. I put in an e with an accent and it changed into a chinese character.[/color]
          >
          > Can you give a short self-contained example demonstrating it?
          >[color=green]
          > >I tried to run
          > >
          > >$link = mysql_connect($ host, $username, $password);
          > >$charset = mysql_character _set_name($link );
          > >printf ("character set is %s\n", $charset);
          > >
          > >but that only gave me an error.[/color]
          >
          > According to the manul there's no such function. There's
          > mysqli_characte r_set_name, from the new PHP5 mysqli extension - but not in the
          > old mysql extension.
          >
          > --
          > Andy Hassall / <andy@andyh.co. uk> / <http://www.andyh.co.uk >[/color]


          Hi Andy and NC

          I since discovered that a lot of functions arenot supported by my provider. Namely, UTF-8 is not supported in MySQL and PHP does
          not support, for example, the conversion functions that you mention above.

          I think I will have to stick with a pretty plane type of characterset and make the Japanese pages by hand.

          Thank you for your helpful answers.

          - Nicolaas


          Comment

          • Tony Marston

            #6
            Re: character sets


            "WindAndWav es" <access@ngaru.c om> wrote in message
            news:OCjMd.1450 6$mo2.1127368@n ews.xtra.co.nz. ..[color=blue]
            >
            > "Andy Hassall" <andy@andyh.co. uk> wrote in message
            > news:3m2001pa76 v1og0k0vr0iriev 750f42suv@4ax.c om...[color=green]
            >> On Tue, 1 Feb 2005 21:02:15 +1300, "WindAndWav es" <access@ngaru.c om>
            >> wrote:
            >>[color=darkred]
            >> >Here I am writing my first php / mysql site, almost ready, and now
            >> >this... charactersets.. ..
            >> >
            >> >The encoding that I use on my webpage is:
            >> >
            >> ><META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">[/color]
            >>
            >> Send a proper character set header; using <meta> for content type and
            >> encodings is generally for situations where HTTP headers don't exist,
            >> e.g.
            >> reading off a filesystem.
            >>
            >> header("Content-type: text/html; charset=utf-8");
            >>
            >> http://uk2.php.net/header[color=darkred]
            >> >
            >> >When people enter new data I use
            >> >
            >> >$newvalue = htmlentities($_ POST["newvalue"], ENT_QUOTES)[/color]
            >>
            >> That defaults to ISO-8859-1; if you pass it UTF-8 without setting the
            >> third
            >> parameter, you'll corrupt your data.
            >>
            >> http://uk2.php.net/htmlentities
            >>[color=darkred]
            >> >I then SQL this into my table and next I display the value
            >> >
            >> >e.g. <DIV CLASS="content" >'.$newvalue. '</DIV>
            >> >
            >> >All of this works fine, BUT, funny characters that may have been entered
            >> >through the form (e.g. Word-Style quotation marks,
            >> >e-accent-grave, etc..)[/color]
            >>
            >> These characters all exist in UTF-8 - as does almost every character.
            >>[color=darkred]
            >> >are taking on a whole new life. I put in an e with an accent and it
            >> >changed into a chinese character.[/color]
            >>
            >> Can you give a short self-contained example demonstrating it?
            >>[color=darkred]
            >> >I tried to run
            >> >
            >> >$link = mysql_connect($ host, $username, $password);
            >> >$charset = mysql_character _set_name($link );
            >> >printf ("character set is %s\n", $charset);
            >> >
            >> >but that only gave me an error.[/color]
            >>
            >> According to the manul there's no such function. There's
            >> mysqli_characte r_set_name, from the new PHP5 mysqli extension - but not
            >> in the
            >> old mysql extension.
            >>
            >> --
            >> Andy Hassall / <andy@andyh.co. uk> / <http://www.andyh.co.uk >[/color]
            >
            >
            > Hi Andy and NC
            >
            > I since discovered that a lot of functions arenot supported by my
            > provider. Namely, UTF-8 is not supported in MySQL[/color]

            Wrong. MySQL 4.1 supports various character sets. Take a look at

            [color=blue]
            > and PHP does
            > not support, for example, the conversion functions that you mention above.[/color]

            Wrong again. Take a look at the multi-byte string conversion functions at



            --
            Tony Marston

            This is Tony Marston's web site, containing personal information plus pages devoted to the Uniface 4GL development language, XML and XSL, PHP and MySQL, and a bit of COBOL



            [color=blue]
            > I think I will have to stick with a pretty plane type of characterset and
            > make the Japanese pages by hand.
            >
            > Thank you for your helpful answers.
            >
            > - Nicolaas
            >
            >[/color]


            Comment

            • WindAndWaves

              #7
              Re: character sets


              "Tony Marston" <tony@NOSPAM.de mon.co.uk> wrote in message
              [...........][color=blue]
              > Wrong. MySQL 4.1 supports various character sets. Take a look at
              > http://dev.mysql.com/doc/mysql/en/charset.html[/color]

              This is what my ISP dude said:
              My understanding is unicode support is only in version 4.1 and above and we have no servers running

              4.1 yet

              this may change in the future

              all our newver servers are running 4.0x



              [color=blue][color=green]
              > > and PHP does
              > > not support, for example, the conversion functions that you mention above.[/color]
              >
              > Wrong again. Take a look at the multi-byte string conversion functions at
              > http://www.php.net/manual/en/ref.mbstring.php
              >[/color]

              and the same sort of thing seems to apply to PHP, although I am running
              PHP Version 4.3.4. PHP simply does not recognise the functions.

              It seems like I need to have my own dedicated server and this will costs...

              Thanks for your answer.

              - Nicolaas


              Comment

              • Tony Marston

                #8
                Re: character sets


                "WindAndWav es" <access@ngaru.c om> wrote in message
                news:RIzMd.1471 6$mo2.1149620@n ews.xtra.co.nz. ..[color=blue]
                >[color=green][color=darkred]
                >> > and PHP does
                >> > not support, for example, the conversion functions that you mention
                >> > above.[/color]
                >>
                >> Wrong again. Take a look at the multi-byte string conversion functions at
                >> http://www.php.net/manual/en/ref.mbstring.php
                >>[/color]
                >
                > and the same sort of thing seems to apply to PHP, although I am running
                > PHP Version 4.3.4. PHP simply does not recognise the functions.[/color]

                This is an optional extension, so it must be explictly enabled when your
                version of PHP is built. A proper ISP would be prepared to configure in this
                option for you.
                Also, a proper ISP would be running PHP 4.3.10, not 4.3.4

                --
                Tony Marston

                This is Tony Marston's web site, containing personal information plus pages devoted to the Uniface 4GL development language, XML and XSL, PHP and MySQL, and a bit of COBOL



                [color=blue]
                > It seems like I need to have my own dedicated server and this will
                > costs...
                >
                > Thanks for your answer.
                >
                > - Nicolaas
                >
                >[/color]


                Comment

                Working...