How to upload form data containing special characters correctly?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Wim Cossement

    How to upload form data containing special characters correctly?

    Hello,

    I was wondering if there are a few good pages and/or examples on how to
    process form data correctly for putting it in a MySQL DB.

    Since I'm not used to using PHP a lot, I already found out that
    addslashes() can be used escape some characters, but I'm having some
    more problems with for instance ä, å and µ (since the text is scientifical)
    Now some people also throw in htmlspecialchar s() to convert those to
    HTML entities, but some nest htmlspecialchar s() in addslashes() and
    others do the opposite.

    Is there a good and error proof way of ensuring that what one puts in a
    textarea gets stored and can be retrieved safe and sound?

    Thanks in advance,

    Wimmy

    --
    Being owned by someone used to be called slavery.
    Now it's called commitment.
  • Ac1d^

    #2
    Re: How to upload form data containing special characters correctly?

    try that:

    - $input_string = 'some text with special characters';
    - $input_string = base64_encode($ input_string);
    - write to database,
    - read from database,
    - $output_string = base64_decode($ output_string);

    Hope It will help.

    Comment

    • Kimmo Laine

      #3
      Re: How to upload form data containing special characters correctly?

      "Wim Cossement" <wcosseme@nospa m.bcol.bewrote in message
      news:edgrbk$n4s $1@snic.vub.ac. be...
      Hello,
      >
      I was wondering if there are a few good pages and/or examples on how to
      process form data correctly for putting it in a MySQL DB.
      >
      Since I'm not used to using PHP a lot, I already found out that
      addslashes() can be used escape some characters, but I'm having some more
      problems with for instance ä, å and µ (since the text is scientifical)
      Now some people also throw in htmlspecialchar s() to convert those to HTML
      entities, but some nest htmlspecialchar s() in addslashes() and others do
      the opposite.
      >
      Is there a good and error proof way of ensuring that what one puts in a
      textarea gets stored and can be retrieved safe and sound?
      >

      Use Unicode for everything. Set utf-8 encoding to your database, save the
      pages in utf-8, tell the browsers in every possibly imaginable way that you
      are providing the content as utf-8. Not exactly easy process, but I
      recommend you to try that.

      --
      "Ohjelmoija on organismi joka muuttaa kofeiinia koodiksi" - lpk
      http://outolempi.net/ahdistus/ - Satunnaisesti päivittyvä nettisarjis
      spam@outolempi. net || Gedoon-S @ IRCnet || rot13(xvzzb@bhg byrzcv.arg)


      Comment

      • Jerry Stuckle

        #4
        Re: How to upload form data containing special characters correctly?

        Wim Cossement wrote:
        Hello,
        >
        I was wondering if there are a few good pages and/or examples on how to
        process form data correctly for putting it in a MySQL DB.
        >
        Since I'm not used to using PHP a lot, I already found out that
        addslashes() can be used escape some characters, but I'm having some
        more problems with for instance ä, å and µ (since the text is scientifical)
        Now some people also throw in htmlspecialchar s() to convert those to
        HTML entities, but some nest htmlspecialchar s() in addslashes() and
        others do the opposite.
        >
        Is there a good and error proof way of ensuring that what one puts in a
        textarea gets stored and can be retrieved safe and sound?
        >
        Thanks in advance,
        >
        Wimmy
        >
        You'll need to select the correct character set for MySQL. It might be
        utf-8, as some have suggested, but you might find another charaset more
        applicable. See the MySQL doc and comp.databases. mysql newsgroup for
        more info on mysql topics.

        Also, rather than use addslashes() you should use
        mysql_real_esca pe_string() to escape your characters.

        You shouldn't use htmlspecialchar s() for storing data into the database;
        that's a display issue, not a storage issue. You should only use it
        when displaying data (if necessary).

        And also ensure you're using the correct character set on your html page
        to display the data.

        --
        =============== ===
        Remove the "x" from my email address
        Jerry Stuckle
        JDS Computer Training Corp.
        jstucklex@attgl obal.net
        =============== ===

        Comment

        • Wim Cossement

          #5
          Re: How to upload form data containing special characters correctly?

          Jerry Stuckle wrote:
          >
          You'll need to select the correct character set for MySQL. It might be
          utf-8, as some have suggested, but you might find another charaset more
          applicable. See the MySQL doc and comp.databases. mysql newsgroup for
          more info on mysql topics.
          Well, I've been hearing for a while UTF-8 is the best for all that
          stuff, so tables and DB's are all in utf8_general_ci (does anyone know
          the difference between that and utf8_bin, and what's utf8_unicode_ci
          doing in that list)
          Also, rather than use addslashes() you should use
          mysql_real_esca pe_string() to escape your characters.
          Some like the other better, there are still discussions going on... :-)

          You shouldn't use htmlspecialchar s() for storing data into the database;
          that's a display issue, not a storage issue. You should only use it
          when displaying data (if necessary).
          The fact is that the data does not realy need to be displayed in a
          webpage, this is just for uploading. I'll rather use OpenOffice with
          MyODBC to edit the data when needed and use a report to display it.
          And also ensure you're using the correct character set on your html page
          to display the data.
          I guess this is the case.
          The header contains <meta http-equiv="content-type"
          content="applic ation/xhtml+xml; charset=utf-8" />

          Now I'm going to try this and I'll let you know the outcome.

          Thanks a bunch,

          Wimmy

          Comment

          • Gleep

            #6
            Re: How to upload form data containing special characters correctly?

            On Mon, 04 Sep 2006 11:24:04 +0200, Wim Cossement <wcosseme@nospa m.bcol.bewrote:
            >Hello,
            >
            >I was wondering if there are a few good pages and/or examples on how to
            >process form data correctly for putting it in a MySQL DB.
            >
            >Since I'm not used to using PHP a lot, I already found out that
            >addslashes() can be used escape some characters, but I'm having some
            >more problems with for instance ä, å and µ (since the text is scientifical)
            >Now some people also throw in htmlspecialchar s() to convert those to
            >HTML entities, but some nest htmlspecialchar s() in addslashes() and
            >others do the opposite.
            >
            >Is there a good and error proof way of ensuring that what one puts in a
            >textarea gets stored and can be retrieved safe and sound?
            >
            >Thanks in advance,
            >
            >Wimmy


            i found user comments in the php manual under htmlspecialchar
            think these might help

            also if you need to save special characters I sugget turning off magic quotes and that supresses
            the backslashes normally adds with set_magic_quote _runtime(0);

            After inspecting the non-native encoding problem, I noticed that for example, if the encoding is
            cyrillic, and I write Latin characters that are not part of the encoding (æ for example -
            ae-ligature), the browser will send the real entity, such as &aelig; for this case.
            Therefore, the only way I see to display multilingual text that is encoded with entities is by:
            <?php
            echo str_replace('&a mp;', '&', htmlspecialchar s($txt));
            ?>
            The regex for numeric entities will skip the Latin-1 textual entities.







            A sample function, if anybody want to turn html entities (and special characters) back to simple.
            (eg: "&egrave;", "<" etc)
            function html2specialcha rs($str){
            $trans_table = array_flip(get_ html_translatio n_table(HTML_EN TITIES));
            return strtr($str, $trans_table);
            }






            Quite often, on HTML pages that are not encoded as UTF-8, and people write in not native encoding,
            some browser (for sure IExplorer) will send the different charset characters using HTML Entities,
            such as б for small russian 'b'.
            htmlspecialchar s() will convert this character to the entity, since it changes all & to &amp;
            What I usually do, is either turn &amp; back to & so the correct characters will appear in the
            output, or I use some regex to replace all entities of characters back to their original entity:
            <?php
            // treat this as pseudo-code, it hasn't been tested...
            $result = preg_replace('/&amp;#(x[a-f0-9]+|[0-9]+);/i', '&#$1;', $source);
            ?>





            Why &#39;? The HTML and XML DTDs proposed &apos; for this.
            See http://www.w3.org/TR/html/dtds.html#...ial_characters
            So better use this:
            $text = htmlspecialchar s($text, ENT_QUOTES);
            $text = preg_replace('/&#0*39;/', '&apos;', $text);

            Comment

            • Jerry Stuckle

              #7
              Re: How to upload form data containing special characters correctly?

              Wim Cossement wrote:
              Jerry Stuckle wrote:
              >
              >>
              >You'll need to select the correct character set for MySQL. It might
              >be utf-8, as some have suggested, but you might find another charaset
              >more applicable. See the MySQL doc and comp.databases. mysql newsgroup
              >for more info on mysql topics.
              >
              >
              Well, I've been hearing for a while UTF-8 is the best for all that
              stuff, so tables and DB's are all in utf8_general_ci (does anyone know
              the difference between that and utf8_bin, and what's utf8_unicode_ci
              doing in that list)
              >
              That some peoples opinions. And remember, they are opinions. Some
              people know what they're talking about, and some don't. Take anything
              you get on the internet (including this) with a grain of salt.

              Personally, I use the characterset which matches my data. This may or
              may not be utf-8.
              >Also, rather than use addslashes() you should use
              >mysql_real_esc ape_string() to escape your characters.
              >
              >
              Some like the other better, there are still discussions going on... :-)

              >
              Not much discussion. addslashes() is a PHP construct which escapes
              certain characters. mysql_real_esca pe_string() is a mysql function to
              escape the characters necessary to place the data in a mysql database
              using the current charset.

              mysql_real_esca pe_string needs no special processing when reading the
              data out - the data is exactly as it was before mysql_real_esca pe_string
              was called. That is not the case for addslashes().
              >You shouldn't use htmlspecialchar s() for storing data into the
              >database; that's a display issue, not a storage issue. You should
              >only use it when displaying data (if necessary).
              >
              >
              The fact is that the data does not realy need to be displayed in a
              webpage, this is just for uploading. I'll rather use OpenOffice with
              MyODBC to edit the data when needed and use a report to display it.
              >
              That's fine. So don't use htmlspecialchar s() at all then.
              >And also ensure you're using the correct character set on your html
              >page to display the data.
              >
              >
              I guess this is the case.
              The header contains <meta http-equiv="content-type"
              content="applic ation/xhtml+xml; charset=utf-8" />
              >
              Now I'm going to try this and I'll let you know the outcome.
              >
              Thanks a bunch,
              >
              Wimmy

              --
              =============== ===
              Remove the "x" from my email address
              Jerry Stuckle
              JDS Computer Training Corp.
              jstucklex@attgl obal.net
              =============== ===

              Comment

              • Wim Cossement

                #8
                Re: How to upload form data containing special characters correctly?

                Hi again,

                I must say I've tried all the suggested options but I still can't do a
                proper upload.

                There is one textarea where users must put in text about their subject
                (more or less 2 formatted pages in a PFD/DOC document), so most (not to
                say all) of them cut 'n' paste it from Acrobat/Word/OpenOffce into their
                browser.

                Most of them contain double quotes that are not escaped by addslashes or
                htmlspecialchar s , I've copied a few myself: "bla" "bla" "bla"

                If I add an entry by hand in phpMyAdmin for instance and one field
                contains these characters they are stored and displayed OK.
                When I store the resulting page and look at it in vi those quoted bla's
                are displayed as â~@~\blaâ~@~]

                How do I get rid of those, since Thunderbird wants to convert the
                message to UTF-8?

                Is there a way to limit or convert the encoding used in a textarea?
                Or is this more HTML related?

                Regards,

                Wimmy

                Comment

                • Jerry Stuckle

                  #9
                  Re: How to upload form data containing special characters correctly?

                  Wim Cossement wrote:
                  Hi again,
                  >
                  I must say I've tried all the suggested options but I still can't do a
                  proper upload.
                  >
                  There is one textarea where users must put in text about their subject
                  (more or less 2 formatted pages in a PFD/DOC document), so most (not to
                  say all) of them cut 'n' paste it from Acrobat/Word/OpenOffce into their
                  browser.
                  >
                  Most of them contain double quotes that are not escaped by addslashes or
                  htmlspecialchar s , I've copied a few myself: "bla" "bla" "bla"
                  >
                  If I add an entry by hand in phpMyAdmin for instance and one field
                  contains these characters they are stored and displayed OK.
                  When I store the resulting page and look at it in vi those quoted bla's
                  are displayed as â~@~\blaâ~@~]
                  >
                  How do I get rid of those, since Thunderbird wants to convert the
                  message to UTF-8?
                  >
                  Is there a way to limit or convert the encoding used in a textarea?
                  Or is this more HTML related?
                  >
                  Regards,
                  >
                  Wimmy
                  Well, what Thunderbird does is completely client side and has nothing to
                  do with PHP. What charset do you have defined for the page?

                  And if they care cutting and pasting from a Word document or a PDF,
                  chances are the document itself has the special characters. For
                  instance, Word can use different characters for left and right double
                  quotes, depending on the version and releases.

                  Nothing in PHP or MySQL would handle such characters; you'll have to
                  handle them yourself, i.e. with str_replace().

                  --
                  =============== ===
                  Remove the "x" from my email address
                  Jerry Stuckle
                  JDS Computer Training Corp.
                  jstucklex@attgl obal.net
                  =============== ===

                  Comment

                  • Wim Cossement

                    #10
                    Re: How to upload form data containing special characters correctly?

                    Jerry Stuckle wrote:
                    Well, what Thunderbird does is completely client side and has nothing to
                    do with PHP. What charset do you have defined for the page?
                    The header contains <meta http-equiv="content-type"
                    content="applic ation/xhtml+xml; charset=utf-8" />, so it should be UTF-8
                    And if they care cutting and pasting from a Word document or a PDF,
                    chances are the document itself has the special characters. For
                    instance, Word can use different characters for left and right double
                    quotes, depending on the version and releases.
                    Well, when I save the text with those weird things in a textfile with
                    UTF-8 encoding they are still there when I open it, so it must be a
                    character that exists in this character set.
                    But how do I determine which one it is specificly?

                    I've put an example here in case someone knows how to do it:

                    Nothing in PHP or MySQL would handle such characters; you'll have to
                    handle them yourself, i.e. with str_replace().
                    Then I might be able to replace it, who knows...

                    Many cheers to the one that can do it!

                    Wimmy

                    Comment

                    • Jerry Stuckle

                      #11
                      Re: How to upload form data containing special characters correctly?

                      Wim Cossement wrote:
                      Jerry Stuckle wrote:
                      >
                      >Well, what Thunderbird does is completely client side and has nothing
                      >to do with PHP. What charset do you have defined for the page?
                      >
                      >
                      The header contains <meta http-equiv="content-type"
                      content="applic ation/xhtml+xml; charset=utf-8" />, so it should be UTF-8
                      >
                      >And if they care cutting and pasting from a Word document or a PDF,
                      >chances are the document itself has the special characters. For
                      >instance, Word can use different characters for left and right double
                      >quotes, depending on the version and releases.
                      >
                      >
                      Well, when I save the text with those weird things in a textfile with
                      UTF-8 encoding they are still there when I open it, so it must be a
                      character that exists in this character set.
                      But how do I determine which one it is specificly?
                      >
                      I've put an example here in case someone knows how to do it:

                      >
                      >Nothing in PHP or MySQL would handle such characters; you'll have to
                      >handle them yourself, i.e. with str_replace().
                      >
                      >
                      Then I might be able to replace it, who knows...
                      >
                      Many cheers to the one that can do it!
                      >
                      Wimmy
                      Hi, Wimmy,

                      That's going to be difficult. They're valid characters in utf-8, but
                      who knows that they mean in Word or a pdf. They could be bullets,
                      left/right double quotes or a number of other special characters.

                      I don't have a conversion table available - there probably is one
                      somewhere on the net (maybe someone else can give some hints). I did
                      try a couple of google searches and found some editors which accept word
                      documents, but that's all. I didn't spend a lot of time on it, though.

                      Otherwise, you might get them to email you the word doc they're using
                      and you can try to figure out what each character means and replace it.
                      It might take a few tries to get all the characters, but it shouldn't
                      be that hard.

                      Sorry I can't be of more help.

                      --
                      =============== ===
                      Remove the "x" from my email address
                      Jerry Stuckle
                      JDS Computer Training Corp.
                      jstucklex@attgl obal.net
                      =============== ===

                      Comment

                      • Petr Vileta

                        #12
                        Re: How to upload form data containing special characters correctly?

                        "Jerry Stuckle" <jstucklex@attg lobal.netwrote in
                        news:9ZKdnfZ35f vOJ2DZnZ2dnUVZ_ vadnZ2d@comcast .com...
                        Hi, Wimmy,
                        >
                        [...]
                        I don't have a conversion table available - there probably is one
                        Maybe this can help you


                        --

                        Petr Vileta, Czech republic
                        (My server rejects all messages from Yahoo and Hotmail. Send me your mail
                        from another non-spammer site please.)


                        Comment

                        • Jerry Stuckle

                          #13
                          Re: How to upload form data containing special characters correctly?

                          Petr Vileta wrote:
                          "Jerry Stuckle" <jstucklex@attg lobal.netwrote in
                          news:9ZKdnfZ35f vOJ2DZnZ2dnUVZ_ vadnZ2d@comcast .com...
                          >
                          >Hi, Wimmy,
                          >>
                          [...]
                          >
                          >I don't have a conversion table available - there probably is one
                          >
                          >
                          Maybe this can help you

                          >
                          Maybe I'm missing something, but I don't see anywhere on that site where
                          they indicate the special characters used by MS Word or PDF's.

                          --
                          =============== ===
                          Remove the "x" from my email address
                          Jerry Stuckle
                          JDS Computer Training Corp.
                          jstucklex@attgl obal.net
                          =============== ===

                          Comment

                          • musiccomposition@gmail.com

                            #14
                            Re: How to upload form data containing special characters correctly?

                            mysqli_real_esc ape_string() or mysql_real_esca pe_string should take out
                            all the characters that would affect MYSQL
                            Wim Cossement wrote:
                            Hello,
                            >
                            I was wondering if there are a few good pages and/or examples on how to
                            process form data correctly for putting it in a MySQL DB.
                            >
                            Since I'm not used to using PHP a lot, I already found out that
                            addslashes() can be used escape some characters, but I'm having some
                            more problems with for instance ä, å and µ (since the text is scientifical)
                            Now some people also throw in htmlspecialchar s() to convert those to
                            HTML entities, but some nest htmlspecialchar s() in addslashes() and
                            others do the opposite.
                            >
                            Is there a good and error proof way of ensuring that what one puts in a
                            textarea gets stored and can be retrieved safe and sound?
                            >
                            Thanks in advance,
                            >
                            Wimmy
                            >
                            --
                            Being owned by someone used to be called slavery.
                            Now it's called commitment.

                            Comment

                            • Petr Vileta

                              #15
                              Re: How to upload form data containing special characters correctly?

                              "Jerry Stuckle" <jstucklex@attg lobal.netwrote in
                              news:YdWdnfmtCP _8i2PZnZ2dnUVZ_ sSdnZ2d@comcast .com...
                              Petr Vileta wrote:
                              >"Jerry Stuckle" <jstucklex@attg lobal.netwrote in
                              >news:9ZKdnfZ35 fvOJ2DZnZ2dnUVZ _vadnZ2d@comcas t.com...
                              >>
                              >>Hi, Wimmy,
                              >>>
                              >[...]
                              >>
                              >>I don't have a conversion table available - there probably is one
                              >>
                              >>
                              >Maybe this can help you
                              >http://www.unicode.org/
                              >>
                              >
                              Maybe I'm missing something, but I don't see anywhere on that site where
                              they indicate the special characters used by MS Word or PDF's.
                              >
                              If I remember right you wrote in some previous message this

                              <cite>
                              And if they care cutting and pasting from a Word document or a PDF,
                              chances are the document itself has the special characters. For
                              instance, Word can use different characters for left and right double
                              quotes, depending on the version and releases.
                              </cite>

                              As far as I know all browsers (except Linx) convert characters from current
                              system codepage to current web page (defined by <metatag). If you define
                              your web page as UTF-8 all user's cut&paste must be converted by browser.
                              UTF-8 have defined all characters like windows-1250, windows-1252, koi8-r,
                              kanji and other "exotic" codepages.

                              --

                              Petr Vileta, Czech republic
                              (My server rejects all messages from Yahoo and Hotmail. Send me your mail
                              from another non-spammer site please.)


                              Comment

                              Working...