Getting html entities into the database

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • glueater
    New Member
    • May 2010
    • 2

    Getting html entities into the database

    Hi guys, I'm having an issue and I'm not sure what is going on here. Any help is appreciated.

    I wan't to accept user input of the various html entities in their raw form (ie ♥) and store this in the database to output elsewhere. Here's what I'm doing in my code:

    Code:
    $caption = htmlentities(mysql_real_escape_string($_POST['t']),ENT_QUOTES,'UTF-8');
    mysql_query("UPDATE Table SET caption='$caption' WHERE UserID = '$userID'");
    Let's say the post value is: Some text goes here that isn't being transferred correctly. I ♥ PHP

    Here's what is getting written to the database:

    Some text goes here that isn\'t being transferred correctly. I

    (sorry I had to break up the ' entity there to get it to show up here -- it is getting into the DB properly, though)

    Notice the ♥ is not getting written, along with anything after it... why is this? What am I missing to get this in there?
    Last edited by Dormilich; May 7 '10, 05:53 AM. Reason: edited the entity code for better display
  • Atli
    Recognized Expert Expert
    • Nov 2006
    • 5062

    #2
    Hey.

    There doesn't seem to be anything wrong with that. I tried it over here and it inserted the text as expected. After issuing the two lines you pasted I had this in my database:
    Some text goes here that isn't being transferred correctly. I ♥ PHP

    What do you get if you print the query before you execute it? Like:
    [code=php]$sql = "UPDATE Table SET caption='$capti on' WHERE UserID = '$userID'";
    echo "<pre>$sql</pre>";
    mysql_query($sq l);[/code]


    One other thing, while I'm at it xD
    Unless you have a good reason not to, it's generally best to insert the data into the database in a neutral format; meaning that you should not encode special chars as HTML chars before inserting them into the database. You should only use functions like htmlentities when the data is on the way out, just before you print it into a HTML page.
    [code=php]<?php
    // On the way in...
    $caption = mysql_real_esca pe_string($_POS T['p']);
    mysql_query("UP DATE `table` SET `caption` = '{$caption}' WHERE stuff='other stuff'");

    // On the way out...
    $result = mysql_query("SE LECT `caption` FROM `table`");
    while($row = mysql_fetch_ass oc($result)) {
    echo htmlentities($r ow['caption'], ENT_QUOTES, 'UTF-8');
    }
    ?>[/code]

    This allows the data to be used for multiple purposes, without having to be decoded by those applications who don't simply plan on printing it as HTML. Like:
    [code=php]<?php
    // Count the number of chars total used for captions...
    $result = mysql_query("SE LECT `caption` FROM `table`");
    $total = 0;
    while($row = mysql_fetch_ass oc($result)) {
    $total += strlen($row['caption']);
    }
    ?>[/code]
    If you encode the text before inserting the data, the $total there would be larger than the text displayed.
    Last edited by Dormilich; May 7 '10, 05:54 AM. Reason: edited the entity code for better display

    Comment

    • glueater
      New Member
      • May 2010
      • 2

      #3
      The problem is that a user can enter for example ≠ from their keyboard, but can also type it in in the HTML form &ne;. I wan't to be able to accept both inputs, and put them in how they are entered. The html_entity_dec ode() function works fine for formatting them when they are pulled out of the db, but I can't seem to get them IN.

      The script is taking the form post from an AJAX script (an edit in place thing).

      When I enter the same thing right into a variable, it gets into the database just fine, ie:

      $caption = "Some text here &hearts;";

      gets put in exactly.

      However, the same thing when entered from the form puts "Some text here" in the database. Weird. Even weirder is that the form keeps the &hearts; and draws the visueal figure. If I re-submit the form, THEN the &hearts; is written into the database (with my current code):

      Comment

      • oasisfleeting
        New Member
        • Jul 2010
        • 5

        #4
        Having a similar issue with html entities

        I'm attempting to insert special chars in a database table that is charset utf8.

        I cam getting the titles from
        $markup = file_get_conten ts('http://sfbay.craigslis t.org/ads/');

        Let's say for example this is the title I get from that page and I want to store it in the database...
        _(¯`•★•´¯)_BLON DE_(¯`•★•´¯)_w4 m - w4m

        When i perform the insert only the first two characters get inserted, "_(" and then nothing else is inserted into the title colum. I don't understand why.

        I've tried converting the string into utf8, and transliterate the results, but that only results in php dropping the characters that can't be transliterated.
        $subject = iconv('UTF-8', 'ISO-8859-1//TRANSLIT//IGNORE', $subject);

        Any help would be appreciated

        Comment

        • chazzy69
          New Member
          • Sep 2007
          • 196

          #5
          Not sure if this is applicable but my understanding of mysql_real_esca pe_string() function is that it removes illegal characters from a string to be inserted into a sql database, this is done to stop sql injections.

          Anyway the i think that some of you characters a getting stripped by that function. I noticed you said it stoped at "-(" when the whole line is _(¯`•★•´¯)_BLON DE_(¯`•★•´¯)_w4 m - w4m , this is becuase the ' character is illegal.

          Comment

          • oasisfleeting
            New Member
            • Jul 2010
            • 5

            #6
            Originally posted by chazzy69
            Not sure if this is applicable but my understanding of mysql_real_esca pe_string() function is that it removes illegal characters from a string to be inserted into a sql database, this is done to stop sql injections.

            Anyway the i think that some of you characters a getting stripped by that function. I noticed you said it stoped at "-(" when the whole line is _(¯`•★•´¯)_BLON DE_(¯`•★•´¯)_w4 m - w4m , this is becuase the ' character is illegal.
            So do you have any suggestions on how I would go about inserting the illegal characters in my database table? or some way of encoding them in a way that they can be inserted?

            Comment

            • chazzy69
              New Member
              • Sep 2007
              • 196

              #7
              Apperently this function-
              Code:
              htmlspecialchars();
              converts some special characters so you maybe able to use it to achieve what you want.

              Other then converting them i know of no other way to achieve what you are looking for.

              Comment

              • oasisfleeting
                New Member
                • Jul 2010
                • 5

                #8
                Originally posted by chazzy69
                Apperently this function-
                Code:
                htmlspecialchars();
                converts some special characters so you maybe able to use it to achieve what you want.

                Other then converting them i know of no other way to achieve what you are looking for.
                If file_get_conten ts($url); returns this line.
                •●••°__Rel axing Asian Style Massage ......Yota - w4m -
                and I use htmlenttities() I'm still unable to insert that line into the database. The resulting string is
                •●••°__Rel axing Asian Style Massage ......Yota - w4m
                The first character is illegal and shows up as a � gremlin when i view the string in utf8 encoding. When i switch to iso-8859-1 encoding the html character displays fine. The database table is using charset utf8. I tried switching the database table to latin1 but still , the illegal character will not insert into the database. The database insert doesn't fail when it hits one of these special html characters, but it doesn't finish the insert, no errors are reported.

                I need some kind of transliteration library or something. Any other thoughts?

                Comment

                • oasisfleeting
                  New Member
                  • Jul 2010
                  • 5

                  #9
                  these are the strings returned from the file_get_conten ts()

                  >>>GORGEOUS***& ***SEXY<<< - w4m - (Outcalls) pic

                  Talented Asian Male - (excelsior / outer mission)

                  ♛----EAST INDIAN BARBIE----♛ w4m - w4m - (fremont / union city / newark) pic

                  .•*¨¨*•-:¦:-•*NeW* GoRgEoUS *MiXED *FUN•-:¦:-•*¨¨*•. - pic

                  •°!!•° ItALiAn •°!!•° HoTtIe •°!!•° bIg BoOtY •°!!•° eXoTiC - w4m - (palo alto) pic

                  %%%% YOUNG ACTRACTIVE LADY EXCELLENT MASSAGE %%%%%%%% - (san jose west)

                  100hh!!!*miXeD maMi*150H!!! - w4m - (san rafael) pic

                  ^^korean ^^ zulia ^^ - w4m - (sunnyvale) pic

                  ~~~~Give yourself a well deserved break and enjoy a relaxing massage. - w4m - (san rafael) pic

                  •☆•—————♥ LOOKING FOR THE BEST•☆•—————♥ - w4m - (concord / pleasant hill / martinez) pic

                  Lovely Asian Masseuses Here For You! - w4m - (san rafael) pic

                  But then here is what is actually stored in the database. (see dbshot.png) Notice how the fulltext column is blank on the entries that start with special html characters.
                  Attached Files

                  Comment

                  • oasisfleeting
                    New Member
                    • Jul 2010
                    • 5

                    #10
                    This conversion function appears to work with all special character encodings
                    Code:
                        function convert_charset($item)
                        {
                            if ($unserialize = unserialize($item))
                            {
                                foreach ($unserialize as $key => $value)
                                {
                                    $unserialize[$key] = @iconv('windows-1256', 'UTF-8', $value);
                                }
                                $serialize = serialize($unserialize);
                                return $serialize;
                            }
                            else
                            {
                                return @iconv('windows-1256', 'UTF-8', $item);
                            }
                        }

                    Comment

                    Working...