strlen issues - no mb_strlen available

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • RayDube
    New Member
    • Dec 2007
    • 15

    strlen issues - no mb_strlen available

    I'm not sure if this is an encoding issue or not, but for some reason strlen is not giving me the correct length of a string stored in a mysql database.

    I have two strings, one containing letters and spaces, and another containing letters, spaces and underscores.

    Even if both strings are exactly the same length, the reported length given by strlen is incorrect.

    For testing I used these two strings:

    This is a long Phrase

    and blanked out some of the letters with underscores to get:
    Code:
    ___s __ _ ____ ______
    As far as I can tell, they are the same length, however, strlen will report the first phrase as having a length of 21, and the second as having a length of 25.

    Now, there are 4 spaces in the second phrase, and this coincidentally how far off the length is...

    Any thoughts?

    Ray
    Last edited by Atli; Jun 8 '09, 08:56 PM. Reason: Added [code] tags... HTML and spaces don't mix well.
  • Dormilich
    Recognized Expert Expert
    • Aug 2008
    • 8694

    #2
    well, the string you posted has 21 characters (including the spaces)

    Comment

    • Atli
      Recognized Expert Expert
      • Nov 2006
      • 5062

      #3
      Hi.

      So the second one, the one with all the underscores, who is according to my count exactly 21 characters, is being reported as 25 characters long?

      Eliminating the obvious first:
      Are you sure there aren't white-spaces trailing or leading the string?
      Have you tried trim?

      Moving on to the less obvious:
      Is the string being stored in a Unicode field?
      I'm no expert on the internals of Unicode strings, but according to what I know (or at least think I know), Unicode characters can take between 1 and 3 bytes.
      PHP5 stores all characters as a single byte.

      Which would suggest to me that if a Unicode string, containing 21 Unicode characters, four of which required 2 bytes to be stored (or some other mix that adds up to 25 bytes), were to be read into a PHP string, then PHPs string length function, which actually just counts the bytes, would report it being 25 characters long.

      Sounds plausible, right?

      If that is the case, you could try:
      [code=php]strlen(utf8_dec ode($unicdeStri ng))[/code]

      Comment

      • RayDube
        New Member
        • Dec 2007
        • 15

        #4
        Thanks for the thoughtful replies.

        Yes, trim, ltrim and rtrim were all tried without success.

        utf8_decode was also tried, again, without success...

        And, as a matter of fact, after rebooting the machine with mb_ functions, I still got the same result (so not a multibyte thing)

        It's still got me puzzled, but in the meanwhile,I've switched to using dashes "-" instead of underscores "_" to get a similar effect.

        For whatever reason, dashes are counted correctly,but it's underscores that seems to be counting the spaces twice... odd behaviour in my mind anyway.

        So my hangman game is working fine now, excellent exercise for the brain, but now it hurts... :)

        I should also note that these were stored in a mysql table, as varchar(250), if that has any impact... frankly I like the look with underscores better than with the dashes, if anyone has a solution that will help me fix this...

        Ray

        Comment

        • Atli
          Recognized Expert Expert
          • Nov 2006
          • 5062

          #5
          That's odd.
          It tried making a UTF8 table on my test server and fetching the data via PHP. It always counts both dashes and underscores correctly in my tests.

          Could you post the exact structure of your table?
          The output of the SHOW CREATE TABLE command would be best. (It includes all the charset info.)

          And perhaps the code that fetches the data from the database?
          Most importantly, the extension you use to connect to MySQL and the method you use to query the data.

          Comment

          Working...