problem with mb_detect_encoding

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Marcus

    problem with mb_detect_encoding

    I am trying to determine if data entered in a $_POST variable in a form
    contains all ASCII (0 - 127) characters or not. To do this I am using
    mb_detect_encod ing(). I am running into problems with non-English
    characters, however - for example, I translated the word 'test' into
    Russian and got 'испытан ие'. If I feed this into the function as:

    $_POST['var'] = 'испытан ие'; // from form
    echo mb_detect_encod ing($_POST['var']);

    it returns ASCII. After thinking about it and running some tests I
    figured out that it is doing this because PHP is feeding
    mb_detect_encod ing the string after it is converted to its html
    representation, i.e. instead of 'испытан ие' mb_detect_encod ing() is
    getting
    'испытание' .
    Obviously all of these characters are ASCII, and as far as I can tell
    this is what's happening.

    Is there a way that I can tell if data entered is ASCII or not BEFORE it
    is converted? With the example above, I would want this test to fail
    (not return ASCII). Thanks in advance.
  • Tim Hunt

    #2
    Re: problem with mb_detect_encod ing


    Marcus wrote:
    I am trying to determine if data entered in a $_POST variable in a form
    contains all ASCII (0 - 127) characters or not. To do this I am using
    mb_detect_encod ing(). I am running into problems with non-English
    characters, however - for example, I translated the word 'test' into
    Russian and got 'испытан ие'.If I feed this into the function as:
    >
    $_POST['var'] = 'испытан ие'; // from form
    echo mb_detect_encod ing($_POST['var']);
    >
    it returns ASCII. After thinking about it and running some tests I
    figured out that it is doing this because PHP is feeding
    mb_detect_encod ing the string after it is converted to its html
    representation, i.e. instead of 'испытан ие' mb_detect_encod ing() is
    getting
    'испытание' .
    Obviously all of these characters are ASCII, and as far as I can tell
    this is what's happening.
    >
    Is there a way that I can tell if data entered is ASCII or not BEFORE it
    is converted? With the example above, I would want this test to fail
    (not return ASCII). Thanks in advance.
    The form data is converted into html entities on the client side before
    php receives the data, convert the html entities back into a string
    using html_entity_dec ode()

    Even then mb_detect_encod ings() might not work, the user notes in the
    php manual aren't encouraging anyway. Someone gave a regular expression
    for detecting utf-8 that can be adapted

    I think preg_match( '/[^\x09\x0A\x0D\x 20-\x7E]/xs',
    html_entity_dec ode($_POST['var']) ) will work

    Tim Hunt

    Comment

    Working...