chinese and arrays

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Kobi Lurie

    chinese and arrays


    Hello all,
    I'm trying to make a simple script
    beginner level script, with just functions.

    it uses the functions:
    file_get_conten ts
    substr
    taking into an array the text substr took
    then
    array_count_val ues
    and sort by value

    the text used is chinese text, and after it is taken into an array or
    maybe even in file_get_conten ts, I think it's no longer chinese
    but converted somehow.

    anybody knows how to deal with this?
    do i need to convert before, or perform something?

    I use echo to screen, but can also write to file the results.
    it doesn't look like chinese.
    any help is appreciated. thanks in advance, kobi.
    you can email me directly
  • Henk Verhoeven

    #2
    Re: chinese and arrays

    Hi Kobi,

    I do not know how chinese is represented in the bytes of your file, but
    i guess the file_get_conten ts and substr work on the bytes, not on the
    chinese characters/signs. So to use substr you need to use byte indexes.

    Once you got the correct substrings putting them into an array should
    not change anything.

    I guess sorting the array will not work with chinese, except for uksort
    with a custom comparison function. To write a comparision function you
    need to know how to compare the bytes that represent your characters/signs.

    I hope someone reacts and tells me i am wrong, that there is a locale
    setting for chinese and that it actually works properly (you may try it
    with the strcoll function in your string comparision function). But i am
    not optimistic, given the mess i got myself in to with european numbers,
    dates, automatic type conversion and MySQL. My solution was to use US
    locale settings, us numbers and dates in literals, and code the
    conversions myself in the user interface. But i admit, that may be
    substantially more work with chinese then with Dutch...

    For what it is worth a link to the setlocale function in the manual:
    http://www.php.net/manual/en/function.setlocale.php - sorry for the
    english, there where three kinds of chinese and
    http://www.php.net/manual/zh/function.setlocale.php does not look very
    chinese anyhow)

    Greetings,

    Henk Verhoeven,


    Kobi Lurie wrote:
    [color=blue]
    >
    > Hello all,
    > I'm trying to make a simple script
    > beginner level script, with just functions.
    >
    > it uses the functions:
    > file_get_conten ts
    > substr
    > taking into an array the text substr took
    > then
    > array_count_val ues
    > and sort by value
    >
    > the text used is chinese text, and after it is taken into an array or
    > maybe even in file_get_conten ts, I think it's no longer chinese
    > but converted somehow.
    >
    > anybody knows how to deal with this?
    > do i need to convert before, or perform something?
    >
    > I use echo to screen, but can also write to file the results.
    > it doesn't look like chinese.
    > any help is appreciated. thanks in advance, kobi.
    > you can email me directly[/color]

    Comment

    • Henk Verhoeven

      #3
      Re: chinese and arrays

      Kobi,

      I came across another function that may be relevant to your problem:

      htmlentities ( string string [, int quote_style [, string charset]])

      the third parameter, charset, can be set to:
      BIG5 Traditional Chinese, mainly used in Taiwan.
      GB2312 Simplified Chinese, national standard character set.
      BIG5-HKSCS Big5 with Hong Kong extensions, Traditional Chinese.

      see http://www.php.net/manual/en/function.htmlentities.php

      Probably your file contains a normal chinese encoded string, while the
      browser needs it to be encoded for one of the above variants of chinese
      html. This is what htmlentities does (if you use the right charset
      parameter).

      html_entity_dec ode ( string string [, int quote_style [, string charset]])

      will do the opposite: decode from html to normal string you can put in a
      file.

      I hope this helps.

      Greetings,

      Henk Verhoeven,
      www.phpPeanuts.org.


      Kobi Lurie wrote:[color=blue]
      >
      > Hello all,
      > I'm trying to make a simple script
      > beginner level script, with just functions.
      >
      > it uses the functions:
      > file_get_conten ts
      > substr
      > taking into an array the text substr took
      > then
      > array_count_val ues
      > and sort by value
      >
      > the text used is chinese text, and after it is taken into an array or
      > maybe even in file_get_conten ts, I think it's no longer chinese
      > but converted somehow.
      >
      > anybody knows how to deal with this?
      > do i need to convert before, or perform something?
      >
      > I use echo to screen, but can also write to file the results.
      > it doesn't look like chinese.
      > any help is appreciated. thanks in advance, kobi.
      > you can email me directly[/color]

      Comment

      Working...