Regular expressions: problems with swedish characters

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Toffe

    Regular expressions: problems with swedish characters

    Hi,

    I've got a problem with regular expressions and strings containing
    Swedish characters (åäö).

    I basically have a PHP script that highlights certain words in a text. I
    found the code attached below in the commented manual at php.net. It
    works great for all words that do not contain Swedish characters. The
    words that do contain åäö will not be highlighted.

    Can anyone suggest how I should change my regexp to fix this?

    Thanks,
    toffe

    Code:
    =============
    function highlightErrors ($text, $errors) {

    foreach($errors as $e) {
    $text = highlight_word( $text,$e);
    }

    return $text;
    }

    function highlight_word( $buff,$query) {

    $buff = preg_replace("/(^|[^A-ZåäöÅÄÖ]){1}(".preg_quo te($query,"/").
    ")($|[^A-ZåäöÅÄÖ]){1}/i",
    "\\1<span class='highligh t'>\\2</span>\\3", $buff);
    return $buff;
    }

    =========
  • R. Rajesh Jeba Anbiah

    #2
    [FAQ] Regular expressions: How-to use foreign characters (Was Re: Regular expressions: problems with swedish characters)

    Q: How could I match the foreign characters like åäö in regular
    expressions?
    A: Use hexadecimal representation of those characters, like \xe1

    Refer:


    Comment

    • Toffe

      #3
      Re: [FAQ] Regular expressions: How-to use foreign characters (Was

      R. Rajesh Jeba Anbiah wrote:[color=blue]
      > Q: How could I match the foreign characters like åäö in regular
      > expressions?
      > A: Use hexadecimal representation of those characters, like \xe1
      >
      > Refer:
      > http://www.php.net/preg_match#42167
      >[/color]

      Sorry for being ignorant and not reading the FAQ before posting, won't
      happen again...

      Thanks a lot for the information!

      -toffe

      Comment

      • Toffe

        #4
        Re: [FAQ] Regular expressions: How-to use foreign characters (Was

        R. Rajesh Jeba Anbiah wrote:[color=blue]
        > Q: How could I match the foreign characters like åäö in regular
        > expressions?
        > A: Use hexadecimal representation of those characters, like \xe1
        >
        > Refer:
        > http://www.php.net/preg_match#42167
        >[/color]

        Hi, thanks for the pointer.

        It works almost like I want it to now.
        My script should highlight certain words in the text, but the text could
        be a mix of upper and lower case letters, and if $query below is hxllo
        and $buff is HXLLO, where x and X is some Swedish character in its lower
        and upper cases, I still don't get a match.

        Any suggestions for how I can fix this?

        Thanks,
        toffe

        Code:
        ====

        $buff =
        preg_replace("/(^|[^A-Z\xe5\xe4\xf6\x c5\xc4\xd6]){1}(".preg_quo te($query,"/").
        ")($|[^A-Z\xe5\xe4\xf6\x c5\xc4\xd6]){1}/i",
        "\\1<SURROUNDIN G>\\2<TAG>\\3" , $buff);

        return $buff;
        =========

        Comment

        • R. Rajesh Jeba Anbiah

          #5
          Re: [FAQ] Regular expressions: How-to use foreign characters (Was Re: Regular expressions: problems with swedish characters)

          Toffe wrote:[color=blue]
          > R. Rajesh Jeba Anbiah wrote:[color=green]
          > > Q: How could I match the foreign characters like åäö in regular
          > > expressions?
          > > A: Use hexadecimal representation of those characters, like \xe1
          > >
          > > Refer:
          > > http://www.php.net/preg_match#42167
          > >[/color]
          > It works almost like I want it to now.
          > My script should highlight certain words in the text, but the text[/color]
          could[color=blue]
          > be a mix of upper and lower case letters, and if $query below is[/color]
          hxllo[color=blue]
          > and $buff is HXLLO, where x and X is some Swedish character in its[/color]
          lower[color=blue]
          > and upper cases, I still don't get a match.[/color]
          <snip>[color=blue]
          > Code:
          > ====
          >
          > $buff =
          >[/color]
          preg_replace("/(^|[^A-Z\xe5\xe4\xf6\x c5\xc4\xd6]){1}(".preg_quo te($query,"/").[color=blue]
          > ")($|[^A-Z\xe5\xe4\xf6\x c5\xc4\xd6]){1}/i",
          > "\\1<SURROUNDIN G>\\2<TAG>\\3" , $buff);
          >
          > return $buff;
          > =========[/color]

          IIRC, there is no lower-upper case distinction for the foreign
          characters--so you may have to add those upper/lower case characters in
          the set. Probably you may need to look at
          <http://in.php.net/ucwords#51137>

          BTW, we don't have any FAQ yet. We're just compiling and the
          question was asked previously.

          --
          <?php echo 'Just another PHP saint'; ?>
          Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/

          Comment

          Working...