regex

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Seb

    regex

    Hi,

    Has anyone an idee how i can replace every character in a string if it is not alphanumeric ?

    something like eregi_replace, but i don't know how i say in regex NOT.

    Tnx
  • Allan Rydberg

    #2
    Re: regex






    $out = preg_replace("/^\W+/, "", $in);





    Seb wrote:
    [color=blue]
    > Hi,
    >
    > Has anyone an idee how i can replace every character in a string if it
    > is not alphanumeric ?
    >
    > something like eregi_replace, but i don't know how i say in regex NOT.
    >
    > Tnx[/color]

    Comment

    • Seb

      #3
      Re: regex

      Tnxx, that works for me.

      Here is a reference for other people with same probs :

      WildcardDescrip tion
      \dMatches a digit (character class [0-9])
      \DMatches a non digit ([^0-9])
      \wMatches a word character ([a-zA-Z0-9_])
      \WMatches a non-word character ([^a-zA-Z0-9_])
      \sMatches a space character ([\t\n ])
      \SMatches a non-space character ([^\t\n ])
      ..Matches any character
      $Matches "end of line" if placed at the end of a regular expression

      "Allan Rydberg" <alrdbg@southte ch.net> wrote in message
      news:c497iu$u8e $1@newshispeed. ch...





      $out = preg_replace("/^\W+/, "", $in);





      Seb wrote:
      [color=blue]
      > Hi,
      >
      > Has anyone an idee how i can replace every character in a string if it
      > is not alphanumeric ?
      >
      > something like eregi_replace, but i don't know how i say in regex NOT.
      >
      > Tnx[/color]


      Comment

      • John Dunlop

        #4
        Re: regex

        Followup-to c.l.p. This is off-topic in two groups.

        Seb wrote upsidedown:
        [color=blue]
        > [Allan Rydberg wrote upsidedown:]
        >[color=green]
        > > Seb wrote:[/color]
        >[color=green][color=darkred]
        > > > Has anyone an idee how i can replace every character in a string if it
        > > > is not alphanumeric ?[/color][/color][/color]

        What do you mean? Will you recast the question, please, Seb?
        [color=blue][color=green]
        > > $out = preg_replace("/^\W+/, "", $in);[/color][/color]
        (-----------------------------^
        A typo there!)

        I can't fit the above pattern into any of my interpretations of Seb's
        question.

        preg_replace('_ ^\W+_','',$foo)

        returns $foo with one or more non-"word" characters at the beginning
        stripped off. If $foo were "-_-", the first hyphen would match and
        get replaced by an empty string, but the underscore and second hyphen
        would remain.
        [color=blue]
        > Tnxx, that works for me.[/color]

        Really? You used an atypical definition of "alphanumer ic" then.
        Despite Merrium-Webster Online's definition allowing punctuation
        marks -- the inclusion of underscores are described as perverse by
        FOLDOC -- alphanumerics are usually represented by the character
        class [a-zA-Z0-9]. M-W gives the etymology of "alphanumer ic" as
        "/alpha/bet/ic/ + /numeric/", i.e., it derived from "alphabet" and
        "numeric". The Manual's pattern syntax guide, however, doesn't
        include underscores in its implicit definition of "alphanumer ic".
        (Is there an explicit definition, anywhere in the Manual?) C.f. the
        character type functions,


        [color=blue]
        > Here is a reference for other people with same probs :[/color]

        I reckon a better reference is the Manual, don't you?

        PHP is a popular general-purpose scripting language that powers everything from your blog to the most popular websites in the world.

        [color=blue]
        > \dMatches a digit (character class [0-9])
        > \DMatches a non digit ([^0-9])[/color]

        Although your character classes are correct and clarify your
        definition, it'd be less ambiguous to state that \d matches *decimal*
        digits, not just digits, and that \D matches any character that isn't
        a *decimal* digit. \d does not match all hexadecimal digits, for
        example.
        [color=blue]
        > \wMatches a word character ([a-zA-Z0-9_])
        > \WMatches a non-word character ([^a-zA-Z0-9_])[/color]

        Your character classes are misleading.

        | A "word" character is any letter or digit or the underscore
        | character, that is, any character which can be part of a Perl
        | "word". The definition of letters and digits is controlled by
        | PCRE's character tables, and may vary if locale-specific matching
        | is taking place. [ ... ]

        PHP is a popular general-purpose scripting language that powers everything from your blog to the most popular websites in the world.

        [color=blue]
        > \sMatches a space character ([\t\n ])
        > \SMatches a non-space character ([^\t\n ])[/color]

        Your character classes are incorrect and out of sync with your
        natural language descriptions, which are also incorrect. The generic
        character type \s matches "whitespace " characters, not just the space
        character; \S matches any non-"whitespace " character. According to
        the Manual, the characters \s matches are, by default, normally:
        "space, formfeed, newline, carriage return, horizontal tab, and
        vertical tab". The "space" in the above definition covers non-
        breaking spaces and spaces, I think.
        [color=blue]
        > .Matches any character[/color]

        ... excluding newlines by default.

        | Outside a character class, a dot in the pattern matches any one
        | character in the subject, including a non-printing character, but
        | not (by default) newline. If the PCRE_DOTALL option is set, then
        | dots match newlines as well. [ ... ] Dot has no special meaning in
        | a character class.

        PHP is a popular general-purpose scripting language that powers everything from your blog to the most popular websites in the world.

        [color=blue]
        > $Matches "end of line" if placed at the end of a regular expression[/color]

        While that may sometimes be true, it doesn't tell the whole story.
        The $ isn't a "wildcard" or generic character type metacharacter.

        | A dollar character is an assertion which is TRUE only if the
        | current matching point is at the end of the subject string, or
        | immediately before a newline character that is the last character
        | in the string (by default). Dollar need not be the last character
        | of the pattern if a number of alternatives are involved, but it
        | should be the last item in any branch in which it appears.
        |
        | [ ... ] The meaning of dollar can be changed so that it matches
        | only at the very end of the string, by setting the
        | PCRE_DOLLAR_END ONLY option at compile or matching time.

        PHP is a popular general-purpose scripting language that powers everything from your blog to the most popular websites in the world.


        HTH.

        --
        Jock

        Comment

        • Ian.H

          #5
          Re: regex

          On Tue, 30 Mar 2004 17:26:23 +0100, John Dunlop wrote:


          [ snip ]

          [color=blue][color=green]
          >> Here is a reference for other people with same probs :[/color]
          >
          > I reckon a better reference is the Manual, don't you?
          >
          > http://www.php.net/manual/en/pcre.pattern.syntax.php[/color]


          [ snip ]


          And to "compliment "(?) John's great response.. Regex Coach maybe of
          interest to help learn and understand regular expressions too. This is by
          no means just aimed at beginners learning.. I use it pretty regularly to
          help build regex patterns more quickly for Postfix filtering aswell as
          coding.

          Download / official site available at:


          <http://www.weitz.de/regex-coach/>



          Regards,

          Ian

          --
          Ian.H
          digiServ Network
          London, UK


          Comment

          Working...