regular expression syntax basics

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • deko

    regular expression syntax basics

    I'm trying to match top-level domains, where $sub = top level domain

    if ( eregi("^com$ | ^org$ | ^net$ | ^biz$ | ^info$ | ^edu$ | ^gov$ | ^int$ |
    ^mil$", $sub) )

    {
    do stuff here
    }

    I've tried quoting each one individually ( "^com$" | "^net$" | ... ) but
    that does not work either. How are quotes supposed to be used here? Should
    I use a single quote? What's the difference?


  • deko

    #2
    Re: regular expression syntax basics

    > I'm trying to match top-level domains, where $sub = top level domain[color=blue]
    >
    > if ( eregi("^com$ | ^org$ | ^net$ | ^biz$ | ^info$ | ^edu$ | ^gov$ | ^int$[/color]
    |[color=blue]
    > ^mil$", $sub) )
    >
    > {
    > do stuff here
    > }
    >
    > I've tried quoting each one individually ( "^com$" | "^net$" | ... ) but
    > that does not work either. How are quotes supposed to be used here?[/color]
    Should[color=blue]
    > I use a single quote? What's the difference?
    >[/color]

    Another example (that works, but could be imporoved, I think) is this:

    if ( (eregi("bot",$a gent)) || (ereg("Google", $agent)) ||
    (ereg("Slurp",$ agent)) || (ereg("Scooter" ,$agent)) ||
    (eregi("Spider" ,$agent)) || (eregi("Infosee k",$agent)) ||
    (eregi("W3C_Val idator",$agent) ) || (eregi("ia_arch iver",$agent)) )
    {
    do stuff here
    }

    Is there a way to avoid the "||" statements?


    Comment

    • Justin Koivisto

      #3
      Re: regular expression syntax basics

      deko wrote:
      [color=blue]
      > I'm trying to match top-level domains, where $sub = top level domain
      >
      > if ( eregi("^com$ | ^org$ | ^net$ | ^biz$ | ^info$ | ^edu$ | ^gov$ | ^int$ |
      > ^mil$", $sub) )
      >
      > {
      > do stuff here
      > }
      >
      > I've tried quoting each one individually ( "^com$" | "^net$" | ... ) but
      > that does not work either. How are quotes supposed to be used here? Should
      > I use a single quote? What's the difference?
      >
      >[/color]

      Try more like this:
      eregi("^(com|or g|net|biz|info| edu|gov|int|mil )$", $sub)

      --
      Justin Koivisto - spam@koivi.com

      Comment

      • deko

        #4
        Re: regular expression syntax basics

        > Try more like this:[color=blue]
        > eregi("^(com|or g|net|biz|info| edu|gov|int|mil )$", $sub)[/color]

        I'll give it a shot. Do you think I could apply the same syntax to this:

        if ( (eregi("bot",$a gent)) || (ereg("Google", $agent)) ||
        (ereg("Slurp",$ agent)) || (ereg("Scooter" ,$agent)) ||
        (eregi("Spider" ,$agent)) || (eregi("Infosee k",$agent)) ||
        (eregi("W3C_Val idator",$agent) ) || (eregi("ia_arch iver",$agent)) )
        {
        do stuff here
        }


        Comment

        • Justin Koivisto

          #5
          Re: regular expression syntax basics

          deko wrote:
          [color=blue][color=green]
          >>I'm trying to match top-level domains, where $sub = top level domain
          >>
          >>if ( eregi("^com$ | ^org$ | ^net$ | ^biz$ | ^info$ | ^edu$ | ^gov$ | ^int$[/color]
          >
          > |
          >[color=green]
          >>^mil$", $sub) )
          >>
          >> {
          >> do stuff here
          >> }
          >>
          >>I've tried quoting each one individually ( "^com$" | "^net$" | ... ) but
          >>that does not work either. How are quotes supposed to be used here?[/color]
          >
          > Should
          >[color=green]
          >>I use a single quote? What's the difference?
          >>[/color]
          > Another example (that works, but could be imporoved, I think) is this:
          >
          > if ( (eregi("bot",$a gent)) || (ereg("Google", $agent)) ||
          > (ereg("Slurp",$ agent)) || (ereg("Scooter" ,$agent)) ||
          > (eregi("Spider" ,$agent)) || (eregi("Infosee k",$agent)) ||
          > (eregi("W3C_Val idator",$agent) ) || (eregi("ia_arch iver",$agent)) )
          > {
          > do stuff here
          > }
          >
          > Is there a way to avoid the "||" statements?[/color]

          if (
          (eregi("(bot|Go ogle|Slurp|Scoo ter|Spider|Info seek|W3C_Valida tor|ia_archiver )",$agent))
          )

          --
          Justin Koivisto - spam@koivi.com

          Comment

          • deko

            #6
            Re: regular expression syntax basics

            > if ([color=blue]
            >[/color]
            (eregi("(bot|Go ogle|Slurp|Scoo ter|Spider|Info seek|W3C_Valida tor|ia_archiver )
            ",$agent))[color=blue]
            > )[/color]

            Cool!! Thanks! That's much better...

            As an aside, are there many cases where a domain would appear as:

            something.com.s omething

            Is that only for international domains?


            Comment

            • Michael Fesser

              #7
              Re: regular expression syntax basics

              .oO(deko)
              [color=blue]
              >I'm trying to match top-level domains, where $sub = top level domain[/color]

              You could also do it without a regex:

              $gTLD = array('com', 'org', 'net', ...);
              if (in_array(strto lower($sub), $gTLD)) {
              // do something
              }

              Micha

              Comment

              • Michael Fesser

                #8
                Re: regular expression syntax basics

                .oO(deko)
                [color=blue]
                >Another example (that works, but could be imporoved, I think) is this:
                >
                >if ( (eregi("bot",$a gent)) || (ereg("Google", $agent)) ||
                >[...][/color]

                Much to slow.
                [color=blue]
                >Is there a way to avoid the "||" statements?[/color]

                Another idea:

                $agents = array(
                'bot', 'Google', 'Slurp',
                'Scooter', 'Spider', 'Infoseek',
                'W3_Validator', 'ia_archiver'
                );
                $pattern = sprintf('#%s#i' , implode($agents , '|'));

                if (preg_match($pa ttern, $agent)) {
                // do stuff here
                }

                Micha

                Comment

                • Justin Koivisto

                  #9
                  Re: regular expression syntax basics

                  deko wrote:[color=blue]
                  > As an aside, are there many cases where a domain would appear as:
                  >
                  > something.com.s omething
                  >
                  > Is that only for international domains?[/color]

                  You mean like example.com.uk ?

                  If so, the tld is "uk" (country codes)...

                  --
                  Justin Koivisto - spam@koivi.com

                  Comment

                  • deko

                    #10
                    Re: regular expression syntax basics

                    > $agents = array([color=blue]
                    > 'bot', 'Google', 'Slurp',
                    > 'Scooter', 'Spider', 'Infoseek',
                    > 'W3_Validator', 'ia_archiver'
                    > );
                    > $pattern = sprintf('#%s#i' , implode($agents , '|'));
                    >
                    > if (preg_match($pa ttern, $agent)) {
                    > // do stuff here
                    > }[/color]

                    I think I understand... is '#%s#i' removing commas? what does sprintf do?

                    But doesn't creating and imploding the array add an extra step - as opposed
                    to something like:

                    eregi ( "(bot|Google|Sl urp|Scooter)" , $agent )


                    Comment

                    • deko

                      #11
                      Re: regular expression syntax basics

                      When using a regex like this:

                      eregi("(bot|goo gle|infoseek|w3 c_validator|ia_ archiver)", $agent)

                      Can I put meta characters within the parentheses, like this:

                      eregi("(bot$|go ogle|infoseek|w 3c_validator|ia _archiver)", $agent) )

                      So it would match "robot" or "superbot", as well as just "bot".


                      Comment

                      • deko

                        #12
                        Re: regular expression syntax basics


                        "Michael Fesser" <netizen@gmx.ne t> wrote in message
                        news:sivtk0lv2u 6ui0g4is9rpvtho 4mt4qe87r@4ax.c om...[color=blue]
                        > .oO(deko)
                        >[color=green]
                        > >When using a regex like this:
                        > >
                        > >eregi("(bot|go ogle|infoseek|w 3c_validator|ia _archiver)", $agent)
                        > >
                        > >Can I put meta characters within the parentheses, like this:
                        > >
                        > >eregi("(bot$|g oogle|infoseek| w3c_validator|i a_archiver)", $agent) )
                        > >
                        > >So it would match "robot" or "superbot", as well as just "bot".[/color]
                        >
                        > Yes, but you have to do it in regex syntax:
                        >
                        > .*bot
                        >
                        > This matches the literal 'bot' which may be preceeded by any char (.) in
                        > any number (*).
                        >
                        > Micha[/color]

                        But isn't bot$ saying: match a "b" followed by an "o" followed by a "t"
                        followed by an end-of-line?
                        But then again, I suppose it's unlikely that "bot" will be followed by an
                        end of line in an agent string...


                        Comment

                        • Michael Fesser

                          #13
                          Re: regular expression syntax basics

                          .oO(deko)
                          [color=blue]
                          >But isn't bot$ saying: match a "b" followed by an "o" followed by a "t"
                          >followed by an end-of-line?[/color]

                          Hmm, true. ;)

                          OK, I confused it with your other question with the TLDs with the
                          explicit start and end match (^ and $).
                          [color=blue]
                          >But then again, I suppose it's unlikely that "bot" will be followed by an
                          >end of line in an agent string...[/color]

                          Yep. I think just 'bot' should be fine.

                          Micha

                          Comment

                          • deko

                            #14
                            Re: regular expression syntax basics

                            > Yep. I think just 'bot' should be fine.

                            Agreed - in this case I think .*bot and bot should return the same thing:

                            eregi("(bot|goo gle|infoseek|et c...|ia_archive r)", $agent)

                            Where I was confused was where to use quotes and meta characters. I have a
                            bunch of other optimizations to do now that I've figured it out...

                            Thanks for the help!


                            Comment

                            Working...