Regular expression question

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Kevin

    Regular expression question

    Hi,

    Is anyone in this group good at regular expression? I have a seemingly
    simple problem, but cannot figure it out myself. I appreciate any help!

    What I want to check is "dog" appear exactly twice in the searchString. If I
    enter "^(.*dog.*){2}$ ", the expression matches even when "dog" appear 3
    times. I want to check if "dog" appear exactly 2 times.

    Regex regEx = new Regex("???");
    string searchString = "my dog is my dog";

    Thanks!


  • mdb

    #2
    Re: Regular expression question

    "Kevin" <nospam@discuss ion.com> wrote in
    news:ud3I1xbdFH A.3932@TK2MSFTN GP12.phx.gbl:
    [color=blue]
    > What I want to check is "dog" appear exactly twice in the
    > searchString. If I enter "^(.*dog.*){2}$ ", the expression matches even
    > when "dog" appear 3 times. I want to check if "dog" appear exactly 2
    > times.[/color]


    Hmmm by nature I don't think that regular expressions will do that in one
    single statement. Easiest way to do that is to just match against "dog"
    and count the number of matches...

    if (Regex.Matches( input, "dog").Coun t == 2) { ... }

    --
    -mdb

    Comment

    • Thomas W. Brown

      #3
      RE: Regular expression question

      "Kevin" wrote:[color=blue]
      > Hi,
      >
      > Is anyone in this group good at regular expression? I have a seemingly
      > simple problem, but cannot figure it out myself. I appreciate any help!
      >
      > What I want to check is "dog" appear exactly twice in the searchString. If I
      > enter "^(.*dog.*){2}$ ", the expression matches even when "dog" appear 3
      > times. I want to check if "dog" appear exactly 2 times.
      >
      > Regex regEx = new Regex("???");
      > string searchString = "my dog is my dog";[/color]

      I think your problem is the ".*" in front and after the "dog" pattern. This
      allows the pattern to match something like "foo dog bar dog bletch dog" by
      throwing the third "dog" into the "match anything" section before or after
      one of the legitimate matches.

      You probably need to limit this to being within a word, that is, instead of
      ".*" you probably want something like "\w*" -- match zero or more word
      characters in front of and behind the occurrence of "dog".

      Note that this will count input like "dogeatdog" as only one match, not as
      two.

      -- Tom

      Comment

      • Kevin

        #4
        Re: Regular expression question

        Thanks, but is it really not possible? This seems to be a common thing
        people want to check...

        "mdb" <m_b_r_a_y@c_t_ i_u_s_a__d0t__c om> wrote in message
        news:Xns967B8DE 46E24Cmbrayctiu sacom@207.46.24 8.16...[color=blue]
        > "Kevin" <nospam@discuss ion.com> wrote in
        > news:ud3I1xbdFH A.3932@TK2MSFTN GP12.phx.gbl:
        >[color=green]
        >> What I want to check is "dog" appear exactly twice in the
        >> searchString. If I enter "^(.*dog.*){2}$ ", the expression matches even
        >> when "dog" appear 3 times. I want to check if "dog" appear exactly 2
        >> times.[/color]
        >
        >
        > Hmmm by nature I don't think that regular expressions will do that in
        > one
        > single statement. Easiest way to do that is to just match against "dog"
        > and count the number of matches...
        >
        > if (Regex.Matches( input, "dog").Coun t == 2) { ... }
        >
        > --
        > -mdb[/color]


        Comment

        • Kevin

          #5
          Re: Regular expression question

          Thanks, but the result is not good. The regex result showed failed match
          even when dog appears 2 times and the searchString includes dog 2 times.

          "Thomas W. Brown" <thomas_w_brown @countrywide.NO SPAM.com> wrote in message
          news:40B8B570-C02A-4988-A718-7C46AF4CAD7D@mi crosoft.com...[color=blue]
          > "Kevin" wrote:[color=green]
          >> Hi,
          >>
          >> Is anyone in this group good at regular expression? I have a seemingly
          >> simple problem, but cannot figure it out myself. I appreciate any help!
          >>
          >> What I want to check is "dog" appear exactly twice in the searchString.
          >> If I
          >> enter "^(.*dog.*){2}$ ", the expression matches even when "dog" appear 3
          >> times. I want to check if "dog" appear exactly 2 times.
          >>
          >> Regex regEx = new Regex("???");
          >> string searchString = "my dog is my dog";[/color]
          >
          > I think your problem is the ".*" in front and after the "dog" pattern.
          > This
          > allows the pattern to match something like "foo dog bar dog bletch dog" by
          > throwing the third "dog" into the "match anything" section before or after
          > one of the legitimate matches.
          >
          > You probably need to limit this to being within a word, that is, instead
          > of
          > ".*" you probably want something like "\w*" -- match zero or more word
          > characters in front of and behind the occurrence of "dog".
          >
          > Note that this will count input like "dogeatdog" as only one match, not as
          > two.
          >
          > -- Tom
          >[/color]


          Comment

          • Thomas W. Brown

            #6
            Re: Regular expression question

            "Kevin" wrote:[color=blue]
            > "mdb" <m_b_r_a_y@c_t_ i_u_s_a__d0t__c om> wrote in message
            > news:Xns967B8DE 46E24Cmbrayctiu sacom@207.46.24 8.16...[color=green]
            > > "Kevin" <nospam@discuss ion.com> wrote in
            > > news:ud3I1xbdFH A.3932@TK2MSFTN GP12.phx.gbl:
            > >[color=darkred]
            > >> What I want to check is "dog" appear exactly twice in the
            > >> searchString. If I enter "^(.*dog.*){2}$ ", the expression matches even
            > >> when "dog" appear 3 times. I want to check if "dog" appear exactly 2
            > >> times.[/color]
            > >
            > >
            > > Hmmm by nature I don't think that regular expressions will do that in
            > > one
            > > single statement. Easiest way to do that is to just match against "dog"
            > > and count the number of matches...
            > >
            > > if (Regex.Matches( input, "dog").Coun t == 2) { ... }
            > >
            > > --
            > > -mdb[/color]
            >[/color]
            [color=blue]
            > Thanks, but is it really not possible? This seems to be a common thing
            > people want to check...
            >[/color]

            The {2} regex syntax demands two and exactly two *repeats* of the matched
            pattern. Remember that even the * pattern to match zero or more is just
            shorthand for {0,}.

            Besides, what do you want to do with the successful match -- do you need the
            entire captured substring that contains the two "dog" matches or do you just
            need to know the match was successful. If the latter case then mdb's
            approach is very clean and simple.

            -- Tom

            Comment

            • Greg Bacon

              #7
              Re: Regular expression question

              In article <ud3I1xbdFHA.39 32@TK2MSFTNGP12 .phx.gbl>,
              Kevin <nospam@discuss ion.com> wrote:

              : Is anyone in this group good at regular expression? I have a
              : seemingly simple problem, but cannot figure it out myself. I
              : appreciate any help!
              :
              : What I want to check is "dog" appear exactly twice in the
              : searchString. If I enter "^(.*dog.*){2}$ ", the expression matches
              : even when "dog" appear 3 times. I want to check if "dog" appear
              : exactly 2 times.

              Doable but perhaps nonobvious:

              static void Main(string[] args)
              {
              string[] dogs = new string[] {
              "",
              "no canines",
              "abc dog",
              "dog abc dog",
              "dog dog dog",
              "xyzzy dog foo dog bar dog w00t doggone",
              };

              Regex twodogs = new Regex(
              String.Format(" ^{0}dog{0}dog{0 }$", "([^d]|d[^o]|do[^g])*"));

              foreach (string input in dogs)
              Console.WriteLi ne(input + ": " + twodogs.IsMatch (input));
              }

              The fencepost -- ([^d]|d[^o]|do[^g])* -- matches strings, possibly
              empty, that don't have "dog".

              The ^ and $ anchors are important. Without them, you'll see matches for
              all strings with at least two dogs.

              Hope this helps,
              Greg

              Comment

              • Thomas W. Brown

                #8
                Re: Regular expression question

                "Kevin" wrote:[color=blue]
                > "Thomas W. Brown" <thomas_w_brown @countrywide.NO SPAM.com> wrote in message
                > news:40B8B570-C02A-4988-A718-7C46AF4CAD7D@mi crosoft.com...[color=green]
                > > "Kevin" wrote:[color=darkred]
                > >> Hi,
                > >>
                > >> Is anyone in this group good at regular expression? I have a seemingly
                > >> simple problem, but cannot figure it out myself. I appreciate any help!
                > >>
                > >> What I want to check is "dog" appear exactly twice in the searchString.
                > >> If I
                > >> enter "^(.*dog.*){2}$ ", the expression matches even when "dog" appear 3
                > >> times. I want to check if "dog" appear exactly 2 times.
                > >>
                > >> Regex regEx = new Regex("???");
                > >> string searchString = "my dog is my dog";[/color]
                > >
                > > I think your problem is the ".*" in front and after the "dog" pattern.
                > > This
                > > allows the pattern to match something like "foo dog bar dog bletch dog" by
                > > throwing the third "dog" into the "match anything" section before or after
                > > one of the legitimate matches.
                > >
                > > You probably need to limit this to being within a word, that is, instead
                > > of
                > > ".*" you probably want something like "\w*" -- match zero or more word
                > > characters in front of and behind the occurrence of "dog".
                > >
                > > Note that this will count input like "dogeatdog" as only one match, not as
                > > two.
                > >[/color]
                > Thanks, but the result is not good. The regex result showed failed match
                > even when dog appears 2 times and the searchString includes dog 2 times.
                >[/color]

                Sorry, I was replying off the top of my head (and still am)... let's narrow
                this down a bit...

                Do you want to find "dog" only as a standalone word or even if it occurs
                within a word? If the latter, do you want to find (and count) each
                occurrence within a word or count it only once for the entire word no matter
                how many times "dog" occurs within it?

                "dog" obviously matches the three letter sequence no matter where it occurs.

                "(?<=\b)dog(?=\ b)" matches any standalone word, "dog". This might fail,
                though, for "dog" at the beginning or end of the input.

                "(?<=^|\b)dog(? =$|\b)" should match the standalone "dog" anywhere, even at
                the very beginning or end of the line.

                "(?<=^|\b).*dog .*(?=$|\b)" should match any occurence of "dog" even when
                embedded within a larger word.

                Perhaps, then, "((?<=^|\b).*do g.*(?=$|\b)){2} " will be closer to what you
                need? Again, this is all off the top of my head so give it try and see what
                happens.

                -- Tom

                Comment

                • mdb

                  #9
                  Re: Regular expression question

                  "Kevin" <nospam@discuss ion.com> wrote in news:#owEFOcdFH A.1036
                  @tk2msftngp13.p hx.gbl:
                  [color=blue]
                  > Thanks, but is it really not possible? This seems to be a common thing
                  > people want to check...[/color]

                  Well the problem is that regular expressions are meant to search
                  substrings, not strings as whole entities. (Certainly there are start-of-
                  string and end-of-string qualifiers, but since you didn't indicate that any
                  of the 'dog's would be at the beginning or end of the string, they aren't
                  really applicable.) They are, in a sense, "mathematic al" representations
                  of string searches. And mathematically, if you are searching for 2 dogs
                  and you find 3 dogs, then mathematically, you've still found two dogs (2
                  dogs is a subset of 3 dogs).

                  This is all just to say that no, I don't think it is possible with ONE
                  regex statement (I'm certainly not 100% sure about this)... But unless
                  you're just interested in finding out if it is possible for your own
                  gratification, there's no reason not to simply ask the Regex system how
                  many times 'dog' matches.

                  Of course, there is usually more than one way to do it... here's a way that
                  doesn't use Matches.Count, but still not in one single statement...

                  if (
                  Regex.IsMatch(i nput, "dog.*dog")
                  && !Regex.IsMatch( input, "dog.*dog.*dog" )
                  )
                  {
                  // 'dog' matches two times, but not three.
                  ...
                  }


                  --
                  -mdb

                  Comment

                  • Kevin

                    #10
                    Re: Regular expression question

                    I tried it and it worked! As you mentioned, this is not obvious at all.
                    Thank you, Greg.

                    "Greg Bacon" <gbacon@hiwaay. net> wrote in message
                    news:11be3akcgu tjd34@corp.supe rnews.com...[color=blue]
                    > In article <ud3I1xbdFHA.39 32@TK2MSFTNGP12 .phx.gbl>,
                    > Kevin <nospam@discuss ion.com> wrote:
                    >
                    > : Is anyone in this group good at regular expression? I have a
                    > : seemingly simple problem, but cannot figure it out myself. I
                    > : appreciate any help!
                    > :
                    > : What I want to check is "dog" appear exactly twice in the
                    > : searchString. If I enter "^(.*dog.*){2}$ ", the expression matches
                    > : even when "dog" appear 3 times. I want to check if "dog" appear
                    > : exactly 2 times.
                    >
                    > Doable but perhaps nonobvious:
                    >
                    > static void Main(string[] args)
                    > {
                    > string[] dogs = new string[] {
                    > "",
                    > "no canines",
                    > "abc dog",
                    > "dog abc dog",
                    > "dog dog dog",
                    > "xyzzy dog foo dog bar dog w00t doggone",
                    > };
                    >
                    > Regex twodogs = new Regex(
                    > String.Format(" ^{0}dog{0}dog{0 }$", "([^d]|d[^o]|do[^g])*"));
                    >
                    > foreach (string input in dogs)
                    > Console.WriteLi ne(input + ": " + twodogs.IsMatch (input));
                    > }
                    >
                    > The fencepost -- ([^d]|d[^o]|do[^g])* -- matches strings, possibly
                    > empty, that don't have "dog".
                    >
                    > The ^ and $ anchors are important. Without them, you'll see matches for
                    > all strings with at least two dogs.
                    >
                    > Hope this helps,
                    > Greg[/color]


                    Comment

                    • Kevin

                      #11
                      Re: Regular expression question

                      I understand what you mean. I prefer a expression that will work in a
                      language-independent way. It will certainly work in C#.

                      "Thomas W. Brown" <thomas_w_brown @countrywide.NO SPAM.com> wrote in message
                      news:B0C006F4-3C1B-408A-8721-518B917CC654@mi crosoft.com...[color=blue]
                      > "Kevin" wrote:[color=green]
                      >> "mdb" <m_b_r_a_y@c_t_ i_u_s_a__d0t__c om> wrote in message
                      >> news:Xns967B8DE 46E24Cmbrayctiu sacom@207.46.24 8.16...[color=darkred]
                      >> > "Kevin" <nospam@discuss ion.com> wrote in
                      >> > news:ud3I1xbdFH A.3932@TK2MSFTN GP12.phx.gbl:
                      >> >
                      >> >> What I want to check is "dog" appear exactly twice in the
                      >> >> searchString. If I enter "^(.*dog.*){2}$ ", the expression matches even
                      >> >> when "dog" appear 3 times. I want to check if "dog" appear exactly 2
                      >> >> times.
                      >> >
                      >> >
                      >> > Hmmm by nature I don't think that regular expressions will do that in
                      >> > one
                      >> > single statement. Easiest way to do that is to just match against
                      >> > "dog"
                      >> > and count the number of matches...
                      >> >
                      >> > if (Regex.Matches( input, "dog").Coun t == 2) { ... }
                      >> >
                      >> > --
                      >> > -mdb[/color]
                      >>[/color]
                      >[color=green]
                      >> Thanks, but is it really not possible? This seems to be a common thing
                      >> people want to check...
                      >>[/color]
                      >
                      > The {2} regex syntax demands two and exactly two *repeats* of the matched
                      > pattern. Remember that even the * pattern to match zero or more is just
                      > shorthand for {0,}.
                      >
                      > Besides, what do you want to do with the successful match -- do you need
                      > the
                      > entire captured substring that contains the two "dog" matches or do you
                      > just
                      > need to know the match was successful. If the latter case then mdb's
                      > approach is very clean and simple.
                      >
                      > -- Tom[/color]


                      Comment

                      • Greg Bacon

                        #12
                        Re: Regular expression question

                        In article <#L225CddFHA.33 24@TK2MSFTNGP10 .phx.gbl>,
                        Kevin <nospam@discuss ion.com> wrote:

                        : I tried it and it worked! As you mentioned, this is not obvious at
                        : all. Thank you, Greg.

                        Glad to help.

                        Greg

                        Comment

                        • Ludovic SOEUR

                          #13
                          Re: Regular expression question

                          An easier way using backtracking :
                          ^.*(dog.*){2}(? <!(dog.*){3}) $
                          And this other example with only one dog in the regex
                          ^.*((dog).*){2} (?<!(\2.*){3})$

                          static void Main() {
                          string[] dogs = new string[] {
                          "",
                          "no canines",
                          "abc dog",
                          "dog abc dog",
                          "dog dog dog",
                          "xyzzy dog foo dog bar dog w00t doggone"
                          };

                          //Regex twodogs = new Regex(@"^.*(dog .*){2}(?<!(dog. *){3})$");
                          Regex twodogs = new Regex(@"^.*((do g).*){2}(?<!(\2 .*){3})$");

                          foreach (string input in dogs)
                          MessageBox.Show (input + ": " + twodogs.IsMatch (input));
                          }

                          If you want to do the same with long words (for example elephant), you only
                          have to replace the regex as following :
                          ^.*(elephant.*) {2}(?<!(elephan t.*){3})$
                          or for the second regex :
                          ^.*((elephant). *){2}(?<!(\2.*) {3})$

                          If you want 5 elephants exactly, you only have to replace the regex as
                          following :
                          ^.*(elephant.*) {5}(?<!(elephan t.*){6})$
                          or for the second regex :
                          ^.*((elephant). *){5}(?<!(\2.*) {6})$

                          Hope it helps,

                          Ludovic SOEUR.

                          "Greg Bacon" <gbacon@hiwaay. net> a écrit dans le message de
                          news:11bef8k8mv bva6e@corp.supe rnews.com...[color=blue]
                          > In article <#L225CddFHA.33 24@TK2MSFTNGP10 .phx.gbl>,
                          > Kevin <nospam@discuss ion.com> wrote:
                          >
                          > : I tried it and it worked! As you mentioned, this is not obvious at
                          > : all. Thank you, Greg.
                          >
                          > Glad to help.
                          >
                          > Greg[/color]


                          Comment

                          Working...