Regex: matching comma separated list?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Bob

    Regex: matching comma separated list?

    I think this is very simple but I am having difficult doing it. Basically
    take a comma separated list:
    abc, def, ghi, jk

    A list with only one token does not have any commas:
    abc

    The first letter of each token (abc) must not be a number. I am simply
    trying to parse it to get an array of tokens:
    abc
    def
    ghi
    jk

    ....or for the single token one:
    abc

    I can easily do this with String.Replace and String.Split, but would like to
    do this with regular expressions. Yet I cannot seem to get it to work, here
    is what I have so far:

    String input = "abc, def, ghi, jk";
    String pattern = @"^((?<name>\D. *?)(\x2C )?)+?$";
    Match match = Regex.Match(inp ut, pattern, RegexOptions.Ex plicitCapture);

    Any input would be appreciated,

    Thanks


  • Kevin Spencer

    #2
    Re: Regex: matching comma separated list?

    I don't think Regular Expressions is the right tool for this job, Bob.
    Regular Expressions are used to search for patterns, that is, strings which
    share certain characteristics in common, but are not identical. In your
    case, you want to convert a comma-delmited string into an array, and
    String.Split() does just that.

    --
    HTH,

    Kevin Spencer
    Microsoft MVP
    ..Net Developer
    A watched clock never boils.

    "Bob" <nobody@nowhere .com> wrote in message
    news:uLS8BCd3FH A.3136@TK2MSFTN GP09.phx.gbl...[color=blue]
    >I think this is very simple but I am having difficult doing it. Basically
    >take a comma separated list:
    > abc, def, ghi, jk
    >
    > A list with only one token does not have any commas:
    > abc
    >
    > The first letter of each token (abc) must not be a number. I am simply
    > trying to parse it to get an array of tokens:
    > abc
    > def
    > ghi
    > jk
    >
    > ...or for the single token one:
    > abc
    >
    > I can easily do this with String.Replace and String.Split, but would like
    > to do this with regular expressions. Yet I cannot seem to get it to work,
    > here is what I have so far:
    >
    > String input = "abc, def, ghi, jk";
    > String pattern = @"^((?<name>\D. *?)(\x2C )?)+?$";
    > Match match = Regex.Match(inp ut, pattern, RegexOptions.Ex plicitCapture);
    >
    > Any input would be appreciated,
    >
    > Thanks
    >
    >[/color]


    Comment

    • Greg Bacon

      #3
      Re: Regex: matching comma separated list?

      In article <uLS8BCd3FHA.31 36@TK2MSFTNGP09 .phx.gbl>,
      Bob <nobody@nowhere .com> wrote:

      : I think this is very simple but I am having difficult doing it. Basically
      : take a comma separated list:
      : abc, def, ghi, jk
      :
      : A list with only one token does not have any commas:
      : abc
      :
      : The first letter of each token (abc) must not be a number. I am simply
      : trying to parse it to get an array of tokens:
      : abc
      : def
      : ghi
      : jk
      :
      : ...or for the single token one:
      : abc
      :
      : I can easily do this with String.Replace and String.Split, but would like to
      : do this with regular expressions. Yet I cannot seem to get it to work, here
      : is what I have so far:
      :
      : String input = "abc, def, ghi, jk";
      : String pattern = @"^((?<name>\D. *?)(\x2C )?)+?$";
      : Match match = Regex.Match(inp ut, pattern, RegexOptions.Ex plicitCapture);
      :
      : Any input would be appreciated,

      Consider the following code:

      static void Main(string[] args)
      {
      string[] inputs = new string[]
      {
      "abc, def, ghi, jk",
      "abc",
      "good, 1bad, good, 2bad",
      "trailingcomma, ",
      ",",
      ",,",
      ",,,",
      };

      string pattern =
      @"^(
      (
      | # ignore empties
      (?<token>\D.*?) # a token worth keeping
      |\d.*? # or one to ignore
      )
      \s* # eat trailing whitespace
      (,\s*|$) # separator or done
      )+$ # catch a sequence of the above
      ";

      Regex tokens = new Regex(pattern, RegexOptions.Ig norePatternWhit espace);

      foreach (string input in inputs)
      {
      Match m = tokens.Match(in put);

      Console.WriteLi ne("input = [" + input + "]:");
      if (m.Success)
      {
      if (m.Groups["token"].Captures.Count > 0)
      foreach (Capture c in m.Groups["token"].Captures)
      Console.WriteLi ne(" - [" + c.Value + "]");
      else
      Console.WriteLi ne(" - no captures");
      }
      else
      Console.WriteLi ne(" - no match.");
      }
      }

      Its output is

      input = [abc, def, ghi, jk]:
      - [abc]
      - [def]
      - [ghi]
      - [jk]
      input = [abc]:
      - [abc]
      input = [good, 1bad, good, 2bad]:
      - [good]
      - [good]
      input = [trailingcomma,]:
      - [trailingcomma]
      input = [,]:
      - no captures
      input = [,,]:
      - no captures
      input = [,,,]:
      - no captures

      It's easy to anticipate Jon Skeet's objections to the regular
      expression above, and he'd certainly be on solid ground. Passing the
      result of a split through a filter would be much clearer, e.g.,

      public static void ExtractGoodToke ns(string[] inputs)
      {
      Regex goodtoken = new Regex(@"^\D");

      foreach (string input in inputs)
      {
      ArrayList goodtokens = new ArrayList();

      foreach (string token in Regex.Split(inp ut, @"\s*,\s*"))
      if (goodtoken.IsMa tch(token))
      goodtokens.Add( token);

      Console.WriteLi ne("input = [" + input + "]:");
      if (goodtokens.Cou nt > 0)
      foreach (string token in goodtokens)
      Console.WriteLi ne(" - [" + token + "]");
      else
      Console.WriteLi ne(" - none");
      }
      }

      Hope this helps,
      Greg
      --
      I have felt for a long time that a talent for programming consists largely
      of the abilty to switch readily from microscopic to macroscopic views of
      things, i.e., to change levels of abstraction fluently.
      -- Donald E. Knuth, "Structured Programming with go to Statements"

      Comment

      • Kevin Spencer

        #4
        Re: Regex: matching comma separated list?

        How about

        string[] aryList = strList.Split(n ew char[] {','});

        ???

        --
        HTH,

        Kevin Spencer
        Microsoft MVP
        ..Net Developer
        A watched clock never boils.

        "Greg Bacon" <gbacon@hiwaay. net> wrote in message
        news:11mcgf3jnd ul949@corp.supe rnews.com...[color=blue]
        > In article <uLS8BCd3FHA.31 36@TK2MSFTNGP09 .phx.gbl>,
        > Bob <nobody@nowhere .com> wrote:
        >
        > : I think this is very simple but I am having difficult doing it.
        > Basically
        > : take a comma separated list:
        > : abc, def, ghi, jk
        > :
        > : A list with only one token does not have any commas:
        > : abc
        > :
        > : The first letter of each token (abc) must not be a number. I am simply
        > : trying to parse it to get an array of tokens:
        > : abc
        > : def
        > : ghi
        > : jk
        > :
        > : ...or for the single token one:
        > : abc
        > :
        > : I can easily do this with String.Replace and String.Split, but would
        > like to
        > : do this with regular expressions. Yet I cannot seem to get it to work,
        > here
        > : is what I have so far:
        > :
        > : String input = "abc, def, ghi, jk";
        > : String pattern = @"^((?<name>\D. *?)(\x2C )?)+?$";
        > : Match match = Regex.Match(inp ut, pattern, RegexOptions.Ex plicitCapture);
        > :
        > : Any input would be appreciated,
        >
        > Consider the following code:
        >
        > static void Main(string[] args)
        > {
        > string[] inputs = new string[]
        > {
        > "abc, def, ghi, jk",
        > "abc",
        > "good, 1bad, good, 2bad",
        > "trailingcomma, ",
        > ",",
        > ",,",
        > ",,,",
        > };
        >
        > string pattern =
        > @"^(
        > (
        > | # ignore empties
        > (?<token>\D.*?) # a token worth keeping
        > |\d.*? # or one to ignore
        > )
        > \s* # eat trailing whitespace
        > (,\s*|$) # separator or done
        > )+$ # catch a sequence of the above
        > ";
        >
        > Regex tokens = new Regex(pattern,
        > RegexOptions.Ig norePatternWhit espace);
        >
        > foreach (string input in inputs)
        > {
        > Match m = tokens.Match(in put);
        >
        > Console.WriteLi ne("input = [" + input + "]:");
        > if (m.Success)
        > {
        > if (m.Groups["token"].Captures.Count > 0)
        > foreach (Capture c in m.Groups["token"].Captures)
        > Console.WriteLi ne(" - [" + c.Value + "]");
        > else
        > Console.WriteLi ne(" - no captures");
        > }
        > else
        > Console.WriteLi ne(" - no match.");
        > }
        > }
        >
        > Its output is
        >
        > input = [abc, def, ghi, jk]:
        > - [abc]
        > - [def]
        > - [ghi]
        > - [jk]
        > input = [abc]:
        > - [abc]
        > input = [good, 1bad, good, 2bad]:
        > - [good]
        > - [good]
        > input = [trailingcomma,]:
        > - [trailingcomma]
        > input = [,]:
        > - no captures
        > input = [,,]:
        > - no captures
        > input = [,,,]:
        > - no captures
        >
        > It's easy to anticipate Jon Skeet's objections to the regular
        > expression above, and he'd certainly be on solid ground. Passing the
        > result of a split through a filter would be much clearer, e.g.,
        >
        > public static void ExtractGoodToke ns(string[] inputs)
        > {
        > Regex goodtoken = new Regex(@"^\D");
        >
        > foreach (string input in inputs)
        > {
        > ArrayList goodtokens = new ArrayList();
        >
        > foreach (string token in Regex.Split(inp ut, @"\s*,\s*"))
        > if (goodtoken.IsMa tch(token))
        > goodtokens.Add( token);
        >
        > Console.WriteLi ne("input = [" + input + "]:");
        > if (goodtokens.Cou nt > 0)
        > foreach (string token in goodtokens)
        > Console.WriteLi ne(" - [" + token + "]");
        > else
        > Console.WriteLi ne(" - none");
        > }
        > }
        >
        > Hope this helps,
        > Greg
        > --
        > I have felt for a long time that a talent for programming consists largely
        > of the abilty to switch readily from microscopic to macroscopic views of
        > things, i.e., to change levels of abstraction fluently.
        > -- Donald E. Knuth, "Structured Programming with go to Statements"[/color]


        Comment

        • Marcus Andrén

          #5
          Re: Regex: matching comma separated list?

          On Sun, 30 Oct 2005 20:06:37 -0800, "Bob" <nobody@nowhere .com> wrote:
          [color=blue]
          >I can easily do this with String.Replace and String.Split, but would like to
          >do this with regular expressions. Yet I cannot seem to get it to work, here
          >is what I have so far:
          >
          >String input = "abc, def, ghi, jk";
          >String pattern = @"^((?<name>\D. *?)(\x2C )?)+?$";[/color]

          This pattern is far from what you want.

          First of all, it is easy to see that as you start with ^ and end with
          $ you will always either match the complete string or nothing at all.

          Secondly, Groups doesn't multiple matches, they only store the last
          match in the given regular expression match. All ExplicitCapture does
          is t make sure (\x2C ) as well as the outer parantheses don't count as
          groups. The "name" group will only contain the characters captured on
          the last loop.

          This leads to the third problem. As the regex is written it will
          capture a single character and than simply loop and repeat.

          This is how it should be done:
          (Using RegexOptions.Ig norePatternWhit espace)

          string patternSplit =
          @"
          (?<=,|^) #The character preceding the match is either a comma or
          #the beginning of the string

          \D.*? #The string itself should be a non digit follow by
          #any number of characters

          (?=,|$) #The first character after the match should be , or
          #the end of the string
          ";

          This will find all the valid substrings while ignoring those beginning
          with a digit.

          It will however not make a noise if the string consists of invalid
          entries. For example "12abc,def, ghi" will return "def" and "ghi" as
          the two matches while just ignoring 12abc.

          If you need to validate that the string doesn't contain any invalid
          entries, you will have to write a seperate regular expressions that
          tries to capture the entire string.

          --
          Marcus Andrén

          Comment

          • Kevin Spencer

            #6
            Re: Regex: matching comma separated list?

            Forgot to add, remove the members that start with a number.

            --
            HTH,

            Kevin Spencer
            Microsoft MVP
            ..Net Developer
            A watched clock never boils.

            "Greg Bacon" <gbacon@hiwaay. net> wrote in message
            news:11mcgf3jnd ul949@corp.supe rnews.com...[color=blue]
            > In article <uLS8BCd3FHA.31 36@TK2MSFTNGP09 .phx.gbl>,
            > Bob <nobody@nowhere .com> wrote:
            >
            > : I think this is very simple but I am having difficult doing it.
            > Basically
            > : take a comma separated list:
            > : abc, def, ghi, jk
            > :
            > : A list with only one token does not have any commas:
            > : abc
            > :
            > : The first letter of each token (abc) must not be a number. I am simply
            > : trying to parse it to get an array of tokens:
            > : abc
            > : def
            > : ghi
            > : jk
            > :
            > : ...or for the single token one:
            > : abc
            > :
            > : I can easily do this with String.Replace and String.Split, but would
            > like to
            > : do this with regular expressions. Yet I cannot seem to get it to work,
            > here
            > : is what I have so far:
            > :
            > : String input = "abc, def, ghi, jk";
            > : String pattern = @"^((?<name>\D. *?)(\x2C )?)+?$";
            > : Match match = Regex.Match(inp ut, pattern, RegexOptions.Ex plicitCapture);
            > :
            > : Any input would be appreciated,
            >
            > Consider the following code:
            >
            > static void Main(string[] args)
            > {
            > string[] inputs = new string[]
            > {
            > "abc, def, ghi, jk",
            > "abc",
            > "good, 1bad, good, 2bad",
            > "trailingcomma, ",
            > ",",
            > ",,",
            > ",,,",
            > };
            >
            > string pattern =
            > @"^(
            > (
            > | # ignore empties
            > (?<token>\D.*?) # a token worth keeping
            > |\d.*? # or one to ignore
            > )
            > \s* # eat trailing whitespace
            > (,\s*|$) # separator or done
            > )+$ # catch a sequence of the above
            > ";
            >
            > Regex tokens = new Regex(pattern,
            > RegexOptions.Ig norePatternWhit espace);
            >
            > foreach (string input in inputs)
            > {
            > Match m = tokens.Match(in put);
            >
            > Console.WriteLi ne("input = [" + input + "]:");
            > if (m.Success)
            > {
            > if (m.Groups["token"].Captures.Count > 0)
            > foreach (Capture c in m.Groups["token"].Captures)
            > Console.WriteLi ne(" - [" + c.Value + "]");
            > else
            > Console.WriteLi ne(" - no captures");
            > }
            > else
            > Console.WriteLi ne(" - no match.");
            > }
            > }
            >
            > Its output is
            >
            > input = [abc, def, ghi, jk]:
            > - [abc]
            > - [def]
            > - [ghi]
            > - [jk]
            > input = [abc]:
            > - [abc]
            > input = [good, 1bad, good, 2bad]:
            > - [good]
            > - [good]
            > input = [trailingcomma,]:
            > - [trailingcomma]
            > input = [,]:
            > - no captures
            > input = [,,]:
            > - no captures
            > input = [,,,]:
            > - no captures
            >
            > It's easy to anticipate Jon Skeet's objections to the regular
            > expression above, and he'd certainly be on solid ground. Passing the
            > result of a split through a filter would be much clearer, e.g.,
            >
            > public static void ExtractGoodToke ns(string[] inputs)
            > {
            > Regex goodtoken = new Regex(@"^\D");
            >
            > foreach (string input in inputs)
            > {
            > ArrayList goodtokens = new ArrayList();
            >
            > foreach (string token in Regex.Split(inp ut, @"\s*,\s*"))
            > if (goodtoken.IsMa tch(token))
            > goodtokens.Add( token);
            >
            > Console.WriteLi ne("input = [" + input + "]:");
            > if (goodtokens.Cou nt > 0)
            > foreach (string token in goodtokens)
            > Console.WriteLi ne(" - [" + token + "]");
            > else
            > Console.WriteLi ne(" - none");
            > }
            > }
            >
            > Hope this helps,
            > Greg
            > --
            > I have felt for a long time that a talent for programming consists largely
            > of the abilty to switch readily from microscopic to macroscopic views of
            > things, i.e., to change levels of abstraction fluently.
            > -- Donald E. Knuth, "Structured Programming with go to Statements"[/color]


            Comment

            Working...