Regular Expressions to Split Lists Into Sub-Lists

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Yimin Rong

    Regular Expressions to Split Lists Into Sub-Lists

    For example, given a string "A, B, C (P, Q, R), D (X, Y [K, L, M ,N],
    Z)".

    Would like to split into tokens thusly:

    a[0] == "A"
    a[1] == "B"
    a[2] == "C (P, Q, R)"
    a[3] == "D (X, Y [K, L, M ,N], Z)"

    i.e. do not descend into sub-lists

    PHP split() using commas as a delimiter will give 14 tokens.

    I can write a routine which checks the input byte by byte and
    increments or decrements a counter based on how many opening "( [ {"
    or closing ") ] }" brackets it sees. If counter 0, this means ignore
    delimiters (i.e. keep looking). Guaranteed to work, but to my mind
    seems to be rather clunky.

    Is it possible to extract the tokens using regular expressions? E.g.
    substitute highest level commas with a special delimiter say "~", and
    split using that delimiter.

    Thanks for reading.

    Regards,

    YR
  • Curtis

    #2
    Re: Regular Expressions to Split Lists Into Sub-Lists

    Yimin Rong wrote:
    For example, given a string "A, B, C (P, Q, R), D (X, Y [K, L, M ,N],
    Z)".
    >
    Would like to split into tokens thusly:
    >
    a[0] == "A"
    a[1] == "B"
    a[2] == "C (P, Q, R)"
    a[3] == "D (X, Y [K, L, M ,N], Z)"
    >
    i.e. do not descend into sub-lists
    >
    PHP split() using commas as a delimiter will give 14 tokens.
    >
    I can write a routine which checks the input byte by byte and
    increments or decrements a counter based on how many opening "( [ {"
    or closing ") ] }" brackets it sees. If counter 0, this means ignore
    delimiters (i.e. keep looking). Guaranteed to work, but to my mind
    seems to be rather clunky.
    >
    Is it possible to extract the tokens using regular expressions? E.g.
    substitute highest level commas with a special delimiter say "~", and
    split using that delimiter.
    Seems like you already have your answer here. If the delimiters for
    the top-level are different, it shouldn't be a problem to split on them.

    --
    Curtis

    Comment

    • Yimin Rong

      #3
      Re: Regular Expressions to Split Lists Into Sub-Lists

      On Aug 17, 6:29 pm, Curtis <dye...@gmail.c omwrote:
      Yimin Rong wrote:
      For example, given a string "A, B, C (P, Q, R), D (X, Y [K, L, M ,N],
      Z)".
      >
      Would like tosplitinto tokens thusly:
      >
      a[0] == "A"
      a[1] == "B"
      a[2] == "C (P, Q, R)"
      a[3] == "D (X, Y [K, L, M ,N], Z)"
      >
      i.e. do not descend into sub-lists
      >
      PHPsplit() using commas as a delimiter will give 14 tokens.
      >
      I can write a routine which checks the input byte by byte and
      increments or decrements a counter based on how many opening "( [ {"
      or closing ") ] }" brackets it sees. If counter 0, this means ignore
      delimiters (i.e. keep looking). Guaranteed to work, but to my mind
      seems to be rather clunky.
      >
      Is it possible to extract the tokens using regular expressions? E.g.
      substitute highest level commas with a special delimiter say "~", and
      splitusing that delimiter.
      >
      Seems like you already have your answer here. If the delimiters for
      the top-level are different, it shouldn't be a problem tospliton them.
      >
      --
      Curtis
      Agreed, however the step I need is to replace top level delimiters in
      the input. Do you think regular expression substitution can do this? /
      YR

      Comment

      Working...