Can Regex do this ?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Craig Kenisston

    Can Regex do this ?

    I have the sudden need to split a text that may have any of the
    following tokens :

    Words with quotes or double quotes.
    Words with no quotes at all.
    Numbers with and without decimal points, no commas allowed, but may
    contain parenthesis which I would like to keep apart to drop later.

    They may be separated by comas, spaces or semicolon.

    So my string my have this content :
    ProductNumber; (1234.44), "The Name", 'This, that and more'

    And I would be willing to get this strings :

    ProductNumber
    ;
    (
    1234.44
    )
    ,
    "The Name"
    ,
    'This, that and more'

    I have two days days with this with no luck. I've tried several
    combinations and I either get one functionality working or drop
    another.

    Thanks in advance for your help.
  • Juan Gabriel Del Cid

    #2
    Re: Can Regex do this ?

    > I have the sudden need to split a text that may have any of the[color=blue]
    > following tokens :
    >
    > - Words with quotes or double quotes.
    > - Words with no quotes at all.
    > - Numbers with and without decimal points,
    > no commas allowed, but may contain parenthesis
    > which I would like to keep apart to drop later.
    >
    > They may be separated by comas, spaces or semicolon.[/color]

    Ok, lets supose you left out grouping functionality (i.e. qoutes and double
    coutes are not grouping operators). If this were the case, this regular
    expression will spilt the for you:

    Regex splitter = new Regex("[\\s,;]+");
    string []splitItems = splitter.Split( myString);

    This is without grouping. When you throw in grouping functionality, you need
    a parser. Regular expressions wont cut it. You need to think of:

    - unballanced grouping chars (e.g. an unclosed quote)
    - escaping grouping chars (e.g. if you want the name O'Neal in a word)
    - double quotes inside single quotes and viceversa

    For this to work you need to write a parser. It's really not that hard, but
    it's not as easy as a regex, :-).

    Hope this helps,
    -JG


    Comment

    • Peter Koen

      #3
      Re: Can Regex do this ?

      craigkenisston@ hotmail.com (Craig Kenisston) wrote in
      news:7c5541e5.0 310221410.56315 948@posting.goo gle.com:

      [...][color=blue]
      >
      > So my string my have this content :
      > ProductNumber; (1234.44), "The Name", 'This, that and more'
      >
      > And I would be willing to get this strings :
      >
      > ProductNumber
      > ;
      > (
      > 1234.44
      > )
      > ,
      > "The Name"
      > ,
      > 'This, that and more'[/color]


      What about

      string result[] = s.Split(new char[]{'\"', '\'', '(', ')'});

      foreach(string str in result)
      {
      string nextresult = str.Split(new char[]{';',','});
      //do some further processing
      }

      that would be much faster than a regex

      --
      best regards

      Peter Koen
      -----------------------------------
      MCAD, CAI/R, CAI/S, CASE/RS, CAT/RS

      Comment

      • 100

        #4
        Re: Can Regex do this ?

        Hi Craig,
        Grammars fall in different classes. Regular expressions are the smallest
        one.
        IMHO your case cannot be described with regular expressions . Rather you
        should use a context-free grammar.
        So my suggestion is to stop wasting your time. You can still use regex for
        tokens like strings, numbers and identifiers, but the overall structure of
        the input has to be described according to some context-free grammar
        There are several techniques for parsing text which is descriped with
        context-free grammars. All fo them but one need tools for generating the
        parser. Recently I read a post in this news group where one was looking for
        *lex* and *yacc* for C#. Such tools you need. However they are hard to be
        used if you don't have experience with compilers design. They have their own
        programming language and generate code (class) for parsing the input text.
        What I may suggest you is to use the method that can be programmed by hand.
        It is called "recursive descent parsing" and is pretty straightforward .
        Unfotunatelly I can't point you to good sources, but hopefully someone on
        the group can do so.
        There is *interpretter design pattern* coverring this method. In GOF book
        about design patterns you can find an example of using it. So you can start
        there.
        You can check out this article for more details as well.
        Same day loans in the UK While their name states what they are, same day loans now come in a...


        HTH
        B\rgds
        100


        "Craig Kenisston" <craigkenisston @hotmail.com> wrote in message
        news:7c5541e5.0 310221410.56315 948@posting.goo gle.com...[color=blue]
        > I have the sudden need to split a text that may have any of the
        > following tokens :
        >
        > Words with quotes or double quotes.
        > Words with no quotes at all.
        > Numbers with and without decimal points, no commas allowed, but may
        > contain parenthesis which I would like to keep apart to drop later.
        >
        > They may be separated by comas, spaces or semicolon.
        >
        > So my string my have this content :
        > ProductNumber; (1234.44), "The Name", 'This, that and more'
        >
        > And I would be willing to get this strings :
        >
        > ProductNumber
        > ;
        > (
        > 1234.44
        > )
        > ,
        > "The Name"
        > ,
        > 'This, that and more'
        >
        > I have two days days with this with no luck. I've tried several
        > combinations and I either get one functionality working or drop
        > another.
        >
        > Thanks in advance for your help.[/color]


        Comment

        Working...