Regex, TextReader...?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Masahiro Ito

    Regex, TextReader...?

    I have attached a block of text similar to the type that I am working
    with.

    I have been learning a lot about Regex - it is quite impressive. I can
    easily capture bits of info, but I keep having trouble with line breaks.

    I want to identify the start and end of blocks of text. Are there some
    tips someone can share?

    EG: in my text, I can grab a collection of everyones Phone number with:
    ^"M:"\t"(?<Phon eNumber>[^"])"

    But, what about if I wanted to grab many lines, until it matched a
    certain pattern. I use the ^ to say not the quote, but can I say not 14
    hyphens?

    The way I have split this type of data is inefficient. I match all the
    cases of:
    ^-{14}
    Then I use many math equations to split the file using the index of the
    matches. I am sure Regex must have some way to pattern match a complex
    not, to indicate the end of my match?

    Thank you.




    --------------
    "M:" "3242310532 "
    "Subscriber Name:" "MR Regex"
    "Additional line user name:" ""
    "Sublevel:" " "
    "Sublevel:" ""
    "Reference 1:" ""
    "Reference 2:" ""

    "CURRENT CHARGES"
    "Monthly Service Plan" $40.00
    "Additional Local Airtime" $0.00
    "Long Distance Charges" $0.00
    "Roaming Charges" $0.00
    "Network and Licensing Charges" $7.20
    "Total Taxes:" $7.09
    "Total Current Charges:" $47.20

    "MONTHLY SERVICE PLAN" 11-Oct-03 to 10-Nov-03
    "Service Plan Name" "Total"
    "Mike Dispatch 40 (11-Oct-03 to 10-Nov-03)" $40.00
    "Total Monthly Service Plan Charges" $40.00

    "ADDITIONAL LOCAL AIRTIME"
    "Service" "Total Mins. Used" "Free Mins. Used" "Included Mins.
    Used" "Chargeable Mins. Used" "Total"
    "Direct Connect Private (minutes)" 28:04 28:04 0:00 0:00 $0.00
    "Total Additional Local Airtime Charges" $0.00

    "LONG DISTANCE, ROAMING AND OTHER CALL CHARGES"
    "Service" "Incl. LD Minutes" "Chargeable LD Minutes" "Total"
    "Total Long Distance Charges" $0.00

    "ROAMING"
    "Service" "Roaming Minutes" "Roaming Charges" "Roaming LD Minutes"
    "Roaming LD Charges" "Roaming Surcharge" "Total"
    "Total Roaming Charges" $0.00

    "WIRELESS WEB - PREMIUM SERVICE"
    "Service" "Total Events" "Event Type" "Total"
    "Total Wireless Web Premium Services Charges" $0.00

    "PHONE - PREMIUM SERVICE"
    "Service" "Total Events" "Event Type" "Total"
    "Total Phone Premium Services Charges" $0.00

    "PAGER SERVICES"
    "Service" "Total Messages" "Included Messages" "Chargeable
    Messages" "Total"
    "Total Pager Charges" $0.00

    "VALUE-ADDED SERVICES" 11-Oct-03 to 10-Nov-03
    "Service" "Total"
    "Wireless Web - Surf Sampler (11-Oct-03 to 10-Nov-03)" $0.00
    "Total Value Added Service Charges" $0.00

    "OTHER CHARGES AND CREDIT"
    "Charge or Credit" "Total"
    "Total Other Charges and Credits" $0.00

    "NETWORK and LICENSING CHARGES"
    "Service" "Total"
    "911 Emergency Access Charge (11-Oct-03 to 10-Nov-03)" $0.25
    "System Licensing Charge (11-Oct-03 to 10-Nov-03)" $6.95
    "Total Network Licensing Charges" $7.20

    "TAXES"
    "" "Total"
    "Total Taxes" $7.09

    --------------
    "M:" "9042437121 "
    "Subscriber Name:" "Fred 1"
    "Additional line user name:" ""
    "Sublevel:" " "
    "Sublevel:" ""
    "Reference 1:" ""
    "Reference 2:" ""

    "CURRENT CHARGES"
  • Eric Gunnerson [MS]

    #2
    Re: Regex, TextReader...?

    Yes, you can do it in regex. The trick is to allow your pattern to match
    more than one time. For example, if I had something like:

    1234
    34123
    11313
    113133
    xxxxx

    I could write something like:

    (?<Numbers>^\d+ $)+xxxxx

    Which means that I need to look at Match.Captures instead of Match.Groups,
    IIRC.

    Note that in most uses of this technique, what you really need to write is
    something like:

    ((?<Numbers> match numbers) match stuff between numbers)+xxxxx

    so that the match can continue. You may also need to play around with the
    singleline and multiline options.

    --
    Eric Gunnerson

    Visit the C# product team at http://www.csharp.net
    Eric's blog is at http://weblogs.asp.net/ericgu/

    This posting is provided "AS IS" with no warranties, and confers no rights.
    "Masahiro Ito" <masa@pleasespa mgoaway.it> wrote in message
    news:Xns947D7DE CA8DE7fujicomxy z@216.196.105.1 30...[color=blue]
    > I have attached a block of text similar to the type that I am working
    > with.
    >
    > I have been learning a lot about Regex - it is quite impressive. I can
    > easily capture bits of info, but I keep having trouble with line breaks.
    >
    > I want to identify the start and end of blocks of text. Are there some
    > tips someone can share?
    >
    > EG: in my text, I can grab a collection of everyones Phone number with:
    > ^"M:"\t"(?<Phon eNumber>[^"])"
    >
    > But, what about if I wanted to grab many lines, until it matched a
    > certain pattern. I use the ^ to say not the quote, but can I say not 14
    > hyphens?
    >
    > The way I have split this type of data is inefficient. I match all the
    > cases of:
    > ^-{14}
    > Then I use many math equations to split the file using the index of the
    > matches. I am sure Regex must have some way to pattern match a complex
    > not, to indicate the end of my match?
    >
    > Thank you.
    >
    >
    >
    >
    > --------------
    > "M:" "3242310532 "
    > "Subscriber Name:" "MR Regex"
    > "Additional line user name:" ""
    > "Sublevel:" " "
    > "Sublevel:" ""
    > "Reference 1:" ""
    > "Reference 2:" ""
    >
    > "CURRENT CHARGES"
    > "Monthly Service Plan" $40.00
    > "Additional Local Airtime" $0.00
    > "Long Distance Charges" $0.00
    > "Roaming Charges" $0.00
    > "Network and Licensing Charges" $7.20
    > "Total Taxes:" $7.09
    > "Total Current Charges:" $47.20
    >
    > "MONTHLY SERVICE PLAN" 11-Oct-03 to 10-Nov-03
    > "Service Plan Name" "Total"
    > "Mike Dispatch 40 (11-Oct-03 to 10-Nov-03)" $40.00
    > "Total Monthly Service Plan Charges" $40.00
    >
    > "ADDITIONAL LOCAL AIRTIME"
    > "Service" "Total Mins. Used" "Free Mins. Used" "Included Mins.
    > Used" "Chargeable Mins. Used" "Total"
    > "Direct Connect Private (minutes)" 28:04 28:04 0:00 0:00 $0.00
    > "Total Additional Local Airtime Charges" $0.00
    >
    > "LONG DISTANCE, ROAMING AND OTHER CALL CHARGES"
    > "Service" "Incl. LD Minutes" "Chargeable LD Minutes" "Total"
    > "Total Long Distance Charges" $0.00
    >
    > "ROAMING"
    > "Service" "Roaming Minutes" "Roaming Charges" "Roaming LD Minutes"
    > "Roaming LD Charges" "Roaming Surcharge" "Total"
    > "Total Roaming Charges" $0.00
    >
    > "WIRELESS WEB - PREMIUM SERVICE"
    > "Service" "Total Events" "Event Type" "Total"
    > "Total Wireless Web Premium Services Charges" $0.00
    >
    > "PHONE - PREMIUM SERVICE"
    > "Service" "Total Events" "Event Type" "Total"
    > "Total Phone Premium Services Charges" $0.00
    >
    > "PAGER SERVICES"
    > "Service" "Total Messages" "Included Messages" "Chargeable
    > Messages" "Total"
    > "Total Pager Charges" $0.00
    >
    > "VALUE-ADDED SERVICES" 11-Oct-03 to 10-Nov-03
    > "Service" "Total"
    > "Wireless Web - Surf Sampler (11-Oct-03 to 10-Nov-03)" $0.00
    > "Total Value Added Service Charges" $0.00
    >
    > "OTHER CHARGES AND CREDIT"
    > "Charge or Credit" "Total"
    > "Total Other Charges and Credits" $0.00
    >
    > "NETWORK and LICENSING CHARGES"
    > "Service" "Total"
    > "911 Emergency Access Charge (11-Oct-03 to 10-Nov-03)" $0.25
    > "System Licensing Charge (11-Oct-03 to 10-Nov-03)" $6.95
    > "Total Network Licensing Charges" $7.20
    >
    > "TAXES"
    > "" "Total"
    > "Total Taxes" $7.09
    >
    > --------------
    > "M:" "9042437121 "
    > "Subscriber Name:" "Fred 1"
    > "Additional line user name:" ""
    > "Sublevel:" " "
    > "Sublevel:" ""
    > "Reference 1:" ""
    > "Reference 2:" ""
    >
    > "CURRENT CHARGES"[/color]


    Comment

    • Masahiro Ito

      #3
      Re: Regex, TextReader...?

      Thank you Eric. I was doing a capture group (in my first example using
      my sample text I used (?<PhoneNumber>[^"]*) to capture everything until
      the next " in my phonenumber collection.

      In this simple example, capturing the Field 1 and Field5 value, I cannot
      reliably regex the 'everything between numbers'.

      My attempt (doesn't work:
      Field1:\s(<F1>[0-9]*)[^Field5:]*Field5:\s(?<F5 >[0-9.$]*)
      ^trouble^

      Field1: 1234
      Field2: 34123
      Field3: 1313
      Field4: 13133
      Field5: $xxxx.00
      Field6: 2342df
      Field1: 2342
      Field2: 33241
      Field3: 2142
      Field4: 543523
      Field5: $342.00
      Field6: 43254
      Field1: 3415
      Field2: 234235
      Field3: 341
      Field4: 13212533
      Field5: $5234.00
      Field6: 32415

      Of course, I can run two separate captures, but...

      You gave the example technique : ((?<Numbers> match numbers) match stuff
      between numbers)+xxxxx

      Does this +xxxxx match everything until the xxxxx is found? In my regex
      apps (I use expresso and Regex Workshop as dotnet tools) there are no
      matches.

      Thanks,

      Masa



      "Eric Gunnerson [MS]" <ericgu@online. microsoft.com> wrote in
      news:#Oygz5e5DH A.1804@TK2MSFTN GP12.phx.gbl:
      [color=blue]
      > Yes, you can do it in regex. The trick is to allow your pattern to
      > match more than one time. For example, if I had something like:
      >
      > 1234
      > 34123
      > 11313
      > 113133
      > xxxxx
      >
      > I could write something like:
      >
      > (?<Numbers>^\d+ $)+xxxxx
      >
      > Which means that I need to look at Match.Captures instead of
      > Match.Groups, IIRC.
      >
      > Note that in most uses of this technique, what you really need to
      > write is something like:
      >
      > ((?<Numbers> match numbers) match stuff between numbers)+xxxxx
      >
      > so that the match can continue. You may also need to play around with
      > the singleline and multiline options.
      >[/color]

      Comment

      • Eric Gunnerson [MS]

        #4
        Re: Regex, TextReader...?

        I'm a little confused about what you're trying to do. Given the example text
        below, what is the expect output that you want?

        If I assume that you didn't mean to write xxxx.00 for the Field5 value
        below, the following regex may do what you want:

        new Regex(@"
        (
        (?<S2>.*?)
        Field1:\s(?<F1>[0-9]*)
        (?<S1>.+?)
        Field5:\s(?<F5>[0-9.\$]+)
        )+",
        RegexOption.Ign orePatternWhite space);

        All the F1 values will be in one capture, all the F5 values in the other
        capture. I named the S1 and S2 captures so you could see what they're
        matching.

        I'd suggest using my Regex Workbench at
        http://www.gotdotnet.com/Community/U...1-4EE2729D7322 -
        it makes playing around with Regex much easier.

        --
        Eric Gunnerson

        Visit the C# product team at http://www.csharp.net
        Eric's blog is at http://weblogs.asp.net/ericgu/

        This posting is provided "AS IS" with no warranties, and confers no rights.
        "Masahiro Ito" <masa@pleasespa mgoaway.it> wrote in message
        news:Xns947F55B 615E9fujicomxyz @207.46.248.16. ..[color=blue]
        > Thank you Eric. I was doing a capture group (in my first example using
        > my sample text I used (?<PhoneNumber>[^"]*) to capture everything until
        > the next " in my phonenumber collection.
        >
        > In this simple example, capturing the Field 1 and Field5 value, I cannot
        > reliably regex the 'everything between numbers'.
        >
        > My attempt (doesn't work:
        > Field1:\s(<F1>[0-9]*)[^Field5:]*Field5:\s(?<F5 >[0-9.$]*)
        > ^trouble^
        >
        > Field1: 1234
        > Field2: 34123
        > Field3: 1313
        > Field4: 13133
        > Field5: $xxxx.00
        > Field6: 2342df
        > Field1: 2342
        > Field2: 33241
        > Field3: 2142
        > Field4: 543523
        > Field5: $342.00
        > Field6: 43254
        > Field1: 3415
        > Field2: 234235
        > Field3: 341
        > Field4: 13212533
        > Field5: $5234.00
        > Field6: 32415
        >
        > Of course, I can run two separate captures, but...
        >
        > You gave the example technique : ((?<Numbers> match numbers) match stuff
        > between numbers)+xxxxx
        >
        > Does this +xxxxx match everything until the xxxxx is found? In my regex
        > apps (I use expresso and Regex Workshop as dotnet tools) there are no
        > matches.
        >
        > Thanks,
        >
        > Masa
        >
        >
        >
        > "Eric Gunnerson [MS]" <ericgu@online. microsoft.com> wrote in
        > news:#Oygz5e5DH A.1804@TK2MSFTN GP12.phx.gbl:
        >[color=green]
        > > Yes, you can do it in regex. The trick is to allow your pattern to
        > > match more than one time. For example, if I had something like:
        > >
        > > 1234
        > > 34123
        > > 11313
        > > 113133
        > > xxxxx
        > >
        > > I could write something like:
        > >
        > > (?<Numbers>^\d+ $)+xxxxx
        > >
        > > Which means that I need to look at Match.Captures instead of
        > > Match.Groups, IIRC.
        > >
        > > Note that in most uses of this technique, what you really need to
        > > write is something like:
        > >
        > > ((?<Numbers> match numbers) match stuff between numbers)+xxxxx
        > >
        > > so that the match can continue. You may also need to play around with
        > > the singleline and multiline options.
        > >[/color]
        >[/color]


        Comment

        • Masahiro Ito

          #5
          Re: Regex, TextReader...?

          "Eric Gunnerson [MS]" <ericgu@online. microsoft.com> wrote in
          news:OQ1JlL55DH A.2496@TK2MSFTN GP09.phx.gbl:
          [color=blue]
          > I'm a little confused about what you're trying to do. Given the
          > example text below, what is the expect output that you want?
          >
          > If I assume that you didn't mean to write xxxx.00 for the Field5 value
          > below, the following regex may do what you want:
          >
          > new Regex(@"
          > (
          > (?<S2>.*?)
          > Field1:\s(?<F1>[0-9]*)
          > (?<S1>.+?)
          > Field5:\s(?<F5>[0-9.\$]+)
          > )+",
          > RegexOption.Ign orePatternWhite space);
          >
          > All the F1 values will be in one capture, all the F5 values in the
          > other capture. I named the S1 and S2 captures so you could see what
          > they're matching.
          >
          > I'd suggest using my Regex Workbench at
          > http://www.gotdotnet.com/Community/U...px?SampleGuid=
          > C712F2DF-B026-4D58-8961-4EE2729D7322 - it makes playing around with
          > Regex much easier.[/color]


          Thanks Eric. Actually, I was using your Regex Workbench already - it is
          great! Thank you for sharing it.

          Something is not clicking with me and these regex expressions. Even when I
          paste your regex, I don't believe I am getting the responses you intended.
          In the sample I posted, I am trying to capture the field 1 and field 5
          values. I can capture them separately, but can't seem to grasp the 'skip
          everything until a specific pattern is matched'.

          I am trying to break down your sample piece by piece. Does the @ at the
          start do something?

          Also, using Regex Workbench, using your sample in your first reply, I am
          not getting any matches.
          String:
          1234
          34123
          11313
          113133
          xxxxx

          Regex:
          (?<Numbers>^\d+ $)+xxxxx

          I have tried every permutation I can think of with Multi/single line, etc..
          I feel like I am going crazy.

          Thank you.

          Masa

          Comment

          Working...