A vote for re scanner

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Wade Leftwich

    A vote for re scanner

    Every couple of months I have a use for the experimental 'scanner'
    object in the re module, and when I do, as I did this morning, it's
    really handy. So if anyone is counting votes for making it a standard
    part of the module, here's my vote:

    +1

    -- Wade Leftwich
    Ithaca, NY
  • Jeremy Fincher

    #2
    Re: A vote for re scanner

    wade@lightlink. com (Wade Leftwich) wrote in message news:<5b4785ee. 0311100714.1445 cdfb@posting.go ogle.com>...[color=blue]
    > Every couple of months I have a use for the experimental 'scanner'
    > object in the re module, and when I do, as I did this morning, it's
    > really handy. So if anyone is counting votes for making it a standard
    > part of the module, here's my vote:[/color]

    While I don't think they're still accepting votes :), you've pointed
    me to something I didn't know about until now. What kinds of things
    have you been using re.Scanner for?

    Jeremy

    Comment

    • Wade Leftwich

      #3
      Re: A vote for re scanner

      tweedgeezer@hot mail.com (Jeremy Fincher) wrote in message news:<698f09f8. 0311101455.41f8 706a@posting.go ogle.com>...[color=blue]
      > wade@lightlink. com (Wade Leftwich) wrote in message news:<5b4785ee. 0311100714.1445 cdfb@posting.go ogle.com>...[color=green]
      > > Every couple of months I have a use for the experimental 'scanner'
      > > object in the re module, and when I do, as I did this morning, it's
      > > really handy. So if anyone is counting votes for making it a standard
      > > part of the module, here's my vote:[/color]
      >
      > While I don't think they're still accepting votes :), you've pointed
      > me to something I didn't know about until now. What kinds of things
      > have you been using re.Scanner for?
      >
      > Jeremy[/color]

      A scanner is constructed from a regex object and a string to be
      scanned. Each call to the scanner's search() method returns the next
      match object of the regex on the string. So to work on a string that
      has multiple matches, it's the bee's roller skates.

      Comment

      • Dang Griffith

        #4
        Re: A vote for re scanner

        On 12 Nov 2003 13:04:36 -0800, wade@lightlink. com (Wade Leftwich)
        wrote:
        [color=blue]
        >tweedgeezer@ho tmail.com (Jeremy Fincher) wrote in message news:<698f09f8. 0311101455.41f8 706a@posting.go ogle.com>...[color=green]
        >> wade@lightlink. com (Wade Leftwich) wrote in message news:<5b4785ee. 0311100714.1445 cdfb@posting.go ogle.com>...[color=darkred]
        >> > Every couple of months I have a use for the experimental 'scanner'
        >> > object in the re module, and when I do, as I did this morning, it's
        >> > really handy. So if anyone is counting votes for making it a standard
        >> > part of the module, here's my vote:[/color]
        >>
        >> While I don't think they're still accepting votes :), you've pointed
        >> me to something I didn't know about until now. What kinds of things
        >> have you been using re.Scanner for?
        >>
        >> Jeremy[/color]
        >
        >A scanner is constructed from a regex object and a string to be
        >scanned. Each call to the scanner's search() method returns the next
        >match object of the regex on the string. So to work on a string that
        >has multiple matches, it's the bee's roller skates.[/color]

        Or in Eric's case, *the* roller skate.
        --dang

        Comment

        • Alex Martelli

          #5
          Re: A vote for re scanner

          Wade Leftwich wrote:
          ...[color=blue]
          > A scanner is constructed from a regex object and a string to be
          > scanned. Each call to the scanner's search() method returns the next
          > match object of the regex on the string. So to work on a string that
          > has multiple matches, it's the bee's roller skates.[/color]

          ....if that method's name was 'next' (and an appropriate __iter__
          also present) it might be even cooler, though...


          Alex

          Comment

          • Wade Leftwich

            #6
            Re: A vote for re scanner

            Alex Martelli <aleax@aleax.it > wrote:[color=blue]
            > Wade Leftwich wrote:
            > ...[color=green]
            > > A scanner is constructed from a regex object and a string to be
            > > scanned. Each call to the scanner's search() method returns the next
            > > match object of the regex on the string. So to work on a string that
            > > has multiple matches, it's the bee's roller skates.[/color]
            >
            > ...if that method's name was 'next' (and an appropriate __iter__
            > also present) it might be even cooler, though...
            >
            >
            > Alex[/color]

            Indeed:
            [color=blue][color=green][color=darkred]
            >>> class CoolerScanner(o bject):[/color][/color][/color]
            .... def __init__(self, regex, s):
            .... self.scanner = regex.scanner(s )
            .... def next(self):
            .... m = self.scanner.se arch()
            .... if m:
            .... return m
            .... else:
            .... raise StopIteration
            .... def __iter__(self):
            .... while 1:
            .... yield self.next()
            ....[color=blue][color=green][color=darkred]
            >>> regex = re.compile(r'(? P<before>.)a(?P <after>.)')
            >>> s = '1ab2ac3ad'
            >>> for m in CoolerScanner(r egex, s):[/color][/color][/color]
            .... print m.group('before '), m.group('after' )
            ....
            1 b
            2 c
            3 d[color=blue][color=green][color=darkred]
            >>>[/color][/color][/color]

            -- Wade

            Comment

            • Fredrik Lundh

              #7
              Re: A vote for re scanner

              Wade Leftwich wrote:
              [color=blue][color=green][color=darkred]
              > >>> regex = re.compile(r'(? P<before>.)a(?P <after>.)')
              > >>> s = '1ab2ac3ad'
              > >>> for m in CoolerScanner(r egex, s):[/color][/color]
              > ... print m.group('before '), m.group('after' )
              > ...
              > 1 b
              > 2 c
              > 3 d[/color]
              [color=blue][color=green][color=darkred]
              >>> regex = re.compile(r'(? P<before>.)a(?P <after>.)')
              >>> s = '1ab2ac3ad'
              >>> for m in regex.finditer( s):[/color][/color][/color]
              .... print m.group('before '), m.group('after' )
              ....
              1 b
              2 c
              3 d

              </F>




              Comment

              • Fredrik Lundh

                #8
                Re: A vote for re scanner

                Alex Martelli wrote:
                [color=blue]
                > Wade Leftwich wrote:
                > ...[color=green]
                > > A scanner is constructed from a regex object and a string to be
                > > scanned. Each call to the scanner's search() method returns the next
                > > match object of the regex on the string. So to work on a string that
                > > has multiple matches, it's the bee's roller skates.[/color]
                >
                > ...if that method's name was 'next' (and an appropriate __iter__
                > also present) it might be even cooler, though...[/color]

                re.finditer

                </F>




                Comment

                • Alex Martelli

                  #9
                  Re: A vote for re scanner

                  Fredrik Lundh wrote:
                  [color=blue]
                  > Alex Martelli wrote:
                  >[color=green]
                  >> Wade Leftwich wrote:
                  >> ...[color=darkred]
                  >> > A scanner is constructed from a regex object and a string to be
                  >> > scanned. Each call to the scanner's search() method returns the next
                  >> > match object of the regex on the string. So to work on a string that
                  >> > has multiple matches, it's the bee's roller skates.[/color]
                  >>
                  >> ...if that method's name was 'next' (and an appropriate __iter__
                  >> also present) it might be even cooler, though...[/color]
                  >
                  > re.finditer[/color]

                  Yep. So the scanner isn't warranted any longer, right?


                  Alex

                  Comment

                  • Wade Leftwich

                    #10
                    Re: A vote for re scanner

                    "Fredrik Lundh" <fredrik@python ware.com> wrote in message news:<mailman.7 65.1068940219.7 02.python-list@python.org >...[color=blue]
                    > Wade Leftwich wrote:
                    >[color=green][color=darkred]
                    > > >>> regex = re.compile(r'(? P<before>.)a(?P <after>.)')
                    > > >>> s = '1ab2ac3ad'
                    > > >>> for m in CoolerScanner(r egex, s):[/color]
                    > > ... print m.group('before '), m.group('after' )
                    > > ...
                    > > 1 b
                    > > 2 c
                    > > 3 d[/color]
                    >[color=green][color=darkred]
                    > >>> regex = re.compile(r'(? P<before>.)a(?P <after>.)')
                    > >>> s = '1ab2ac3ad'
                    > >>> for m in regex.finditer( s):[/color][/color]
                    > ... print m.group('before '), m.group('after' )
                    > ...
                    > 1 b
                    > 2 c
                    > 3 d
                    >
                    > </F>[/color]

                    There I go, reimplementing the wheel again. Guess I didn't pay enough
                    attention to "What's New In 2.2". Thanks for the pointer. It appears
                    we don't need that scanner() method after all.

                    However, from my point of view it was a good exercise, because now I
                    know how easy it is to make an iterator.

                    Thanks again

                    -- Wade

                    Comment

                    • Fredrik Lundh

                      #11
                      Re: A vote for re scanner

                      Alex Martelli wrote:
                      [color=blue][color=green][color=darkred]
                      > >> ...if that method's name was 'next' (and an appropriate __iter__
                      > >> also present) it might be even cooler, though...[/color]
                      > >
                      > > re.finditer[/color]
                      >
                      > Yep. So the scanner isn't warranted any longer, right?[/color]

                      if you remove it, you'll break re.Scanner.

                      </F>




                      Comment

                      • allanc

                        #12
                        Line Text Parsing

                        I'm new with python so bear with me.

                        I'm looking for a way to elegantly parse fixed-width text data (as opposed
                        to CSV) and saving the parsed data unto a database. The text data comes
                        from an old ISAM-format table and each line may be a different record
                        structure depending on key fields in the line.

                        RegExp with match and split are of interest but it's been too long since
                        I've dabbled with RE to be able to judge whether its use will make the
                        problem more complex.

                        Here's a sample of the records I need to parse:

                        01508390019002 11284361000002S UGARPLUM
                        015083915549 SHORT ON LAST ORDER
                        0150839220692 000002EA BMC 15 KG 001400

                        1st Line is a (portion of) header record.
                        2nd Line is an text instruction record.
                        3rd Line is a Transaction Line Item record.

                        Each type of record has a different structure. But these set of lines
                        appear in the one table.


                        Any ideas would be greatly appreciated.

                        Allan

                        Comment

                        • Dang Griffith

                          #13
                          Re: Line Text Parsing

                          On Wed, 04 Feb 2004 19:35:52 GMT, allanc
                          <kawNOSPAMenks@ nospamyahoo.ca> wrote:
                          [color=blue]
                          >I'm new with python so bear with me.
                          >
                          >I'm looking for a way to elegantly parse fixed-width text data (as opposed
                          >to CSV) and saving the parsed data unto a database. The text data comes
                          >from an old ISAM-format table and each line may be a different record
                          >structure depending on key fields in the line.
                          >
                          >RegExp with match and split are of interest but it's been too long since
                          >I've dabbled with RE to be able to judge whether its use will make the
                          >problem more complex.
                          >
                          >Here's a sample of the records I need to parse:
                          >
                          >015083900190 02 11284361000002S UGARPLUM
                          >015083915549 SHORT ON LAST ORDER
                          >015083922069 2 000002EA BMC 15 KG 001400
                          >
                          >1st Line is a (portion of) header record.
                          >2nd Line is an text instruction record.
                          >3rd Line is a Transaction Line Item record.
                          >
                          >Each type of record has a different structure. But these set of lines
                          >appear in the one table.[/color]

                          Are the key fields in fixed positions? If so, pluck them out and use
                          them as an index into a dictionary of functions to call. I can't tell
                          from your example where the keys are, so I'm assuming the first 8 are
                          simply a line number and the next 4 are the key.

                          Maybe something along these lines:

                          def header(x):
                          print 'header: %s' % x # process header

                          def testinstruction (x):
                          print 'test instruction: %s' % x # process test instruction

                          def lineitem(x):
                          print 'lineitem: %s' % x # process line item

                          ptable = {'0190':header, '5549': testinstruction , '2069': lineitem}

                          for line in file("data.dat" ):
                          ptable[line[8:12]](line)

                          --dang

                          Comment

                          • David Goodger

                            #14
                            Re: Line Text Parsing

                            allanc wrote:[color=blue]
                            > Here's a sample of the records I need to parse:
                            >
                            > 01508390019002 11284361000002S UGARPLUM
                            > 015083915549 SHORT ON LAST ORDER
                            > 0150839220692 000002EA BMC 15 KG 001400
                            >
                            > 1st Line is a (portion of) header record.
                            > 2nd Line is an text instruction record.
                            > 3rd Line is a Transaction Line Item record.[/color]

                            I've written many programs to parse data very similar to this,
                            until I generalized the algorithm (a line-oriented state machine)
                            into a module. You can find the module (internally documented)
                            at http://docutils.sf.net/docutils/statemachine.py.

                            Hope it helps!

                            --
                            David Goodger http://python.net/~goodger
                            For hire: http://python.net/~goodger/cv


                            Comment

                            • wes weston

                              #15
                              Re: Line Text Parsing



                              allanc wrote:[color=blue]
                              > I'm new with python so bear with me.
                              >
                              > I'm looking for a way to elegantly parse fixed-width text data (as opposed
                              > to CSV) and saving the parsed data unto a database. The text data comes
                              > from an old ISAM-format table and each line may be a different record
                              > structure depending on key fields in the line.
                              >
                              > RegExp with match and split are of interest but it's been too long since
                              > I've dabbled with RE to be able to judge whether its use will make the
                              > problem more complex.
                              >
                              > Here's a sample of the records I need to parse:
                              >
                              > 01508390019002 11284361000002S UGARPLUM
                              > 015083915549 SHORT ON LAST ORDER
                              > 0150839220692 000002EA BMC 15 KG 001400
                              >
                              > 1st Line is a (portion of) header record.
                              > 2nd Line is an text instruction record.
                              > 3rd Line is a Transaction Line Item record.
                              >
                              > Each type of record has a different structure. But these set of lines
                              > appear in the one table.
                              >
                              >
                              > Any ideas would be greatly appreciated.
                              >
                              > Allan[/color]


                              allanc,
                              -slices as in str[0:5] or str[5:] or str[5:-1] - get pieces of a string
                              -you'll probably want to strip leading/trailing spaces; see strings doc
                              -you may need to cast/convert
                              _int = int("55")
                              _float = float("4.2")
                              wes

                              Comment

                              Working...