A vote for re scanner

**Jeremy Fincher** · Jul 18 '05, 05:26 AM

Re: A vote for re scanner

wade@lightlink. com (Wade Leftwich) wrote in message news:<5b4785ee. 0311100714.1445 cdfb@posting.go ogle.com>...[color=blue]
> Every couple of months I have a use for the experimental 'scanner'
> object in the re module, and when I do, as I did this morning, it's
> really handy. So if anyone is counting votes for making it a standard
> part of the module, here's my vote:[/color]

While I don't think they're still accepting votes :), you've pointed
me to something I didn't know about until now. What kinds of things
have you been using re.Scanner for?

Jeremy

**Wade Leftwich** · Jul 18 '05, 05:31 AM

Re: A vote for re scanner

tweedgeezer@hot mail.com (Jeremy Fincher) wrote in message news:<698f09f8. 0311101455.41f8 706a@posting.go ogle.com>...[color=blue]
> wade@lightlink. com (Wade Leftwich) wrote in message news:<5b4785ee. 0311100714.1445 cdfb@posting.go ogle.com>...[color=green]
> > Every couple of months I have a use for the experimental 'scanner'
> > object in the re module, and when I do, as I did this morning, it's
> > really handy. So if anyone is counting votes for making it a standard
> > part of the module, here's my vote:[/color]
>
> While I don't think they're still accepting votes :), you've pointed
> me to something I didn't know about until now. What kinds of things
> have you been using re.Scanner for?
>
> Jeremy[/color]

A scanner is constructed from a regex object and a string to be
scanned. Each call to the scanner's search() method returns the next
match object of the regex on the string. So to work on a string that
has multiple matches, it's the bee's roller skates.

**Dang Griffith** · Jul 18 '05, 05:32 AM

Re: A vote for re scanner

On 12 Nov 2003 13:04:36 -0800, wade@lightlink. com (Wade Leftwich)
wrote:
[color=blue]
>tweedgeezer@ho tmail.com (Jeremy Fincher) wrote in message news:<698f09f8. 0311101455.41f8 706a@posting.go ogle.com>...[color=green]
>> wade@lightlink. com (Wade Leftwich) wrote in message news:<5b4785ee. 0311100714.1445 cdfb@posting.go ogle.com>...[color=darkred]
>> > Every couple of months I have a use for the experimental 'scanner'
>> > object in the re module, and when I do, as I did this morning, it's
>> > really handy. So if anyone is counting votes for making it a standard
>> > part of the module, here's my vote:[/color]
>>
>> While I don't think they're still accepting votes :), you've pointed
>> me to something I didn't know about until now. What kinds of things
>> have you been using re.Scanner for?
>>
>> Jeremy[/color]
>
>A scanner is constructed from a regex object and a string to be
>scanned. Each call to the scanner's search() method returns the next
>match object of the regex on the string. So to work on a string that
>has multiple matches, it's the bee's roller skates.[/color]

Or in Eric's case, *the* roller skate.
--dang

**Alex Martelli** · Jul 18 '05, 05:33 AM

Re: A vote for re scanner

Wade Leftwich wrote:
...[color=blue]
> A scanner is constructed from a regex object and a string to be
> scanned. Each call to the scanner's search() method returns the next
> match object of the regex on the string. So to work on a string that
> has multiple matches, it's the bee's roller skates.[/color]

....if that method's name was 'next' (and an appropriate __iter__
also present) it might be even cooler, though...

Alex

**Wade Leftwich** · Jul 18 '05, 05:37 AM

Re: A vote for re scanner

Alex Martelli <aleax@aleax.it > wrote:[color=blue]
> Wade Leftwich wrote:
> ...[color=green]
> > A scanner is constructed from a regex object and a string to be
> > scanned. Each call to the scanner's search() method returns the next
> > match object of the regex on the string. So to work on a string that
> > has multiple matches, it's the bee's roller skates.[/color]
>
> ...if that method's name was 'next' (and an appropriate __iter__
> also present) it might be even cooler, though...
>
>
> Alex[/color]

Indeed:
[color=blue][color=green][color=darkred]
>>> class CoolerScanner(o bject):[/color][/color][/color]
.... def __init__(self, regex, s):
.... self.scanner = regex.scanner(s )
.... def next(self):
.... m = self.scanner.se arch()
.... if m:
.... return m
.... else:
.... raise StopIteration
.... def __iter__(self):
.... while 1:
.... yield self.next()
....[color=blue][color=green][color=darkred]
>>> regex = re.compile(r'(? P<before>.)a(?P <after>.)')
>>> s = '1ab2ac3ad'
>>> for m in CoolerScanner(r egex, s):[/color][/color][/color]
.... print m.group('before '), m.group('after' )
....
1 b
2 c
3 d[color=blue][color=green][color=darkred]
>>>[/color][/color][/color]

-- Wade

**Fredrik Lundh** · Jul 18 '05, 05:37 AM

Re: A vote for re scanner

Wade Leftwich wrote:
[color=blue][color=green][color=darkred]
> >>> regex = re.compile(r'(? P<before>.)a(?P <after>.)')
> >>> s = '1ab2ac3ad'
> >>> for m in CoolerScanner(r egex, s):[/color][/color]
> ... print m.group('before '), m.group('after' )
> ...
> 1 b
> 2 c
> 3 d[/color]
[color=blue][color=green][color=darkred]
>>> regex = re.compile(r'(? P<before>.)a(?P <after>.)')
>>> s = '1ab2ac3ad'
>>> for m in regex.finditer( s):[/color][/color][/color]
.... print m.group('before '), m.group('after' )
....
1 b
2 c
3 d

</F>

**Fredrik Lundh** · Jul 18 '05, 05:37 AM

Re: A vote for re scanner

Alex Martelli wrote:
[color=blue]
> Wade Leftwich wrote:
> ...[color=green]
> > A scanner is constructed from a regex object and a string to be
> > scanned. Each call to the scanner's search() method returns the next
> > match object of the regex on the string. So to work on a string that
> > has multiple matches, it's the bee's roller skates.[/color]
>
> ...if that method's name was 'next' (and an appropriate __iter__
> also present) it might be even cooler, though...[/color]

re.finditer

</F>

**Alex Martelli** · Jul 18 '05, 05:38 AM

Re: A vote for re scanner

Fredrik Lundh wrote:
[color=blue]
> Alex Martelli wrote:
>[color=green]
>> Wade Leftwich wrote:
>> ...[color=darkred]
>> > A scanner is constructed from a regex object and a string to be
>> > scanned. Each call to the scanner's search() method returns the next
>> > match object of the regex on the string. So to work on a string that
>> > has multiple matches, it's the bee's roller skates.[/color]
>>
>> ...if that method's name was 'next' (and an appropriate __iter__
>> also present) it might be even cooler, though...[/color]
>
> re.finditer[/color]

Yep. So the scanner isn't warranted any longer, right?

Alex

**Wade Leftwich** · Jul 18 '05, 05:38 AM

Re: A vote for re scanner

"Fredrik Lundh" <fredrik@python ware.com> wrote in message news:<mailman.7 65.1068940219.7 02.python-list@python.org >...[color=blue]
> Wade Leftwich wrote:
>[color=green][color=darkred]
> > >>> regex = re.compile(r'(? P<before>.)a(?P <after>.)')
> > >>> s = '1ab2ac3ad'
> > >>> for m in CoolerScanner(r egex, s):[/color]
> > ... print m.group('before '), m.group('after' )
> > ...
> > 1 b
> > 2 c
> > 3 d[/color]
>[color=green][color=darkred]
> >>> regex = re.compile(r'(? P<before>.)a(?P <after>.)')
> >>> s = '1ab2ac3ad'
> >>> for m in regex.finditer( s):[/color][/color]
> ... print m.group('before '), m.group('after' )
> ...
> 1 b
> 2 c
> 3 d
>
> </F>[/color]

There I go, reimplementing the wheel again. Guess I didn't pay enough
attention to "What's New In 2.2". Thanks for the pointer. It appears
we don't need that scanner() method after all.

However, from my point of view it was a good exercise, because now I
know how easy it is to make an iterator.

Thanks again

-- Wade

**Fredrik Lundh** · Jul 18 '05, 05:38 AM

Re: A vote for re scanner

Alex Martelli wrote:
[color=blue][color=green][color=darkred]
> >> ...if that method's name was 'next' (and an appropriate __iter__
> >> also present) it might be even cooler, though...[/color]
> >
> > re.finditer[/color]
>
> Yep. So the scanner isn't warranted any longer, right?[/color]

if you remove it, you'll break re.Scanner.

</F>

**allanc** · Jul 18 '05, 08:09 AM

Line Text Parsing

I'm new with python so bear with me.

I'm looking for a way to elegantly parse fixed-width text data (as opposed
to CSV) and saving the parsed data unto a database. The text data comes
from an old ISAM-format table and each line may be a different record
structure depending on key fields in the line.

RegExp with match and split are of interest but it's been too long since
I've dabbled with RE to be able to judge whether its use will make the
problem more complex.

Here's a sample of the records I need to parse:

01508390019002 11284361000002S UGARPLUM
015083915549 SHORT ON LAST ORDER
0150839220692 000002EA BMC 15 KG 001400

1st Line is a (portion of) header record.
2nd Line is an text instruction record.
3rd Line is a Transaction Line Item record.

Each type of record has a different structure. But these set of lines
appear in the one table.

Any ideas would be greatly appreciated.

Allan

**Dang Griffith** · Jul 18 '05, 08:09 AM

Re: Line Text Parsing

On Wed, 04 Feb 2004 19:35:52 GMT, allanc
<kawNOSPAMenks@ nospamyahoo.ca> wrote:
[color=blue]
>I'm new with python so bear with me.
>
>I'm looking for a way to elegantly parse fixed-width text data (as opposed
>to CSV) and saving the parsed data unto a database. The text data comes
>from an old ISAM-format table and each line may be a different record
>structure depending on key fields in the line.
>
>RegExp with match and split are of interest but it's been too long since
>I've dabbled with RE to be able to judge whether its use will make the
>problem more complex.
>
>Here's a sample of the records I need to parse:
>
>015083900190 02 11284361000002S UGARPLUM
>015083915549 SHORT ON LAST ORDER
>015083922069 2 000002EA BMC 15 KG 001400
>
>1st Line is a (portion of) header record.
>2nd Line is an text instruction record.
>3rd Line is a Transaction Line Item record.
>
>Each type of record has a different structure. But these set of lines
>appear in the one table.[/color]

Are the key fields in fixed positions? If so, pluck them out and use
them as an index into a dictionary of functions to call. I can't tell
from your example where the keys are, so I'm assuming the first 8 are
simply a line number and the next 4 are the key.

Maybe something along these lines:

def header(x):
print 'header: %s' % x # process header

def testinstruction (x):
print 'test instruction: %s' % x # process test instruction

def lineitem(x):
print 'lineitem: %s' % x # process line item

ptable = {'0190':header, '5549': testinstruction , '2069': lineitem}

for line in file("data.dat" ):
ptable[line[8:12]](line)

--dang

**David Goodger** · Jul 18 '05, 08:09 AM

Re: Line Text Parsing

allanc wrote:[color=blue]
> Here's a sample of the records I need to parse:
>
> 01508390019002 11284361000002S UGARPLUM
> 015083915549 SHORT ON LAST ORDER
> 0150839220692 000002EA BMC 15 KG 001400
>
> 1st Line is a (portion of) header record.
> 2nd Line is an text instruction record.
> 3rd Line is a Transaction Line Item record.[/color]

I've written many programs to parse data very similar to this,
until I generalized the algorithm (a line-oriented state machine)
into a module. You can find the module (internally documented)
at http://docutils.sf.net/docutils/statemachine.py.

Hope it helps!

--
David Goodger http://python.net/~goodger
For hire: http://python.net/~goodger/cv

**wes weston** · Jul 18 '05, 08:09 AM

Re: Line Text Parsing

allanc wrote:[color=blue]
> I'm new with python so bear with me.
>
> I'm looking for a way to elegantly parse fixed-width text data (as opposed
> to CSV) and saving the parsed data unto a database. The text data comes
> from an old ISAM-format table and each line may be a different record
> structure depending on key fields in the line.
>
> RegExp with match and split are of interest but it's been too long since
> I've dabbled with RE to be able to judge whether its use will make the
> problem more complex.
>
> Here's a sample of the records I need to parse:
>
> 01508390019002 11284361000002S UGARPLUM
> 015083915549 SHORT ON LAST ORDER
> 0150839220692 000002EA BMC 15 KG 001400
>
> 1st Line is a (portion of) header record.
> 2nd Line is an text instruction record.
> 3rd Line is a Transaction Line Item record.
>
> Each type of record has a different structure. But these set of lines
> appear in the one table.
>
>
> Any ideas would be greatly appreciated.
>
> Allan[/color]

allanc,
-slices as in str[0:5] or str[5:] or str[5:-1] - get pieces of a string
-you'll probably want to strip leading/trailing spaces; see strings doc
-you may need to cast/convert
_int = int("55")
_float = float("4.2")
wes

A vote for re scanner

A vote for re scanner

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment