re.match and non-alphanumeric characters

r · Nov 16 '08, 05:05 PM

Re: re.match and non-alphanumeric characters

On Nov 16, 10:33 am, The Web President <mattia.land... @gmail.com>
wrote:

Dear all,
>
this is really driving me nuts and any help would be extremely
appreciated.
>
I have a string that contains some numeric data. I want to isolate
these data using re.match, as follows.
>
bogus = "IFC(35m)"
data = re.match(r'(\d+ )',bogus)
print data.group(1)
>
I would expect to have "35" printed out to screen, but instead I get
an error that the regular expression did not match:
>
Traceback (most recent call last):
File "C:\Documen ts and Settings\Mattia \Desktop\Neeltj e\read.py",
line 20, in <module>
print data.group(1)
AttributeError: 'NoneType' object has no attribute 'group'
>
Note that the same holds if I look for "35" straight, instead of "\d
+". If instead I look for "IFC" it works fine. That is, apparently
re.match will match only up to the first non-alphanumeric character
and ignore anything after a "(", "_", "[" and god knows what else.
>
I am using Python 2.6 (r26:66721, latest stable version). Am I missing
something very big and very important?

try re.search or re.findall
re.match is only at the beginning of a string
i almost never use it

>>re.search('(\ d+)', bogus).group()

'35'

>>re.search('(\ d+)', bogus).span()

(4, 6)

**MRAB** · Nov 16 '08, 05:05 PM

Re: re.match and non-alphanumeric characters

On Nov 16, 4:33 pm, The Web President <mattia.land... @gmail.com>
wrote:

Dear all,
>
this is really driving me nuts and any help would be extremely
appreciated.
>
I have a string that contains some numeric data. I want to isolate
these data using re.match, as follows.
>
bogus = "IFC(35m)"
data = re.match(r'(\d+ )',bogus)
print data.group(1)
>
I would expect to have "35" printed out to screen, but instead I get
an error that the regular expression did not match:
>
Traceback (most recent call last):
File "C:\Documen ts and Settings\Mattia \Desktop\Neeltj e\read.py",
line 20, in <module>
print data.group(1)
AttributeError: 'NoneType' object has no attribute 'group'
>
Note that the same holds if I look for "35" straight, instead of "\d
+". If instead I look for "IFC" it works fine. That is, apparently
re.match will match only up to the first non-alphanumeric character
and ignore anything after a "(", "_", "[" and god knows what else.
>
I am using Python 2.6 (r26:66721, latest stable version). Am I missing
something very big and very important?

re.match() anchors the match at the start of the string. What you need
is re.search(). It's all in the documentation! :-)

**Gabriel Genellina** · Nov 16 '08, 05:15 PM

Re: re.match and non-alphanumeric characters

En Sun, 16 Nov 2008 14:33:42 -0200, The Web President
<mattia.landoni @gmail.comescri bió:

I have a string that contains some numeric data. I want to isolate
these data using re.match, as follows.
>
bogus = "IFC(35m)"
data = re.match(r'(\d+ )',bogus)
print data.group(1)
>
I would expect to have "35" printed out to screen, but instead I get
an error that the regular expression did not match:

re — Regular expression operations

http://docs.python.org/library/re.html#matching-vs-searching

Source code: Lib/re/ This module provides regular expression matching operations similar to those found in Perl. Both patterns and strings to be searched can be Unicode strings ( str) as well as 8-...

--
Gabriel Genellina

**Diez B. Roggisch** · Nov 16 '08, 05:45 PM

Re: re.match and non-alphanumeric characters

The Web President wrote:

Dear all,
>
this is really driving me nuts and any help would be extremely
appreciated.
>
I have a string that contains some numeric data. I want to isolate
these data using re.match, as follows.
>
bogus = "IFC(35m)"
data = re.match(r'(\d+ )',bogus)
print data.group(1)
>
I would expect to have "35" printed out to screen, but instead I get
an error that the regular expression did not match:
>
Traceback (most recent call last):
File "C:\Documen ts and Settings\Mattia \Desktop\Neeltj e\read.py",
line 20, in <module>
print data.group(1)
AttributeError: 'NoneType' object has no attribute 'group'
>
Note that the same holds if I look for "35" straight, instead of "\d
+". If instead I look for "IFC" it works fine. That is, apparently
re.match will match only up to the first non-alphanumeric character
and ignore anything after a "(", "_", "[" and god knows what else.
>
I am using Python 2.6 (r26:66721, latest stable version). Am I missing
something very big and very important?

Yep - re.search. Match matches the whole string. You want searching.

Diez

**John Machin** · Nov 16 '08, 09:05 PM

Re: re.match and non-alphanumeric characters

On Nov 17, 4:44 am, "Diez B. Roggisch" <de...@nospam.w eb.dewrote:

Match matches the whole string.

*ONLY* if the pattern ends with "$" or r"\Z"

**Diez B. Roggisch** · Nov 16 '08, 11:25 PM

Re: re.match and non-alphanumeric characters

John Machin schrieb:

On Nov 17, 4:44 am, "Diez B. Roggisch" <de...@nospam.w eb.dewrote:
>

> Match matches the whole string.

>
*ONLY* if the pattern ends with "$" or r"\Z"

You think so?

import re

rex = re.compile("abc .*def")

if rex.match("abc0 123455678def"):
print "matched"

Diez

**Steve Holden** · Nov 16 '08, 11:55 PM

Re: re.match and non-alphanumeric characters

Diez B. Roggisch wrote:

John Machin schrieb:

>On Nov 17, 4:44 am, "Diez B. Roggisch" <de...@nospam.w eb.dewrote:
>>

>> Match matches the whole string.

>>
>*ONLY* if the pattern ends with "$" or r"\Z"

>
>
You think so?
>
import re
>
rex = re.compile("abc .*def")
>
if rex.match("abc0 123455678def"):
print "matched"
>

Your test is inconclusive: necessary, but not sufficient.

>>rex = re.compile("abc .*def")
>>>
>>if rex.match("abc0 123455678defPLU SEXTRASTUFF"):

.... print "Matched"
....
Matched

>>>

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

**John Machin** · Nov 17 '08, 12:05 AM

Re: re.match and non-alphanumeric characters

On Nov 17, 10:19 am, "Diez B. Roggisch" <de...@nospam.w eb.dewrote:

John Machin schrieb:
>

On Nov 17, 4:44 am, "Diez B. Roggisch" <de...@nospam.w eb.dewrote:

>

Match matches the whole string.

>

*ONLY* if the pattern ends with "$" or r"\Z"

>
You think so?
>
import re
>
rex = re.compile("abc .*def")
>
if rex.match("abc0 123455678def"):
print "matched"
>

OK, I'll try again:

The following 3-tuples represent (pattern, string,
matched_portion _of_string):
('abc', 'abc', 'abc')
('abc', 'abcdef', 'abc')
('abc$', 'abc', 'abc')
('abc$', 'abcdef', '<no match>')

Saying "Match matches the whole string" is incorrect; see the second
case. If you want to ensure that the whole string matches the pattern,
the pattern needs to be terminated by "$" or "\Z".

re.match and non-alphanumeric characters

re.match and non-alphanumeric characters

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment