christopher taylor wrote:
First, [0-9A-F] cannot match an "X". Assuming that's a typo, your next
problem is a precedence issue: (X)+ means "one or more (X)", not "one or
more X inside parens". In other words, that pattern matches one or more
X's and captures the last one.
Assuming that you want to find runs of \uXXXX escapes, simply use
non-capturing parentheses:
pat = re.compile(u"(? :\\\u[0-9A-F]{4})")
and use group(0) instead of group(1) to get the match.
</F>
my issue, is that the pattern i used was returning:
>
[ '\\uAD0X', '\\u1BF3', ... ]
>
when i expected:
>
[ '\\uAD0X\\u1BF3 ', ]
>
the code looks something like this:
>
pat = re.compile("(\\ \u[0-9A-F]{4})+", re.UNICODE|re.L OCALE)
#print pat.findall(txt _line)
results = pat.finditer(tx t_line)
>
i ran the pattern through a couple of my colleagues and they were all
in agreement that my pattern should have matched correctly.
>
[ '\\uAD0X', '\\u1BF3', ... ]
>
when i expected:
>
[ '\\uAD0X\\u1BF3 ', ]
>
the code looks something like this:
>
pat = re.compile("(\\ \u[0-9A-F]{4})+", re.UNICODE|re.L OCALE)
#print pat.findall(txt _line)
results = pat.finditer(tx t_line)
>
i ran the pattern through a couple of my colleagues and they were all
in agreement that my pattern should have matched correctly.
problem is a precedence issue: (X)+ means "one or more (X)", not "one or
more X inside parens". In other words, that pattern matches one or more
X's and captures the last one.
Assuming that you want to find runs of \uXXXX escapes, simply use
non-capturing parentheses:
pat = re.compile(u"(? :\\\u[0-9A-F]{4})")
and use group(0) instead of group(1) to get the match.
</F>
Comment