How to replace an attribute with empty string using regex?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • ashitpro
    Recognized Expert Contributor
    • Aug 2007
    • 542

    How to replace an attribute with empty string using regex?

    I am parsing one xml, I want to replace an attribute with empty string.

    every node has an attribute, something like this:

    id="1hyx36uhpi7 80iq8oiu355"

    I am using following regex pattern, but its not working

    d = re.search('(id= "([a-z0-9A-Z]+)")*',text)
  • bvdet
    Recognized Expert Specialist
    • Oct 2006
    • 2851

    #2
    In what way is it not working?

    Are you parsing the XML with an XML parser such as minidom? If not, you should consider doing so.

    If parsing the string directly, have you tried parsing it line by line?

    This seems to work:
    Code:
    >>> import re
    >>> patt = re.compile(r'id="[a-z0-9A-Z]+"')
    >>> s = 'id="1hyx36uhpi780iq8oiu355"   xxxxx xxxxx id="46fhrt5976jkfjhrh"'
    >>> s1 = patt.sub('id=""', s)
    >>> s1
    'id=""   xxxxx xxxxx id=""'
    >>>

    Comment

    • Oralloy
      Recognized Expert Contributor
      • Jun 2010
      • 988

      #3
      ashtipro,

      It looks like you've got a problematic indefinite repetition operator in your statement:

      Code:
      d = re.search('(id="([a-z0-9A-Z]+)")*',text)
          HERE  --------------------------^
      Which I think will cause undesired matches of zero length.

      Also, I don't know what your data stream looks like, however you may need to check for single quotes (')enclosing the attribute value, as well.

      Luck!

      Comment

      Working...