RE: parsing "&A" in a string..

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • bruce

    RE: parsing "&A" in a string..

    aha...

    it's the beautifulsoup() that's taking the "&E" and giving the "&E;"...



    -----Original Message-----
    From: python-list-bounces+bedougl as=earthlink.ne t@python.org
    [mailto:python-list-bounces+bedougl as=earthlink.ne t@python.org]On Behalf
    Of Fredrik Lundh
    Sent: Sunday, August 31, 2008 1:10 PM
    To: python-list@python.org
    Subject: Re: parsing "&A" in a string..


    bruce wrote:
    a pretty simple question, i'm guessing.
    >
    i have a text/html string that looks like:
    ....(A&E)
    >
    the issue i have is that when i parse it using xpath/node/toString,
    i get the following
    >
    ...(A&E;).
    that's because your parser is interpreting the &E part as an entity
    reference, and the serializer is then adding the missing semicolon.

    bare ampersands must be written as "&" in the file.

    </F>

    --

  • Tim Roberts

    #2
    Re: parsing &quot;&amp;A&qu ot; in a string..

    "bruce" <bedouglas@eart hlink.netwrote:
    >
    >it's the beautifulsoup() that's taking the "&E" and giving the "&E;"...
    Right, as it should. "A&E" is not valid HTML, and beautifulsoup expects
    valid HTML.

    This can be difficult to fix in the general case, because your page might
    already contain "&amp;". If it is possible that some of them might be
    wrong while some are right, you can do something like:

    s = s.replace( '&amp;', '&' ).replace( '&', '&amp;' )
    --
    Tim Roberts, timr@probo.com
    Providenza & Boekelheide, Inc.

    Comment

    Working...