RE: parsing "&A" in a string..

bruce
#1

RE: parsing "&A" in a string..

Aug 31 '08, 09:15 PM

aha...

it's the beautifulsoup() that's taking the "&E" and giving the "&E;"...

-----Original Message-----
From: python-list-bounces+bedougl as=earthlink.ne t@python.org
[mailto:python-list-bounces+bedougl as=earthlink.ne t@python.org]On Behalf
Of Fredrik Lundh
Sent: Sunday, August 31, 2008 1:10 PM
To: python-list@python.org
Subject: Re: parsing "&A" in a string..

bruce wrote:

a pretty simple question, i'm guessing.
>
i have a text/html string that looks like:
....(A&E)
>
the issue i have is that when i parse it using xpath/node/toString,
i get the following
>
...(A&E;).

that's because your parser is interpreting the &E part as an entity
reference, and the serializer is then adding the missing semicolon.

bare ampersands must be written as "&" in the file.

</F>

--

Mailman 3 Info | python-list@python.org - python.org

http://mail.python.org/mailman/listinfo/python-list
Tags: None
Tim Roberts
#2

Sep 1 '08, 04:35 AM

Re: parsing "&A&qu ot; in a string..

"bruce" <bedouglas@eart hlink.netwrote:

>
>it's the beautifulsoup() that's taking the "&E" and giving the "&E;"...

Right, as it should. "A&E" is not valid HTML, and beautifulsoup expects
valid HTML.

This can be difficult to fix in the general case, because your page might
already contain "&". If it is possible that some of them might be
wrong while some are right, you can do something like:

s = s.replace( '&', '&' ).replace( '&', '&' )
--
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.
Comment

RE: parsing "&A" in a string..

RE: parsing "&A" in a string..

Comment