Unicode to HTML entities

**Richard Brodie** · May 29 '07, 04:05 PM

Re: Unicode to HTML entities

"Clodoaldo" <clodoaldo.pint o@gmail.comwrot e in message
news:1180453921 .357081.89500@n 15g2000prd.goog legroups.com...

>I was looking for a function to transform a unicode string into
>htmlentities .

>>u'São Paulo'.encode(' ascii', 'xmlcharrefrepl ace')

'São Paulo'

**Clodoaldo** · May 29 '07, 04:15 PM

Re: Unicode to HTML entities

On May 29, 12:57 pm, "Richard Brodie" <R.Bro...@rl.ac .ukwrote:

"Clodoaldo" <clodoaldo.pi.. .@gmail.comwrot e in message
>
news:1180453921 .357081.89500@n 15g2000prd.goog legroups.com...
>

I was looking for a function to transform a unicode string into
htmlentities.

>u'São Paulo'.encode(' ascii', 'xmlcharrefrepl ace')

>
'São Paulo'

That was a fast answer. I would never find that myself.

Thanks, Clodoaldo

**Duncan Booth** · May 30 '07, 07:35 AM

Re: Unicode to HTML entities

Clodoaldo <clodoaldo.pint o@gmail.comwrot e:

On May 29, 12:57 pm, "Richard Brodie" <R.Bro...@rl.ac .ukwrote:

>"Clodoaldo" <clodoaldo.pi.. .@gmail.comwrot e in message
>>
>news:118045392 1.357081.89500@ n15g2000prd.goo glegroups.com.. .
>>

>I was looking for a function to transform a unicode string into
>htmlentities .
>>u'São Paulo'.encode(' ascii', 'xmlcharrefrepl ace')

>>
>'São Paulo'

>
That was a fast answer. I would never find that myself.
>

You might actually want:

>>cgi.escape(u' São Paulo & Espírito Santo').encode( 'ascii', 'xmlcharrefrepl ace')

'São Paulo & Espírito Santo'

as you have to be sure to escape any ampersands in your unicode
string before doing the encode.

**Tommy Nordgren** · May 30 '07, 11:55 AM

Re: Unicode to HTML entities

On 29 maj 2007, at 17.52, Clodoaldo wrote:

I was looking for a function to transform a unicode string into
htmlentities. Not only the usual html escaping thing but all
characters.
>
As I didn't find I wrote my own:
>
# -*- coding: utf-8 -*-
from htmlentitydefs import codepoint2name
>
def unicode2htmlent ities(u):
>
htmlentities = list()
>
for c in u:
if ord(c) < 128:
htmlentities.ap pend(c)
else:
htmlentities.ap pend('&%s;' % codepoint2name[ord(c)])
>
return ''.join(htmlent ities)
>
print unicode2htmlent ities(u'São Paulo')
>
Is there a function like that in one of python builtin modules? If not
is there a better way to do it?
>
Regards, Clodoaldo Pinto Neto
>

In many cases, the need to use html/xhtml entities can be avoided by
generating
utf8- coded pages.
------------------------------------------------------
"Home is not where you are born, but where your heart finds peace" -
Tommy Nordgren, "The dying old crone"
tommy.nordgren@ comhem.se

**Clodoaldo** · May 30 '07, 12:55 PM

Re: Unicode to HTML entities

On May 30, 8:53 am, Tommy Nordgren <tommy.nordg... @comhem.sewrote :

On 29 maj 2007, at 17.52, Clodoaldo wrote:
>
>
>

I was looking for a function to transform a unicode string into
htmlentities. Not only the usual html escaping thing but all
characters.

>

As I didn't find I wrote my own:

>

# -*- coding: utf-8 -*-
from htmlentitydefs import codepoint2name

>

def unicode2htmlent ities(u):

>

htmlentities = list()

>

for c in u:
if ord(c) < 128:
htmlentities.ap pend(c)
else:
htmlentities.ap pend('&%s;' % codepoint2name[ord(c)])

>

return ''.join(htmlent ities)

>

print unicode2htmlent ities(u'São Paulo')

>

Is there a function like that in one of python builtin modules? If not
is there a better way to do it?

>

Regards, Clodoaldo Pinto Neto

>
In many cases, the need to use html/xhtml entities can be avoidedby
generating
utf8- coded pages.

Sure. All my pages are utf-8 encoded. The case I'm dealing with is an
email link which subject has non ascii characters like in:

<a href=mailto:exa mple@sample.com ?subject=Dúvida s>Mail to</a>

Somehow when the user clicks on the link the subject goes to his email
client with the non ascii chars as garbage.

And before someone points that I should not expose email addresses,
the email is only linked with the consent of the owner and the source
is obfuscated to make it harder for a robot to harvest it.

Regards, Clodoaldo

**Clodoaldo** · May 30 '07, 12:55 PM

Re: Unicode to HTML entities

On May 30, 4:25 am, Duncan Booth <duncan.bo...@i nvalid.invalidw rote:

Clodoaldo <clodoaldo.pi.. .@gmail.comwrot e:

On May 29, 12:57 pm, "Richard Brodie" <R.Bro...@rl.ac .ukwrote:

"Clodoaldo" <clodoaldo.pi.. .@gmail.comwrot e in message

>

>news:118045392 1.357081.89500@ n15g2000prd.goo glegroups.com.. .

>

I was looking for a function to transform a unicode string into
htmlentities.
>u'São Paulo'.encode(' ascii', 'xmlcharrefrepl ace')

>

'São Paulo'

>

That was a fast answer. I would never find that myself.

>
You might actually want:
>

>cgi.escape(u'S ão Paulo & Espírito Santo').encode( 'ascii', 'xmlcharrefrepl ace')

>
'São Paulo & Espírito Santo'
>
as you have to be sure to escape any ampersands in your unicode
string before doing the encode.

I will do it. Thanks.

Regards, Clodoaldo.

Unicode to HTML entities

Unicode to HTML entities

Comment

Comment

Comment

Comment

Comment

Comment