Unicode to HTML entities

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Clodoaldo

    Unicode to HTML entities

    I was looking for a function to transform a unicode string into
    htmlentities. Not only the usual html escaping thing but all
    characters.

    As I didn't find I wrote my own:

    # -*- coding: utf-8 -*-
    from htmlentitydefs import codepoint2name

    def unicode2htmlent ities(u):

    htmlentities = list()

    for c in u:
    if ord(c) < 128:
    htmlentities.ap pend(c)
    else:
    htmlentities.ap pend('&%s;' % codepoint2name[ord(c)])

    return ''.join(htmlent ities)

    print unicode2htmlent ities(u'São Paulo')

    Is there a function like that in one of python builtin modules? If not
    is there a better way to do it?

    Regards, Clodoaldo Pinto Neto

  • Richard Brodie

    #2
    Re: Unicode to HTML entities


    "Clodoaldo" <clodoaldo.pint o@gmail.comwrot e in message
    news:1180453921 .357081.89500@n 15g2000prd.goog legroups.com...
    >I was looking for a function to transform a unicode string into
    >htmlentities .
    >>u'São Paulo'.encode(' ascii', 'xmlcharrefrepl ace')
    'S&#227;o Paulo'


    Comment

    • Clodoaldo

      #3
      Re: Unicode to HTML entities

      On May 29, 12:57 pm, "Richard Brodie" <R.Bro...@rl.ac .ukwrote:
      "Clodoaldo" <clodoaldo.pi.. .@gmail.comwrot e in message
      >
      news:1180453921 .357081.89500@n 15g2000prd.goog legroups.com...
      >
      I was looking for a function to transform a unicode string into
      htmlentities.
      >u'São Paulo'.encode(' ascii', 'xmlcharrefrepl ace')
      >
      'S&#227;o Paulo'
      That was a fast answer. I would never find that myself.

      Thanks, Clodoaldo

      Comment

      • Duncan Booth

        #4
        Re: Unicode to HTML entities

        Clodoaldo <clodoaldo.pint o@gmail.comwrot e:
        On May 29, 12:57 pm, "Richard Brodie" <R.Bro...@rl.ac .ukwrote:
        >"Clodoaldo" <clodoaldo.pi.. .@gmail.comwrot e in message
        >>
        >news:118045392 1.357081.89500@ n15g2000prd.goo glegroups.com.. .
        >>
        >I was looking for a function to transform a unicode string into
        >htmlentities .
        >>u'São Paulo'.encode(' ascii', 'xmlcharrefrepl ace')
        >>
        >'S&#227;o Paulo'
        >
        That was a fast answer. I would never find that myself.
        >
        You might actually want:
        >>cgi.escape(u' São Paulo & Espírito Santo').encode( 'ascii', 'xmlcharrefrepl ace')
        'S&#227;o Paulo &amp; Esp&#237;rito Santo'

        as you have to be sure to escape any ampersands in your unicode
        string before doing the encode.

        Comment

        • Tommy Nordgren

          #5
          Re: Unicode to HTML entities


          On 29 maj 2007, at 17.52, Clodoaldo wrote:
          I was looking for a function to transform a unicode string into
          htmlentities. Not only the usual html escaping thing but all
          characters.
          >
          As I didn't find I wrote my own:
          >
          # -*- coding: utf-8 -*-
          from htmlentitydefs import codepoint2name
          >
          def unicode2htmlent ities(u):
          >
          htmlentities = list()
          >
          for c in u:
          if ord(c) < 128:
          htmlentities.ap pend(c)
          else:
          htmlentities.ap pend('&%s;' % codepoint2name[ord(c)])
          >
          return ''.join(htmlent ities)
          >
          print unicode2htmlent ities(u'São Paulo')
          >
          Is there a function like that in one of python builtin modules? If not
          is there a better way to do it?
          >
          Regards, Clodoaldo Pinto Neto
          >
          In many cases, the need to use html/xhtml entities can be avoided by
          generating
          utf8- coded pages.
          ------------------------------------------------------
          "Home is not where you are born, but where your heart finds peace" -
          Tommy Nordgren, "The dying old crone"
          tommy.nordgren@ comhem.se


          Comment

          • Clodoaldo

            #6
            Re: Unicode to HTML entities

            On May 30, 8:53 am, Tommy Nordgren <tommy.nordg... @comhem.sewrote :
            On 29 maj 2007, at 17.52, Clodoaldo wrote:
            >
            >
            >
            I was looking for a function to transform a unicode string into
            htmlentities. Not only the usual html escaping thing but all
            characters.
            >
            As I didn't find I wrote my own:
            >
            # -*- coding: utf-8 -*-
            from htmlentitydefs import codepoint2name
            >
            def unicode2htmlent ities(u):
            >
            htmlentities = list()
            >
            for c in u:
            if ord(c) < 128:
            htmlentities.ap pend(c)
            else:
            htmlentities.ap pend('&%s;' % codepoint2name[ord(c)])
            >
            return ''.join(htmlent ities)
            >
            print unicode2htmlent ities(u'São Paulo')
            >
            Is there a function like that in one of python builtin modules? If not
            is there a better way to do it?
            >
            Regards, Clodoaldo Pinto Neto
            >
            In many cases, the need to use html/xhtml entities can be avoidedby
            generating
            utf8- coded pages.
            Sure. All my pages are utf-8 encoded. The case I'm dealing with is an
            email link which subject has non ascii characters like in:

            <a href=mailto:exa mple@sample.com ?subject=Dúvida s>Mail to</a>

            Somehow when the user clicks on the link the subject goes to his email
            client with the non ascii chars as garbage.

            And before someone points that I should not expose email addresses,
            the email is only linked with the consent of the owner and the source
            is obfuscated to make it harder for a robot to harvest it.

            Regards, Clodoaldo

            Comment

            • Clodoaldo

              #7
              Re: Unicode to HTML entities

              On May 30, 4:25 am, Duncan Booth <duncan.bo...@i nvalid.invalidw rote:
              Clodoaldo <clodoaldo.pi.. .@gmail.comwrot e:
              On May 29, 12:57 pm, "Richard Brodie" <R.Bro...@rl.ac .ukwrote:
              "Clodoaldo" <clodoaldo.pi.. .@gmail.comwrot e in message
              >
              >news:118045392 1.357081.89500@ n15g2000prd.goo glegroups.com.. .
              >
              I was looking for a function to transform a unicode string into
              htmlentities.
              >u'São Paulo'.encode(' ascii', 'xmlcharrefrepl ace')
              >
              'S&#227;o Paulo'
              >
              That was a fast answer. I would never find that myself.
              >
              You might actually want:
              >
              >cgi.escape(u'S ão Paulo & Espírito Santo').encode( 'ascii', 'xmlcharrefrepl ace')
              >
              'S&#227;o Paulo &amp; Esp&#237;rito Santo'
              >
              as you have to be sure to escape any ampersands in your unicode
              string before doing the encode.
              I will do it. Thanks.

              Regards, Clodoaldo.

              Comment

              Working...