Regular expression to identify HTMLEncoded string

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Gabriela

    Regular expression to identify HTMLEncoded string

    Hi,
    I need help with writing a regexp that identifies HTML encoded
    strings.
    The problem occurred because I have a field in the DB, that contains
    regular ASCII chars, as well as HTMLencoded strings (e.g.:
    זאת לא).
    Is there a quick way to determine which strings are HTML encoded?
    Thanks,
    Gabi.
  • Evertjan.

    #2
    Re: Regular expression to identify HTMLEncoded string

    Gabriela wrote on 03 nov 2008 in microsoft.publi c.inetserver.as p.general:
    Hi,
    I need help with writing a regexp that identifies HTML encoded
    strings.
    The problem occurred because I have a field in the DB, that contains
    regular ASCII chars, as well as HTMLencoded strings (e.g.:
    זאת לא).
    These all look to me like regular ASCII chars,
    as there are no irregular ASCII chars.
    Is there a quick way to determine which strings are HTML encoded?
    var bolResult = /\&\d{4};/.test(str)

    perhaps?

    bd way, a javascript string is in unicode, and can contain non-ASCII chars.

    --
    Evertjan.
    The Netherlands.
    (Please change the x'es to dots in my emailaddress)

    Comment

    • Anthony Jones

      #3
      Re: Regular expression to identify HTMLEncoded string

      "Gabriela" <frohlinger@yah oo.comwrote in message
      news:4954b993-5b7b-4e47-a6fc-664decfedef5@40 g2000prx.google groups.com...
      Hi,
      I need help with writing a regexp that identifies HTML encoded
      strings.
      The problem occurred because I have a field in the DB, that contains
      regular ASCII chars, as well as HTMLencoded strings (e.g.:
      זאת לא).
      Is there a quick way to determine which strings are HTML encoded?
      Are you sure their not all HTML encoded? (That is, are there any that
      contain characters that would normally be encoded but have not been?).
      Do you know how they came to have this encoding?
      Are there any HTML specific entities such as &nbsp; or are they from the
      simple XML set.
      What is the DB fields data type?

      Why do you want to detect, is it because you want to convert the string
      back?

      If there are no HTML specific entities and its true that there are no values
      where character that would normally be encoded aren't, then:-

      Dim oXML : Set oXML = CreateObject("M SXML2.DOMDocume nt.3.0")
      oXML.LoadXML "<root>" & sFieldValue & "</root>"

      sDecoded = oXML.documentEl ement.text

      --
      Anthony Jones - MVP ASP/ASP.NET

      Comment

      Working...