Regex to replace invalid XML string

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • girishk
    New Member
    • Mar 2008
    • 1

    Regex to replace invalid XML string

    I am getting an XML rss feed but i am finding invalid html tags in it. Basically the string which i receive is as below:

    <channel>
    <item>
    <category></category>
    <link>www.googl e.com</link>
    <title>Google Home Page</title>
    <description>Th is is a google home page</description>
    <pubDate>Thu, 27 Mar 2008</pubDate>
    </item>

    <item>
    <category></category>
    <link>www.msn.c om</link>
    <title>Microsof t Home Page</title>
    <description>Th is is microsoft home page <hl2 </description>
    <pubDate>Thu, 27 Mar 2008</pubDate>
    </item>
    </channel>

    Note the occurance of the tag like character <hl2 in teh description TAG. Is there any regular expression out there to search for '<' chars inside the description tags. I mean i should be able to check for this tag ie '<' in all description tags in the XML string and replace that with string.Empty.

    I would need the code in a .NET language.

    Any help would be greatly appreciated.

    Girish.
  • kenobewan
    Recognized Expert Specialist
    • Dec 2006
    • 4871

    #2
    My first action would be to complain, maybe your not the only one facing this problem. Then for existing files maybe a xml validator. HTH.

    Comment

    Working...