Strip CDATA with regex

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Balaras

    Strip CDATA with regex

    Hi,

    Can sombody here please help me a bit with a regular expression.
    I have a xml file where I need to strip the CDATA sections of any
    contained data.

    Eg.
    <xml>
    <tag><[CDATA[ some data ]]></tag>
    <tag><[CDATA[ some more data ]]></tag>
    </xml>

    Should end up like this:
    <xml>
    <tag><[CDATA[]]></tag>
    <tag><[CDATA[]]></tag>
    </xml>

    Now, I have the start and end of the range
    (\[CDATA\[)
    and
    (\]\]>)

    But I cannot figure out how I match any character that is not like the
    end of the range.

    That is > is ok, ] is ok
    but ]]> is not ok.

    Thanks in advance,
    Balaras
  • Martin Honnen

    #2
    Re: Strip CDATA with regex



    Balaras wrote:

    [color=blue]
    > Can sombody here please help me a bit with a regular expression.
    > I have a xml file where I need to strip the CDATA sections of any
    > contained data.
    >
    > Eg.
    > <xml>
    > <tag><[CDATA[ some data ]]></tag>[/color]
    It should be
    <![CDATA[[color=blue]
    > <tag><[CDATA[ some more data ]]></tag>
    > </xml>
    >
    > Should end up like this:
    > <xml>
    > <tag><[CDATA[]]></tag>
    > <tag><[CDATA[]]></tag>
    > </xml>[/color]

    How about parsing the XML into a DOM document and then manipulating
    those CDATA section nodes and serializing back, Mozilla example:

    var xmlMarkup = [
    '<xml>',
    '<tag><![CDATA[ some data ]]></tag>',
    '<tag><![CDATA[ some more data ]]></tag>',
    '</xml>'
    ].join('\r\n');

    var xmlDocument = new DOMParser().par seFromString(xm lMarkup,
    'application/xml');

    var tagElements = xmlDocument.get ElementsByTagNa me('tag');
    for (var i = 0; i < tagElements.len gth; i++) {
    var cdataSection = tagElements[i].firstChild;
    if (cdataSection.n odeType == 4) {
    cdataSection.da ta = '';
    }
    }

    var newXmlMarkup = new XMLSerializer() .serializeToStr ing(xmlDocument );

    That yields

    <xml>
    <tag><![CDATA[]]></tag>
    <tag><![CDATA[]]></tag>
    </xml>


    --

    Martin Honnen

    Comment

    • Balaras

      #3
      Re: Strip CDATA with regex

      Thanks Martin,

      Actually I posted this to c.l.javascript by accident, it was ment for a
      php group. I have to do some preprocessing before the xml is sent to the
      client.

      However your post helped me in another manner :)
      [color=blue]
      >
      > var newXmlMarkup = new XMLSerializer() .serializeToStr ing(xmlDocument );
      >[/color]

      I did not know about the XMLSerializer, and I need it :)

      Does IE have an equivallent or does a .innerHTML return valid xml ?

      /Balaras

      Comment

      • Martin Honnen

        #4
        Re: Strip CDATA with regex



        Balaras wrote:
        [color=blue][color=green]
        >>
        >> var newXmlMarkup = new XMLSerializer() .serializeToStr ing(xmlDocument );
        >>[/color]
        >
        > I did not know about the XMLSerializer, and I need it :)
        >
        > Does IE have an equivallent or does a .innerHTML return valid xml ?[/color]

        An XML DOM document (or any XML DOM node) with IE has a property named
        xml which gives you the serialized markup so with IE/MSXML you can use
        xmlDocument.xml
        to get the markup.

        --

        Martin Honnen

        Comment

        Working...