XML string parsing containing HTML content

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • gagandeepgupta16
    New Member
    • Feb 2007
    • 56

    XML string parsing containing HTML content

    I am having problem in parsing a string containing HTML Tags.

    The situation is somewhat similar to the following as quoted in some other forum :


    if (typeof DOMParser == "undefined" ) {
    DOMParser = function () {}
    DOMParser.proto type.parseFromS tring = function (str, contentType) {
    if (typeof ActiveXObject != "undefined" ) {
    var d = new ActiveXObject(" Microsoft.XMLDO M");
    d.async="false" ;
    d.loadXML(str);
    return d;
    } else if (typeof XMLHttpRequest != "undefined" ) {
    var req = new XMLHttpRequest;
    req.open("GET", "data:" + (contentType || "applicatio n/xml") +";charset=u tf-8," + encodeURICompon ent(str), false);
    if (req.overrideMi meType) {
    req.overrideMim eType(contentTy pe);
    }
    req.send(null);
    return req.responseXML ;
    }
    }
    }
    var xml = (new DOMParser()).pa rseFromString(t ext, "text/xml");


    in the code of mine similar to the above i am also not getting the parsed value, its coming to be null.

    Any suggestions would be appreciated.

    thanks
  • jkmyoung
    Recognized Expert Top Contributor
    • Mar 2006
    • 2057

    #2
    My suggestion is to break it down step by step and figure out where it is failing.

    1. Is the Xml string parsed correctly?
    2. Is the URL that you are connecting to working?
    3. Are you connecting to the url correctly?
    5. Are you sending the correct output?
    4. Is the Request returning anything?

    Comment

    • gagandeepgupta16
      New Member
      • Feb 2007
      • 56

      #3
      Hi jkmyoung

      Thanks for the reply.
      i did tried breaking the text into multiple lines, and i also added syntax returning the cause - parseerror.reas on which is returning "character '>' is expected".

      Actually the text is containing the HTML source code including the "<!Doctype. .." tag.

      in the same tag there are two strings seperated with double inverted commas something like

      <!Doctype... "http://..." "somet text" >
      and the error position is in middle of two strings, spoiling the valid xml format.

      but this is what i need as input. Is there a way i can skip the format validation, or convert it into single element of xml?

      Thanks
      Originally posted by jkmyoung
      My suggestion is to break it down step by step and figure out where it is failing.

      1. Is the Xml string parsed correctly?
      2. Is the URL that you are connecting to working?
      3. Are you connecting to the url correctly?
      5. Are you sending the correct output?
      4. Is the Request returning anything?

      Comment

      • jkmyoung
        Recognized Expert Top Contributor
        • Mar 2006
        • 2057

        #4
        So the doctype is something like: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> ??

        You could get the string output and remove the line declaring the doctype, since the parser seems to be having a problem with it. Then load it into an xml object.

        Or the problem could also be that the doctype is not formatted correctly.

        Comment

        Working...