Previewing user input HTML

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Steve Swift

    Previewing user input HTML

    I have a page that accepts user input, including HTML. I would like to
    offer a preview of what the users HTML will look like, but I'd also like
    to avoid having to parse their HTML to ensure that it is valid.

    The sorts of things that cause problems are unmatched quotes inside the
    HTML and mismatched <>'s around the HTML. There are probably others
    (thus demonstrating why I need to avoid parsing it).

    The mismatched <>'s are not too difficult - I can add a ">" of my own,
    but then it will be visible.

    I realise we are into the land of handling invalid HTML, so all bets are
    off, but is there any good approach to such a problem?

    If I do end up parsing the users HTML, do I need to worry about more
    than mismatched <>'s and quotes (inside the <>'s). Remember, I don't
    actually care what it looks like, as long as it doesn't upset my own
    HTML which follows the preview.

    --
    Steve Swift


  • Ben C

    #2
    Re: Previewing user input HTML

    On 2008-09-30, Steve Swift <Steve.J.Swift@ gmail.comwrote:
    I have a page that accepts user input, including HTML. I would like to
    offer a preview of what the users HTML will look like, but I'd also like
    to avoid having to parse their HTML to ensure that it is valid.
    >
    The sorts of things that cause problems are unmatched quotes inside the
    HTML and mismatched <>'s around the HTML. There are probably others
    (thus demonstrating why I need to avoid parsing it).
    >
    The mismatched <>'s are not too difficult - I can add a ">" of my own,
    but then it will be visible.
    >
    I realise we are into the land of handling invalid HTML, so all bets are
    off, but is there any good approach to such a problem?
    >
    If I do end up parsing the users HTML, do I need to worry about more
    than mismatched <>'s and quotes (inside the <>'s). Remember, I don't
    actually care what it looks like, as long as it doesn't upset my own
    HTML which follows the preview.
    I think if you user innerHTML, your own HTML will probably be OK.

    The browser will parse their garbage to create a subtree for the element
    whose innerHTML you're setting, and then attach that subtree to your DOM
    tree. It won't paste their garbage into your HTML and parse the whole
    lot again.

    To be absolutely sure, you could parse their input before attaching it
    to your DOM tree.

    Something like:

    var div = document.create Element("div"); // unattached node
    div.innerHTML = userGarbage;

    Then use appendChild to attach the div into your DOM tree.

    But I don't think that will be necessary.

    Comment

    • Ben Bacarisse

      #3
      Re: Previewing user input HTML

      Steve Swift <Steve.J.Swift@ gmail.comwrites :
      I have a page that accepts user input, including HTML. I would like to
      offer a preview of what the users HTML will look like, but I'd also
      like to avoid having to parse their HTML to ensure that it is valid.
      <snip>
      >... Remember, I don't
      actually care what it looks like, as long as it doesn't upset my own
      HTML which follows the preview.
      Can you side-step the problem by keeping the user HTML separate and
      displaying it using an <objectelemen t?

      --
      Ben.

      Comment

      • David Stone

        #4
        Re: Previewing user input HTML

        In article <slrnge3n19.3s0 .spamspam@bowse r.marioworld>,
        Ben C <spamspam@spam. eggswrote:
        On 2008-09-30, Steve Swift <Steve.J.Swift@ gmail.comwrote:
        I have a page that accepts user input, including HTML. I would like to
        offer a preview of what the users HTML will look like, but I'd also like
        to avoid having to parse their HTML to ensure that it is valid.

        The sorts of things that cause problems are unmatched quotes inside the
        HTML and mismatched <>'s around the HTML. There are probably others
        (thus demonstrating why I need to avoid parsing it).

        The mismatched <>'s are not too difficult - I can add a ">" of my own,
        but then it will be visible.

        I realise we are into the land of handling invalid HTML, so all bets are
        off, but is there any good approach to such a problem?

        If I do end up parsing the users HTML, do I need to worry about more
        than mismatched <>'s and quotes (inside the <>'s). Remember, I don't
        actually care what it looks like, as long as it doesn't upset my own
        HTML which follows the preview.
        >
        I think if you user innerHTML, your own HTML will probably be OK.
        >
        The browser will parse their garbage to create a subtree for the element
        whose innerHTML you're setting, and then attach that subtree to your DOM
        tree. It won't paste their garbage into your HTML and parse the whole
        lot again.
        >
        To be absolutely sure, you could parse their input before attaching it
        to your DOM tree.
        >
        Something like:
        >
        var div = document.create Element("div"); // unattached node
        div.innerHTML = userGarbage;
        >
        Then use appendChild to attach the div into your DOM tree.
        >
        But I don't think that will be necessary.
        I don't know about that, but it seems to me that you will need to run the
        user-provided html through something first, just to ensure that no
        malicious code has been inserted that could pose a security risk. I
        believe the perl CGI module has a function or functions you can use to
        do this, and I would be willing to bet you can find equivalent JS tools.

        Which leads to the thought that, since you're going to have to
        pre-process the user html anyway, maybe you could also pipe it through
        something like htmlTidy (I think that's it's name)?

        Comment

        • Ben C

          #5
          Re: Previewing user input HTML

          On 2008-09-30, David Stone <no.email@domai n.invalidwrote:
          In article <slrnge3n19.3s0 .spamspam@bowse r.marioworld>,
          Ben C <spamspam@spam. eggswrote:
          >
          >On 2008-09-30, Steve Swift <Steve.J.Swift@ gmail.comwrote:
          I have a page that accepts user input, including HTML. I would like to
          offer a preview of what the users HTML will look like, but I'd also like
          to avoid having to parse their HTML to ensure that it is valid.
          >
          The sorts of things that cause problems are unmatched quotes inside the
          HTML and mismatched <>'s around the HTML. There are probably others
          (thus demonstrating why I need to avoid parsing it).
          >
          The mismatched <>'s are not too difficult - I can add a ">" of my own,
          but then it will be visible.
          >
          I realise we are into the land of handling invalid HTML, so all bets are
          off, but is there any good approach to such a problem?
          >
          If I do end up parsing the users HTML, do I need to worry about more
          than mismatched <>'s and quotes (inside the <>'s). Remember, I don't
          actually care what it looks like, as long as it doesn't upset my own
          HTML which follows the preview.
          >>
          >I think if you user innerHTML, your own HTML will probably be OK.
          >>
          >The browser will parse their garbage to create a subtree for the element
          >whose innerHTML you're setting, and then attach that subtree to your DOM
          >tree. It won't paste their garbage into your HTML and parse the whole
          >lot again.
          >>
          >To be absolutely sure, you could parse their input before attaching it
          >to your DOM tree.
          >>
          >Something like:
          >>
          > var div = document.create Element("div"); // unattached node
          > div.innerHTML = userGarbage;
          >>
          >Then use appendChild to attach the div into your DOM tree.
          >>
          >But I don't think that will be necessary.
          >
          I don't know about that, but it seems to me that you will need to run the
          user-provided html through something first, just to ensure that no
          malicious code has been inserted that could pose a security risk. I
          believe the perl CGI module has a function or functions you can use to
          do this, and I would be willing to bet you can find equivalent JS tools.
          >
          Which leads to the thought that, since you're going to have to
          pre-process the user html anyway, maybe you could also pipe it through
          something like htmlTidy (I think that's it's name)?
          I think we're thinking about different things. Perhaps because he used
          the word "preview" I got it into my head that this user HTML was not
          going back to the server (wiki style) but being added to the page there
          and then with JS on the client.

          The idea of innerHTML is that you're using the browser's own normal
          broken HTML handling to deal with things, and it's basically all you've
          got on the client.

          But it's much more likely that the data is going back to the server, in
          which case yes you could run it through tidy and other checkers like
          that easily.

          Comment

          Working...