get inner content with regular expression

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Sami

    get inner content with regular expression

    How can I get the inner content of a tag with regular expression

    I couldn't the the opening and closing tags to match properly

    Input
    "fjkdjfkdj <div>sadfdf dfdf <b>dfd</b>dfdf<div>nest ed<div>tags</div>.</div>
    </div>dfdfdf"

    Get content of the first div tag

    Output
    "sadfdf dfdf <b>dfd</b>dfdf<div>nest ed<div>tags</div>.</div"


    thank you
    Sami

  • maximz2005

    #2
    Re: get inner content with regular expression

    On Aug 24, 3:17 pm, "Sami" <sam...@ymail.c omwrote:
    How can I get the inner content of a tag with regular expression
    >
    I couldn't the the opening and closing tags to match properly
    >
    Input
    "fjkdjfkdj <div>sadfdf dfdf <b>dfd</b>dfdf<div>nest ed<div>tags</div>.</div>
    </div>dfdfdf"
    >
    Get content of the first div tag
    >
    Output
    "sadfdf dfdf <b>dfd</b>dfdf<div>nest ed<div>tags</div>.</div"
    >
    thank you
    Sami
    There is no way to do it using regular expressions other than
    hardcoding it as there are numerous <divand </divtags in the main
    <divtag. In the program I'm building right now, I use regex to find
    the content between two tags in html (if it were xml it would be much
    easier!), but i don't have multiple tags with the same name.
    Now, if your content is xml (it can be html but it must be well-
    formed), there is a much easier approach. You just read it as an xml
    document and you search for the correct tag node. Very simple. (To see
    if your html fits, google well-formed html checker).

    Comment

    • Jesse Houwing

      #3
      Re: get inner content with regular expression

      Hello maximz2005,
      On Aug 24, 3:17 pm, "Sami" <sam...@ymail.c omwrote:
      >
      >How can I get the inner content of a tag with regular expression
      >>
      >I couldn't the the opening and closing tags to match properly
      >>
      >Input
      >"fjkdjfkdj <div>sadfdf dfdf
      ><b>dfd</b>dfdf<div>nest ed<div>tags</div>.</div>
      ></div>dfdfdf"
      >Get content of the first div tag
      >>
      >Output
      >"sadfdf dfdf <b>dfd</b>dfdf<div>nest ed<div>tags</div>.</div"
      >thank you
      >Sami
      There is no way to do it using regular expressions other than
      hardcoding it as there are numerous <divand </divtags in the main
      <divtag. In the program I'm building right now, I use regex to find
      the content between two tags in html (if it were xml it would be much
      easier!), but i don't have multiple tags with the same name.
      Now, if your content is xml (it can be html but it must be well-
      formed), there is a much easier approach. You just read it as an xml
      document and you search for the correct tag node. Very simple. (To see
      if your html fits, google well-formed html checker).
      You can use the HTMLAgility pack (on codeplex) to rea the HTML as it were
      XML and you could easily get the contents you wanted. You can also use regex
      for this, though you'd end up the the more advanced constructs (the hardest
      to understand ones) like the balanced group sets. (more info here: http://blogs.msdn.com/bclteam/archiv...15/396452.aspx)

      --
      Jesse Houwing
      jesse.houwing at sogeti.nl


      Comment

      Working...