HELP !!! (Capture text between html tags)

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • zeny
    New Member
    • Jul 2006
    • 44

    HELP !!! (Capture text between html tags)

    Hey ppl,

    How can we capture text between html tags using regular expressions? For example, how to capture the words "hello", "world", "bla", "bla" and "bla" in the following input:

    <br><i>hello world <br><br> bla bla bla <br>

    Best Regards
  • blazedaces
    Contributor
    • May 2007
    • 284

    #2
    Parser may exist already for this kind of thing, but why don't you just parse it yourself? Read in the text one character at a time. When you get to a ">" you see what comes up next. If it's "<" you do nothing, otherwise you keep it and store it (capture it?) Basically, if there's text between ">" and "<" you extract it. Otherwise, you keep going.

    Hope this helped,
    -blazed

    Edit: you might also want to do something to ignore whitespace-only text in case someone goes <br> <br>Something </br></br> ...

    Comment

    • JosAH
      Recognized Expert MVP
      • Mar 2007
      • 11453

      #3
      Java has an entire framework implemented for manipulating HTML so why do it
      all yourself? Create an HTMLEditorKit and an HTMLDocument. Make
      the kit read data into the document given a simple Reader. When the content
      is loaded create an HTML.DocumentIt erator using the document. The iterator
      needs an HTML.Tag to iterate over; the iterator delivers the text between
      the tags.

      kind regards,

      Jos

      Comment

      • jx2
        New Member
        • Feb 2007
        • 228

        #4
        I would use regular expresion
        Code:
        String [] yourArray; // an array of the captured text 
        yourArray = yourString.split("<[^>]*>");
        unfortunately there is no implode method so if you need it to be one string you have to connect those parts together using i.e. StringBuilder

        good luck with your project
        jan jarczyk

        Comment

        Working...