removing tags from html file

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • litun
    New Member
    • Mar 2008
    • 12

    removing tags from html file

    when i am reading the html file i am getting da tags .after storing it as a text file also same tags are stored but when i want to extract a particular sentence i am getting da same tags in the result whereas i don't want them .how can i get my result without the tags.
    with regards
  • sukatoa
    Contributor
    • Nov 2007
    • 539

    #2
    Originally posted by litun
    when i am reading the html file i am getting da tags .after storing it as a text file also same tags are stored but when i want to extract a particular sentence i am getting da same tags in the result whereas i don't want them .how can i get my result without the tags.
    with regards
    Can you post the content in that text file?

    Update us,
    sukatoa

    Comment

    • nomad
      Recognized Expert Contributor
      • Mar 2007
      • 664

      #3
      Originally posted by litun
      when i am reading the html file i am getting da tags .after storing it as a text file also same tags are stored but when i want to extract a particular sentence i am getting da same tags in the result whereas i don't want them .how can i get my result without the tags.
      with regards
      copy and paste the contents in text application like notepad. You then can copy and paste that content back to your application that you are using.

      nomad

      Comment

      • litun
        New Member
        • Mar 2008
        • 12

        #4
        Originally posted by sukatoa
        Can you post the content in that text file?

        Update us,
        sukatoa
        <h1>this document contains information about me cse students. students are now doing their project.
        they are<I> working </I>in different individual project. generally the <U>document</U> is just like a progess report .</h>
        i want to get the word working which is in itatics and want 2 retieve the sentence containing that particular italic word.

        Comment

        • sukatoa
          Contributor
          • Nov 2007
          • 539

          #5
          Originally posted by litun
          <h1>this document contains information about me cse students. students are now doing their project.
          they are<I> working </I>in different individual project. generally the <U>document</U> is just like a progess report .</h>
          i want to get the word working which is in itatics and want 2 retieve the sentence containing that particular italic word.
          You may use split()... for example,
          <I> working </I>
          You may split the whole string with regex "<I>" and </I> ( twice to execute that function ).... and now it returns a splitted Strings ( Now Array of Strings ).....

          You can now search for "working".. .. that string should be in 2nd element of the String array..... and so on....


          Base on your example,


          Please correct me if im wrong,
          sukatoa

          Comment

          • JosAH
            Recognized Expert MVP
            • Mar 2007
            • 11453

            #6
            There's no need to read that file and fiddle with those tags yourself. Have a look
            at the HTMLEditorKit. It can create an HTMLDocument document
            for you. Given this document and a Reader the kit can produce the content
            in the document. The document can give you an iterator over a certain HTML.Tag
            You just want to iterate (i.e. get the content) of the <i> ... </i> tag. Read the API
            documentation for these classes and interfaces.

            kind regards,

            Jos

            Comment

            Working...