how to retrive title from a document written in html

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • litun
    New Member
    • Mar 2008
    • 12

    how to retrive title from a document written in html

    hi
    how can we retieve the title from a document having extension as .htm?we need 2 consuder the title tags and retrieve it , the tags like<Title>Dict ionary</Title>,then we need to retieve "Dictionary " and store it in a String or string array.
    plz help me
    thanks
  • r035198x
    MVP
    • Sep 2006
    • 13225

    #2
    Originally posted by litun
    hi
    how can we retieve the title from a document having extension as .htm?we need 2 consuder the title tags and retrieve it , the tags like<Title>Dict ionary</Title>,then we need to retieve "Dictionary " and store it in a String or string array.
    plz help me
    thanks
    And what have you done so far?

    Comment

    • litun
      New Member
      • Mar 2008
      • 12

      #3
      Originally posted by r035198x
      And what have you done so far?
      till now i have read a html files .but the problem is i am not able 2 retrieve the title of that particular text.Suppose the title is "Dictionary " of the particular document i want 2 read and i want 2 get the title "dictionary " as my output. i know in html format,the title is recognised frm the tags<head><titl e>Dictionary</title>so i want to retrieve the word "dictionary " which should be my output.plzz help me
      thanks

      Comment

      • r035198x
        MVP
        • Sep 2006
        • 13225

        #4
        Originally posted by litun
        till now i have read a html files .but the problem is i am not able 2 retrieve the title of that particular text.Suppose the title is "Dictionary " of the particular document i want 2 read and i want 2 get the title "dictionary " as my output. i know in html format,the title is recognised frm the tags<head><titl e>Dictionary</title>so i want to retrieve the word "dictionary " which should be my output.plzz help me
        thanks
        How have you read the HTML file?
        Can you post the code you have used?

        Comment

        • litun
          New Member
          • Mar 2008
          • 12

          #5
          Originally posted by r035198x
          How have you read the HTML file?
          Can you post the code you have used?
          ya sure.u have to use
          [CODE]
          FileInputStream fis=new FileInputStream ("C:\\test.html ");
          [/CODE]
          then we can read using same code which we use for .txt files.
          bye

          Comment

          • pronerd
            Recognized Expert Contributor
            • Nov 2006
            • 392

            #6
            Originally posted by litun
            ya sure.u have to use
            [CODE]
            FileInputStream fis=new FileInputStream ("C:\\test.html ");
            [/CODE]
            then we can read using same code which we use for .txt files.
            bye
            The purpose of forums is generally to help with specific issues, not to write your code for you. Google can provide literally 1000's of examples or reading though files if you would just put forth the effort to look.

            Finding the title tags is just a matter of using the String classes .indexOf() and .subString() methods. You would need to add some extra logic to account for case sensitive situations.

            Code:
                String lineFromFile = "  blah blah Blah  <Title>Some Title</Title>";
            
                int startingPoint = lineFromFile.indexOf("<Title>");
            
                int endingPoint = lineFromFile.indexOf("</Title>");
            
                String pageTitle = lineFromFile.subString( (startingPoint +7) , ( endingPoint -1 ) );

            Comment

            • pronerd
              Recognized Expert Contributor
              • Nov 2006
              • 392

              #7
              Originally posted by litun
              ya sure.u have to use
              [CODE]
              FileInputStream fis=new FileInputStream ("C:\\test.html ");
              [/CODE]
              then we can read using same code which we use for .txt files.
              bye
              The purpose of forums is generally to help with specific issues, not to write your code for you. Google can provide literally 1000's of examples or reading though files if you would just put forth the effort to look.

              Finding the title tags is just a matter of using the String classes .indexOf() and .subString() methods. You would need to add some extra logic to account for case sensitive situations.

              Code:
                  String lineFromFile = "  blah blah Blah  <Title>Some Title</Title>";
              
                  int startingPoint = lineFromFile.indexOf("<Title>");
              
                  int endingPoint = lineFromFile.indexOf("</Title>");
              
                  String pageTitle = lineFromFile.subString( (startingPoint +7) , ( endingPoint -1 ) );

              Comment

              • hsn
                New Member
                • Sep 2007
                • 237

                #8
                man bytes.com is amaizing i learned so much from this website and hopefuly more.

                would this method that pronerd showed work for xml too.

                another question . if i used normal strings to read from the html page and there was a picture what will happen when i read the picture and store it in the string??????

                thanks

                hsn

                Comment

                • pronerd
                  Recognized Expert Contributor
                  • Nov 2006
                  • 392

                  #9
                  Originally posted by hsn
                  would this method that pronerd showed work for xml too.
                  Yes. Any thing in a String object can be parsed that way. There are specific API's for parsing XML though that may work better. You can Google for examples of the DOM and SAX APIs. One catch though, you can not assume that HTML will actually be written according to XML syntax standards, so those API's would not be a good option for HTML files.




                  Originally posted by hsn
                  if i used normal strings to read from the html page and there was a picture what will happen when i read the picture and store it in the string??????
                  I am not sure what you mean. If you are talking about reading in the text of the image tag it would be written out the same way it was written in. If you are talking about trying to read the acuatll binarry characters from an image file that would not work. First off I am not even sure how you would do that. Secondly Strings can only hold alpha-numeric data. i.e. numbers, letters, and punchation. It can not hold binary data.

                  If you want to manipulate or modify image files you should look into using the JAI (Java Advanced Image) API.

                  Comment

                  • hsn
                    New Member
                    • Sep 2007
                    • 237

                    #10
                    Originally posted by pronerd
                    Yes. Any thing in a String object can be parsed that way. There are specific API's for parsing XML though that may work better. You can Google for examples of the DOM and SAX APIs. One catch though, you can not assume that HTML will actually be written according to XML syntax standards, so those API's would not be a good option for HTML files.





                    I am not sure what you mean. If you are talking about reading in the text of the image tag it would be written out the same way it was written in. If you are talking about trying to read the acuatll binarry characters from an image file that would not work. First off I am not even sure how you would do that. Secondly Strings can only hold alpha-numeric data. i.e. numbers, letters, and punchation. It can not hold binary data.

                    If you want to manipulate or modify image files you should look into using the JAI (Java Advanced Image) API.
                    Good to know that
                    thanks alot m8

                    regards
                    hsn

                    Comment

                    Working...