html dom representation

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Merin Lalu
    New Member
    • Oct 2011
    • 20

    html dom representation

    can any one help me??

    I have an html document. i need the java code for the dom tree representation of that document. if i got the dom tree representation i can proceed with the next step of finding the path of each node

    please help me/...
  • Frinavale
    Recognized Expert Expert
    • Oct 2006
    • 9749

    #2
    Have you considered using the following classes to manage DOM manipulation with Java?
    • DocumentBuilder Factory
    • DocumentBuilder
    • Document


    They are in the javax.xml.parse rs package.

    -Frinny

    Comment

    • Merin Lalu
      New Member
      • Oct 2011
      • 20

      #3
      thank u for ur reply..

      Actually i want to do it using htmlparser1.6 library in java.

      i have an html document.. i got the dom tree representation of the document using the following code. but i need the path of each node of the dom tree.

      the code is as follows..
      Code:
      import java.io.Serializable;
      import java.net.HttpURLConnection;
      import java.net.URLConnection;
      
      import org.htmlparser.Parser;
      import org.htmlparser.Node;
      import org.htmlparser.NodeFilter;
      import org.htmlparser.filters.TagNameFilter;
      import org.htmlparser.filters.NodeClassFilter;
      import org.htmlparser.http.ConnectionManager;
      import org.htmlparser.http.ConnectionMonitor;
      import org.htmlparser.http.HttpHeader;
      import org.htmlparser.lexer.Lexer;
      import org.htmlparser.lexer.Page;
      import org.htmlparser.util.DefaultParserFeedback;
      import org.htmlparser.util.IteratorImpl;
      import org.htmlparser.util.NodeIterator;
      import org.htmlparser.util.NodeList;
      import org.htmlparser.util.ParserException;
      import org.htmlparser.util.ParserFeedback;
      import org.htmlparser.util.EncodingChangeException;
      import org.htmlparser.visitors.NodeVisitor;
      import org.htmlparser.tags.*;
      import org.htmlparser.nodes.*;
      import org.htmlparser.Tag;
      import org.htmlparser.Text;
      
      public class SimpleParser2 {
       static String str="";
          public static void main (String [] args)throws ParserException{
              Parser parser = null;
              NodeFilter filter = null;
      
            
             
      
              if (args.length < 1 || args[0].equals ("-help")) {
      	    System.out.println ("HTML Parser v" + Parser.getVersion () + "\n");
      	
      	}
              else
                  try {
      		parser = new Parser ();
      		if (1 < args.length)
      		    filter = new TagNameFilter (args[1]);
      		else
      		    {
      			filter = null;
      			parser.setFeedback (Parser.STDOUT);
      			Parser.getConnectionManager ().setMonitor (parser);
      		    }
      		
      		parser.setResource (args[0]);
      		NodeList list = parser.parse(filter);
      		 
                      		for (NodeIterator i = list.elements (); i.hasMoreNodes (); )
                      		    processMyNodes (i.nextNode ());
      
                         }
                  catch (EncodingChangeException ece) {
      		try {
      
      		    parser.reset ();
      		    NodeList list = parser.parse(filter);
      		    for (NodeIterator i = list.elements (); i.hasMoreNodes (); )
      			processMyNodes (i.nextNode ());
      		}
      		catch (ParserException e) {
      		    e.printStackTrace ();
      		}
                  }
                  catch (ParserException e) {
                      e.printStackTrace ();
                  }
          }
          static void processMyNodes (Node node) throws ParserException{
          
              if (node instanceof TextNode)
              {
                  TextNode text = (TextNode)node;
      	str=str+text.getText();
      
      	//System.out.println("Tree Nodes"+str);
              }
              if (node instanceof RemarkNode)
              {          
                RemarkNode remark = (RemarkNode)node;
              }
              else if (node instanceof TagNode)
              {        
                  TagNode tag = (TagNode)node;
      
               	str=str+tag.getTagName();       
                     //   System.out.println("Tree Nodes"+str);
      
                  NodeList nl = tag.getChildren ();
                  if (null != nl)
                      for (NodeIterator i = nl.elements (); i.hasMoreNodes(); )
      	{	
                          processMyNodes (i.nextNode ());
      	}
      System.out.println("\nTree Nodes::\n"+str);
              }
      	
          }
      }

      Can u help me to find get the path of each node as a String?
      Last edited by Frinavale; Oct 16 '11, 11:19 PM. Reason: Added code tags. Please post code in code tags.

      Comment

      Working...