Extract Descriptions

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Nikkhah
    New Member
    • Apr 2010
    • 12

    Extract Descriptions

    Hello
    I want to Extract URLs and descriptions of them from a search result. I use Jericho library and I can extract URLs but I don't know how could I extract descriptions of them.
    please help me.
    thanks.
  • jkmyoung
    Recognized Expert Top Contributor
    • Mar 2006
    • 2057

    #2
    I'm assuming that you're taking a html result from a site like google, and extracting the urls. Could we a couple results from your sample input (the search result), and how you're extracting the URLs? I'm sure it's probably just a quick modification on the URL extraction.

    Comment

    • Nikkhah
      New Member
      • Apr 2010
      • 12

      #3
      I use html unit and this is my code that extract URLs:
      *************** *******
      Code:
      List anchors = page.getAnchors();
      
      for (Iterator iter = anchors.iterator(); iter.hasNext();){
      HtmlAnchor anchor = (HtmlAnchor) iter.next();
        //   anchor.getAttribute("title");
         if (isSkipLink(anchor)) {
      continue;
      }
      private static boolean isSkiptitle(HtmlAnchor anchor) {
      
       
      return anchor.getHrefAttribute().startsWith("/")
      || anchor.getHrefAttribute().indexOf("/search?q=cache:") > 0;
      }
      *************** ***********
      Now I want to extract the title and description of each URL in a search results.
      Could you help me?

      Comment

      • jkmyoung
        Recognized Expert Top Contributor
        • Mar 2006
        • 2057

        #4
        Try getTextContent( )
        If that doesn't work, you might have to go into the child nodes and find the node of type textNode.

        Comment

        Working...