Website Crawler

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Petrosa
    New Member
    • Apr 2007
    • 9

    Website Crawler

    Hey all,

    I have a project that i need to make a web crawler to find links in a website, and then represent the site's structure in a 3D tree. I have found an example at http://java.sun.com/developer/technicalArticl es/ThirdParty/WebCrawler/ for the crawler, but it seems very very old when hava was at 1.3 and seems some packages have changed abit and now it doesnt work properly. What modifications are needed to make that code work properly now?

    My main question is, are there any specific packages now implemented in java i can use if i dicide make the crawler from scratch? What i should use to make my life easier?

    For the 3D representation part, i was said to use VRML to do it. Is that a good idea or anyting else i could use?
  • Petrosa
    New Member
    • Apr 2007
    • 9

    #2
    Anyone can help with some directions?

    Comment

    • JosAH
      Recognized Expert MVP
      • Mar 2007
      • 11453

      #3
      I hope you know that the web is not a tree structure, it's a graph structure; btw
      what is a 3DTree? Every tree no matter it's arity (number of children) can be
      represented in a 2D space. But you don't need a tree, you need a graph or even
      a multi graph (more edges from and to the same nodes).

      kind regards,

      Jos

      Comment

      • Petrosa
        New Member
        • Apr 2007
        • 9

        #4
        Originally posted by JosAH
        I hope you know that the web is not a tree structure, it's a graph structure; btw
        what is a 3DTree? Every tree no matter it's arity (number of children) can be
        represented in a 2D space. But you don't need a tree, you need a graph or even
        a multi graph (more edges from and to the same nodes).

        kind regards,

        Jos
        By 3D i mean basic 3D shapes represent 1 node thats why i was asked to use VRML for the shapes. I will only need to represent a single website not many websites connected together thats why i said a tree. Maybe graph was a more appropriate term now that i think of it since websites can point to multple others not limited to sub-sites of them. Well, the instructions i got said tree ... guess they used a wrong term :)

        So, any libraries i can use to make the crawler from scratch to make my life easier ?
        How about VRML? I think its an outdated way to use for the website representation, so is there a good way in java to do it or any other suggestions on what to use?

        Thanks in advance

        Comment

        • Petrosa
          New Member
          • Apr 2007
          • 9

          #5
          Any help from anyone?

          Comment

          Working...