Hi Folks,
I am writing a program to analyse an html page in java, I am connecting to a website, then going to extract ALL the links from it. I think the best way to do this is using the <a href... /a> tags as a guideline.
I have the code....
This obviously reads through every line of code in an html doc at the URL and puts it into data. I was thinking of storing all the URLs from the site in an array later on, but it is the way of extracting the links I was unsure of... possibly somekind of sting tokenizer? I really need something that will scroll through a string, char by char until it hits <a href=" and will then record the data until it hits /a> giving me the URL.
Which I can then just add to something like URLs [] and loop through that later.
Think it will only be one line of code or so, any ideas?
Cheers!
I am writing a program to analyse an html page in java, I am connecting to a website, then going to extract ALL the links from it. I think the best way to do this is using the <a href... /a> tags as a guideline.
I have the code....
Code:
String data1; DataInputStream webadd = null; webadd = new DataInputStream( (new URL("http://www.anyrandomurl.com/")).openStream() ); data1 = webadd.readLine(); while ( data != null ) { data = webadd.readLine(); *** HELP NEEDED HERE *** }
Which I can then just add to something like URLs [] and loop through that later.
Think it will only be one line of code or so, any ideas?
Cheers!
Comment