Hi everyone,
this is my first thread coz I just joined. Does anyone know how to crawl a particular URL using Python? I tried to build a breadth-first sort of crawler but have little success.
With wget, if you are more familiar with it than me, how can get it to output the crawled links (links not the actual HTML content) to a file?
currently i have something like:
wget -q -E -O outfile --proxy-user=username -E --proxy-password=mypass word -r --recursive http://www.museum.vic. gov.au
but it outputs the actual crawled HTML content to 'outfile', but I only want the crawled links.
Thank you in advance
this is my first thread coz I just joined. Does anyone know how to crawl a particular URL using Python? I tried to build a breadth-first sort of crawler but have little success.
With wget, if you are more familiar with it than me, how can get it to output the crawled links (links not the actual HTML content) to a file?
currently i have something like:
wget -q -E -O outfile --proxy-user=username -E --proxy-password=mypass word -r --recursive http://www.museum.vic. gov.au
but it outputs the actual crawled HTML content to 'outfile', but I only want the crawled links.
Thank you in advance
Comment