I am writing a small C# application that is supposed to download the files from a given URL. When i parse the HTML, i check the headers to see if the response is "text/html" and if so, to skip the file, since i suppose it's not a file (well, it is, but i just don't need text files or htm/html etc).
However, recently i was trying to download pdf files from: http://pdonline.brisbane.qld.gov.au/...key=A001617250
so at the web page there are links to pdf files, but not if you look at the source. So, i would get something like this: href='../documentmaster/viewdocumentftp .aspx?key=pDGvy FzSN3Zr4%2fPWwQ U5dVdkX8j%2f22a qh8uXeTWiRJKIv9 X6L5lVzFSgp%2bJ fwrwE'
However, i check the response from the headers, and since it's text/html i skip it. I need a way to check for this kind of responses if they return some other type of document (pdf in this example).
This is how i check the headers:
where:
s - is the link uri;
bSkip - is a bool variable indicating whether the link should be skipped
The thing is that i am indeed getting html response, but i am hoping to find a way to know if there is an embedded document in it and not just download all html files and parse them, since there might be lots of them and just a few with the actual embedded documents.
Thanks in advance. Any help is much appreciated.
If you need further details, please let me know.
However, recently i was trying to download pdf files from: http://pdonline.brisbane.qld.gov.au/...key=A001617250
so at the web page there are links to pdf files, but not if you look at the source. So, i would get something like this: href='../documentmaster/viewdocumentftp .aspx?key=pDGvy FzSN3Zr4%2fPWwQ U5dVdkX8j%2f22a qh8uXeTWiRJKIv9 X6L5lVzFSgp%2bJ fwrwE'
However, i check the response from the headers, and since it's text/html i skip it. I need a way to check for this kind of responses if they return some other type of document (pdf in this example).
This is how i check the headers:
Code:
...
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(s);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
string contentType = response.ContentType.ToLower();
if (!contentType.Contains("text/html"))
bSkip = true;
...
s - is the link uri;
bSkip - is a bool variable indicating whether the link should be skipped
The thing is that i am indeed getting html response, but i am hoping to find a way to know if there is an embedded document in it and not just download all html files and parse them, since there might be lots of them and just a few with the actual embedded documents.
Thanks in advance. Any help is much appreciated.
If you need further details, please let me know.