Get proper HTML response

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • l034n
    New Member
    • Oct 2008
    • 15

    Get proper HTML response

    I am writing a small C# application that is supposed to download the files from a given URL. When i parse the HTML, i check the headers to see if the response is "text/html" and if so, to skip the file, since i suppose it's not a file (well, it is, but i just don't need text files or htm/html etc).
    However, recently i was trying to download pdf files from: http://pdonline.brisbane.qld.gov.au/...key=A001617250
    so at the web page there are links to pdf files, but not if you look at the source. So, i would get something like this: href='../documentmaster/viewdocumentftp .aspx?key=pDGvy FzSN3Zr4%2fPWwQ U5dVdkX8j%2f22a qh8uXeTWiRJKIv9 X6L5lVzFSgp%2bJ fwrwE'
    However, i check the response from the headers, and since it's text/html i skip it. I need a way to check for this kind of responses if they return some other type of document (pdf in this example).

    This is how i check the headers:

    Code:
    ...
    HttpWebRequest request = (HttpWebRequest)WebRequest.Create(s);
    HttpWebResponse response = (HttpWebResponse)request.GetResponse();
    string contentType = response.ContentType.ToLower();
    
    if (!contentType.Contains("text/html"))
    bSkip = true;
    ...
    where:
    s - is the link uri;
    bSkip - is a bool variable indicating whether the link should be skipped

    The thing is that i am indeed getting html response, but i am hoping to find a way to know if there is an embedded document in it and not just download all html files and parse them, since there might be lots of them and just a few with the actual embedded documents.

    Thanks in advance. Any help is much appreciated.
    If you need further details, please let me know.
Working...