allowing http redirects??

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Jason Manfield

    allowing http redirects??

    I am trying to crawl the web using the following code snippet.

    HttpWebRequest req = (HttpWebRequest )WebRequest.Cre ate(url);
    req.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Win32)";

    ....


    By default, the req.AllowAutoRe direct is true and
    MaximumAutomati cRedirections is 50.


    When I try to crawl the following URL.



    I get NameResolutionF ailure exception. However, I am able to open this URL
    from the browser and it gets redirected to:




    How do I force my C# code to go to the redirected url?

  • Joerg Jooss

    #2
    Re: allowing http redirects??

    Jason Manfield wrote:
    [color=blue]
    > I am trying to crawl the web using the following code snippet.
    >
    > HttpWebRequest req = (HttpWebRequest )WebRequest.Cre ate(url);
    > req.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Win32)";
    >
    > ...
    >
    >
    > By default, the req.AllowAutoRe direct is true and
    > MaximumAutomati cRedirections is 50.
    >
    >
    > When I try to crawl the following URL.
    >
    > http://citeseer.ist.psu.edu/rd/55811...oad/http://cit
    > eseer.ist.psu.e du/cache/papers/cs/7145/http:zSzzSzwww. stanford.eduzSz c
    > lasszSzcs343zSz pszSzpathprof.p df/ball96efficient .pdf
    >
    > I get NameResolutionF ailure exception. However, I am able to open
    > this URL from the browser and it gets redirected to:
    >
    > http://citeseer.ist.psu.edu/cache/pa...zzSzwww.stanfo
    > rd.eduzSzclassz Szcs343zSzpszSz pathprof.pdf/ball96efficient .pdf
    >
    >
    > How do I force my C# code to go to the redirected url?[/color]

    What .NET version do you use? I can get this document woth no problem:

    HTTP/1.1 200 OK
    Date: Wed, 01 Jun 2005 20:52:37 GMT
    Server: Apache/2.0.53 (Unix)
    Last-Modified: Wed, 08 Nov 2000 19:17:28 GMT
    ETag: "17801be-21d36-c5214200"
    Accept-Ranges: bytes
    Content-Length: 138550
    Connection: close
    Content-Type: application/pdf

    Cheers,
    --

    mailto:news-reply@joergjoos s.de

    Comment

    • Jason Manfield

      #3
      Re: allowing http redirects??

      Joerg

      My .NET version is 2.0.4.

      I assume it worked for you from C# code. Did you set any property in
      HttpWebRequest to make it work?

      My code can open the redirected URL, but not the original url (with rd in it).

      Jason

      "Joerg Jooss" wrote:
      [color=blue]
      > Jason Manfield wrote:
      >[color=green]
      > > I am trying to crawl the web using the following code snippet.
      > >
      > > HttpWebRequest req = (HttpWebRequest )WebRequest.Cre ate(url);
      > > req.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Win32)";
      > >
      > > ...
      > >
      > >
      > > By default, the req.AllowAutoRe direct is true and
      > > MaximumAutomati cRedirections is 50.
      > >
      > >
      > > When I try to crawl the following URL.
      > >
      > > http://citeseer.ist.psu.edu/rd/55811...oad/http://cit
      > > eseer.ist.psu.e du/cache/papers/cs/7145/http:zSzzSzwww. stanford.eduzSz c
      > > lasszSzcs343zSz pszSzpathprof.p df/ball96efficient .pdf
      > >
      > > I get NameResolutionF ailure exception. However, I am able to open
      > > this URL from the browser and it gets redirected to:
      > >
      > > http://citeseer.ist.psu.edu/cache/pa...zzSzwww.stanfo
      > > rd.eduzSzclassz Szcs343zSzpszSz pathprof.pdf/ball96efficient .pdf
      > >
      > >
      > > How do I force my C# code to go to the redirected url?[/color]
      >
      > What .NET version do you use? I can get this document woth no problem:
      >
      > HTTP/1.1 200 OK
      > Date: Wed, 01 Jun 2005 20:52:37 GMT
      > Server: Apache/2.0.53 (Unix)
      > Last-Modified: Wed, 08 Nov 2000 19:17:28 GMT
      > ETag: "17801be-21d36-c5214200"
      > Accept-Ranges: bytes
      > Content-Length: 138550
      > Connection: close
      > Content-Type: application/pdf
      >
      > Cheers,
      > --
      > http://www.joergjooss.de
      > mailto:news-reply@joergjoos s.de
      >[/color]

      Comment

      • Joerg Jooss

        #4
        Re: allowing http redirects??

        Jason Manfield wrote:
        [color=blue]
        > Joerg
        >
        > My .NET version is 2.0.4.
        >
        > I assume it worked for you from C# code. Did you set any property in
        > HttpWebRequest to make it work?[/color]

        Nothing special, but I tried it using .NET 1.1 SP1. It doesn't work for
        me in .NET 2.0 Beta 2 either.


        Cheers,
        --

        mailto:news-reply@joergjoos s.de

        Comment

        Working...