WebResponse.GetResponseStream returns incomplete stream

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • vito16
    New Member
    • Feb 2008
    • 2

    WebResponse.GetResponseStream returns incomplete stream

    Hi,

    I have some C# code for a console application that was correctly grabbing pages until recently were the data is now incomplete. I am needing to grab all information including sponsored links for a url similar to: "http://search.live.com/results.aspx?q= sony+bmg" All sponsored ads on the site are contained in one of two <div> tags: <div id="at"> or <div id="ar"> when I save the output to a file and search for these tags they are not found. If I use a browser and go to the above link and view the source then I easily find the <div> tags containing the sponsored links. This was working for the past couple of months and only recently it stopped working and so I don't believe that the problem is related to my code. Below is a simplified version of my program and illustrates the same problem.

    Code:
    using System;
    using System.Collections.Generic;
    using System.Text;
    using System.Net;
    using System.IO;
    
    namespace WebReader
    {
        class Program
        {
            static void Main(string[] args)
            {
                string url = "http://search.live.com/results.aspx?q=sony+bmg";
                Uri uri = new Uri(url);
    
                WebRequest req = WebRequest.Create(uri);
                WebResponse resp = req.GetResponse();
                Stream stream = resp.GetResponseStream();
                StreamReader sr = new StreamReader(stream);
                string s = sr.ReadToEnd();
    
                System.IO.StreamWriter myFile = new System.IO.StreamWriter("c:\\WebRead.txt");
                myFile.Write(s);
                myFile.Close();
            }
        }
    }
    Things I have Tried:
    • I have tried buffering the response in MemorySteam with the same results



    Any suggestions that you may have on how to get the entire page returned so I can parse it would be appreciated.
  • vito16
    New Member
    • Feb 2008
    • 2

    #2
    I have solved the problem, it looks like live.com is now requiring the useragent information before returning the sponsored ads. I simply had to add one line which casted my WebRequest object to an HttpWebRequest and assigned the user agent. Below is the updated code which resolved my problem.


    Code:
    using System;
    using System.Collections.Generic;
    using System.Text;
    using System.Net;
    using System.IO;
    
    namespace WebReader
    {
        class Program
        {
            static void Main(string[] args)
            {
                string url = "http://search.live.com/results.aspx?q=sony+bmg";
                Uri uri = new Uri(url);
    
                WebRequest req = WebRequest.Create(uri);
                ((HttpWebRequest)req).UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows 5.1;";
                WebResponse resp = req.GetResponse();
                Stream stream = resp.GetResponseStream();
                StreamReader sr = new StreamReader(stream);
                string s = sr.ReadToEnd();
    
                System.IO.StreamWriter myFile = new System.IO.StreamWriter("c:\\WebRead.txt");
                myFile.Write(s);
                myFile.Close();
            }
        }
    }

    Comment

    Working...