Hello, First time posting and new to C# so please forgive my noobishness. I have built a little page scraper to grab a list of television shows from and input string. Here is the class:
I works fairly well but the for statement never ends. And even stranger still the document complete fires 3 times. When I debug it says that the spans.Count is 16 and it increments 1,2 .. 14 then stops. It doesn't hang, it doesn't do 15, it just ends and doesn't execute the code after.
The last span on the scrapped page is empty <span></span>. Is that the source of my problem? Why would that loop just stop processing.
PS I switched to the iterative for statement after I originally had the same problem with a foreach and wanted to see what was happening.
Any help would be GREATLY appreciated. I'm really stumped. Also any advise you have as far as technique would be really helpful.
Code:
public class Show
{
public ArrayList seasons;
public string name;
public Boolean loaded;
public int showId;
public Show(string n)
{
name = n;
WebBrowser scraper = new WebBrowser();
scraper.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(this.scrapeForShows);
scraper.Url = new Uri("http://www.tv.com/search.php?type=Search&stype=ajax_search&search_type=program&offset=0&qs=" + name);
}
public void scrapeForShows(object sender, WebBrowserDocumentCompletedEventArgs e)
{
HtmlDocument doc = ((WebBrowser)sender).Document;
HtmlElementCollection spans = doc.GetElementsByTagName("span");
Dictionary<string, string> showsFound = new Dictionary<string, string>();
for (int n = 0; n < spans.Count;n++)
{
var html = spans[n].InnerHtml.ToString();
if (Regex.IsMatch(html, @"Show\:"))
{
string showName = Regex.Match(html, @"<(a|A).*>(?<show>.*)</(a|A)>").Groups["show"].ToString();
showName = Regex.Match(showName, @"\w.*\w").ToString();
showsFound.Add(showName, Regex.Match(html, @"/show/(?<num>\d+)/").Groups["num"].ToString());
}
}
if (showsFound.ContainsKey(name))
{
MessageBox.Show(name.ToString());
}
}
}
The last span on the scrapped page is empty <span></span>. Is that the source of my problem? Why would that loop just stop processing.
PS I switched to the iterative for statement after I originally had the same problem with a foreach and wanted to see what was happening.
Any help would be GREATLY appreciated. I'm really stumped. Also any advise you have as far as technique would be really helpful.
Comment