HtmlElementCollection foreach loop won't end

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • soljin
    New Member
    • Mar 2008
    • 1

    HtmlElementCollection foreach loop won't end

    Hello, First time posting and new to C# so please forgive my noobishness. I have built a little page scraper to grab a list of television shows from and input string. Here is the class:

    Code:
        public class Show
        {
            public ArrayList seasons;
            public string name;
            public Boolean loaded;
            public int showId;
            public Show(string n)
            {
                name = n;
                WebBrowser scraper = new WebBrowser();
                scraper.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(this.scrapeForShows);
                scraper.Url = new Uri("http://www.tv.com/search.php?type=Search&stype=ajax_search&search_type=program&offset=0&qs=" + name);
            }
            public void scrapeForShows(object sender, WebBrowserDocumentCompletedEventArgs e)
            {
                HtmlDocument doc = ((WebBrowser)sender).Document;
                HtmlElementCollection spans = doc.GetElementsByTagName("span");
                Dictionary<string, string> showsFound = new Dictionary<string, string>();
                for (int n = 0; n < spans.Count;n++)
                {
                    var html = spans[n].InnerHtml.ToString();
                    if (Regex.IsMatch(html, @"Show\:"))
                    {
                        string showName = Regex.Match(html, @"<(a|A).*>(?<show>.*)</(a|A)>").Groups["show"].ToString();
                        showName = Regex.Match(showName, @"\w.*\w").ToString();
                        showsFound.Add(showName, Regex.Match(html, @"/show/(?<num>\d+)/").Groups["num"].ToString());
                    }
                }
                if (showsFound.ContainsKey(name))
                {
                    MessageBox.Show(name.ToString());
                }
            }
        }
    I works fairly well but the for statement never ends. And even stranger still the document complete fires 3 times. When I debug it says that the spans.Count is 16 and it increments 1,2 .. 14 then stops. It doesn't hang, it doesn't do 15, it just ends and doesn't execute the code after.

    The last span on the scrapped page is empty <span></span>. Is that the source of my problem? Why would that loop just stop processing.

    PS I switched to the iterative for statement after I originally had the same problem with a foreach and wanted to see what was happening.

    Any help would be GREATLY appreciated. I'm really stumped. Also any advise you have as far as technique would be really helpful.
  • nateraaaa
    Recognized Expert Contributor
    • May 2007
    • 664

    #2
    Try setting n = 1 and n <= spans.Count. Let me know if you get a different result.

    Nathan

    Comment

    Working...