I'm trying to parse html that resides locally by using the HtmlDocument class
and unfortunately you can only get an instance of an HtmlDocument through the
WebBrowser control.
Some of the html files I want to parse are quite large so I want to get the
HtmlDocument in a separate thread. But for some reason, whenever I move the
code to navigate the WebBrowser to a separate thread the DocumentComplet ed
event is never fired. When I step through I can see that some of it is
loading but not all. Here is some code:
-----------
....
//Start the thread
Thread worker = new Thread(new ParameterizedTh readStart(LoadH tml));
worker.SetApart mentState(Apart mentState.STA);
worker.Start(fi les);
....
private void LoadHtml(object obj)
{
foreach(FileInf o fileinfo in (FileInfo[])obj)
{
//Create a new chapter and add it to the list of chapters
HtmlParser parser = new HtmlParser(file info);
m_listHtmlParse rs.Add(parser);
}
}
public class HtmlParser
{
public HtmlParser(File Info fileinfo)
{
//Set up our WebBrowser Control, which will Parse the HtmlDocument
//that it contains once the DocumentComplet ed Event is fired
m_wbHtmlParser. DocumentComplet ed +=
new WebBrowserDocum entCompletedEve ntHandler(Parse Html);
m_wbHtmlParser. Navigate(m_fiCh apterFile.FullN ame);
}
private void ParseHtml(objec t sender,
ebBrowserDocume ntCompletedEven tArgs e)
{
//We never get here
}
}
------------------
If I take this out of the thread, the DocuementComple ted event get's fired
and everything works. I think what is happening is that the thread is being
exited before the document is completely loaded. But I'm not sure how to
make sure all the HtmlParsers have fired the DocumentComplet ed event before
the thread ends.
Any help would be greatly appreciated.
and unfortunately you can only get an instance of an HtmlDocument through the
WebBrowser control.
Some of the html files I want to parse are quite large so I want to get the
HtmlDocument in a separate thread. But for some reason, whenever I move the
code to navigate the WebBrowser to a separate thread the DocumentComplet ed
event is never fired. When I step through I can see that some of it is
loading but not all. Here is some code:
-----------
....
//Start the thread
Thread worker = new Thread(new ParameterizedTh readStart(LoadH tml));
worker.SetApart mentState(Apart mentState.STA);
worker.Start(fi les);
....
private void LoadHtml(object obj)
{
foreach(FileInf o fileinfo in (FileInfo[])obj)
{
//Create a new chapter and add it to the list of chapters
HtmlParser parser = new HtmlParser(file info);
m_listHtmlParse rs.Add(parser);
}
}
public class HtmlParser
{
public HtmlParser(File Info fileinfo)
{
//Set up our WebBrowser Control, which will Parse the HtmlDocument
//that it contains once the DocumentComplet ed Event is fired
m_wbHtmlParser. DocumentComplet ed +=
new WebBrowserDocum entCompletedEve ntHandler(Parse Html);
m_wbHtmlParser. Navigate(m_fiCh apterFile.FullN ame);
}
private void ParseHtml(objec t sender,
ebBrowserDocume ntCompletedEven tArgs e)
{
//We never get here
}
}
------------------
If I take this out of the thread, the DocuementComple ted event get's fired
and everything works. I think what is happening is that the thread is being
exited before the document is completely loaded. But I'm not sure how to
make sure all the HtmlParsers have fired the DocumentComplet ed event before
the thread ends.
Any help would be greatly appreciated.