html screen scrapper tool?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • GaryDean

    html screen scrapper tool?

    Anyone know of html screen scrapper software that will work with .net
    projects. We need to get data back from a gov site that only provides it on
    a webpage.
    Thanks,
    Gary


  • Allen Chen [MSFT]

    #2
    RE: html screen scrapper tool?

    Hi Gary,

    Here's a working sample with source code. You can refer to it and write
    your own project.


    If it's not what you need please let me know and clarify the requirement in
    detail.

    Regards,
    Allen Chen
    Microsoft Online Support

    Delighting our customers is our #1 priority. We welcome your comments and
    suggestions about how we can improve the support we provide to you. Please
    feel free to let my manager know what you think of the level of service
    provided. You can send feedback directly to my manager at:
    msdnmg@microsof t.com.

    =============== =============== =============== =====
    Get notification to my posts through email? Please refer to
    http://msdn.microsoft.com/en-us/subs...#notifications.

    Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
    where an initial response from the community or a Microsoft Support
    Engineer within 1 business day is acceptable. Please note that each follow
    up response may take approximately 2 business days as the support
    professional working with you may need further investigation to reach the
    most efficient resolution. The offering is not appropriate for situations
    that require urgent, real-time or phone-based interactions or complex
    project analysis and dump analysis issues. Issues of this nature are best
    handled working with a dedicated Microsoft Support Engineer by contacting
    Microsoft Customer Support Services (CSS) at
    http://support.microsoft.com/select/...tance&ln=en-us.
    =============== =============== =============== =====
    This posting is provided "AS IS" with no warranties, and confers no rights.
    --------------------
    | From: "GaryDean" <gdeanblakely@n ewsgroup.nospam >
    | Subject: html screen scrapper tool?
    | Date: Sun, 19 Oct 2008 16:37:06 -0700
    | Lines: 7
    | X-Priority: 3
    | X-MSMail-Priority: Normal
    | X-Newsreader: Microsoft Outlook Express 6.00.2900.3028
    | X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028
    | X-RFC2646: Format=Flowed; Original
    | Message-ID: <eO7RTOkMJHA.72 8@TK2MSFTNGP03. phx.gbl>
    | Newsgroups: microsoft.publi c.dotnet.framew ork.aspnet
    | NNTP-Posting-Host: ip68-110-7-92.tc.ph.cox.ne t 68.110.7.92
    | Path: TK2MSFTNGHUB02. phx.gbl!TK2MSFT NGP01.phx.gbl!T K2MSFTNGP03.phx .gbl
    | Xref: TK2MSFTNGHUB02. phx.gbl
    microsoft.publi c.dotnet.framew ork.aspnet:7813 6
    | X-Tomcat-NG: microsoft.publi c.dotnet.framew ork.aspnet
    |
    | Anyone know of html screen scrapper software that will work with .net
    | projects. We need to get data back from a gov site that only provides it
    on
    | a webpage.
    | Thanks,
    | Gary
    |
    |
    |

    Comment

    • Allen Chen [MSFT]

      #3
      RE: html screen scrapper tool?

      Hi Gary,

      Is this issue solved?

      Regards,
      Allen Chen
      Microsoft Online Support

      Delighting our customers is our #1 priority. We welcome your comments and
      suggestions about how we can

      improve the support we provide to you. Please feel free to let my manager
      know what you think of the

      level of service provided. You can send feedback directly to my manager at:
      msdnmg@microsof t.com.

      =============== =============== =============== =====
      Get notification to my posts through email? Please refer to
      Gain technical skills through documentation and training, earn certifications and connect with the community


      us/subscriptions/aa948868.aspx#n otifications.

      Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
      where an initial response

      from the community or a Microsoft Support
      Engineer within 1 business day is acceptable. Please note that each follow
      up response may take

      approximately 2 business days as the support
      professional working with you may need further investigation to reach the
      most efficient resolution. The

      offering is not appropriate for situations
      that require urgent, real-time or phone-based interactions or complex
      project analysis and dump analysis

      issues. Issues of this nature are best handled working with a dedicated
      Microsoft Support Engineer by

      contacting Microsoft Customer Support Services (CSS) at

      http://support.microsoft.com/select/...tance&ln=en-us.
      =============== =============== =============== =====
      This posting is provided "AS IS" with no warranties, and confers no rights.
      --------------------
      | From: "GaryDean" <gdeanblakely@n ewsgroup.nospam >
      | Subject: html screen scrapper tool?
      | Date: Sun, 19 Oct 2008 16:37:06 -0700
      | Lines: 7
      | X-Priority: 3
      | X-MSMail-Priority: Normal
      | X-Newsreader: Microsoft Outlook Express 6.00.2900.3028
      | X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028
      | X-RFC2646: Format=Flowed; Original
      | Message-ID: <eO7RTOkMJHA.72 8@TK2MSFTNGP03. phx.gbl>
      | Newsgroups: microsoft.publi c.dotnet.framew ork.aspnet
      | NNTP-Posting-Host: ip68-110-7-92.tc.ph.cox.ne t 68.110.7.92
      | Path: TK2MSFTNGHUB02. phx.gbl!TK2MSFT NGP01.phx.gbl!T K2MSFTNGP03.phx .gbl
      | Xref: TK2MSFTNGHUB02. phx.gbl
      microsoft.publi c.dotnet.framew ork.aspnet:7813 6
      | X-Tomcat-NG: microsoft.publi c.dotnet.framew ork.aspnet
      |
      | Anyone know of html screen scrapper software that will work with .net
      | projects. We need to get data back from a gov site that only provides it
      on
      | a webpage.
      | Thanks,
      | Gary
      |
      |
      |

      Comment

      • GaryDean

        #4
        Re: html screen scrapper tool?

        Allen,
        Well, not exactly. What we are trying to do is navigate a website
        programmaticall y. Our research so far has turned up different methods...

        1. Using the Webbrowser control with it's properties and methods

        2. Creating an InternetExplore r application in memory as so ....
        Set uf1_cbutt1_clic k_ie = CreateObject("I nternetExplorer .Application")

        With uf1_cbutt1_clic k_ie
        .navigate
        "http://www.cfmu.euroco ntrol.be/chmi_public/ciahome.jsp?ser v1=ifpuvs"
        .Visible = True

        With .document.ifpuv s
        .arcid.Value = "A value here"
        .rules.Value = "Another value here etc etc"
        .cmd.Click
        End With

        End With

        3. Using the XPath reader or the agility pack. (that reads but I don't know
        if we can fill in fields and push buttons with it)
        4. Creating WebClient, UTF8Encoding object - DownloadData method.

        We haven't been able to find any "best practices" articles on this subject.
        It seem there are several different approaches and different people know a
        little bit about each one. Which way is best?
        Thanks for following up,
        Gary

        "Allen Chen [MSFT]" <v-alchen@online.m icrosoft.comwro te in message
        news:BjGDFSONJH A.8140@TK2MSFTN GHUB02.phx.gbl. ..
        Hi Gary,
        >
        Is this issue solved?
        >
        Regards,
        Allen Chen
        Microsoft Online Support
        >
        Delighting our customers is our #1 priority. We welcome your comments and
        suggestions about how we can
        >
        improve the support we provide to you. Please feel free to let my manager
        know what you think of the
        >
        level of service provided. You can send feedback directly to my manager
        at:
        msdnmg@microsof t.com.
        >
        =============== =============== =============== =====
        Get notification to my posts through email? Please refer to
        Gain technical skills through documentation and training, earn certifications and connect with the community

        >
        us/subscriptions/aa948868.aspx#n otifications.
        >
        Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
        where an initial response
        >
        from the community or a Microsoft Support
        Engineer within 1 business day is acceptable. Please note that each follow
        up response may take
        >
        approximately 2 business days as the support
        professional working with you may need further investigation to reach the
        most efficient resolution. The
        >
        offering is not appropriate for situations
        that require urgent, real-time or phone-based interactions or complex
        project analysis and dump analysis
        >
        issues. Issues of this nature are best handled working with a dedicated
        Microsoft Support Engineer by
        >
        contacting Microsoft Customer Support Services (CSS) at
        >
        http://support.microsoft.com/select/...tance&ln=en-us.
        =============== =============== =============== =====
        This posting is provided "AS IS" with no warranties, and confers no
        rights.
        --------------------
        | From: "GaryDean" <gdeanblakely@n ewsgroup.nospam >
        | Subject: html screen scrapper tool?
        | Date: Sun, 19 Oct 2008 16:37:06 -0700
        | Lines: 7
        | X-Priority: 3
        | X-MSMail-Priority: Normal
        | X-Newsreader: Microsoft Outlook Express 6.00.2900.3028
        | X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028
        | X-RFC2646: Format=Flowed; Original
        | Message-ID: <eO7RTOkMJHA.72 8@TK2MSFTNGP03. phx.gbl>
        | Newsgroups: microsoft.publi c.dotnet.framew ork.aspnet
        | NNTP-Posting-Host: ip68-110-7-92.tc.ph.cox.ne t 68.110.7.92
        | Path: TK2MSFTNGHUB02. phx.gbl!TK2MSFT NGP01.phx.gbl!T K2MSFTNGP03.phx .gbl
        | Xref: TK2MSFTNGHUB02. phx.gbl
        microsoft.publi c.dotnet.framew ork.aspnet:7813 6
        | X-Tomcat-NG: microsoft.publi c.dotnet.framew ork.aspnet
        |
        | Anyone know of html screen scrapper software that will work with .net
        | projects. We need to get data back from a gov site that only provides
        it
        on
        | a webpage.
        | Thanks,
        | Gary
        |
        |
        |
        >

        Comment

        • Allen Chen [MSFT]

          #5
          Re: html screen scrapper tool?

          Hi Gary,

          By saying "navigate a website programmaticall y" do you mean you want to
          create a robot to play with the page in IE? If so it's really a tough work.
          If you only want to achieve this requirement you can use some third party
          tools such as WatiN:



          This response contains a reference to a third party World Wide Web site.
          Microsoft is providing this information as a convenience to you. Microsoft
          does not control these sites and has not tested any software or information
          found on these sites; therefore, Microsoft cannot make any representations
          regarding the quality, safety, or suitability of any software or
          information found there. There are inherent dangers in the use of any
          software found on the Internet, and Microsoft cautions you to make sure
          that you completely understand the risk before retrieving any software from
          the Internet.

          Quote from Gary=========== =============== =============== =========
          We haven't been able to find any "best practices" articles on this subject.
          It seem there are several different approaches and different people know a
          little bit about each one. Which way is best?
          =============== =============== =============== =====

          It's not that easy to give you an answer as to which way is the best. I
          think we'd better leave this topic open because each way deserves
          investigation.

          Please let me know if you made any progress on this issue.

          Regards,
          Allen Chen
          Microsoft Online Support

          --------------------
          | From: "GaryDean" <gdeanblakely@n ewsgroup.nospam >
          | References: <eO7RTOkMJHA.72 8@TK2MSFTNGP03. phx.gbl>
          <BjGDFSONJHA.81 40@TK2MSFTNGHUB 02.phx.gbl>
          | Subject: Re: html screen scrapper tool?
          | Date: Thu, 23 Oct 2008 08:59:50 -0700
          | Lines: 113
          | X-Priority: 3
          | X-MSMail-Priority: Normal
          | X-Newsreader: Microsoft Outlook Express 6.00.2900.3028
          | X-RFC2646: Format=Flowed; Original
          | X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028
          | Message-ID: <uceJahSNJHA.23 48@TK2MSFTNGP05 .phx.gbl>
          | Newsgroups: microsoft.publi c.dotnet.framew ork.aspnet
          | NNTP-Posting-Host: ip68-110-7-92.tc.ph.cox.ne t 68.110.7.92
          | Path: TK2MSFTNGHUB02. phx.gbl!TK2MSFT NGP01.phx.gbl!T K2MSFTNGP05.phx .gbl
          | Xref: TK2MSFTNGHUB02. phx.gbl
          microsoft.publi c.dotnet.framew ork.aspnet:7848 2
          | X-Tomcat-NG: microsoft.publi c.dotnet.framew ork.aspnet
          |
          | Allen,
          | Well, not exactly. What we are trying to do is navigate a website
          | programmaticall y. Our research so far has turned up different methods...
          |
          | 1. Using the Webbrowser control with it's properties and methods
          |
          | 2. Creating an InternetExplore r application in memory as so ....
          | Set uf1_cbutt1_clic k_ie = CreateObject("I nternetExplorer .Application")
          |
          | With uf1_cbutt1_clic k_ie
          | .navigate
          | "http://www.cfmu.euroco ntrol.be/chmi_public/ciahome.jsp?ser v1=ifpuvs"
          | .Visible = True
          |
          | With .document.ifpuv s
          | .arcid.Value = "A value here"
          | .rules.Value = "Another value here etc etc"
          | .cmd.Click
          | End With
          |
          | End With
          |
          | 3. Using the XPath reader or the agility pack. (that reads but I don't
          know
          | if we can fill in fields and push buttons with it)
          | 4. Creating WebClient, UTF8Encoding object - DownloadData method.
          |
          | We haven't been able to find any "best practices" articles on this
          subject.
          | It seem there are several different approaches and different people know
          a
          | little bit about each one. Which way is best?
          | Thanks for following up,
          | Gary
          |
          | "Allen Chen [MSFT]" <v-alchen@online.m icrosoft.comwro te in message
          | news:BjGDFSONJH A.8140@TK2MSFTN GHUB02.phx.gbl. ..
          | Hi Gary,
          | >
          | Is this issue solved?
          | >
          | Regards,
          | Allen Chen
          | Microsoft Online Support
          | >
          | Delighting our customers is our #1 priority. We welcome your comments
          and
          | suggestions about how we can
          | >
          | improve the support we provide to you. Please feel free to let my
          manager
          | know what you think of the
          | >
          | level of service provided. You can send feedback directly to my manager
          | at:
          | msdnmg@microsof t.com.
          | >
          | =============== =============== =============== =====
          | Get notification to my posts through email? Please refer to
          | http://msdn.microsoft.com/en-
          | >
          | us/subscriptions/aa948868.aspx#n otifications.
          | >
          | Note: The MSDN Managed Newsgroup support offering is for non-urgent
          issues
          | where an initial response
          | >
          | from the community or a Microsoft Support
          | Engineer within 1 business day is acceptable. Please note that each
          follow
          | up response may take
          | >
          | approximately 2 business days as the support
          | professional working with you may need further investigation to reach
          the
          | most efficient resolution. The
          | >
          | offering is not appropriate for situations
          | that require urgent, real-time or phone-based interactions or complex
          | project analysis and dump analysis
          | >
          | issues. Issues of this nature are best handled working with a dedicated
          | Microsoft Support Engineer by
          | >
          | contacting Microsoft Customer Support Services (CSS) at
          | >
          | >
          http://support.microsoft.com/select/...tance&ln=en-us.
          | =============== =============== =============== =====
          | This posting is provided "AS IS" with no warranties, and confers no
          | rights.
          | --------------------
          | | From: "GaryDean" <gdeanblakely@n ewsgroup.nospam >
          | | Subject: html screen scrapper tool?
          | | Date: Sun, 19 Oct 2008 16:37:06 -0700
          | | Lines: 7
          | | X-Priority: 3
          | | X-MSMail-Priority: Normal
          | | X-Newsreader: Microsoft Outlook Express 6.00.2900.3028
          | | X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028
          | | X-RFC2646: Format=Flowed; Original
          | | Message-ID: <eO7RTOkMJHA.72 8@TK2MSFTNGP03. phx.gbl>
          | | Newsgroups: microsoft.publi c.dotnet.framew ork.aspnet
          | | NNTP-Posting-Host: ip68-110-7-92.tc.ph.cox.ne t 68.110.7.92
          | | Path: TK2MSFTNGHUB02. phx.gbl!TK2MSFT NGP01.phx.gbl!T K2MSFTNGP03.phx .gbl
          | | Xref: TK2MSFTNGHUB02. phx.gbl
          | microsoft.publi c.dotnet.framew ork.aspnet:7813 6
          | | X-Tomcat-NG: microsoft.publi c.dotnet.framew ork.aspnet
          | |
          | | Anyone know of html screen scrapper software that will work with .net
          | | projects. We need to get data back from a gov site that only
          provides
          | it
          | on
          | | a webpage.
          | | Thanks,
          | | Gary
          | |
          | |
          | |
          | >
          |
          |
          |

          Comment

          • GaryDean

            #6
            Re: html screen scrapper tool?

            Allen,
            WatiN along with the IE developer Toolbar does the trick!
            thanks so much,
            Gary

            "Allen Chen [MSFT]" <v-alchen@online.m icrosoft.comwro te in message
            news:JEhmT7XNJH A.1672@TK2MSFTN GHUB02.phx.gbl. ..
            Hi Gary,
            >
            By saying "navigate a website programmaticall y" do you mean you want to
            create a robot to play with the page in IE? If so it's really a tough
            work.
            If you only want to achieve this requirement you can use some third party
            tools such as WatiN:
            >

            >
            This response contains a reference to a third party World Wide Web site.
            Microsoft is providing this information as a convenience to you. Microsoft
            does not control these sites and has not tested any software or
            information
            found on these sites; therefore, Microsoft cannot make any representations
            regarding the quality, safety, or suitability of any software or
            information found there. There are inherent dangers in the use of any
            software found on the Internet, and Microsoft cautions you to make sure
            that you completely understand the risk before retrieving any software
            from
            the Internet.
            >
            Quote from Gary=========== =============== =============== =========
            We haven't been able to find any "best practices" articles on this
            subject.
            It seem there are several different approaches and different people know a
            little bit about each one. Which way is best?
            =============== =============== =============== =====
            >
            It's not that easy to give you an answer as to which way is the best. I
            think we'd better leave this topic open because each way deserves
            investigation.
            >
            Please let me know if you made any progress on this issue.
            >
            Regards,
            Allen Chen
            Microsoft Online Support
            >
            --------------------
            | From: "GaryDean" <gdeanblakely@n ewsgroup.nospam >
            | References: <eO7RTOkMJHA.72 8@TK2MSFTNGP03. phx.gbl>
            <BjGDFSONJHA.81 40@TK2MSFTNGHUB 02.phx.gbl>
            | Subject: Re: html screen scrapper tool?
            | Date: Thu, 23 Oct 2008 08:59:50 -0700
            | Lines: 113
            | X-Priority: 3
            | X-MSMail-Priority: Normal
            | X-Newsreader: Microsoft Outlook Express 6.00.2900.3028
            | X-RFC2646: Format=Flowed; Original
            | X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028
            | Message-ID: <uceJahSNJHA.23 48@TK2MSFTNGP05 .phx.gbl>
            | Newsgroups: microsoft.publi c.dotnet.framew ork.aspnet
            | NNTP-Posting-Host: ip68-110-7-92.tc.ph.cox.ne t 68.110.7.92
            | Path: TK2MSFTNGHUB02. phx.gbl!TK2MSFT NGP01.phx.gbl!T K2MSFTNGP05.phx .gbl
            | Xref: TK2MSFTNGHUB02. phx.gbl
            microsoft.publi c.dotnet.framew ork.aspnet:7848 2
            | X-Tomcat-NG: microsoft.publi c.dotnet.framew ork.aspnet
            |
            | Allen,
            | Well, not exactly. What we are trying to do is navigate a website
            | programmaticall y. Our research so far has turned up different
            methods...
            |
            | 1. Using the Webbrowser control with it's properties and methods
            |
            | 2. Creating an InternetExplore r application in memory as so ....
            | Set uf1_cbutt1_clic k_ie = CreateObject("I nternetExplorer .Application")
            |
            | With uf1_cbutt1_clic k_ie
            | .navigate
            | "http://www.cfmu.euroco ntrol.be/chmi_public/ciahome.jsp?ser v1=ifpuvs"
            | .Visible = True
            |
            | With .document.ifpuv s
            | .arcid.Value = "A value here"
            | .rules.Value = "Another value here etc etc"
            | .cmd.Click
            | End With
            |
            | End With
            |
            | 3. Using the XPath reader or the agility pack. (that reads but I don't
            know
            | if we can fill in fields and push buttons with it)
            | 4. Creating WebClient, UTF8Encoding object - DownloadData method.
            |
            | We haven't been able to find any "best practices" articles on this
            subject.
            | It seem there are several different approaches and different people know
            a
            | little bit about each one. Which way is best?
            | Thanks for following up,
            | Gary
            |
            | "Allen Chen [MSFT]" <v-alchen@online.m icrosoft.comwro te in message
            | news:BjGDFSONJH A.8140@TK2MSFTN GHUB02.phx.gbl. ..
            | Hi Gary,
            | >
            | Is this issue solved?
            | >
            | Regards,
            | Allen Chen
            | Microsoft Online Support
            | >
            | Delighting our customers is our #1 priority. We welcome your comments
            and
            | suggestions about how we can
            | >
            | improve the support we provide to you. Please feel free to let my
            manager
            | know what you think of the
            | >
            | level of service provided. You can send feedback directly to my
            manager
            | at:
            | msdnmg@microsof t.com.
            | >
            | =============== =============== =============== =====
            | Get notification to my posts through email? Please refer to
            | http://msdn.microsoft.com/en-
            | >
            | us/subscriptions/aa948868.aspx#n otifications.
            | >
            | Note: The MSDN Managed Newsgroup support offering is for non-urgent
            issues
            | where an initial response
            | >
            | from the community or a Microsoft Support
            | Engineer within 1 business day is acceptable. Please note that each
            follow
            | up response may take
            | >
            | approximately 2 business days as the support
            | professional working with you may need further investigation to reach
            the
            | most efficient resolution. The
            | >
            | offering is not appropriate for situations
            | that require urgent, real-time or phone-based interactions or complex
            | project analysis and dump analysis
            | >
            | issues. Issues of this nature are best handled working with a
            dedicated
            | Microsoft Support Engineer by
            | >
            | contacting Microsoft Customer Support Services (CSS) at
            | >
            | >
            http://support.microsoft.com/select/...tance&ln=en-us.
            | =============== =============== =============== =====
            | This posting is provided "AS IS" with no warranties, and confers no
            | rights.
            | --------------------
            | | From: "GaryDean" <gdeanblakely@n ewsgroup.nospam >
            | | Subject: html screen scrapper tool?
            | | Date: Sun, 19 Oct 2008 16:37:06 -0700
            | | Lines: 7
            | | X-Priority: 3
            | | X-MSMail-Priority: Normal
            | | X-Newsreader: Microsoft Outlook Express 6.00.2900.3028
            | | X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028
            | | X-RFC2646: Format=Flowed; Original
            | | Message-ID: <eO7RTOkMJHA.72 8@TK2MSFTNGP03. phx.gbl>
            | | Newsgroups: microsoft.publi c.dotnet.framew ork.aspnet
            | | NNTP-Posting-Host: ip68-110-7-92.tc.ph.cox.ne t 68.110.7.92
            | | Path:
            TK2MSFTNGHUB02. phx.gbl!TK2MSFT NGP01.phx.gbl!T K2MSFTNGP03.phx .gbl
            | | Xref: TK2MSFTNGHUB02. phx.gbl
            | microsoft.publi c.dotnet.framew ork.aspnet:7813 6
            | | X-Tomcat-NG: microsoft.publi c.dotnet.framew ork.aspnet
            | |
            | | Anyone know of html screen scrapper software that will work with
            .net
            | | projects. We need to get data back from a gov site that only
            provides
            | it
            | on
            | | a webpage.
            | | Thanks,
            | | Gary
            | |
            | |
            | |
            | >
            |
            |
            |
            >

            Comment

            Working...