scripting browsers from Python

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Michele Simionato

    scripting browsers from Python

    I would like to know what is available for scripting browsers from
    Python.
    For instance, webbrowser.open let me to perform GET requests, but I
    would like
    to do POST requests too. I don't want to use urllib to emulate a
    browser, I am
    interested in checking that browser X really works as intended with my
    application. Any suggestion?

    Michele Simionato

  • Simon Brunning

    #2
    Re: scripting browsers from Python

    On 31 May 2005 00:52:33 -0700, Michele Simionato
    <michele.simion ato@gmail.com> wrote:[color=blue]
    > I would like to know what is available for scripting browsers from
    > Python.
    > For instance, webbrowser.open let me to perform GET requests, but I
    > would like
    > to do POST requests too. I don't want to use urllib to emulate a
    > browser, I am
    > interested in checking that browser X really works as intended with my
    > application. Any suggestion?[/color]

    I don't know of anything cross platform, or even cross browser, but on
    Windows, IE can be automated via COM - see
    <http://www.mayukhbose. com/python/IEC/> for example - and other
    browsers should be able to be automated either via COM or by driving
    the GUI with WATSUP (<http://www.tizmoi.net/watsup/intro.html>).

    --
    Cheers,
    Simon B,
    simon@brunningo nline.net,

    Comment

    • davidb@mcs.st-and.ac.uk

      #3
      Re: scripting browsers from Python

      Michele Simionato wrote:
      [color=blue]
      > I would like to know what is available for scripting browsers from
      > Python.
      > For instance, webbrowser.open let me to perform GET requests, but I
      > would like to do POST requests too. I don't want to use urllib to
      > emulate a browser, I am interested in checking that browser X really
      > works as intended with my application. Any suggestion?[/color]

      For Konqueror running on KDE, you can use DCOP to control the browser.
      There are a couple of different, but related, Python modules that you
      can use to do this. See the following page for more information:



      I believe this approach has been used quite successfully with other
      KDE applications:



      You should still be able to automate the browser with just popen2 and
      the "dcop" command line tool if you are really desperate. I once had
      to resort to this ad-hoc approach in the distant past but, these days,
      I'd recommend one of the above modules instead.

      David

      Comment

      • Kent Johnson

        #4
        Re: scripting browsers from Python

        Simon Brunning wrote:[color=blue]
        > On 31 May 2005 00:52:33 -0700, Michele Simionato
        > <michele.simion ato@gmail.com> wrote:
        >[color=green]
        >>I would like to know what is available for scripting browsers from
        >>Python.[/color]
        >
        > I don't know of anything cross platform, or even cross browser, but on
        > Windows, IE can be automated via COM - see
        > <http://www.mayukhbose. com/python/IEC/> for example[/color]

        Also http://pamie.sourceforge.net/

        Kent

        Comment

        • John J. Lee

          #5
          Re: scripting browsers from Python

          "Michele Simionato" <michele.simion ato@gmail.com> writes:
          [color=blue]
          > I would like to know what is available for scripting browsers from
          > Python.
          > For instance, webbrowser.open let me to perform GET requests, but I
          > would like
          > to do POST requests too. I don't want to use urllib to emulate a
          > browser, I am
          > interested in checking that browser X really works as intended with my
          > application. Any suggestion?[/color]

          Yes:



          In a previous post I mentioned Selenium as a Web app testing tool that is like no other in terms of functionality and implementation. I've...





          Unfortunately, there's still no free (as in speech) "macro recorder"
          implemented as a browser plugin (nor even one implemented on the HTTP
          level that can produce output in a form selenium understands, AFAIK).

          For some other relevant links, see under "Misc Links" here (and for
          that matter, the previous bullet point too):




          John

          Comment

          • Olivier Favre-Simon

            #6
            Re: scripting browsers from Python

            On Tue, 31 May 2005 00:52:33 -0700, Michele Simionato wrote:
            [color=blue]
            > I would like to know what is available for scripting browsers from
            > Python.
            > For instance, webbrowser.open let me to perform GET requests, but I
            > would like
            > to do POST requests too. I don't want to use urllib to emulate a
            > browser, I am
            > interested in checking that browser X really works as intended with my
            > application. Any suggestion?
            >
            > Michele Simionato[/color]


            ClientForm http://wwwsearch.sourceforge.net/ClientForm/

            I use it for automation of POSTs of entire image directories to
            imagevenue.com/imagehigh.com/etc hosts.

            Works above urllib2.

            You access forms by name or indice, then you access HTML elements as a
            dict attribute of the form.

            Support file upload within POST.

            The only drawback I've found are:
            - does not support nested forms (since forms are returned in a list)
            - does not like ill-formed HTML (Uses HTMLParser as the underlying parser.
            you may pass a parser class as parameter (say SGMLParser for greater
            acceptance of stupid HTML code) but it's tricky because there is no well
            defined parser interface)

            Hope this helps.

            Comment

            • Michele Simionato

              #7
              Re: scripting browsers from Python

              This looks interesting, but I need an example here. What would be the
              command
              to open Konqueror to a given page and to post a form with given
              parameters?
              kde.org has tons a material, but I am getting lost and I don't find
              anything
              relevant to my simple problem.

              Michele Simionato

              Comment

              • has

                #8
                Re: scripting browsers from Python

                Simon Brunning wrote:[color=blue]
                > On 31 May 2005 00:52:33 -0700, Michele Simionato
                > <michele.simion ato@gmail.com> wrote:[color=green]
                > > I would like to know what is available for scripting browsers from
                > > Python.[/color]
                >
                > I don't know of anything cross platform, or even cross browser, but on
                > Windows, IE can be automated via COM[/color]

                On OS X you can use appscript
                <http://freespace.virgi n.net/hamish.sanderso n/appscript.html> , either
                directly via the application's scripting interface if it has one or
                indirectly by manipulating its GUI via GUI Scripting.

                HTH

                Comment

                • Jeff Epler

                  #9
                  Re: scripting browsers from Python

                  I wanted to have a Python program make my browser do a POST. I am using
                  Firefox on Linux.

                  Here's what I did:
                  * Prepare a HTML page on the local disk that looks like this:
                  <html><body onload="documen t.forms[0].submit()">
                  <div style="display: none">
                  <form method=post accept-charset="utf-8" action="http://www.example.com/cgi-bin/example.cgi">
                  <input name=field1 value=value1>
                  <input name=field2 value=value2>
                  <textarea name=text>....</textarea>
                  <input type=submit name=blah>
                  </form>
                  </div>
                  Submitting form...
                  </body>
                  </html>

                  * Point the webbrowser at it. In my case, the webbrowser module didn't work immediately so I
                  just used os.system() with a hardcoded browser name for it

                  Jeff

                  -----BEGIN PGP SIGNATURE-----
                  Version: GnuPG v1.2.1 (GNU/Linux)

                  iD8DBQFCncHiJd0 1MZaTXX0RAolzAK CKjs0pkEu86sQxY 4yB83fU71fECACf Ryag
                  0wuz8b0pvJHWC3i 90MhZG7k=
                  =4OPn
                  -----END PGP SIGNATURE-----

                  Comment

                  • John J. Lee

                    #10
                    Re: scripting browsers from Python

                    Olivier Favre-Simon <olivier.favr e-simon@club-internet.fr> writes:
                    [color=blue]
                    > On Tue, 31 May 2005 00:52:33 -0700, Michele Simionato wrote:
                    >[color=green]
                    > > I would like to know what is available for scripting browsers from
                    > > Python.[/color][/color]
                    [...][color=blue]
                    > ClientForm http://wwwsearch.sourceforge.net/ClientForm/
                    >
                    > I use it for automation of POSTs of entire image directories to
                    > imagevenue.com/imagehigh.com/etc hosts.[/color]

                    This doesn't actually address what the OP wanted: it's not a browser.

                    [color=blue]
                    > The only drawback I've found are:
                    > - does not support nested forms (since forms are returned in a list)[/color]

                    Nested forms?? Good grief. Can you point me at a real life example
                    of such HTML? Can probably fix the parser to work around this.

                    [color=blue]
                    > - does not like ill-formed HTML (Uses HTMLParser as the underlying parser.
                    > you may pass a parser class as parameter (say SGMLParser for greater
                    > acceptance of stupid HTML code) but it's tricky because there is no well
                    > defined parser interface)[/color]

                    Titus Brown says he's trying to fix sgmllib (to some extent, at least).

                    Also, you can always feed stuff through mxTidy.

                    I'd like to have a reimplementatio n of ClientForm on top of something
                    like BeautifulSoup.. .


                    John

                    Comment

                    • David Boddie

                      #11
                      Re: scripting browsers from Python

                      Michele Simionato wrote:[color=blue]
                      > This looks interesting, but I need an example here. What would be the
                      > command to open Konqueror to a given page and to post a form with given
                      > parameters?[/color]

                      Launch Konqueror, note the process ID (pid), and use the dcop command
                      line tool to open the page at a specified URL:

                      dcop konqueror-<pid> konqueror-mainwindow#1 openURL <URL>

                      Unfortunately, I don't think it's possible to manipulate the page
                      purely with DCOP, even with Python bindings, although I hope that
                      someone can prove me wrong.
                      [color=blue]
                      > kde.org has tons a material, but I am getting lost and I don't find
                      > anything relevant to my simple problem.[/color]

                      A quick search revealed this discussion about using JavaScript with
                      DCOP:



                      This might be the best you can hope for with scripting outside the
                      browser. I've been trying to enable support for in-browser scripting
                      with Konqueror using KPart plugins, but this requires up to date
                      versions of sip, PyQt and PyKDE:



                      If you want to pursue that route, let me know and I'll try and tidy up
                      what I have.

                      David

                      Comment

                      • Olivier Favre-Simon

                        #12
                        Re: scripting browsers from Python

                        On Wed, 01 Jun 2005 22:27:44 +0000, John J. Lee wrote:
                        [color=blue]
                        > Olivier Favre-Simon <olivier.favr e-simon@club-internet.fr> writes:
                        >[color=green]
                        >> On Tue, 31 May 2005 00:52:33 -0700, Michele Simionato wrote:
                        >>[color=darkred]
                        >> > I would like to know what is available for scripting browsers from
                        >> > Python.[/color][/color]
                        > [...][color=green]
                        >> ClientForm http://wwwsearch.sourceforge.net/ClientForm/
                        >>
                        >> I use it for automation of POSTs of entire image directories to
                        >> imagevenue.com/imagehigh.com/etc hosts.[/color]
                        >
                        > This doesn't actually address what the OP wanted: it's not a browser.[/color]

                        Yep. Didn't read with sufficient care. He really wants scripting not
                        webscraping.
                        [color=blue]
                        >
                        >[color=green]
                        >> The only drawback I've found are:
                        >> - does not support nested forms (since forms are returned in a list)[/color]
                        >
                        > Nested forms?? Good grief. Can you point me at a real life example of
                        > such HTML? Can probably fix the parser to work around this.[/color]

                        What I mean is: The parser does not detect a missing </form>, so
                        thinks that there are nested forms, and raises a ParseError.

                        Browsers have an easier task at spotting non-matching form tags, because
                        they can use matching table or div tags around to imply that the form is
                        closed (DOM approach).

                        Not easy with a SAXish approach like HTMLParser.

                        I don't mean nested forms should be supported, they are crap (is this even
                        legal code ?)
                        [color=blue]
                        >
                        >[color=green]
                        >> - does not like ill-formed HTML (Uses HTMLParser as the underlying
                        >> parser. you may pass a parser class as parameter (say SGMLParser for
                        >> greater acceptance of stupid HTML code) but it's tricky because there
                        >> is no well defined parser interface)[/color]
                        >
                        > Titus Brown says he's trying to fix sgmllib (to some extent, at least).
                        >
                        > Also, you can always feed stuff through mxTidy.
                        >
                        > I'd like to have a reimplementatio n of ClientForm on top of something
                        > like BeautifulSoup.. .
                        >
                        >
                        > John[/color]

                        When taken separately, either ClientForm, HTMLParser or SGMLParser work
                        well.

                        But it would be cool that competent people in the HTML parsing domain join
                        up, and define a base parser interface, the same way smart guys did with
                        WSGI for webservers.

                        So libs like ClientForm would not raise say an AttributeError if some
                        custom parser class does not implement a given attribute.

                        Adding an otherwise unused attribute to a parser just in case one day it
                        will interop with ClientForm sounds silly. And what if ClientForm changes
                        its attributes, etc.

                        No really, whatever the chosen codebase, a common parser interface would
                        be great.


                        Comment

                        • Stephen Thorne

                          #13
                          Re: scripting browsers from Python

                          On 31 May 2005 00:52:33 -0700, Michele Simionato
                          <michele.simion ato@gmail.com> wrote:[color=blue]
                          > I would like to know what is available for scripting browsers from
                          > Python.
                          > For instance, webbrowser.open let me to perform GET requests, but I
                          > would like
                          > to do POST requests too. I don't want to use urllib to emulate a
                          > browser, I am
                          > interested in checking that browser X really works as intended with my
                          > application. Any suggestion?
                          >
                          > Michele Simionato[/color]

                          I use pbp, http://pbp.berlios.de/

                          It's essentially a python commandline webbrowser suitable for testing
                          websites. It makes it easy to do things like:

                          go http://user:pass@mywebsite/secure/
                          follow Admin
                          follow Configure
                          formvalue config max_widgets 300
                          submit config

                          in a script, and then run that script at your lesuire.

                          As it's designed for testing, everything you do is essnetial an
                          assertion, so if anything fails it fails spectacularly with debug
                          messages and non-zero exit codes. You can also load python code up so
                          you can do arbitary stuff.

                          --
                          Stephen Thorne
                          Development Engineer

                          Comment

                          • John J. Lee

                            #14
                            Re: scripting browsers from Python

                            Olivier Favre-Simon <olivier.favr e-simon@club-internet.fr> writes:
                            [...][color=blue][color=green]
                            > > I'd like to have a reimplementatio n of ClientForm on top of something
                            > > like BeautifulSoup.. .
                            > >
                            > >
                            > > John[/color]
                            >
                            > When taken separately, either ClientForm, HTMLParser or SGMLParser work
                            > well.
                            >
                            > But it would be cool that competent people in the HTML parsing domain join
                            > up, and define a base parser interface, the same way smart guys did with
                            > WSGI for webservers.[/color]

                            Perhaps. Given a mythical fixed quantity of volunteer coding effort I
                            could assign to any HTML parsing project, I'd really prefer that
                            somebody separated out the HTML parsing, tree building and DOM code
                            from Mozilla and/or Konqueror.

                            [color=blue]
                            > So libs like ClientForm would not raise say an AttributeError if some
                            > custom parser class does not implement a given attribute.
                            >
                            > Adding an otherwise unused attribute to a parser just in case one day it
                            > will interop with ClientForm sounds silly. And what if ClientForm changes
                            > its attributes, etc.[/color]
                            [...]

                            I'm sorry, I didn't really follow that at all.

                            What I hoped to get from implementing the ClientForm interface on top
                            of something like BeautifulSoup was actually two things:

                            1. Better parsing

                            2. Access to a nice, and comprehensive, object model that lets you do
                            things with non-form elements, and the ability to move back and
                            forth between ClientForm and BeautifulSoup objects. I already did
                            this for the HTML DOM with DOMForm (unsupported), but for various
                            reasons the implementation is horrid, and since I no longer intend
                            to put in the effort to support JavaScript, I'd prefer a nicer tree
                            API than the DOM.


                            John

                            Comment

                            • John J. Lee

                              #15
                              Re: scripting browsers from Python

                              [Michele Simionato][color=blue][color=green]
                              > > I would like to know what is available for scripting browsers from
                              > > Python.[/color][/color]
                              [...][color=blue][color=green]
                              > > to do POST requests too. I don't want to use urllib to emulate a
                              > > browser, I am
                              > > interested in checking that browser X really works as intended with my
                              > > application. Any suggestion?[/color][/color]
                              [...]
                              [Stephen Thorne][color=blue]
                              > I use pbp, http://pbp.berlios.de/[/color]
                              [...]

                              Again, that doesn't do what Michele wants.


                              John

                              Comment

                              Working...