html + javascript automations = [mechanize + ?? ] or something else?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • John

    html + javascript automations = [mechanize + ?? ] or something else?


    I have to write a spyder for a webpage that uses html + javascript. I
    had it written using mechanize
    but the authors of the webpage now use a lot of javascript. Mechanize
    can no longer do the job.
    Does anyone know how I could automate my spyder to understand
    javascript? Is there a way
    to control a browser like firefox from python itself? How about IE?
    That way, we do not have
    to go thru something like mechanize?

    Thanks in advance for your help/comments,
    --j

  • John

    #2
    Re: html + javascript automations = [mechanize + ?? ] or something else?


    I am curious about the webbrowser module. I can open up firefox
    using webbrowser.open (), but can one control it? Say enter a
    login / passwd on a webpage? Send keystrokes to firefox?
    mouse clicks?

    Thanks,
    --j

    John wrote:
    I have to write a spyder for a webpage that uses html + javascript. I
    had it written using mechanize
    but the authors of the webpage now use a lot of javascript. Mechanize
    can no longer do the job.
    Does anyone know how I could automate my spyder to understand
    javascript? Is there a way
    to control a browser like firefox from python itself? How about IE?
    That way, we do not have
    to go thru something like mechanize?
    >
    Thanks in advance for your help/comments,
    --j

    Comment

    • Benjamin Niemann

      #3
      Re: html + javascript automations = [mechanize + ?? ] or something else?

      Hello,

      John wrote:
      John wrote:
      >I have to write a spyder for a webpage that uses html + javascript. I
      >had it written using mechanize
      >but the authors of the webpage now use a lot of javascript. Mechanize
      >can no longer do the job.
      >Does anyone know how I could automate my spyder to understand
      >javascript? Is there a way
      >to control a browser like firefox from python itself? How about IE?
      >That way, we do not have
      >to go thru something like mechanize?
      >
      I am curious about the webbrowser module. I can open up firefox
      using webbrowser.open (), but can one control it? Say enter a
      login / passwd on a webpage? Send keystrokes to firefox?
      mouse clicks?
      Not with the webbrowser module - it can only launch a browser.

      On the website of mechanize you will also find DOMForm
      <http://wwwsearch.sourc eforge.net/DOMForm/>, which is a webscraper with
      basic JS support (using the SpiderMonkey engine from the Mozilla project).
      But note that DOMForm is in a early state and not developed anymore
      (according to the site, never used it myself).

      You could try to script IE (perhaps also FF, dunno..) using COM. This can be
      done using the pywin32 module <https://sourceforge.net/projects/pywin32/>.
      How this is done in detail is a windows issue. You may find help and
      documentation in win specific group/mailing list, msdn, ... You can usually
      translate the COM calls from VB, C#, ... quite directly to Python.


      HTH

      --
      Benjamin Niemann
      Email: pink at odahoda dot de
      WWW: http://pink.odahoda.de/

      Comment

      • Andrey Khavryuchenko

        #4
        Re: html + javascript automations = [mechanize + ?? ] or something else?

        John,

        "J" == John wrote:

        JI have to write a spyder for a webpage that uses html + javascript. I
        Jhad it written using mechanize but the authors of the webpage now use a
        Jlot of javascript. Mechanize can no longer do the job. Does anyone
        Jknow how I could automate my spyder to understand javascript? Is there
        Ja way to control a browser like firefox from python itself? How about
        JIE? That way, we do not have to go thru something like mechanize?

        Up to my knowledge, there no way to test javascript but to fire up a
        browser.

        So, you might check Selenium (http://www.openqa.org/selenium/) and its
        python module.

        --
        Andrey V Khavryuchenko
        Software Development Company http://www.kds.com.ua/

        Comment

        • Diez B. Roggisch

          #5
          Re: html + javascript automations = [mechanize + ?? ] or something else?

          Up to my knowledge, there no way to test javascript but to fire up a
          browser.
          >
          So, you might check Selenium (http://www.openqa.org/selenium/) and its
          python module.
          No use in that, as to be remote-controlled by python, selenium must be run
          on the server-site itself, due to JS security model restrictions.

          Diez

          Comment

          • Duncan Booth

            #6
            Re: html + javascript automations = [mechanize + ?? ] or something else?

            "John" <weekender_ny@y ahoo.comwrote:
            Is there a way
            to control a browser like firefox from python itself? How about IE?
            IE is easy enough to control and you have full access to the DOM:
            >>import win32com
            >>win32com.clie nt.gencache.Ens ureModule('{EAB 22AC0-30C1-11CF-A7EB-
            0000C05BAE0B}', 0, 1, 1)
            <module 'win32com.gen_p y.EAB22AC0-30C1-11CF-A7EB-0000C05BAE0Bx0x 1x1' from
            'C:\Python25\li b\site-packages\win32c om\gen_py\EAB22 AC0-30C1-11CF-A7EB-
            0000C05BAE0Bx0x 1x1.py'>
            >>IE = win32com.client .DispatchEx('In ternetExplorer. Application.1')
            >>dir(IE)
            ['CLSID', 'ClientToWindow ', 'ExecWB', 'GetProperty', 'GoBack', 'GoForward',
            'GoHome', 'GoSearch', 'Navigate', 'Navigate2', 'PutProperty',
            'QueryStatusWB' , 'Quit', 'Refresh', 'Refresh2', 'ShowBrowserBar ', 'Stop',
            '_ApplyTypes_', '__call__', '__cmp__', '__doc__', '__getattr__',
            '__init__', '__int__', '__module__', '__repr__', '__setattr__', '__str__',
            '__unicode__', '_get_good_obje ct_', '_get_good_sing le_object_', '_oleobj_',
            '_prop_map_get_ ', '_prop_map_put_ ', 'coclass_clsid']
            >>IE.Visible=Tr ue
            >>IE.Navigate(" http://plone.org")
            >>while IE.Busy: pass
            >>print IE.Document.get ElementById("po rtlet-news").innerHTM L
            <DT class=portletHe ader><A class="feedButt on link-plain"
            href="feed://plone.org/news/newslisting/RSS"><IMG title="RSS subscription
            feed for news items" alt=RSS src="http://plone.org/rss.gif"</A><A
            href="http://plone.org/news">News</A</DT>

            .... and so on ...


            See
            Find official documentation, practical know-how, and expert guidance for builders working and troubleshooting in Microsoft products.

            ernetexplorer.a sp
            for the documentation.

            Comment

            • Andrey Khavryuchenko

              #7
              Re: html + javascript automations = [mechanize + ?? ] or something else?

              Diez,

              "DBR" == Diez B Roggisch wrote:
              >Up to my knowledge, there no way to test javascript but to fire up a
              >browser.
              >>
              >So, you might check Selenium (http://www.openqa.org/selenium/) and its
              >python module.
              DBRNo use in that, as to be remote-controlled by python, selenium must be run
              DBRon the server-site itself, due to JS security model restrictions.

              Sorry, missed 'spider' word in the original post.

              --
              Andrey V Khavryuchenko
              Software Development Company http://www.kds.com.ua/

              Comment

              • ina

                #8
                Re: html + javascript automations = [mechanize + ?? ] or something else?


                John wrote:
                I have to write a spyder for a webpage that uses html + javascript. I
                had it written using mechanize
                but the authors of the webpage now use a lot of javascript. Mechanize
                can no longer do the job.
                Does anyone know how I could automate my spyder to understand
                javascript? Is there a way
                to control a browser like firefox from python itself? How about IE?
                That way, we do not have
                to go thru something like mechanize?
                >
                Thanks in advance for your help/comments,
                --j
                You want pamie, iec or ishybrowser. Pamie is probably the best choice
                since it gets patches and updates on a regular basis.



                Comment

                • John

                  #9
                  Re: html + javascript automations = [mechanize + ?? ] or something else?


                  I tried to install pamie (but I have mostly used python on cygwin on
                  windows).
                  In the section " What will you need to run PAMIE", it says I will need
                  "Mark Hammonds Win32 All"
                  which I can not find. Can anyone tell me how do I install PAMIE? Do I
                  need python for
                  windows that is different from cygwin's python?

                  Thanks,
                  --j

                  ina wrote:
                  John wrote:
                  I have to write a spyder for a webpage that uses html + javascript. I
                  had it written using mechanize
                  but the authors of the webpage now use a lot of javascript. Mechanize
                  can no longer do the job.
                  Does anyone know how I could automate my spyder to understand
                  javascript? Is there a way
                  to control a browser like firefox from python itself? How about IE?
                  That way, we do not have
                  to go thru something like mechanize?

                  Thanks in advance for your help/comments,
                  --j
                  >
                  You want pamie, iec or ishybrowser. Pamie is probably the best choice
                  since it gets patches and updates on a regular basis.
                  >
                  http://pamie.sourceforge.net/

                  Comment

                  • John

                    #10
                    Re: html + javascript automations = [mechanize + ?? ] or something else?



                    My python2.5 installation on windows did not come with "win32com".
                    How do I install/get this module for windows?

                    Thanks,
                    --j

                    Duncan Booth wrote:
                    "John" <weekender_ny@y ahoo.comwrote:
                    >
                    Is there a way
                    to control a browser like firefox from python itself? How about IE?
                    >
                    IE is easy enough to control and you have full access to the DOM:
                    >
                    >import win32com
                    >win32com.clien t.gencache.Ensu reModule('{EAB2 2AC0-30C1-11CF-A7EB-
                    0000C05BAE0B}', 0, 1, 1)
                    <module 'win32com.gen_p y.EAB22AC0-30C1-11CF-A7EB-0000C05BAE0Bx0x 1x1' from
                    'C:\Python25\li b\site-packages\win32c om\gen_py\EAB22 AC0-30C1-11CF-A7EB-
                    0000C05BAE0Bx0x 1x1.py'>
                    >IE = win32com.client .DispatchEx('In ternetExplorer. Application.1')
                    >dir(IE)
                    ['CLSID', 'ClientToWindow ', 'ExecWB', 'GetProperty', 'GoBack', 'GoForward',
                    'GoHome', 'GoSearch', 'Navigate', 'Navigate2', 'PutProperty',
                    'QueryStatusWB' , 'Quit', 'Refresh', 'Refresh2', 'ShowBrowserBar ', 'Stop',
                    '_ApplyTypes_', '__call__', '__cmp__', '__doc__', '__getattr__',
                    '__init__', '__int__', '__module__', '__repr__', '__setattr__', '__str__',
                    '__unicode__', '_get_good_obje ct_', '_get_good_sing le_object_', '_oleobj_',
                    '_prop_map_get_ ', '_prop_map_put_ ', 'coclass_clsid']
                    >IE.Visible=Tru e
                    >IE.Navigate("h ttp://plone.org")
                    >while IE.Busy: pass
                    >
                    >print IE.Document.get ElementById("po rtlet-news").innerHTM L
                    <DT class=portletHe ader><A class="feedButt on link-plain"
                    href="feed://plone.org/news/newslisting/RSS"><IMG title="RSS subscription
                    feed for news items" alt=RSS src="http://plone.org/rss.gif"</A><A
                    href="http://plone.org/news">News</A</DT>
                    >
                    ... and so on ...
                    >
                    >
                    See
                    Find official documentation, practical know-how, and expert guidance for builders working and troubleshooting in Microsoft products.

                    ernetexplorer.a sp
                    for the documentation.

                    Comment

                    • Gabriel Genellina

                      #11
                      Re: html + javascript automations = [mechanize + ?? ] orsomethingelse ?

                      "John" <weekender_ny@y ahoo.comescribi ó en el mensaje
                      news:1169441279 .556814.16770@3 8g2000cwa.googl egroups.com...
                      My python2.5 installation on windows did not come with "win32com".
                      How do I install/get this module for windows?
                      Look for the pywin32 package at sourceforge.net

                      --
                      Gabriel Genellina


                      Comment

                      • John

                        #12
                        Re: html + javascript automations = [mechanize + ?? ] or somethingelse?


                        I tried it, didnt work with the python25 distribution msi file that is
                        on python.org
                        But activestate python worked. Now I can open IE using COM. What I am
                        trying
                        to figure out is how to click an x,y coordinate on a page in IE
                        automatically
                        using COM. How about typing something automatically.. .Any ideas?

                        Thanks,
                        --j

                        Gabriel Genellina wrote:
                        "John" <weekender_ny@y ahoo.comescribi ó en el mensaje
                        news:1169441279 .556814.16770@3 8g2000cwa.googl egroups.com...
                        >
                        My python2.5 installation on windows did not come with "win32com".
                        How do I install/get this module for windows?
                        >
                        Look for the pywin32 package at sourceforge.net

                        --
                        Gabriel Genellina

                        Comment

                        • Duncan Booth

                          #13
                          Re: html + javascript automations = [mechanize + ?? ] or somethingelse?

                          "John" <weekender_ny@y ahoo.comwrote:
                          I tried it, didnt work with the python25 distribution msi file that is
                          on python.org
                          But activestate python worked. Now I can open IE using COM. What I am
                          trying
                          to figure out is how to click an x,y coordinate on a page in IE
                          automatically
                          using COM. How about typing something automatically.. .Any ideas?
                          Don't think about clicking a coordinate or typing something; think about
                          the actions on the page. e.g. to fill in a field on a form you'll want
                          something like:

                          ie.document.for ms[formname][fieldname].value = 'whatever'

                          to click a button call its click method e.g.

                          submit = ie.document.for ms[0]['submit']
                          submit.focus()
                          submit.click()

                          Check out the documentation at msdn.microsoft. com for the application,
                          document, form etc. objects. Generally speaking anything you could have
                          done through javascript you should be able to do through automation, plus a
                          few of other things that javascript might have blocked for security
                          reasons.

                          Comment

                          Working...