internet searching program

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • KillSwitch

    internet searching program

    Is it possible to make a program to search a site on the internet,
    then get certain information from the web pages that match and display
    them? Like, you would put in keywords to be searched for on
    youtube.com, then it would search youtube.com, get the names of the
    videos, the links, and the embed information? Or something like that.
  • Steven D'Aprano

    #2
    Re: internet searching program

    On Fri, 08 Aug 2008 19:59:02 -0700, KillSwitch wrote:
    Is it possible to make a program to search a site on the internet, then
    get certain information from the web pages that match and display them?
    Like, you would put in keywords to be searched for on youtube.com, then
    it would search youtube.com, get the names of the videos, the links, and
    the embed information? Or something like that.
    Search the Internet? Hmmm... I'm not sure, but I think Google does
    something quite like that, but I don't know if they do it with a computer
    program or an army of trained monkeys.



    --
    Steven

    Comment

    • KillSwitch

      #3
      Re: internet searching program

      No, I mean to search the internet really fast and display only REALLY
      SPECIFIC information about certain web pages. Like, stuff Google
      wouldn't have. For instance, in the youtube example, it would name the
      names of the videos, the url's of them, and the EMBED information
      without you having to view the site. And BTW, I am extremely new to
      Python. I am going to buy a few books soon, and I can really start my
      learning then. And I'll look up some tut's on the 'net in the
      meantime.

      Comment

      • Michael Tobis

        #4
        Re: internet searching program

        I think you are talking about "screen scraping".

        Your program can get the html for the page, and search for an
        appropriate pattern.

        Look at the source for a YouTube view page and you will see a string

        var embedUrl = 'http://....

        You can write code to search for that in the html text.

        But embedUrl is not a standard javascript item; it is part of the
        YouTube design. You will be relying on that, and if YouTube changes
        how they provide such a string, you will be out of luck; at best you
        will have to rewrite part of your code.

        In other words, screen scraping relies on the site not doing a major
        reworking and on you doing a little reverse engineering of the page
        source.

        So how to get the html into your program? In python the answer to that
        is urllib.

        Source code: Lib/urllib/ urllib is a package that collects several modules for working with URLs: urllib.request for opening and reading URLs, urllib.error containing the exceptions raised by urlli...


        mt

        Comment

        • alex23

          #5
          Re: internet searching program

          On Aug 9, 12:59 pm, KillSwitch <gu.yakahug...@ gmail.comwrote:
          Is it possible to make a program to search a site on the internet,
          then get certain information from the web pages that match and display
          them? Like, you would put in keywords to be searched for on
          youtube.com, then it would search youtube.com, get the names of the
          videos, the links, and the embed information? Or something like that.
          You might find the mechanize module handy:


          And possibly BeautifulSoup:

          Comment

          • greg

            #6
            Re: internet searching program

            Michael Tobis wrote:
            I think you are talking about "screen scraping".
            >
            Your program can get the html for the page, and search for an
            appropriate pattern.
            However, it wouldn't be "really fast", because you
            still have to fetch all the pages that might contain
            data you're looking for.

            Google searches are fast because they've already
            fetched all the web pages in the world and indexed
            them.

            You might get somewhere using a program that does
            a site-specific google search to find potentially
            relevant pages, then goes and looks at those pages
            for further information.

            Another possibility might be to crawl the site and
            build your own index based on the information you're
            interested in.

            --
            Greg

            Comment

            • KillSwitch

              #7
              Re: internet searching program

              Thanks a lot for all of everyones help, I am really looking forward to
              learning the ins and ous of python or my first programming language.

              Comment

              • BAnderton

                #8
                Re: internet searching program

                I was doing something very similar on my windows XP machine a year ago
                (with python 2.4) and used Mayukh Bose's Internet Explorer controller
                (see http://www.mayukhbose.com/python/IEC/index.php for details/
                download). It worked very nicely for my needs and was rather
                intuitive (generally much easier and required fewer brain cells than
                using urllib) ... here's some clips from the project:

                # this window will be for our initial data pull-in
                ie = IEController()
                ie.Navigate('ht tp://<whateverYourSi teNameIs>')
                ie.ClickButton( caption='Advanc ed')
                ...
                ie.SetInputValu e('search_strin g',strUserID)
                ie.ClickButton( name='image')
                ...
                strAllText = ie.GetDocumentT ext() # gets all html source code
                from current page.
                ...
                ie.CloseWindow( )

                I wish someone could make a similar one for Firefox.

                Comment

                • Support Desk

                  #9
                  RE: internet searching program

                  Google does'nt allow use of their API's anymore, I belive Yahoo has one or
                  you could do something like below.

                  searchstring = 'stuff here'

                  x = os.popen('lynx -dump http://www.google.com/search?q=%s' %
                  searchstring).r eadlines()


                  -----Original Message-----
                  From: Steven D'Aprano [mailto:steve@RE MOVE-THIS-cybersource.com .au]
                  Sent: Friday, August 08, 2008 11:22 PM
                  To: python-list@python.org
                  Subject: Re: internet searching program

                  On Fri, 08 Aug 2008 19:59:02 -0700, KillSwitch wrote:
                  Is it possible to make a program to search a site on the internet, then
                  get certain information from the web pages that match and display them?
                  Like, you would put in keywords to be searched for on youtube.com, then
                  it would search youtube.com, get the names of the videos, the links, and
                  the embed information? Or something like that.
                  Search the Internet? Hmmm... I'm not sure, but I think Google does
                  something quite like that, but I don't know if they do it with a computer
                  program or an army of trained monkeys.



                  --
                  Steven


                  Comment

                  • alex23

                    #10
                    Re: internet searching program

                    On Aug 12, 12:03 am, "Support Desk" <m...@ipglobal. netwrote:
                    Google does'nt allow use of their API's anymore, I belive Yahoo has one
                    Are you sure?

                    "Google Custom Search enables you to search over a website or a
                    collection of websites. You can harness the power of Google to create
                    a search engine tailored to your needs and interests, and you can
                    present the results in your website. Your custom search engine can
                    prioritize or restrict search results based on websites you specify."


                    Comment

                    • maoxw@tradeinfo.cn

                      #11
                      Re: internet searching program

                      On 8ÔÂ12ÈÕ, ÏÂÎç1ʱ44·Ö, alex23 <wuwe...@gmail. com>wrote:
                      On Aug 12, 12:03 am, "Support Desk" <m...@ipglobal. netwrote:
                      >
                      Google does'nt allow use of their API's anymore, I belive Yahoo has one
                      >
                      Are you sure?
                      >
                      "Google Custom Search enables you to search over a website or a
                      collection of websites. You can harness the power of Google to create
                      a search engine tailored to your needs and interests, and you can
                      present the results in your website. Your custom search engine can
                      prioritize or restrict search results based on websites you specify."
                      >
                      http://code.google.com/apis/customsearch/


                      V-4 type Series Muffler

                      B type Series Muffler

                      Comment

                      • Support Desk

                        #12
                        RE: internet searching program


                        Yes, I believe the custom search allows you to embed a google search into your website and customize it, but they no longer allow you to use a script to access search results unless you go about it in a roundabout way




                        -----Original Message-----
                        From: maoxw@tradeinfo .cn [mailto:maoxw@tr adeinfo.cn]
                        Sent: Tuesday, August 12, 2008 3:09 AM
                        To: python-list@python.org
                        Subject: Re: internet searching program

                        On 8月12日, 下午1时44分 , alex23 <wuwe...@gmail. comwrote:
                        On Aug 12, 12:03 am, "Support Desk" <m...@ipglobal. netwrote:
                        >
                        Google does'nt allow use of their API's anymore, I belive Yahoo has one
                        >
                        Are you sure?
                        >
                        "Google Custom Search enables you to search over a website or a
                        collection of websites. You can harness the power of Google to create
                        a search engine tailored to your needs and interests, and you can
                        present the results in your website. Your custom search engine can
                        prioritize or restrict search results based on websites you specify."
                        >
                        http://code.google.com/apis/customsearch/


                        V-4 type Series Muffler

                        B type Series Muffler


                        Comment

                        Working...