HTTP-POST simultaneous requests

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • mark

    HTTP-POST simultaneous requests

    Hello,

    I want to create a php scraper that will get some information from
    e.g. 5 sites simultaneously. I tried the following script:
    The basic idea of a Web 2.0-style "mashup" is that you consume data from several services, often from different providers and combine them in interesting ways. This means you often need to do more than one HTTP request to a service or services. In PHP if you use something like file_get_contents() th

    Everything works fine, but what I want is simultaneuos (something to
    multithread, when these 5 websites will be loaded not one after
    another, but by using different sockets) scraper.

    In addition I would like to display the results as soon as it will be
    scraped. So when first http-post get answer, it will show the result
    and wait for the rest of the pages (not display everything when all
    scraping is done).
    Any ideas how can I achieve it? Thanks!

    regards, Mark
  • Jerry Stuckle

    #2
    Re: HTTP-POST simultaneous requests

    mark wrote:
    Hello,
    >
    I want to create a php scraper that will get some information from
    e.g. 5 sites simultaneously. I tried the following script:
    The basic idea of a Web 2.0-style "mashup" is that you consume data from several services, often from different providers and combine them in interesting ways. This means you often need to do more than one HTTP request to a service or services. In PHP if you use something like file_get_contents() th

    Everything works fine, but what I want is simultaneuos (something to
    multithread, when these 5 websites will be loaded not one after
    another, but by using different sockets) scraper.
    >
    In addition I would like to display the results as soon as it will be
    scraped. So when first http-post get answer, it will show the result
    and wait for the rest of the pages (not display everything when all
    scraping is done).
    Any ideas how can I achieve it? Thanks!
    >
    regards, Mark
    >
    Sorry, PHP doesn't do multithreading very well. Probably the best you
    can do is start multiple background processes to do the work then
    communicate via a database, shared memory, etc.

    As for displaying the contents immediately - again, not guaranteed
    possible. You can flush() the buffers in PHP - but that doesn't
    guarantee the data will be sent by the webserver to the client
    immediately, nor does it guarantee the client will display the data
    before it's received.

    Sounds like java might be a better fit.

    --
    =============== ===
    Remove the "x" from my email address
    Jerry Stuckle
    JDS Computer Training Corp.
    jstucklex@attgl obal.net
    =============== ===

    Comment

    • =?UTF-8?B?SXbDoW4gU8OhbmNoZXogT3J0ZWdh?=

      #3
      Re: HTTP-POST simultaneous requests

      mark wrote:
      In addition I would like to display the results as soon as it will be
      scraped. So when first http-post get answer, it will show the result
      and wait for the rest of the pages (not display everything when all
      scraping is done).
      Any ideas how can I achieve it? Thanks!
      Either:
      - Run in console and use fork().
      - Use raw HTTP and some socket_select() magic.
      - curl_multi_exec ().
      - Rely on javascript, ajax techniques, and make a web browser launch 5
      queries yo your web server, each of one scraping a site.
      - Use ignore_user_abo rt() and a mix of raw HTTP with sockets to blindly
      launch PHP threads. This one's quite tricky to pull out.

      There may be more ways to do this, but unless you know what a critical
      section is, please stay away from concurrent (AKA multithread) programming.

      Besides, you want IPC to get the results as they appear - to make your life
      easier, you should stick with either curl_multi queries or rely on
      javascript to individually fetch results as they are ready.


      --
      ----------------------------------
      Iván Sánchez Ortega -ivan-algarroba-sanchezortega-punto-es-

      Now listening to: Deep Forest - Music.Detected_ (2002) - [4] Computer
      Machine (5:12) (99.061996%)

      Comment

      • Manuel Lemos

        #4
        Re: HTTP-POST simultaneous requests

        Hello,

        on 10/04/2008 05:09 PM mark said the following:
        Hello,
        >
        I want to create a php scraper that will get some information from
        e.g. 5 sites simultaneously. I tried the following script:
        The basic idea of a Web 2.0-style "mashup" is that you consume data from several services, often from different providers and combine them in interesting ways. This means you often need to do more than one HTTP request to a service or services. In PHP if you use something like file_get_contents() th

        Everything works fine, but what I want is simultaneuos (something to
        multithread, when these 5 websites will be loaded not one after
        another, but by using different sockets) scraper.
        >
        In addition I would like to display the results as soon as it will be
        scraped. So when first http-post get answer, it will show the result
        and wait for the rest of the pages (not display everything when all
        scraping is done).
        Any ideas how can I achieve it? Thanks!
        This class can do exactly what you describe:



        This other class also uses separate HTTP requests to run multiple
        parallel tasks but these are started from the browser side using AJAX
        requests:




        --

        Regards,
        Manuel Lemos

        Find and post PHP jobs


        PHP Classes - Free ready to use OOP components written in PHP

        Comment

        • Jerry Stuckle

          #5
          Re: HTTP-POST simultaneous requests

          Manuel Lemos wrote:
          Hello,
          >
          on 10/04/2008 05:09 PM mark said the following:
          >Hello,
          >>
          >I want to create a php scraper that will get some information from
          >e.g. 5 sites simultaneously. I tried the following script:
          >http://www.phpied.com/simultaneuos-h...php-with-curl/
          >Everything works fine, but what I want is simultaneuos (something to
          >multithread, when these 5 websites will be loaded not one after
          >another, but by using different sockets) scraper.
          >>
          >In addition I would like to display the results as soon as it will be
          >scraped. So when first http-post get answer, it will show the result
          >and wait for the rest of the pages (not display everything when all
          >scraping is done).
          >Any ideas how can I achieve it? Thanks!
          >
          This class can do exactly what you describe:
          >

          >
          This other class also uses separate HTTP requests to run multiple
          parallel tasks but these are started from the browser side using AJAX
          requests:
          >

          >
          >
          Why don't you tell him that's your own site you're spamming again, Manuel?

          And those are your own classes (which, BTW, aren't worth a damn) you're
          spamming?

          --
          =============== ===
          Remove the "x" from my email address
          Jerry Stuckle
          JDS Computer Training Corp.
          jstucklex@attgl obal.net
          =============== ===

          Comment

          • C. (http://symcbean.blogspot.com/)

            #6
            Re: HTTP-POST simultaneous requests

            On 4 Oct, 21:09, mark <mkazmier...@gm ail.comwrote:
            Hello,
            >
            I want to create a php scraper that will get some information from
            e.g. 5 sites simultaneously. I tried the following script:http://www.phpied.com/simultaneuos-h...php-with-curl/
            Everything works fine, but what I want is simultaneuos (something to
            multithread, when these 5 websites will be loaded not one after
            another, but by using different sockets) scraper.
            >
            That's exactly what curl_multi_* does.
            In addition I would like to display the results as soon as it will be
            scraped. So when first http-post get answer, it will show the result
            and wait for the rest of the pages (not display everything when all
            scraping is done).
            Any ideas how can I achieve it? Thanks!
            >
            This is not a trivial bit of coding. It's not impossible but since you
            seem to be relying on cut-and-paste coding, do you think you're
            overstretching your abilities?

            C.

            Comment

            • R. Rajesh Jeba Anbiah

              #7
              Re: HTTP-POST simultaneous requests

              On Oct 5, 7:44 am, Jerry Stuckle <jstuck...@attg lobal.netwrote:
              Manuel Lemos wrote:
              <snip>
              >
              This class can do exactly what you describe:
              >>
              This other class also uses separate HTTP requests to run multiple
              parallel tasks but these are started from the browser side using AJAX
              requests:
              >>
              Why don't you tell him that's your own site you're spamming again, Manuel?
              >
              And those are your own classes (which, BTW, aren't worth a damn) you're
              spamming?
              What's your solution? Do you have better approach?

              --
              <?php echo 'Just another PHP saint'; ?>
              Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/

              Comment

              • Jerry Stuckle

                #8
                Re: HTTP-POST simultaneous requests

                R. Rajesh Jeba Anbiah wrote:
                On Oct 5, 7:44 am, Jerry Stuckle <jstuck...@attg lobal.netwrote:
                >Manuel Lemos wrote:
                <snip>
                >>This class can do exactly what you describe:
                >>http://www.phpclasses.org/thread
                >>This other class also uses separate HTTP requests to run multiple
                >>parallel tasks but these are started from the browser side using AJAX
                >>requests:
                >>http://www.phpclasses.org/phpthreader
                >Why don't you tell him that's your own site you're spamming again, Manuel?
                >>
                >And those are your own classes (which, BTW, aren't worth a damn) you're
                >spamming?
                >
                What's your solution? Do you have better approach?
                >
                --
                <?php echo 'Just another PHP saint'; ?>
                Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/
                >
                Yes, curl_multi_exec (), as Iván indicated.

                Manuel is just a spammer - virtually every answer he posts refers to
                something on his site. And he doesn't even indicate it's his own site
                when he spams it.

                Now I wouldn't mind if he were giving good technical advice. But I've
                looked at some of his scripts. I've seen relatively new PHP programmers
                do better.

                --
                =============== ===
                Remove the "x" from my email address
                Jerry Stuckle
                JDS Computer Training Corp.
                jstucklex@attgl obal.net
                =============== ===

                Comment

                • Michael Fesser

                  #9
                  Re: HTTP-POST simultaneous requests

                  ..oO(Jerry Stuckle)
                  >R. Rajesh Jeba Anbiah wrote:
                  >On Oct 5, 7:44 am, Jerry Stuckle <jstuck...@attg lobal.netwrote:
                  >>>
                  >>And those are your own classes (which, BTW, aren't worth a damn) you're
                  >>spamming?
                  >>
                  > What's your solution? Do you have better approach?
                  >>
                  >
                  >Yes, curl_multi_exec (), as Iván indicated.
                  >
                  >Manuel is just a spammer
                  Wrong.
                  >virtually every answer he posts refers to
                  >something on his site.
                  Nothing wrong with that. I would also point to my own classes to solve a
                  given problem if they would be freely available.
                  >And he doesn't even indicate it's his own site
                  >when he spams it.
                  Not necessary.

                  It would be spam if it would be totally OT, but he posts ready-to-use
                  solutions to PHP problems. It doesn't matter if these solutions are his
                  own or not. Even if they would be commercial, it wouldn't be spam in the
                  given context.
                  >Now I wouldn't mind if he were giving good technical advice. But I've
                  >looked at some of his scripts.
                  Some. But surely not all. They might not fit your coding standards, but
                  this doesn't give you the right to discredit them on every chance you
                  get. If you have a problem with them, come to the point and post exactly
                  what you don't like. And _prove_ it by posting code samples.
                  >I've seen relatively new PHP programmers
                  >do better.
                  If you don't like his solutions, post better ones or simply ignore him.
                  It's always good to have a choice between various ways to solve a
                  problem. He's contributing to the community by posting alternatives.

                  You OTOH are just trolling by attacking him personally on each and every
                  post. This sucks.

                  Enough is enough! >:-(

                  Micha

                  Comment

                  • salmobytes

                    #10
                    Re: HTTP-POST simultaneous requests

                    Jerry Stuckle wrote:
                    Manuel Lemos wrote:
                    Jerry Stuckle has a personality problem.
                    He seems to live on comp.lang.php like rat addicted to the cocaine
                    lever in a laboratory cage. He seems to do nothing else. Does his
                    employer know how much time he spends insulting people, complaining,
                    posturing? He seems to be a competent hacker. But also a lonely,
                    friendless, nasty dispositioned jerk.

                    Manuel Lemos is a mature, cosiderate and helpful guy by comparison.

                    Comment

                    • Jerry Stuckle

                      #11
                      Re: HTTP-POST simultaneous requests

                      salmobytes wrote:
                      Jerry Stuckle wrote:
                      >Manuel Lemos wrote:
                      >
                      Jerry Stuckle has a personality problem.
                      He seems to live on comp.lang.php like rat addicted to the cocaine
                      lever in a laboratory cage. He seems to do nothing else. Does his
                      employer know how much time he spends insulting people, complaining,
                      posturing? He seems to be a competent hacker. But also a lonely,
                      friendless, nasty dispositioned jerk.
                      >
                      Manuel Lemos is a mature, cosiderate and helpful guy by comparison.
                      >
                      ROFLMAO!

                      FYI, I am my own employer - an independent consultant. And I suspect I
                      make a lot more than most of the people in this newsgroup.

                      No, I don't "live" here. But I check in a few times during the day,
                      usually when I need to take a break from coding.

                      As for Manuel - "mature" people don't need to spam their websites at
                      every opportunity. When was the last time you saw him give advice which
                      wasn't on his website? Not very often.

                      OTOH, I never refer to my website for solutions. Many here don't even
                      know what it is (which is fine with me).

                      --
                      =============== ===
                      Remove the "x" from my email address
                      Jerry Stuckle
                      JDS Computer Training Corp.
                      jstucklex@attgl obal.net
                      =============== ===

                      Comment

                      • Jerry Stuckle

                        #12
                        Re: HTTP-POST simultaneous requests

                        Michael Fesser wrote:
                        .oO(Jerry Stuckle)
                        >
                        >R. Rajesh Jeba Anbiah wrote:
                        >>On Oct 5, 7:44 am, Jerry Stuckle <jstuck...@attg lobal.netwrote:
                        >>>And those are your own classes (which, BTW, aren't worth a damn) you're
                        >>>spamming?
                        >> What's your solution? Do you have better approach?
                        >>>
                        >Yes, curl_multi_exec (), as Iván indicated.
                        >>
                        >Manuel is just a spammer
                        >
                        Wrong.
                        >
                        >virtually every answer he posts refers to
                        >something on his site.
                        >
                        Nothing wrong with that. I would also point to my own classes to solve a
                        given problem if they would be freely available.
                        >
                        >And he doesn't even indicate it's his own site
                        >when he spams it.
                        >
                        Not necessary.
                        >
                        It would be spam if it would be totally OT, but he posts ready-to-use
                        solutions to PHP problems. It doesn't matter if these solutions are his
                        own or not. Even if they would be commercial, it wouldn't be spam in the
                        given context.
                        >
                        >Now I wouldn't mind if he were giving good technical advice. But I've
                        >looked at some of his scripts.
                        >
                        Some. But surely not all. They might not fit your coding standards, but
                        this doesn't give you the right to discredit them on every chance you
                        get. If you have a problem with them, come to the point and post exactly
                        what you don't like. And _prove_ it by posting code samples.
                        >
                        >I've seen relatively new PHP programmers
                        >do better.
                        >
                        If you don't like his solutions, post better ones or simply ignore him.
                        It's always good to have a choice between various ways to solve a
                        problem. He's contributing to the community by posting alternatives.
                        >
                        You OTOH are just trolling by attacking him personally on each and every
                        post. This sucks.
                        >
                        Enough is enough! >:-(
                        >
                        Micha
                        >

                        Sorry, Micha, as much as I respect you, I have to disagree. How many
                        posts has Manuel made which had solutions - other than saying "see this
                        website" - and not telling people it is his?

                        I don't spam my website - because its contents is not germane to this
                        newsgroup. I do sometimes refer people to other websites. But at NO
                        time have I ever referred anyone to a site where I have a pecuniary
                        interest. And if I did, I'd at least tell them it was my site.

                        And no, I haven't looked at every one of his scripts. But I know bad
                        coding when I see it. And there is no reason to inflict such garbage on
                        new PHP programmers who are trying to learn how to do things the write
                        way. It's at least worth warning them that the coding is lousy.


                        --
                        =============== ===
                        Remove the "x" from my email address
                        Jerry Stuckle
                        JDS Computer Training Corp.
                        jstucklex@attgl obal.net
                        =============== ===

                        Comment

                        • Michael Fesser

                          #13
                          Re: HTTP-POST simultaneous requests

                          ..oO(salmobytes )
                          >If you Google "Jerry Stuckle" you get quite an impressive list of link
                          >titles. Here are just a few samples:
                          >[...]
                          >
                          >...this list goes on for page after page. It's almost endless.
                          >What is it about you Jerry?
                          What does this have to do with PHP?

                          Micha

                          Comment

                          • salmobytes

                            #14
                            Re: HTTP-POST simultaneous requests

                            Jerry Stuckle wrote:
                            I'm not afraid to call a troll a troll - or a spammer a spammer.
                            I'm afraid the trolls and spammers don't like that.
                            >
                            If you go back and study the posts to this group you see there
                            are basically two groups: newbee help seekers and a small core of
                            top-knotch guys Jerry accepts because of their expertise, or perhaps
                            because he's afraid to attack them.

                            Every new comer who isn't a supplicating, hat-in-hand beginner
                            immediately gets attacked by Jerry and then disappears, all too
                            often never to be heard from again. You do this group a great
                            disservice. You're drying it up, shrinking it down into your own
                            personal, wrinkled, fascist soap box forum.

                            Comment

                            • Jerry Stuckle

                              #15
                              Re: HTTP-POST simultaneous requests

                              salmobytes wrote:
                              Jerry Stuckle wrote:
                              >
                              >I'm not afraid to call a troll a troll - or a spammer a spammer.
                              >I'm afraid the trolls and spammers don't like that.
                              >>
                              >
                              If you go back and study the posts to this group you see there
                              are basically two groups: newbee help seekers and a small core of
                              top-knotch guys Jerry accepts because of their expertise, or perhaps
                              because he's afraid to attack them.
                              >
                              Every new comer who isn't a supplicating, hat-in-hand beginner
                              immediately gets attacked by Jerry and then disappears, all too
                              often never to be heard from again. You do this group a great
                              disservice. You're drying it up, shrinking it down into your own
                              personal, wrinkled, fascist soap box forum.
                              >
                              Wrong. I answer a lot of newbie questions. It's the spammers and
                              trolls I can't stand.

                              And Manuel is not a "newbie" in this newsgroup. He's spammed it many
                              times before.


                              --
                              =============== ===
                              Remove the "x" from my email address
                              Jerry Stuckle
                              JDS Computer Training Corp.
                              jstucklex@attgl obal.net
                              =============== ===

                              Comment

                              Working...