Parsing pricerunner.com results via regular expression.

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • ojsimon
    New Member
    • May 2007
    • 59

    Parsing pricerunner.com results via regular expression.

    hi have been trying to write a regular expression in php that will get the price of any product page at pricerunner.com , if you could suggest a regular expression i would be very gratefull.
    thnaks
  • pbmods
    Recognized Expert Expert
    • Apr 2007
    • 5821

    #2
    Changed forum title to better match contents.

    Heya, ojsimon. Welcome to TSDN!

    First thing to do is to write the script to connect to PriceRunner and grab the results page. Then all you need to do is examine the results, locate the element that contains the price and program a [regex] search for it.

    Once you get to that point, let us know if you have any further problems.

    Comment

    • ojsimon
      New Member
      • May 2007
      • 59

      #3
      I have been trying to write a regular expression to do this i have done the other things this is where my problem lies

      Comment

      • pbmods
        Recognized Expert Expert
        • Apr 2007
        • 5821

        #4
        Originally posted by ojsimon
        I have been trying to write a regular expression to do this i have done the other things this is where my problem lies
        Post a snippet of the pricerunner.com stream that your script needs to parse so we can see how the regular expression needs to be structured.

        Comment

        • ojsimon
          New Member
          • May 2007
          • 59

          #5
          mpare prices : Nokia N93
          Talk time: 5, standby time: 240, Camera: Yes, Integrated, 180 gram, WAP, GPRS, MP3 More product info
          Price range:
          £405.38 - £409.99

          from here i need the price range.
          Thanks

          Comment

          • pbmods
            Recognized Expert Expert
            • Apr 2007
            • 5821

            #6
            Originally posted by ojsimon
            mpare prices : Nokia N93
            Talk time: 5, standby time: 240, Camera: Yes, Integrated, 180 gram, WAP, GPRS, MP3 More product info
            Price range:
            £405.38 - £409.99
            Is pricerunner sending your script plain text like that, or are you receiving HTML or an RSS feed?

            Comment

            • ojsimon
              New Member
              • May 2007
              • 59

              #7
              html, and i want it to work for all pricerunner product pages
              Thanks

              Comment

              • pbmods
                Recognized Expert Expert
                • Apr 2007
                • 5821

                #8
                Originally posted by ojsimon
                html, and i want it to work for all pricerunner product pages
                Thanks
                All you have to do is find the HTML tags that contain the data you need, then use create a backreferences to capture the values you need.

                So for example, if your data were located here:
                Originally posted by Pricerunner.com Feed
                [code=html]
                <div>Price Range:</div>£405.38 - £409.99
                [/code]
                [code=regexp]
                /(?<=<div>Price Range:<\/div>)£(\d+\.\d{ 2})\s-\s£(\d+\.\d{2})/
                [/code]

                Run that through preg_match, and your match array will be:
                Code:
                array
                (
                    [0] => £405.38 - £409.99
                    [1] => 405.38
                    [2] => 409.99
                )

                Comment

                • PriceRunnerUS
                  New Member
                  • May 2007
                  • 1

                  #9
                  Hi There

                  I'm from PriceRunner.

                  You can just access our API and get everything back in XML format. That would be much easier for you and we would not have the load on our server :)

                  Send me a mail and I will ensure that you get going.

                  Best
                  Martin Andersen
                  GM, PriceRunner.com

                  Comment

                  • ojsimon
                    New Member
                    • May 2007
                    • 59

                    #10
                    Sorry such a late reply but how do i use
                    /(?<=<div>Price Range:<\/div>)£(\d+\.\d{ 2})\s-\s£(\d+\.\d{2})/
                    in order to get the price i don't understand how you put this in a preg match and replace. and what i am doing at the moment is a simple php get source command is that ok.
                    Thanks

                    Comment

                    • pbmods
                      Recognized Expert Expert
                      • Apr 2007
                      • 5821

                      #11
                      Heya, ojsimon.

                      Originally posted by ojsimon
                      Sorry such a late reply but how do i use ... in order to get the price i don't understand how you put this in a preg match and replace. and what i am doing at the moment is a simple php get source command is that ok.
                      The regex uses lookbehind and lookahead to match (but not include) the block that contains the data you want.

                      But as PriceRunnerUS mentioned, there is an API for retrieving the info you're looking for.

                      Comment

                      • ojsimon
                        New Member
                        • May 2007
                        • 59

                        #12
                        i cannot find an api for the uk version of pricerunner, sorry, but could you please show me how to put it into the preg match and preg replace, i still do not understand this despite quite a lot of research.
                        Thanks

                        Comment

                        • pbmods
                          Recognized Expert Expert
                          • Apr 2007
                          • 5821

                          #13
                          Heya, ojsimon.

                          It looks like to get access to their API, you must first become a partner:


                          Not sure if that means that you have to give them money. I sent a PM to PriceRunnerUS and asked him to provide more details. We'll see what happens.

                          The search results page looks a little tricky to parse, but it looks like every price is listed like this:

                          [code=html]<span class="listpric e">£184.99</span>[/code]

                          So you need to grab the '184.99' inside of that SPAN. To do that, you must preg_match_all( ) using a lookbehind:

                          [code=php]
                          $html = file_get_conten ts('http://pricerunner.co. uk/search?q=' . $searchOrWhatev erYouCalledIt);
                          preg_match_all( '/(?<=<span class="listpric e">£)\d+\.\d {2}/', $html, $matches);
                          [/code]

                          Comment

                          • ojsimon
                            New Member
                            • May 2007
                            • 59

                            #14
                            Thanks for all your help, i tried the code you suggested and it returned a blank page, i tried to echo the $matches and $html but neither worked, as i am an absolute begginer with php i have no idea what to do could you please explain thanks again for all your help.
                            Olie

                            Comment

                            • ojsimon
                              New Member
                              • May 2007
                              • 59

                              #15
                              sorry, how do i use preg match and replace could you tell me any sites where i can learn how to use them to fulfill my request previously
                              Thanks

                              Comment

                              Working...