match regular expressions in a webpage, not the entire source

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • charliefortune

    match regular expressions in a webpage, not the entire source

    In order to check availability, I want to visit many different pages
    from a database and try and match the regular expressions 'out of
    stock' or 'unavailable' or 'sold out' etc. If found, the product will
    be flagged as unavailable.

    I tried using fopen() and preg_match() but the problem is, if the reg.
    expression is present in javascript or in comments, the item gets
    wrongly flagged. Is there a way of taking what appears on the users
    screen, after parsing the page, and looking at it as a string I suppose
    ?

    I feel I might be missing a concept here, can anyone help ?

  • d

    #2
    Re: match regular expressions in a webpage, not the entire source

    "charliefortune " <google@charlie fortune.com> wrote in message
    news:1137498421 .729248.212430@ o13g2000cwo.goo glegroups.com.. .[color=blue]
    > In order to check availability, I want to visit many different pages
    > from a database and try and match the regular expressions 'out of
    > stock' or 'unavailable' or 'sold out' etc. If found, the product will
    > be flagged as unavailable.
    >
    > I tried using fopen() and preg_match() but the problem is, if the reg.
    > expression is present in javascript or in comments, the item gets
    > wrongly flagged. Is there a way of taking what appears on the users
    > screen, after parsing the page, and looking at it as a string I suppose
    > ?
    >
    > I feel I might be missing a concept here, can anyone help ?[/color]

    The answer is right in front of you ;) Remove the tags from the source
    code, (first using regular expressions to remove script chunks, then using
    strip_tags to clean out the rest), and you have your document ready to be
    examined.

    dave


    Comment

    • charliefortune

      #3
      Re: match regular expressions in a webpage, not the entire source

      If I understand correctly, this won't tell me the result of executed
      scripts on the page, such as this ;

      if (index != -1) {
      if (productPrice[index] == 'SOLD OUT') {
      alert("This Product cannot be purchased at this time.");
      result=false;
      }

      This code is present in all the products, whether in stock or out, and
      I need it to run to decide whether or not the words 'SOLD OUT' are
      going to appear. I am wondering if eval() might provide the answer ? I
      am experimenting with it, with little success so far..

      Comment

      • d

        #4
        Re: match regular expressions in a webpage, not the entire source

        "charliefortune " <google@charlie fortune.com> wrote in message
        news:1137501487 .909794.145540@ g47g2000cwa.goo glegroups.com.. .[color=blue]
        > If I understand correctly, this won't tell me the result of executed
        > scripts on the page, such as this ;
        >
        > if (index != -1) {
        > if (productPrice[index] == 'SOLD OUT') {
        > alert("This Product cannot be purchased at this time.");
        > result=false;
        > }
        >
        > This code is present in all the products, whether in stock or out, and
        > I need it to run to decide whether or not the words 'SOLD OUT' are
        > going to appear. I am wondering if eval() might provide the answer ? I
        > am experimenting with it, with little success so far..[/color]

        Eval is NEVER the answer. Ugh. Seriously, if you think eval is the answer,
        you're doing something horribly wrong.

        Nothing will tell you the output of executed scripts on the page, except the
        browser in which it's running (which isn't possible in your case, as PHP is
        not a javascript-enabled web browser). If you want to see the output,
        follow the input, and make your own deductions from that :) Could you show
        me the complete block of javascript? Maybe I can help you.

        "Winners don't do eval" :-P

        dave


        Comment

        • charliefortune

          #5
          Re: match regular expressions in a webpage, not the entire source

          Here is an example of one of the pages of a sold out item

          http://www.subsidesports.com/uk/stor...ist.jsp?id=402,

          and here is an in-stock one

          http://www.subsidesports.com/uk/stor...ist.jsp?id=402,

          Looking through, it seems that the javaScript array element
          productPrice[0] contains the information I need, so I suppose the
          sensible thing would be to look at this alone to decide if an item is
          in stock. I think it is null if the product is available. So my
          question now becomes ....

          how can I test the value of this variable using PHP on the retrieved
          document ? Surely not looking for the regex

          thanks
          productPrice[0] = 'SOLD OUT' ?

          Comment

          • d

            #6
            Re: match regular expressions in a webpage, not the entire source

            "charliefortune " <google@charlie fortune.com> wrote in message
            news:1137506279 .617801.311270@ f14g2000cwb.goo glegroups.com.. .[color=blue]
            > Here is an example of one of the pages of a sold out item
            >
            > http://www.subsidesports.com/uk/stor...ist.jsp?id=402,
            >
            > and here is an in-stock one
            >
            > http://www.subsidesports.com/uk/stor...ist.jsp?id=402,
            >
            > Looking through, it seems that the javaScript array element
            > productPrice[0] contains the information I need, so I suppose the
            > sensible thing would be to look at this alone to decide if an item is
            > in stock. I think it is null if the product is available. So my
            > question now becomes ....
            >
            > how can I test the value of this variable using PHP on the retrieved
            > document ? Surely not looking for the regex
            >
            > thanks
            > productPrice[0] = 'SOLD OUT' ?[/color]

            Why don't you check to see if this text is in the document or not:

            <span style="font-size:20px; color:#000000;" >SOLD OUT&nbsp;</span>

            Surely that'll tell you if it's sold out or not, regardless of javascript.
            You don't even have to strip any tags before checking for it... :)


            Comment

            • charliefortune

              #7
              Re: match regular expressions in a webpage, not the entire source

              yes, that's it. thanks for your help.

              I am starting to think that there is no way of taking a URL and turning
              into a string that represents what the browser would output to the
              screen (without writing a browser itself). Or else there is an area of
              PHP functions that deal with this that I am unaware of.

              Thanks again.

              Ruari

              Comment

              • d

                #8
                Re: match regular expressions in a webpage, not the entire source

                "charliefortune " <google@charlie fortune.com> wrote in message
                news:1137511697 .835293.223240@ g14g2000cwa.goo glegroups.com.. .[color=blue]
                > yes, that's it. thanks for your help.
                >
                > I am starting to think that there is no way of taking a URL and turning
                > into a string that represents what the browser would output to the
                > screen (without writing a browser itself). Or else there is an area of
                > PHP functions that deal with this that I am unaware of.[/color]

                Exactly - that's what a browser is for :)
                [color=blue]
                > Thanks again.[/color]

                any time!
                [color=blue]
                > Ruari
                >[/color]


                Comment

                Working...