Regular Expression?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • abhishek1234321
    New Member
    • Jul 2010
    • 18

    Regular Expression?

    Hi... I want to match the below pattern in the following page http://results.vtu.ac.in/default.php...&submit=SUBMIT
    i tried several regular expression.. but no use

    Code:
    <TR>
    
                               <TD width="513">
    
    <B>ABHISHEK S JAIN (1rn08cs006) </B><br><br><br><br><hr><table><tr><td><b>Semester:</b></td><td><b>4</b></td><td></td><td> &nbsp;&nbsp;&nbsp;&nbsp;<b> Result:&nbsp;&nbsp;FIRST CLASS WITH DISTINCTION </b></td></tr></table><hr><table><tr><td width=250>Subject</td><td width=60 align=center>External </td><td width=60 align=center>Internal</td><td align=center width=60>Total</td><td align=center width=60>Result</td></tr><br><tr><td width=250><i>Engineering Mathematics - IV (06MAT41)</i></td><td width=60 align=center>86</td><td width=60 align=center>25</td><td width=60 align=center>111</td><td  width=60 align=center><b>P</b></td></tr><tr><td width=250><i>Graph Theory and Combinatorics (06CS42)</i></td><td width=60 align=center>75</td><td width=60 align=center>20</td><td width=60 align=center>95</td><td width=60 align=center><b>P</b></td></tr><tr><td width=250><i>Analysis and Design of Algorithms (06CS43)</i></td><td width=60 align=center>70</td><td width=60 align=center>20</td><td width=60 align=center>90</td><td width=60 align=center><b>P</b></td></tr><tr><td width=250><i>Object Oriented Programming with C++ (06CS44)</i></td><td width=60 align=center>49</td><td width=60 align=center>22</td><td width=60 align=center>71</td><td width=60 align=center><b>P</b></td></tr><tr><td width=250><i>Microprocessors (06CS45)</i></td><td width=60 align=center>66</td><td width=60 align=center>22</td><td width=60 align=center>88</td><td width=60 align=center><b>P</b></td></tr><tr><td width=250><i>Computer Organization (06CS46)</i></td><td width=60 align=center>44</td><td width=60 align=center>22</td><td width=60 align=center>66</td><td width=60 align=center><b>P</b></td></tr><tr><td width=250><i>Object Oriented Programming Lab (06CSL47)</i></td><td width=60 align=center>44</td><td width=60 align=center>24</td><td width=60 align=center>68</td><td width=60 align=center><b>P</b></td></tr><tr><td width=250><i>Microprocessors Lab (06CSL48)</i></td><td width=60 align=center>42</td><td width=60 align=center>20</td><td width=60 align=center>62</td><td width=60 align=center><b>P</b></td></tr></table><br><br><table><tr><td></td><td></td><td>Total Marks:</td><td> 651 &nbsp;&nbsp;&nbsp; </td></tr></table>                              </TD></TR>

    can any one help me out with this :)
  • kovik
    Recognized Expert Top Contributor
    • Jun 2007
    • 1044

    #2
    That depends. What elements of this "pattern" are static. Which parts are dynamic? and what are the restrictions you are placing on valid matches vs. invalid matches?

    Comment

    • abhishek1234321
      New Member
      • Jul 2010
      • 18

      #3
      In the coded string, all the html is static only the data between the tags is dynamic... that is it changes for different USN(University seat number) like 1rn08cs006 is mine.. :)

      the tags like <TR><TD width="513"><B> </td></tr></table></TD></TR> are static... :)

      Comment

      • kovik
        Recognized Expert Top Contributor
        • Jun 2007
        • 1044

        #4
        Then just grab the link as a static portion with that one part encoded to be dynamic.

        Code:
        preg_match_all('~(http://results.vtu.ac.in/default.php?rid=[^&"\']+&submit=SUBMIT)~s', $your_data, $results);
        print_r($results);

        Comment

        • abhishek1234321
          New Member
          • Jul 2010
          • 18

          #5
          no no.. u didn't undertstand my question... i told the html rendered is dynamic... i need to match the html rendered by that link... i've a variable say $x which stores teh html content of that link like...
          Code:
          $x=file_get_contents(THE LINK)
          now i need to match the below pattern...
          Code:
             <TR>
             
                                       <TD width="513">
            
           <B>ABHISHEK S JAIN (1rn08cs006) </B><br><br><br><br><hr><table><tr><td><b>Semester:</b></td><td><b>4</b></td><td></td><td> &nbsp;&nbsp;&nbsp;&nbsp;<b> Result:&nbsp;&nbsp;FIRST CLASS WITH DISTINCTION </b></td></tr></table><hr><table><tr><td width=250>Subject</td><td width=60 align=center>External </td><td width=60 align=center>Internal</td><td align=center width=60>Total</td><td align=center width=60>Result</td></tr><br><tr><td width=250><i>Engineering Mathematics - IV (06MAT41)</i></td><td width=60 align=center>86</td><td width=60 align=center>25</td><td width=60 align=center>111</td><td  width=60 align=center><b>P</b></td></tr><tr><td width=250><i>Graph Theory and Combinatorics (06CS42)</i></td><td width=60 align=center>75</td><td width=60 align=center>20</td><td width=60 align=center>95</td><td width=60 align=center><b>P</b></td></tr><tr><td width=250><i>Analysis and Design of Algorithms (06CS43)</i></td><td width=60 align=center>70</td><td width=60 align=center>20</td><td width=60 align=center>90</td><td width=60 align=center><b>P</b></td></tr><tr><td width=250><i>Object Oriented Programming with C++ (06CS44)</i></td><td width=60 align=center>49</td><td width=60 align=center>22</td><td width=60 align=center>71</td><td width=60 align=center><b>P</b></td></tr><tr><td width=250><i>Microprocessors (06CS45)</i></td><td width=60 align=center>66</td><td width=60 align=center>22</td><td width=60 align=center>88</td><td width=60 align=center><b>P</b></td></tr><tr><td width=250><i>Computer Organization (06CS46)</i></td><td width=60 align=center>44</td><td width=60 align=center>22</td><td width=60 align=center>66</td><td width=60 align=center><b>P</b></td></tr><tr><td width=250><i>Object Oriented Programming Lab (06CSL47)</i></td><td width=60 align=center>44</td><td width=60 align=center>24</td><td width=60 align=center>68</td><td width=60 align=center><b>P</b></td></tr><tr><td width=250><i>Microprocessors Lab (06CSL48)</i></td><td width=60 align=center>42</td><td width=60 align=center>20</td><td width=60 align=center>62</td><td width=60 align=center><b>P</b></td></tr></table><br><br><table><tr><td></td><td></td><td>Total Marks:</td><td> 651 &nbsp;&nbsp;&nbsp; </td></tr></table>                              </TD></TR>
          i need a pattern to match the above html :)

          Comment

          • kovik
            Recognized Expert Top Contributor
            • Jun 2007
            • 1044

            #6
            You need to match the HTML rendered by the link...? As in, you need to virtually click the link and get the resulting page? At no point did you even hint at that...

            If that is what you are after, you will need to utilize file_get_conten ts() or cURL.

            Comment

            • abhishek1234321
              New Member
              • Jul 2010
              • 18

              #7
              i'm sorry i didn't mention.. but after that can u tell me what my be the pattern
              Code:
              $pattern = "/<B>(.*?)(?= <\/td><\/tr><\/table>.*<\/TD><\/TR>)/im";
              i m using the above pattern right now.. but i m not able to match these lines
              Code:
               <TR>
                                            <TD width="513">

              Comment

              • kovik
                Recognized Expert Top Contributor
                • Jun 2007
                • 1044

                #8
                I'm not sure what you are trying to do, but the only reason to use multi-line mode (m) is if you are matching the beginning and end of lines. You probably want to use single-line mode (s) which makes the dot (.) match newlines, treating your data as a single line.

                Comment

                • abhishek1234321
                  New Member
                  • Jul 2010
                  • 18

                  #9
                  oh u mean to say i should use (s) and dot(.) can be used to match multi lines too..? am i correct? :).. so finally the pattern becomes
                  Code:
                  $pattern = "/<TR>(.)<TD width="513">(.)<B>(.*?)(?= <\/td><\/tr><\/table>.*<\/TD><\/TR>)/is";

                  Comment

                  • kovik
                    Recognized Expert Top Contributor
                    • Jun 2007
                    • 1044

                    #10
                    Haha, not exactly. A dot on it's own only matches a single character. It's the same as the usage of dots in your first regex.

                    Comment

                    • abhishek1234321
                      New Member
                      • Jul 2010
                      • 18

                      #11
                      can u please provide me with the final regex? :)

                      Comment

                      • abhishek1234321
                        New Member
                        • Jul 2010
                        • 18

                        #12
                        hey thanks a lot man.. i got it working!!!

                        Comment

                        • kovik
                          Recognized Expert Top Contributor
                          • Jun 2007
                          • 1044

                          #13
                          No problem.

                          Comment

                          Working...