Posts requests for a research article

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • ricowyder
    New Member
    • Jul 2009
    • 10

    Posts requests for a research article

    Hi,

    I'm not a programmer and I am doing a research for a non-profit report
    about start-ups and branding. I want to digest some statistics. For
    this I have a list of about 1 thousand names I want to check in the
    trademark registry. This check can be done on this page:



    Currently, I have to type by hand the name into the formula field 4
    (next to "Mark"). A pop-up window comes up and I then see, whether or
    not this name is registered.

    I would be already happy if I can provide the list an get 1000 pop-up
    windows to check and close again. Even better would be to have an
    output list telling me whether or not the brand is registered.

    Is there anyone who could help me with this endeavour?

    Acknoledgements will surely be publicly displayed.

    Thank you so much for answering and giving me positive/negative
    advice.

    Cheers

    Rico
  • acoder
    Recognized Expert MVP
    • Nov 2006
    • 16032

    #2
    Which language are you using? You could use Curl (see link) to target the form action page, get the response and parse it for each name. It'd be easier if the site offered a simpler interface to get the results more easily. There's no need to open 1,000 pop up windows.

    Comment

    • ricowyder
      New Member
      • Jul 2009
      • 10

      #3
      Thanks a lot. Unfortunately I cannot change the interface as it's an official site of the trademark registry. Curl would be ok, if you could help me.

      Comment

      • acoder
        Recognized Expert MVP
        • Nov 2006
        • 16032

        #4
        Either use it on the command line, or if you know one of the supported languages (30 or so of them), use them instead. Which one can you program in?

        Comment

        • ricowyder
          New Member
          • Jul 2009
          • 10

          #5
          Hi acoder. Sorry to bother you. As I wrote at the beginning, I am not a programmer. I know some css, xml.

          Comment

          • acoder
            Recognized Expert MVP
            • Nov 2006
            • 16032

            #6
            Ah, sorry, I missed that. I'm afraid that this is going to be difficult without some programming. What you can do is look at the cURL command line options (manual) and see if it's possible to make multiple requests. There's probably a tool out there to do this for you, but I'm not aware of any. Having said that, it shouldn't be difficult for someone to program one.

            Comment

            • ricowyder
              New Member
              • Jul 2009
              • 10

              #7
              Thanks a lot for your help!

              Comment

              • Markus
                Recognized Expert Expert
                • Jun 2007
                • 6092

                #8
                If you could specify what on the results page deems the 'mark' branded, then I could help you pull the data (using PHP).

                I can POST the data to the landing page, and retrieve the results, but to parse the results I need to know what you want from them; the 'Verbal Elements' possibly?

                Comment

                • ricowyder
                  New Member
                  • Jul 2009
                  • 10

                  #9
                  Hi Markus

                  Thank you so much for helping me.

                  The results page (a pop-up window) will list a part called "Search Summary":

                  Search Summary
                  MAR/supertext: 0 occurrences in 0 records.

                  So MAR means "Mark" (field) "/" followed by searched name "supertext"

                  My interest ist "name of input" and number of occurences and number of records.

                  Let me know if you need more information. In what format would you need the list of names?

                  Comment

                  • Markus
                    Recognized Expert Expert
                    • Jun 2007
                    • 6092

                    #10
                    Originally posted by ricowyder
                    Hi Markus

                    Thank you so much for helping me.

                    The results page (a pop-up window) will list a part called "Search Summary":

                    Search Summary
                    MAR/supertext: 0 occurrences in 0 records.

                    So MAR means "Mark" (field) "/" followed by searched name "supertext"

                    My interest ist "name of input" and number of occurences and number of records.

                    Let me know if you need more information. In what format would you need the list of names?
                    Okay - so you want to retrieve the part '0 occurrences in 0 records'? We can do that.

                    The names would, for simplicity's sake, be in a .txt file with each name on a new row. But, I don't the file - you keep it - I'll just help you with the coding.

                    Mark (will BRB in 10 minutes).

                    Comment

                    • ricowyder
                      New Member
                      • Jul 2009
                      • 10

                      #11
                      Hi Mark, great. I'll put the names in a .txt one name each line. Thank you soo much!

                      Comment

                      • Markus
                        Recognized Expert Expert
                        • Jun 2007
                        • 6092

                        #12
                        You'll need PHP set up on your system - that ain't hard to do, though.

                        Anyway, here's the source (it's simple and dirty, but does the job).

                        Code:
                        <?php
                        
                        // Prevent ourselves from timing out
                        set_time_limit(0);
                        // Error reporting max
                        error_reporting(E_ALL);
                        
                        // The names that we will be searching for.
                        $names = file('/path/to/file.txt');
                        // You could it with an array, like so:
                        //$names = array(
                        //	'Mark', 'Sarah', 'Rachel',
                        //	'Chris', 'Jake', 'Josh'
                        //);
                        
                        // Create a single resource for cURL.
                        $curl = curl_init('http://www.wipo.int/cgi-mad/guest/bool_srch5?ENG+11');
                        $ref  = "http://www.wipo.int/ipdl/en/search/madrid/search-struct.jsp";
                        // Set some options that don't change.
                        curl_setopt($curl, 	CURLOPT_RETURNTRANSFER, 	1);
                        curl_setopt($curl, 	CURLOPT_POST,				1);
                        curl_setopt($curl,       CURLOPT_FOLLOWLOCATION,		1);
                        curl_setopt($curl, 	CURLOPT_REFERER,			$ref);
                        
                        // Our post array: we'll be changing only one key.
                        $post = array(
                        	'ADDITIONAL_FIELD_1' => '',
                        	'ADDITIONAL_FIELD_2' => '',
                        	'ADDITIONAL_FIELD_3' => 'IMAGE',
                        	
                        	'ADDITIONAL_FIELD_COUNT'  => 3,
                        	'ADDITIONAL_FIELD_COUNT'  => 2,
                        	'ADDITIONAL_TERM_COUNT'  => 2,
                        	
                        	'BRIEF_ELEMENT_SET' => 'HITNUM,PN,MAR',
                        	'DBSELECT2'    		=> 'MADRID-FULL.vdb',
                        	'DISPLAYCOUNT'      => 1000, // Override the default '25 per page' crap.
                        	'ELEMENT_SET'	=> 'BASICHTML-ENG',
                        	
                        	'FIELD_1'			=> 'IRN',
                        	'FIELD_2'			=> 'HOL',
                        	'FIELD_3'			=> 'REP',
                        	'FIELD_4'			=> 'MAR',
                        	'FIELD_5'			=> 'VC',
                        	'FIELD_6'			=> 'NCL',
                        	'FIELD_7'			=> 'GSE',
                        	'FIELD_8'			=> 'GSF',
                        	'FIELD_9'			=> 'GSES',
                        	'FIELD_10'			=> 'ORI',
                        	'FIELD_11'			=> 'OBA',
                        	'FIELD_12'			=> 'OBR',
                        	'FIELD_13'			=> 'DSA',
                        	
                        	'OPERATOR_1'		=> 'AND',
                        	'OPERATOR_2'		=> 'AND',
                        	'OPERATOR_3'		=> 'AND',
                        	'OPERATOR_4'		=> 'AND',
                        	'OPERATOR_5'		=> 'AND',
                        	'OPERATOR_6'		=> 'AND',
                        	'OPERATOR_7'		=> 'AND',
                        	'OPERATOR_8'		=> 'AND',
                        	'OPERATOR_9'		=> 'AND',
                        	'OPERATOR_10'	=> 'AND',
                        	'OPERATOR_11'	=> 'AND',
                        	'OPERATOR_12'	=> 'AND',
                        	
                        	'RANKTYPE'		=> 'KEY',
                        	
                        	'SEARCH_TYPE_1'	=> '',
                        	'SEARCH_TYPE_2'	=> '',
                        	'SEARCH_TYPE_3'	=> '',
                        	'SEARCH_TYPE_4'	=> '',
                        	'SEARCH_TYPE_5'	=> 'ANY',
                        	'SEARCH_TYPE_6'	=> 'ANY',
                        	'SEARCH_TYPE_7'	=> '',
                        	'SEARCH_TYPE_8'	=> '',
                        	'SEARCH_TYPE_9'	=> '',
                        	'SEARCH_TYPE_10'	=> '',
                        	'SEARCH_TYPE_11'	=> '',
                        	'SEARCH_TYPE_12'	=> '',
                        	'SEARCH_TYPE_13'	=> '',
                        	
                        	'SEPDISPLAY'		=> '',
                        	
                        	'TERM_1'			=> '',
                        	'TERM_2'			=> '',
                        	'TERM_3'			=> '',
                        	// Missing TERM_4
                        	'TERM_5'			=> '',
                        	'TERM_6'			=> '',
                        	'TERM_7'			=> '',
                        	'TERM_8'			=> '',
                        	'TERM_9'			=> '',
                        	'TERM_10'			=> '',
                        	'TERM_11'			=> '',
                        	'TERM_12'			=> '',
                        	'TERM_13'			=> '',
                        	
                        	'NUM_TERMS'		=> '13'
                        );
                        
                        $post_string = '';
                        // HACK! Icky :(
                        foreach( $post as $key => $val )
                        {
                        	$post_string .= "$key=$val&";
                        }
                        
                        // Prevent ourselves from timing out
                        set_time_limit(0);
                        // Error reporting max
                        error_reporting(E_ALL);
                        
                        $results = array();
                        // Now we process
                        foreach($names as $name) {
                        	
                        	// Inject POST into curl
                        	curl_setopt($curl, CURLOPT_POSTFIELDS, $post_string . "TERM_4=$name");
                        	// Thanks to CURLOPT_RETURNTRANSFER we can
                        	// save whatever results our query generates.
                        	$return = curl_exec($curl);
                        	
                        	preg_match('/([0-9]+ records)/', $return, $matches);
                        	
                        	$results[$name] = $matches[1];
                        }
                        
                        print_r($results);
                        
                        curl_close($curl);
                        Save it as 'curl.php' and then, in a terminal (command line) type 'php /path/to/curl.php' (substituting the /path/to/ with the actual path ;).

                        You'll receive the output inside the terminal.

                        If you'd like the output piped to a text file, change the command to this: php /path/to/curl.php > /path/to/output.txt

                        The second path does not have to exist - the terminal will create it (or overwrite it if it does exist).

                        Let me know how you get on & if you need a hand.

                        - mark.

                        Comment

                        • ricowyder
                          New Member
                          • Jul 2009
                          • 10

                          #13
                          Thanks :-) I'll do my best and will let you know...

                          Comment

                          • Markus
                            Recognized Expert Expert
                            • Jun 2007
                            • 6092

                            #14
                            Originally posted by ricowyder
                            Thanks :-) I'll do my best and will let you know...
                            I updated the above code - so make sure you are using the correct one.

                            Comment

                            • ricowyder
                              New Member
                              • Jul 2009
                              • 10

                              #15
                              Hi Markus

                              It did so:

                              php C:\curl.php > C:\new.txt

                              Prompts: command "php" is either typed wrong or could not be found

                              Do I need something else to run this except the cmd?

                              Comment

                              Working...