I would like to know how to scrape Google SERP in big project.
What I have:
- Website application writen in PHP with simple user management
- cUrl script which for 9 phrases scrape SERP every hour and fetch top 100 domains for each phrase saving them to db. It does 9(phrases)*10(p ages)*24(hours) = 2160 requests per 24 hours. More accurately: at 10:00 1 request per max 6 seconds (3-6 sec freeze), finish cron process and wait for 11:00 and repeat it every hour. It uses cron. It worked good for last month and I didn't get banned and I believe it's very maximum what I can get before Google send me to hell.
What I want:
Expand concept of my cUrl script to multiple users.
Scenario: each user could get top 100 domains for each phrase he want per 1 hour. For example: user A set 3 phrases to analysis which gives 30 requests per hour for this user and user B set 8 phrases to analysis which gives overall 110 requests per hour which is impossible to handle that amout of requests for 1 IP without being punished by Google. I probably need to set up 1 proxy server for each user to get 1 unique IP on which I can get maximum 9 requests per hour. BUT EVEN THEN it's just bad to provide user to analyze only 9 phrases every hour.
The only desperate idea I see for now is to buy a lot of hostings with cron and somehow "assign" for example 10 proxy servers (each with unique IP) to each hosting. Let's say I've 1000 users so I need 100 hostings with cron and 10 proxies for each hosting so it gives me 1000 unique IPs for 1000 users. Each hosting would use the same database where user settings will be kept. This db would give info about SERP scraping for each user (UserID, PhraseToGoogleS earch, DomainToFilterI nSERP, SERPLanguage) for each cron process. Basically: 10 users = 10 proxy servers = 1 hosting. How about that?
It would be nice to give every user possibility to expand his 9 analysis (9 phrase|domain SERP scrape for 1 cron process) per hour for example to 80 or even 800 regardless of costs. I want to know how to implement my suggested user <=> IP idea or any solution that would be better. I have never had any experience with proxies and managing IPs.
What I have:
- Website application writen in PHP with simple user management
- cUrl script which for 9 phrases scrape SERP every hour and fetch top 100 domains for each phrase saving them to db. It does 9(phrases)*10(p ages)*24(hours) = 2160 requests per 24 hours. More accurately: at 10:00 1 request per max 6 seconds (3-6 sec freeze), finish cron process and wait for 11:00 and repeat it every hour. It uses cron. It worked good for last month and I didn't get banned and I believe it's very maximum what I can get before Google send me to hell.
What I want:
Expand concept of my cUrl script to multiple users.
Scenario: each user could get top 100 domains for each phrase he want per 1 hour. For example: user A set 3 phrases to analysis which gives 30 requests per hour for this user and user B set 8 phrases to analysis which gives overall 110 requests per hour which is impossible to handle that amout of requests for 1 IP without being punished by Google. I probably need to set up 1 proxy server for each user to get 1 unique IP on which I can get maximum 9 requests per hour. BUT EVEN THEN it's just bad to provide user to analyze only 9 phrases every hour.
The only desperate idea I see for now is to buy a lot of hostings with cron and somehow "assign" for example 10 proxy servers (each with unique IP) to each hosting. Let's say I've 1000 users so I need 100 hostings with cron and 10 proxies for each hosting so it gives me 1000 unique IPs for 1000 users. Each hosting would use the same database where user settings will be kept. This db would give info about SERP scraping for each user (UserID, PhraseToGoogleS earch, DomainToFilterI nSERP, SERPLanguage) for each cron process. Basically: 10 users = 10 proxy servers = 1 hosting. How about that?
It would be nice to give every user possibility to expand his 9 analysis (9 phrase|domain SERP scrape for 1 cron process) per hour for example to 80 or even 800 regardless of costs. I want to know how to implement my suggested user <=> IP idea or any solution that would be better. I have never had any experience with proxies and managing IPs.
Comment