Instrumented web proxy

**Miki** · Mar 27 '08, 09:15 PM

Re: Instrumented web proxy

Hello Andrew,

Tiny HTTP Proxy in Python looks promising as it's nominally simple (not
many lines of code)
>

http://www.okisoft.co.jp/esc/python/proxy/

>
It does what it's supposed to, but I'm a bit at a loss as where to
intercept the traffic. I suspect it should be quite straightforward , but
I'm finding the code a bit opaque.
>
Any suggestions?

From a quick look at the code, you need to either hook to do_GET where
you have the URL (see the urlunparse line).
If you want the actual content of the page, you'll need to hook to
_read_write (data = i.recv(8192)).

HTH,
--
Miki <miki.tebeka@gm ail.com>

PythonWise

http://pythonwise.blogspot.com

If it won't be simple, it simply won't be. [Hire me, source code]

**Paul Rubin** · Mar 27 '08, 09:25 PM

Re: Instrumented web proxy

Andrew McLean <andrew-news@andros.org .ukwrites:

I would like to write a web (http) proxy which I can instrument to
automatically extract information from certain web sites as I browse
them. Specifically, I would want to process URLs that match a
particular regexp. For those URLs I would have code that parsed the
content and logged some of it.
>
Think of it as web scraping under manual control.

I've used Proxy 3 for this, a very cool program with powerful
capabilities for on the fly html rewriting.

Amit’s Web Proxy Project

http://theory.stanford.edu/~amitp/proxy.html

**Andrew McLean** · Mar 28 '08, 06:15 PM

Re: Instrumented web proxy

Paul Rubin wrote:

Andrew McLean <andrew-news@andros.org .ukwrites:

>I would like to write a web (http) proxy which I can instrument to
>automaticall y extract information from certain web sites as I browse
>them. Specifically, I would want to process URLs that match a
>particular regexp. For those URLs I would have code that parsed the
>content and logged some of it.
>>
>Think of it as web scraping under manual control.

>
I've used Proxy 3 for this, a very cool program with powerful
capabilities for on the fly html rewriting.
>
http://theory.stanford.edu/~amitp/proxy.html

This looks very useful. Unfortunately I can't seem to get it to run
under Windows (specifically Vista) using Python 1.5.2, 2.2.3 or 2.5.2.
I'll try Linux if I get a chance.

Instrumented web proxy

Instrumented web proxy

Comment

Comment

Comment