parse http header

**johny10151981** · Aug 9 '10, 08:50 PM

first separate all lines. every line ends with "\r\n"
First line is fixed. It always the same. GET/POST/DELETE/HEADER or something else now i cant recall.
Form Second line you can separate by ":"

If the request is a POST request then after getting "\r\n\r\n" you get the URI in one line.

**johny10151981** · Aug 9 '10, 08:53 PM

Your Posted header is missing one line.

Your posted header says it is a POST request and its content length is 25. But I cant see the URI. I guess your program stopped after getting "\r\n\r\n" which is not right.

**Oralloy** · Aug 9 '10, 09:24 PM

Parsing is always non-trivial. The line break information you need is described in section 2.2 of the spec, however.

Why are you writing this parser, rather than using one that's already been implemented?

**dschu012** · Aug 9 '10, 11:39 PM

@Oralloy
I was using a small http client sample code. In the sample they were just ignoring the header data and going straight to the \r\n\r\n to get the response data. I am only doing GETs and one of the URLs has a redirect which is why I needed the header data.

@johny10151981
It was only an example not real data and I am only doing GETs. Your suggestion doesn't help for multi line values.

Thanks for referring me to section 2.2. It answered my question.

**Oralloy** · Aug 10 '10, 01:37 AM

dschu012,

If you're using a very light-weight sample as your starting point, then you're going to have to do some work to parse the headers. Since you're only interested in one of the headers, you can cheat by reading the headers one line at a time and processing them. If you find the redirect, you're done, if you don't you have the content.

Pseudo code:

Code:

looking = inHeaders = true;
while (looking && inHeaders)
  read line
  if (eof)
    inHeaders = false
  else if (line = "")
    inHeaders = false
  else if (strncmp(line, "Redirect:", 9))
    ; // noOp - not found
  else
    looking = false // found the header we want
end while

if(!looking)
  process redirect
else
  process result

If you read the specification, then you have a good idea of how messy parsing the headers can be.

Rather than re-inventing the wheel to parse headers, it might be worth some time to go find a little better example to start with.

If you're not stuck with C++, there is a really good Perl module for accessing web servers.

On the other hand, if you're stuck with C++, I'd say use a combination of Lex and Yacc to really simplify the work.

Another good option would be to read the entire mess into a single buffer and parse it using regex. I'm pretty sure that it'll be fairly easy to write regular expressions to parse the headers. Start by dividing the headers from the content at the first occurance of "\r\n\r\n". Then tear the headers off of the header block one at a time using one regex, repeatedly.

Failing that, I'd write a simple state machine/recognizer for general headers. The problem is that by the time you're done implementing all the quoting forms and comments, you're going to have a rather complex bit of software.

See section 2 of the document you sent...

parse http header

parse http header

Comment

Comment

Comment

Comment

Comment