# HTTP 1.1 Web Server

This is an implementation overview of the very basic HTTP 1.1 server provided.

## Configuration
Configuration is done in a configuration file. `webhttp.config` is a wrapper
for Python's `SafeConfigParser`. The default configuration file is
`~/.webpy.ini`. An example has been given in `webpy.ini.example`. It should be
self-explanatory. There are two settings that may require explanation:

 * `index` is similar to Apache's `DirectoryIndex`;
 * the `error`*`xxx`* settings point to files relative to the root content
   directory that hold files that should be served in case of errors.

Most settings can be overridden by command line flags and have sensible
defaults.

It is not necessary to use a configuration file. If not provided, the default
settings are:

 * Hostname: localhost
 * Port: 8001
 * Timeout: 15s
 * Maximum number of connections: 1000

## Logging
Logging is done using `logging`. `webhttp.weblogging` is a wrapper for this
library that inserts the right name of the logger (currently `webhttp`).

## Parsing
[RFC 2616][rfc2616] gives a context free grammar specification of HTTP requests
and responses. To parse these elements appropriately, the CFG has been
translated directly to regular expressions, in `webhttp.regexes`. This leads to
bulky (and possibly inefficient) regexes, but on the other hand produces
maintainable code.

## Conenection handling
A `webhttp.server.Server` object opens a listening socket on the specified port
and hostname. Whenever a connection is requested, a `ConnectionHandler` object
is created. Extending `threading.Thread`, this class will run in a separate
thread, allowing simultaneous connections.

The `ConnectionHandler` will read data and feed it, 4096 bytes at a time, to a
`parser.RequestParser`. This class uses a buffer to hold unfinished requests,
because a request's size may exceed 4096 bytes. Since the `RequestParser` keeps
reading after yielding the first request, persistent connections are implicitly
supported (more below).

For every request that the `RequestParser` yields, a
`composer.ResponseComposer` is used to compose the appropriate response. This
response is then sent back to the client in the `ConnectionHandler`.

### Persistent connections
As mentioned above, the *handling* of requests in persistent connections is
supported implicitly. However, this does not hold for *closing* persistent
connections. In accordance with [RFC 2616][rfc2616], persistent connections are
closed in one of the following situation:

 * When the configured timeout is exceeded while waiting for a new request, the
   connection will be closed and a debug level log message is produced.
 * When the composed response has a `Connection: close` header, the connection
   will be closed directly after sending that response.

When a client requests closure of a persistent connection through sending the
`Connection: close` header, the response will always include this header as
well. This situation is therefore covered by the second bullet point.

## Serving GET requests
As explained above, the `ResponseComposer` is responsible for building
responses for requests. It does this through three key methods.

 * `compose_response` tackles directory traversal attacks by directly refusing
   to handle URIs that contain `..`. If this check is passed, a new `Process`
   is created in which `serve` (see below) is handled. This is necessary
   because `serve` uses methods that could theoretically time-out, for example
   when handling excessively large files.

 * `serve` implements most of the logic. It creates a `Resource` object for the
   URI requested, handles ETag-related headers (see below), sets the
   `Content-Type` header as needed, handles encoding (see below), sets the
   `Connection` header as needed and, perhaps most importantly does error
   handling. If all goes well, it returns a `Response` which is then returned
   from `compose_response`.

 * `serve_error` is a `serve` wrapper that serves the error page for a
   particular HTTP status code (see Configuration above).

### Status codes

200 for successful requests, 404 if the requested resource could not be found,
403 if the user running the server doesn't have permission to read the
resource, or if a directory without index file has been requested.

## Caching using ETags
ETags are properties of resources and are therefore generated by the `Resource`
class, method `generate_etag`. This uses the md5 hash of the result of
`os.stat()` on the file requested. We don't need a cryptographically secure
hash for this: only collision resistance is in some way relevant, but clients
that are afraid about collisions can always simply not send conditional
requests.

`Resource` also has an `etag_match` method that checks if a given ETag list
matches the ETag of the resource.

### Status codes
304 if the ETag matched and the cache is used.

## Encoding
Encoding is done through the `Resource` class. An optional argument `encoding`
has been given to the `get_content` method (default: `identity`). Internally,
another module, `encodings` is used. This module has functions to convert the
internal representation of an encoding to a string and vice versa, and to
encode and decode strings using some encoding. Currently, only `gzip` and
`identity` are supported.

### Status codes
406 if only unknown encodings have been requested.

## Acknowledgements
The following Python modules have been used, in alphabetical order:

 * `configparser`, in `webhttp.config`, for parsing the ini configuration file
 * `gzip`, in `webhttp.encodings`, for gzip encoding
 * `hashlib`, in `webhttp.resource`, for ETag generation
 * `mimetypes`, in `webhttp.resource`, for guessing content type and encoding
 * `multiprocessing`, in `webhttp.composer`, for timing out the `serve` call
 * `StringIO`, in `webhttp.encodings`, for gzip encoding
 * `urlparse`, in `webhttp.resource`, for parsing URIs

In addition to these, the following fairly standard libraries were used:
`argparse`, `binascii`, `itertools`, `logging`, `os`, `re`, `socket`, `sys`,
`threading`, `time`, `unittest`.

[rfc2616]: http://tools.ietf.org/html/rfc2616