diff options
-rw-r--r-- | project1/proj1_s4498062/DOCUMENTATION.md | 135 | ||||
-rw-r--r-- | project1/proj1_s4498062/Makefile | 4 | ||||
l---------[-rw-r--r--] | project1/proj1_s4498062/documentation.txt | 6 | ||||
-rw-r--r-- | project1/proj1_s4498062/webpy.ini.example | 11 |
4 files changed, 149 insertions, 7 deletions
diff --git a/project1/proj1_s4498062/DOCUMENTATION.md b/project1/proj1_s4498062/DOCUMENTATION.md new file mode 100644 index 0000000..6c3a96c --- /dev/null +++ b/project1/proj1_s4498062/DOCUMENTATION.md @@ -0,0 +1,135 @@ +# HTTP 1.1 Web Server + +This is an implementation overview of the very basic HTTP 1.1 server provided. + +## Configuration +Configuration is done in a configuration file. `webhttp.config` is a wrapper +for Python's `SafeConfigParser`. The default configuration file is +`~/.webpy.ini`. An example has been given in `webpy.ini.example`. It should be +self-explanatory. There are two settings that may require explanation: + + * `index` is similar to Apache's `DirectoryIndex`; + * the `error`*`xxx`* settings point to files relative to the root content + directory that hold files that should be served in case of errors. + +Most settings can be overridden by command line flags and have sensible +defaults. + +It is not necessary to use a configuration file. If not provided, the default +settings are: + + * Hostname: localhost + * Port: 8001 + * Timeout: 15s + * Maximum number of connections: 1000 + +## Logging +Logging is done using `logging`. `webhttp.weblogging` is a wrapper for this +library that inserts the right name of the logger (currently `webhttp`). + +## Parsing +[RFC 2616][rfc2616] gives a context free grammar specification of HTTP requests +and responses. To parse these elements appropriately, the CFG has been +translated directly to regular expressions, in `webhttp.regexes`. This leads to +bulky (and possibly inefficient) regexes, but on the other hand produces +maintainable code. + +## Conenection handling +A `webhttp.server.Server` object opens a listening socket on the specified port +and hostname. Whenever a connection is requested, a `ConnectionHandler` object +is created. Extending `threading.Thread`, this class will run in a separate +thread, allowing simultaneous connections. + +The `ConnectionHandler` will read data and feed it, 4096 bytes at a time, to a +`parser.RequestParser`. This class uses a buffer to hold unfinished requests, +because a request's size may exceed 4096 bytes. Since the `RequestParser` keeps +reading after yielding the first request, persistent connections are implicitly +supported (more below). + +For every request that the `RequestParser` yields, a +`composer.ResponseComposer` is used to compose the appropriate response. This +response is then sent back to the client in the `ConnectionHandler`. + +### Persistent connections +As mentioned above, the *handling* of requests in persistent connections is +supported implicitly. However, this does not hold for *closing* persistent +connections. In accordance with [RFC 2616][rfc2616], persistent connections are +closed in one of the following situation: + + * When the configured timeout is exceeded while waiting for a new request, the + connection will be closed and a debug level log message is produced. + * When the composed response has a `Connection: close` header, the connection + will be closed directly after sending that response. + +When a client requests closure of a persistent connection through sending the +`Connection: close` header, the response will always include this header as +well. This situation is therefore covered by the second bullet point. + +## Serving GET requests +As explained above, the `ResponseComposer` is responsible for building +responses for requests. It does this through three key methods. + + * `compose_response` tackles directory traversal attacks by directly refusing + to handle URIs that contain `..`. If this check is passed, a new `Process` + is created in which `serve` (see below) is handled. This is necessary + because `serve` uses methods that could theoretically time-out, for example + when handling excessively large files. + + * `serve` implements most of the logic. It creates a `Resource` object for the + URI requested, handles ETag-related headers (see below), sets the + `Content-Type` header as needed, handles encoding (see below), sets the + `Connection` header as needed and, perhaps most importantly does error + handling. If all goes well, it returns a `Response` which is then returned + from `compose_response`. + + * `serve_error` is a `serve` wrapper that serves the error page for a + particular HTTP status code (see Configuration above). + +### Status codes + +200 for successful requests, 404 if the requested resource could not be found, +403 if the user running the server doesn't have permission to read the +resource, or if a directory without index file has been requested. + +## Caching using ETags +ETags are properties of resources and are therefore generated by the `Resource` +class, method `generate_etag`. This uses the md5 hash of the result of +`os.stat()` on the file requested. We don't need a cryptographically secure +hash for this: only collision resistance is in some way relevant, but clients +that are afraid about collisions can always simply not send conditional +requests. + +`Resource` also has an `etag_match` method that checks if a given ETag list +matches the ETag of the resource. + +### Status codes +304 if the ETag matched and the cache is used. + +## Encoding +Encoding is done through the `Resource` class. An optional argument `encoding` +has been given to the `get_content` method (default: `identity`). Internally, +another module, `encodings` is used. This module has functions to convert the +internal representation of an encoding to a string and vice versa, and to +encode and decode strings using some encoding. Currently, only `gzip` and +`identity` are supported. + +### Status codes +406 if only unknown encodings have been requested. + +## Acknowledgements +The following Python modules have been used, in alphabetical order: + + * `configparser`, in `webhttp.config`, for parsing the ini configuration file + * `gzip`, in `webhttp.encodings`, for gzip encoding + * `hashlib`, in `webhttp.resource`, for ETag generation + * `mimetypes`, in `webhttp.resource`, for guessing content type and encoding + * `multiprocessing`, in `webhttp.composer`, for timing out the `serve` call + * `StringIO`, in `webhttp.encodings`, for gzip encoding + * `urlparse`, in `webhttp.resource`, for parsing URIs + +In addition to these, the following fairly standard libraries were used: +`argparse`, `binascii`, `itertools`, `logging`, `os`, `re`, `socket`, `sys`, +`threading`, `time`, `unittest`. + +[rfc2616]: http://tools.ietf.org/html/rfc2616 + diff --git a/project1/proj1_s4498062/Makefile b/project1/proj1_s4498062/Makefile index 39c4da7..a6c4261 100644 --- a/project1/proj1_s4498062/Makefile +++ b/project1/proj1_s4498062/Makefile @@ -18,7 +18,7 @@ test: webtests.py webhttp/*.py python $< clean: - rm -vf webserver webserver.c + rm -rvf webserver webserver.c **/*.pyc **/__pycache__ -.PHONY: clean all run +.PHONY: clean all run test diff --git a/project1/proj1_s4498062/documentation.txt b/project1/proj1_s4498062/documentation.txt index 630fc88..fc84f4f 100644..120000 --- a/project1/proj1_s4498062/documentation.txt +++ b/project1/proj1_s4498062/documentation.txt @@ -1,5 +1 @@ -Don't forget to document your implementation:
--> language + external libraries used (if any)
--> control flow with headers/status codes considered for each requirement (GET, persistent connections, ETag, encoding)
--> concurrency, hashing, resource encoding
--> challenges (if any)
\ No newline at end of file +DOCUMENTATION.md
\ No newline at end of file diff --git a/project1/proj1_s4498062/webpy.ini.example b/project1/proj1_s4498062/webpy.ini.example new file mode 100644 index 0000000..52ea962 --- /dev/null +++ b/project1/proj1_s4498062/webpy.ini.example @@ -0,0 +1,11 @@ +[webhttp] +hostname= +port=8001 +timeout=15 +max_connections=1000 + +root=content +index=index.html + +error404=/error/404.html +error403=/error/403.html |