summaryrefslogtreecommitdiff
path: root/project1/proj1_s4498062
diff options
context:
space:
mode:
Diffstat (limited to 'project1/proj1_s4498062')
-rw-r--r--project1/proj1_s4498062/DOCUMENTATION.md135
-rw-r--r--project1/proj1_s4498062/Makefile4
l---------[-rw-r--r--]project1/proj1_s4498062/documentation.txt6
-rw-r--r--project1/proj1_s4498062/webpy.ini.example11
4 files changed, 149 insertions, 7 deletions
diff --git a/project1/proj1_s4498062/DOCUMENTATION.md b/project1/proj1_s4498062/DOCUMENTATION.md
new file mode 100644
index 0000000..6c3a96c
--- /dev/null
+++ b/project1/proj1_s4498062/DOCUMENTATION.md
@@ -0,0 +1,135 @@
+# HTTP 1.1 Web Server
+
+This is an implementation overview of the very basic HTTP 1.1 server provided.
+
+## Configuration
+Configuration is done in a configuration file. `webhttp.config` is a wrapper
+for Python's `SafeConfigParser`. The default configuration file is
+`~/.webpy.ini`. An example has been given in `webpy.ini.example`. It should be
+self-explanatory. There are two settings that may require explanation:
+
+ * `index` is similar to Apache's `DirectoryIndex`;
+ * the `error`*`xxx`* settings point to files relative to the root content
+ directory that hold files that should be served in case of errors.
+
+Most settings can be overridden by command line flags and have sensible
+defaults.
+
+It is not necessary to use a configuration file. If not provided, the default
+settings are:
+
+ * Hostname: localhost
+ * Port: 8001
+ * Timeout: 15s
+ * Maximum number of connections: 1000
+
+## Logging
+Logging is done using `logging`. `webhttp.weblogging` is a wrapper for this
+library that inserts the right name of the logger (currently `webhttp`).
+
+## Parsing
+[RFC 2616][rfc2616] gives a context free grammar specification of HTTP requests
+and responses. To parse these elements appropriately, the CFG has been
+translated directly to regular expressions, in `webhttp.regexes`. This leads to
+bulky (and possibly inefficient) regexes, but on the other hand produces
+maintainable code.
+
+## Conenection handling
+A `webhttp.server.Server` object opens a listening socket on the specified port
+and hostname. Whenever a connection is requested, a `ConnectionHandler` object
+is created. Extending `threading.Thread`, this class will run in a separate
+thread, allowing simultaneous connections.
+
+The `ConnectionHandler` will read data and feed it, 4096 bytes at a time, to a
+`parser.RequestParser`. This class uses a buffer to hold unfinished requests,
+because a request's size may exceed 4096 bytes. Since the `RequestParser` keeps
+reading after yielding the first request, persistent connections are implicitly
+supported (more below).
+
+For every request that the `RequestParser` yields, a
+`composer.ResponseComposer` is used to compose the appropriate response. This
+response is then sent back to the client in the `ConnectionHandler`.
+
+### Persistent connections
+As mentioned above, the *handling* of requests in persistent connections is
+supported implicitly. However, this does not hold for *closing* persistent
+connections. In accordance with [RFC 2616][rfc2616], persistent connections are
+closed in one of the following situation:
+
+ * When the configured timeout is exceeded while waiting for a new request, the
+ connection will be closed and a debug level log message is produced.
+ * When the composed response has a `Connection: close` header, the connection
+ will be closed directly after sending that response.
+
+When a client requests closure of a persistent connection through sending the
+`Connection: close` header, the response will always include this header as
+well. This situation is therefore covered by the second bullet point.
+
+## Serving GET requests
+As explained above, the `ResponseComposer` is responsible for building
+responses for requests. It does this through three key methods.
+
+ * `compose_response` tackles directory traversal attacks by directly refusing
+ to handle URIs that contain `..`. If this check is passed, a new `Process`
+ is created in which `serve` (see below) is handled. This is necessary
+ because `serve` uses methods that could theoretically time-out, for example
+ when handling excessively large files.
+
+ * `serve` implements most of the logic. It creates a `Resource` object for the
+ URI requested, handles ETag-related headers (see below), sets the
+ `Content-Type` header as needed, handles encoding (see below), sets the
+ `Connection` header as needed and, perhaps most importantly does error
+ handling. If all goes well, it returns a `Response` which is then returned
+ from `compose_response`.
+
+ * `serve_error` is a `serve` wrapper that serves the error page for a
+ particular HTTP status code (see Configuration above).
+
+### Status codes
+
+200 for successful requests, 404 if the requested resource could not be found,
+403 if the user running the server doesn't have permission to read the
+resource, or if a directory without index file has been requested.
+
+## Caching using ETags
+ETags are properties of resources and are therefore generated by the `Resource`
+class, method `generate_etag`. This uses the md5 hash of the result of
+`os.stat()` on the file requested. We don't need a cryptographically secure
+hash for this: only collision resistance is in some way relevant, but clients
+that are afraid about collisions can always simply not send conditional
+requests.
+
+`Resource` also has an `etag_match` method that checks if a given ETag list
+matches the ETag of the resource.
+
+### Status codes
+304 if the ETag matched and the cache is used.
+
+## Encoding
+Encoding is done through the `Resource` class. An optional argument `encoding`
+has been given to the `get_content` method (default: `identity`). Internally,
+another module, `encodings` is used. This module has functions to convert the
+internal representation of an encoding to a string and vice versa, and to
+encode and decode strings using some encoding. Currently, only `gzip` and
+`identity` are supported.
+
+### Status codes
+406 if only unknown encodings have been requested.
+
+## Acknowledgements
+The following Python modules have been used, in alphabetical order:
+
+ * `configparser`, in `webhttp.config`, for parsing the ini configuration file
+ * `gzip`, in `webhttp.encodings`, for gzip encoding
+ * `hashlib`, in `webhttp.resource`, for ETag generation
+ * `mimetypes`, in `webhttp.resource`, for guessing content type and encoding
+ * `multiprocessing`, in `webhttp.composer`, for timing out the `serve` call
+ * `StringIO`, in `webhttp.encodings`, for gzip encoding
+ * `urlparse`, in `webhttp.resource`, for parsing URIs
+
+In addition to these, the following fairly standard libraries were used:
+`argparse`, `binascii`, `itertools`, `logging`, `os`, `re`, `socket`, `sys`,
+`threading`, `time`, `unittest`.
+
+[rfc2616]: http://tools.ietf.org/html/rfc2616
+
diff --git a/project1/proj1_s4498062/Makefile b/project1/proj1_s4498062/Makefile
index 39c4da7..a6c4261 100644
--- a/project1/proj1_s4498062/Makefile
+++ b/project1/proj1_s4498062/Makefile
@@ -18,7 +18,7 @@ test: webtests.py webhttp/*.py
python $<
clean:
- rm -vf webserver webserver.c
+ rm -rvf webserver webserver.c **/*.pyc **/__pycache__
-.PHONY: clean all run
+.PHONY: clean all run test
diff --git a/project1/proj1_s4498062/documentation.txt b/project1/proj1_s4498062/documentation.txt
index 630fc88..fc84f4f 100644..120000
--- a/project1/proj1_s4498062/documentation.txt
+++ b/project1/proj1_s4498062/documentation.txt
@@ -1,5 +1 @@
-Don't forget to document your implementation:
--> language + external libraries used (if any)
--> control flow with headers/status codes considered for each requirement (GET, persistent connections, ETag, encoding)
--> concurrency, hashing, resource encoding
--> challenges (if any) \ No newline at end of file
+DOCUMENTATION.md \ No newline at end of file
diff --git a/project1/proj1_s4498062/webpy.ini.example b/project1/proj1_s4498062/webpy.ini.example
new file mode 100644
index 0000000..52ea962
--- /dev/null
+++ b/project1/proj1_s4498062/webpy.ini.example
@@ -0,0 +1,11 @@
+[webhttp]
+hostname=
+port=8001
+timeout=15
+max_connections=1000
+
+root=content
+index=index.html
+
+error404=/error/404.html
+error403=/error/403.html