aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorErin van der Veen2017-12-14 16:24:29 +0100
committerErin van der Veen2017-12-14 16:24:29 +0100
commit075bad53a26f1fe2aeda092a0ba56396bf8a2142 (patch)
treee7507c242cef31c3f4e54df51bef538bf1c45e10
parentUpdate Fields.md (diff)
Start second blogpost
-rw-r--r--Implementation.md31
1 files changed, 31 insertions, 0 deletions
diff --git a/Implementation.md b/Implementation.md
new file mode 100644
index 0000000..337b9fe
--- /dev/null
+++ b/Implementation.md
@@ -0,0 +1,31 @@
+# Implementation
+
+## Feasibility
+The Plan mentions the following:
+> We consider a vector space where every possible search field represents a binary parameter.
+> A vector has `1` for the parameter if and only if it is included in the search (excluded from the blacklist).
+> We will then run a hill-climbing algorithm through this higher-dimensional vector space
+> in order to find a vector (an index setting) for which the ranking results are best.
+
+Soon after we began trying to implement this feature using a locally run version of nordlys, we encountered some issues.
+The most notable being that our machines were unable to index the full DB-Pedia set in a reasonable amount of time, using a reasonable amount of resources.
+When we encountered this issue, we decided that the best options was using a subset of the DB-Pedia dataset.
+
+The subset that we settled on is the subset that has relevance scores assigned to them for any query.
+We then only considered the result of a given query in our assessment.
+
+The above has the added benefit that the relevance (both the human assessment and the score) are precomputed.
+This meant that simply parsing the files that are provided by nordlys is enough to implement any kind of field selected assessment.
+
+Unfortunately, it turned out that hill-climbing was also out of the scope of the assignment.
+Having only 2 programmers, both of whom have not a lot of experience in implementing such algorithms, made the task slightly to much work.
+Instead, we decided to take a different approach and statically analyse the importance of all fields.
+The meansure that we use take the form of:
+
+![Field Relevance Measure](http://mathurl.com/yc2ptq63.png "Field Relevance Measure")
+
+Where `relevance` is the bm25 relevance that is stored by nordlys, `D` is the set of documents, `Q` the set of queries, `tf` the function that counts the amount of times any of the query terms was found in that field and `|f|` the size of the field.
+
+## Implementation
+
+## Intermediate Result