aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorstephandooper2018-01-12 11:57:24 +0100
committerGitHub2018-01-12 11:57:24 +0100
commitf66b9b6bd79a12088c65e7254e1b0e2b2c817d82 (patch)
tree2b91c7b0eee5d75f9ce4c0227f8c024c9af5d8c9
parentUpdate Evaluation.md (diff)
Update Evaluation.md
-rw-r--r--Evaluation.md2
1 files changed, 1 insertions, 1 deletions
diff --git a/Evaluation.md b/Evaluation.md
index 6f0565e..8c04c2d 100644
--- a/Evaluation.md
+++ b/Evaluation.md
@@ -4,7 +4,7 @@
The results we obtained are described in the second blogpost: implementation. In this evaluation we will describe the fields that can be added to the index in order to find out the importance of all fields. As stated in the table with the scores and ranks, we can see that not all fields that are found to be relevant by BM25 are included in Nordlys. Only `<foaf:givenName>`, `<dpb:name>`, `<foaf:name>`, `<dbo:wikiPageWikiLinkText>` and `<rfds:label>` are used in Nordlys. There may be some fields that we want to add to the index.
-In this case, we might try adding fields that have a high BM25 score. The fields with the top two ranks in both the BM25 and the human assesment rankings are `<dbo:abstract>` and `<rdfs:comment>`. Even though these fields are ranked so highly by BM25 we do not recommend adding them both, since the `<rdfs:comment>` field is simply a shorter version of `<dbo:abstract>`. Also, since these fields contain large texts, adding them both to the index would likely increase the computing time by quite a bit. Instead, we recommend only adding the `<rdfs:comment>` field.
+In this case, we might try adding fields that have a high BM25 score. The fields with the top two ranks in both the BM25 and the human assesment rankings are `<dbo:abstract>` and `<rdfs:comment>`. Even though these fields are ranked so highly by BM25 we do not recommend adding them both, since the `<rdfs:comment>` field is simply a shorter version of `<dbo:abstract>`. Also, since these fields contain large texts, adding them both to the index would likely increase the computing time substantially. Instead, we recommend only adding the `<rdfs:comment>` field.
Two other fields we might want to add are `<dc:description>` that has rank 8 of BM25 and `<dbp:shortDescription>`, which has rank 9. These description are likely to be searched for and therefore we would recommend to add this field to the index. Since there is a lot of overlap between these fields we would recommend adding the higher ranked `<dc:description>` to the index, because it is ranked higher and the descriptions are already relatively short, meaning the difference in computation necessary for these two fields will not be very large.