aboutsummaryrefslogtreecommitdiff
path: root/Evaluation.md
diff options
context:
space:
mode:
authorLuuk Arts2018-01-12 12:04:29 +0100
committerGitHub2018-01-12 12:04:29 +0100
commitb1dfd42c6e0a6dcb285f823da20eca2cf7ae3d3e (patch)
tree4acba61bffe94054ab42f50c77faa4427813af42 /Evaluation.md
parentUpdate Evaluation.md (diff)
Update Evaluation.md
Diffstat (limited to 'Evaluation.md')
-rw-r--r--Evaluation.md8
1 files changed, 4 insertions, 4 deletions
diff --git a/Evaluation.md b/Evaluation.md
index 8c04c2d..2370da9 100644
--- a/Evaluation.md
+++ b/Evaluation.md
@@ -2,15 +2,15 @@
## Explanation of results
-The results we obtained are described in the second blogpost: implementation. In this evaluation we will describe the fields that can be added to the index in order to find out the importance of all fields. As stated in the table with the scores and ranks, we can see that not all fields that are found to be relevant by BM25 are included in Nordlys. Only `<foaf:givenName>`, `<dpb:name>`, `<foaf:name>`, `<dbo:wikiPageWikiLinkText>` and `<rfds:label>` are used in Nordlys. There may be some fields that we want to add to the index.
+The results we obtained are described in the second blogpost: implementation. In this evaluation we will describe the fields that can be added to the index in order to find out the importance of all fields. As stated in the table with the scores and ranks, we can see that not all fields that are found to be relevant by BM25 are included in Nordlys. Only `<foaf:givenName>`, `<dpb:name>`, `<foaf:name>`, `<dbo:wikiPageWikiLinkText>` and `<rfds:label>` are used in Nordlys. There may be some fields that we want to add to the index. We will now evaluate those fields that are in our lists of important fields as found by our measure for both the BM25 relevance scores and our human assesments.
-In this case, we might try adding fields that have a high BM25 score. The fields with the top two ranks in both the BM25 and the human assesment rankings are `<dbo:abstract>` and `<rdfs:comment>`. Even though these fields are ranked so highly by BM25 we do not recommend adding them both, since the `<rdfs:comment>` field is simply a shorter version of `<dbo:abstract>`. Also, since these fields contain large texts, adding them both to the index would likely increase the computing time substantially. Instead, we recommend only adding the `<rdfs:comment>` field.
+The fields with the top two ranks in both the BM25 and the human assesment rankings are `<dbo:abstract>` and `<rdfs:comment>`. Even though these fields are ranked so highly by BM25 we do not recommend adding them both, since the `<rdfs:comment>` field is simply a shorter version of `<dbo:abstract>`. Also, since these fields contain large texts, adding them both to the index would likely increase the computing time by quite a bit. Instead, we recommend only adding the `<rdfs:comment>` field.
Two other fields we might want to add are `<dc:description>` that has rank 8 of BM25 and `<dbp:shortDescription>`, which has rank 9. These description are likely to be searched for and therefore we would recommend to add this field to the index. Since there is a lot of overlap between these fields we would recommend adding the higher ranked `<dc:description>` to the index, because it is ranked higher and the descriptions are already relatively short, meaning the difference in computation necessary for these two fields will not be very large.
-Some fields that scored well using our human assesment as a relevance measures turned out to have a low ranking when using bm25. These fields are `<dbp:ground>` (802), `<dbo:foundingYear>` (299), `<dbp:foundation>` (266). We suspect that the reason for this difference is that a lot of the DBpedia entries don't contain these fields. For example, countries, people and a lot of organisations don't have a founding year in DBpedia.
+Some fields that scored well using our human assesment as a relevance measures turned out to have a low ranking when using bm25. These fields are `<dbp:ground>` (802), `<dbo:foundingYear>` (299), `<dbp:foundation>` (266). We suspect that the reason for this difference is that a lot of the DBpedia entries don't contain these fields, because they're too specific. For example, countries, people and a lot of organisations don't have a founding year in DBpedia. Other fields in our lists that are also too specific are `<dbp:bridgeName>`, `<dbp:producer>` (which we had a hard time even finding in the DBpedia).
-Another field with a high score that we don't want to add is `<dbp:caption>`, which is rank 7 for BM25. Since `<dbp:caption>` can be the caption of an image or an table, which a lot of the times are added in support of information contained in other fields, this field does not provide a lot of new information.
+Another field with a high score that we don't want to add is `<dbp:caption>`, which is rank 7 for BM25. Since `<dbp:caption>` can be the caption of an image or an table, which a lot of the times are added in support of information contained in other fields, this field does not provide a lot of new information. For similar reason we also do not add `<dbp:mapCaption>`, `<dbp:imageCaption>` and `< dbp:pushpinMapCaption>`.
## Further Research