diff options
author | xPaulienx | 2018-01-12 12:22:49 +0100 |
---|---|---|
committer | GitHub | 2018-01-12 12:22:49 +0100 |
commit | 6f61436de587be5b8a8b7f1ed450e2cc4f8f6959 (patch) | |
tree | 3d42244b2228a20d0a9a0865aba46bae57d21627 | |
parent | Update Evaluation.md (diff) |
Update Evaluation.md
-rw-r--r-- | Evaluation.md | 9 |
1 files changed, 7 insertions, 2 deletions
diff --git a/Evaluation.md b/Evaluation.md index ecad4c0..e8d45c8 100644 --- a/Evaluation.md +++ b/Evaluation.md @@ -10,15 +10,20 @@ Two other fields we might want to add are `<dc:description>` that has rank 8 of Some fields that scored well using our human assesment as a relevance measures turned out to have a low ranking when using bm25. These fields are `<dbp:ground>` (802), `<dbo:foundingYear>` (299), `<dbp:foundation>` (266). We suspect that the reason for this difference is that a lot of the DBpedia entries don't contain these fields, because they're too specific. For example, countries, people and a lot of organisations don't have a founding year in DBpedia. Other fields in our lists that are also too specific are `<dbp:bridgeName>`, `<dbp:producer>` (which we had a hard time even finding in the DBpedia). -Another field with a high score that we don't want to add is `<dbp:caption>`, which is rank 7 for BM25. Since `<dbp:caption>` can be the caption of an image or an table, which a lot of the times are added in support of information contained in other fields, this field does not provide a lot of new information. For similar reason we also do not add `<dbp:mapCaption>`, `<dbp:imageCaption>` and `< dbp:pushpinMapCaption>`. +Another field with a high score that we do not want to add is `<dbp:caption>`, which is rank 7 for BM25. Since `<dbp:caption>` can be the caption of an image or an table, which a lot of the times are added in support of information contained in other fields, this field does not provide a lot of new information. For similar reason we also do not add `<dbp:mapCaption>`, `<dbp:imageCaption>` and `< dbp:pushpinMapCaption>`. +## Conclusion +Our conclusion is that it would be best to add `<rdfs:comment>` and `<dc:description>` because we think those are the fields that would influence the results the most significant. Both fields describe more about the topic instead of only using the fields used by Nordlys (`<foaf:givenName>`, `<dpb:name>`, `<foaf:name>`, `<dbo:wikiPageWikiLinkText>` and `<rfds:label>`). So if a user would want to look for for example: the footballplayer Messi, but he does not know the name, he could use the query: argentine footballplayer. In that case, he is more likely to find the information about the person he is looking for, in this case: Messi, since this is the description of Messi in DBpedia. ## Further Research ### Hill climbing to validate or improve the results of our statical analysis of the importance of fields. Unfortunately, we couldn't apply hill climbing to our own research because we did not have enough programmers in order to carry out this. In further research, it would still be interesting to apply hill climbing for search engine optimalization. Than, we could use hill climbing to add all the fields to the index in order to test the importance of those fields. It would be interesting to see how this hill climbing algorithm can be optimized. -- Adding the fields to the index and comparing the new NDCG scores with baseline runs. +### Adding the fields to the index and comparing the new NDCG scores with baseline runs. +In further research, it would also be interesting to really implement the suggestion we do now. In that case, we would add the fields: `<rdfs:comment>` and `<dc:description>` to see whether it really optimizes the seach results. In that case we would have to add those fields to the index and compare them with the NDCG scores with baseline runs. + + |