diff options
author | Erin van der Veen | 2018-01-12 21:02:07 +0100 |
---|---|---|
committer | GitHub | 2018-01-12 21:02:07 +0100 |
commit | 23e6acc79bf658cf31b99dc0de69816c419abaab (patch) | |
tree | 637ab9b8f49978b9e977039dbdd071dd85fc4603 /Evaluation.md | |
parent | Update Evaluation.md (diff) |
Add reasoning behind advantage of hillclimbing over statistical analysis
Diffstat (limited to 'Evaluation.md')
-rw-r--r-- | Evaluation.md | 8 |
1 files changed, 6 insertions, 2 deletions
diff --git a/Evaluation.md b/Evaluation.md index ee17694..c5cba19 100644 --- a/Evaluation.md +++ b/Evaluation.md @@ -18,9 +18,13 @@ Our conclusion is that it would be best to add `<rdfs:comment>` and `<dc:descrip ## Further Research ### Hill climbing to validate or improve the results of our statical analysis of the importance of fields. -Unfortunately, we couldn't apply hill climbing to our own research because we did not have enough programmers in order to carry out this. In further research, it would still be interesting to apply hill climbing for search engine optimalization. Then, we could use hill climbing to add all the fields to the index in order to test the importance of those fields. It would be interesting to see how this hill climbing algorithm can be optimized. +Unfortunately, we couldn't apply hill climbing to our own research because we did not have enough programmers in order to carry out this. In further research, it would still be interesting to apply hill climbing for search engine optimalization because it takes the value of bm25 into account. Something we did not manage to do for our research as of yet. -#### TODO: In welk opzicht is Hill Climbing beter dan statical analysis? +In particular, using hill climbing takes duplicate data in multiple fields into account. For most wikipedia articles the name of the page also occurs in the abstract of said page. Therefore, adding the name of the page might not actually increase the evaluation of the search algorithm. Our current data does not take such correlations into account. + +The inverse might also be true, some fields that we think are not relevant (because they do not often contain a search term) might actually have the search term in such specific cases that it actually increases the overall evaluation of the system. + +It would be interesting to see how this hill climbing algorithm can optimize search. ### Adding the fields to the index and comparing the new NDCG scores with baseline runs. In further research, it would also be interesting to really implement the suggestion we do now. In that case, we would add the fields: `<rdfs:comment>` and `<dc:description>` to see whether it really optimizes the search results. In that case we would have to add those fields to the index and compare them with the NDCG scores with baseline runs. |