# Plan ## The Idea The DBpedia-Entity repository has base rankings for a select amount of retrieval algorithms for multiple sets of queries. These base rankings were obtained by running the algorithms on the dataset, where the dataset was reduced to contain only a subset of all possible fields. In particular, the fields used by the base rankings were: | Field | Description | Predicates | Notes | | --- | --- | --- | --- | | Names | Names of the entity | ``, ``, ``, ``, ``, ``, ``, ``, ``, ``, ``, ``, ``, ``, ``, ``, ``, ``, ``, ``, `` | | | Categories | Entity types | `` | | | Similar entity names | Entity name variants | `!`, `!`, `` | `!` denotes reverse direction (i.e. ``) | | Attributes | Literal attibutes of entity | All ``, where *"o"* is a literal and *"p"* is not in *Names*, *Categories*, *Similar entity names*, and blacklist predicates.For each `` triple, if `p matches ` both *p* and *o* are stored (i.e. *"p o"* is indexed). | | | Related entity names | URI relations of entity| Similar to *Attributes* field, but *"o"* should be a URI. | | Of the following files from the 2015-10 dump: - `anchor_text_en.ttl` - `article_categories_en.ttl` - `disambiguations_en.ttl` - `infobox_properties_en.ttl` - `instance_types_transitive_en.ttl` - `labels_en.ttl` - `long_abstracts_en.ttl` - `mappingbased_literals_en.ttl` - `mappingbased_objects_en.ttl` - `page_links_en.ttl` - `persondata_en.ttl` - `short_abstracts_en.ttl` - `transitive_redirects_en.ttl` Our hypothesis is that not all of the fields are of similar importance. As such, our idea is to use some kind of Hill-Climbing algorithm to determine just what combination of fields (or possible weights) produces the best output.