diff options
author | Erin van der Veen | 2017-09-29 11:06:31 +0200 |
---|---|---|
committer | Erin van der Veen | 2017-09-29 11:06:31 +0200 |
commit | 288b5e20b18ca5ee784fbd0bd4adfc49a6db9947 (patch) | |
tree | ff267ba58d9ba3fc91220226c1a0838841fa21a5 /Plan.md | |
parent | Update PLAN.md (diff) |
Write Idea section of plan
Diffstat (limited to 'Plan.md')
-rw-r--r-- | Plan.md | 32 |
1 files changed, 32 insertions, 0 deletions
@@ -0,0 +1,32 @@ +# Plan + +## The Idea +The DBpedia-Entity repository has base rankings for a select amount of retrieval algorithms for multiple sets of queries. +These base rankings were obtained by running the algorithms on the dataset, where the dataset was reduced to contain only a subset of all possible fields. +In particular, the fields used by the base rankings were: + +| Field | Description | Predicates | Notes | +| --- | --- | --- | --- | +| Names | Names of the entity | `<foaf:name>`, `<dbp:name>`, `<foaf:givenName>`, `<foaf:surname>`, `<dbp:officialName>`, `<dbp:fullname>`, `<dbp:nativeName>`, `<dbp:birthName>`, `<dbo:birthName>`, `<dbp:nickname>`, `<dbp:showName>`, `<dbp:shipName>`, `<dbp:clubname>`, `<dbp:unitName>`, `<dbp:otherName>`, `<dbo:formerName>`, `<dbp:birthname>`, `<dbp:alternativeNames>`, `<dbp:otherNames>`, `<dbp:names>`, `<rdfs:label>` | | +| Categories | Entity types | `<dcterms:subject>` | | +| Similar entity names | Entity name variants | `!<dbo:wikiPageRedirects>`, `!<dbo:wikiPageDisambiguates>`, `<dbo:wikiPageWikiLinkText>` | `!` denotes reverse direction (i.e. `<o, p, s>`) | +| Attributes | Literal attibutes of entity | All `<s, p, o>`, where *"o"* is a literal and *"p"* is not in *Names*, *Categories*, *Similar entity names*, and blacklist predicates.For each `<s, p, o>` triple, if `p matches <dbp:.*>` both *p* and *o* are stored (i.e. *"p o"* is indexed). | | +| Related entity names | URI relations of entity| Similar to *Attributes* field, but *"o"* should be a URI. | | + +Of the following files from the 2015-10 dump: +- `anchor_text_en.ttl` +- `article_categories_en.ttl` +- `disambiguations_en.ttl` +- `infobox_properties_en.ttl` +- `instance_types_transitive_en.ttl` +- `labels_en.ttl` +- `long_abstracts_en.ttl` +- `mappingbased_literals_en.ttl` +- `mappingbased_objects_en.ttl` +- `page_links_en.ttl` +- `persondata_en.ttl` +- `short_abstracts_en.ttl` +- `transitive_redirects_en.ttl` + +Our hypothesis is that not all of the fields are of similar importance. +As such, our idea is to use some kind of Hill-Climbing algorithm to determine just what combination of fields (or possible weights) produces the best output. |