aboutsummaryrefslogtreecommitdiff
path: root/Plan.md
diff options
context:
space:
mode:
authorErin van der Veen2017-09-29 11:06:31 +0200
committerErin van der Veen2017-09-29 11:06:31 +0200
commit288b5e20b18ca5ee784fbd0bd4adfc49a6db9947 (patch)
treeff267ba58d9ba3fc91220226c1a0838841fa21a5 /Plan.md
parentUpdate PLAN.md (diff)
Write Idea section of plan
Diffstat (limited to 'Plan.md')
-rw-r--r--Plan.md32
1 files changed, 32 insertions, 0 deletions
diff --git a/Plan.md b/Plan.md
new file mode 100644
index 0000000..78bf7c5
--- /dev/null
+++ b/Plan.md
@@ -0,0 +1,32 @@
+# Plan
+
+## The Idea
+The DBpedia-Entity repository has base rankings for a select amount of retrieval algorithms for multiple sets of queries.
+These base rankings were obtained by running the algorithms on the dataset, where the dataset was reduced to contain only a subset of all possible fields.
+In particular, the fields used by the base rankings were:
+
+| Field | Description | Predicates | Notes |
+| --- | --- | --- | --- |
+| Names | Names of the entity | `<foaf:name>`, `<dbp:name>`, `<foaf:givenName>`, `<foaf:surname>`, `<dbp:officialName>`, `<dbp:fullname>`, `<dbp:nativeName>`, `<dbp:birthName>`, `<dbo:birthName>`, `<dbp:nickname>`, `<dbp:showName>`, `<dbp:shipName>`, `<dbp:clubname>`, `<dbp:unitName>`, `<dbp:otherName>`, `<dbo:formerName>`, `<dbp:birthname>`, `<dbp:alternativeNames>`, `<dbp:otherNames>`, `<dbp:names>`, `<rdfs:label>` | |
+| Categories | Entity types | `<dcterms:subject>` | |
+| Similar entity names | Entity name variants | `!<dbo:wikiPageRedirects>`, `!<dbo:wikiPageDisambiguates>`, `<dbo:wikiPageWikiLinkText>` | `!` denotes reverse direction (i.e. `<o, p, s>`) |
+| Attributes | Literal attibutes of entity | All `<s, p, o>`, where *"o"* is a literal and *"p"* is not in *Names*, *Categories*, *Similar entity names*, and blacklist predicates.For each `<s, p, o>` triple, if `p matches <dbp:.*>` both *p* and *o* are stored (i.e. *"p o"* is indexed). | |
+| Related entity names | URI relations of entity| Similar to *Attributes* field, but *"o"* should be a URI. | |
+
+Of the following files from the 2015-10 dump:
+- `anchor_text_en.ttl`
+- `article_categories_en.ttl`
+- `disambiguations_en.ttl`
+- `infobox_properties_en.ttl`
+- `instance_types_transitive_en.ttl`
+- `labels_en.ttl`
+- `long_abstracts_en.ttl`
+- `mappingbased_literals_en.ttl`
+- `mappingbased_objects_en.ttl`
+- `page_links_en.ttl`
+- `persondata_en.ttl`
+- `short_abstracts_en.ttl`
+- `transitive_redirects_en.ttl`
+
+Our hypothesis is that not all of the fields are of similar importance.
+As such, our idea is to use some kind of Hill-Climbing algorithm to determine just what combination of fields (or possible weights) produces the best output.