aboutsummaryrefslogtreecommitdiff
path: root/Plan.md
diff options
context:
space:
mode:
authorErin van der Veen2017-09-29 12:02:14 +0200
committerErin van der Veen2017-09-29 12:02:14 +0200
commitb3e25d406b71f4d6ca12b9bcf5120ccdf8cf4197 (patch)
tree0403401d5c166ed617f950d636f9b60aa13421b0 /Plan.md
parentMention that there are two ways to Index the data (diff)
Fix issues in Plan.md
Diffstat (limited to 'Plan.md')
-rw-r--r--Plan.md6
1 files changed, 3 insertions, 3 deletions
diff --git a/Plan.md b/Plan.md
index 5a2611f..df17f16 100644
--- a/Plan.md
+++ b/Plan.md
@@ -29,13 +29,13 @@ Of the following files from the 2015-10 dump:
- `transitive_redirects_en.ttl`
There are two indexes that are used for this result.
-<!-- TODO: Are they? --!>
+<!-- TODO: Are they? -->
Both Indexes are likely implemented by the Nordlys package that we will describe below.
-###Index A
+### Index A
- A new field called "catchall" is used; it encompass the content of all other fields. Duplicate values are not removed in this field.
-###Index B
+### Index B
- Anchor texts (i.e. contents of `<dbo:wikiPageWikiLinkText>` predicate) are added to both "similar entity names" and "attributes" fields.
- Entity URIs are resolved differently for the "related entity names" field. Names for related entities are extracted in the same way as it is done for "names" field (see predicates for "names" in the above table), but only one arbitrary name is used for each related entity.
- Category URIs are resolved using `category_labels_en.ttl` file