Dataset: yago-annotated-facts (development version)
This is a subset of the YAGO 4 knowledge base (paper), based on Wikidata, version from February 24, 2020. This dataset includes only the fact annotations in RDF-star, that is facts about facts. Each stream element corresponds to one item in Wikidata.
Info
Download this metadata in RDF: Turtle, N-Triples, RDF/XML, Jelly
Source repository: dataset-yago-annotated-facts
Permanent URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev
Stream preview (click to expand)
<< <http://yago-knowledge.org/resource/_Q56236170> <http://schema.org/dissolutionDate> "2009"^^<http://www.w3.org/2001/XMLSchema#gYear> >>
<http://schema.org/endDate> "2009-12"^^<http://www.w3.org/2001/XMLSchema#gYearMonth>;
<http://schema.org/startDate> "2009-06"^^<http://www.w3.org/2001/XMLSchema#gYearMonth> .
<< <http://yago-knowledge.org/resource/Open_Science_Radio_Q18744554> <http://schema.org/creator> <http://yago-knowledge.org/resource/Matthias_Fromm_Q18748012> >>
<http://schema.org/startDate> "2013-01-02"^^<http://www.w3.org/2001/XMLSchema#date> .
<< <http://yago-knowledge.org/resource/Open_Science_Radio_Q18744554> <http://schema.org/creator> <http://yago-knowledge.org/resource/Konrad_Förstner_Q18744528> >>
<http://schema.org/startDate> "2014-01-19"^^<http://www.w3.org/2001/XMLSchema#date> .
<< <http://yago-knowledge.org/resource/Carnaval_na_avenida_Central,_atual_avenida_Rio_Branco_Q65621070> <http://schema.org/dateCreated> "1906-06-22"^^<http://www.w3.org/2001/XMLSchema#date> >>
<http://schema.org/endDate> "1906"^^<http://www.w3.org/2001/XMLSchema#gYear>;
<http://schema.org/startDate> "1906"^^<http://www.w3.org/2001/XMLSchema#gYear> .
<< <http://yago-knowledge.org/resource/Margherita_Cagol> <http://schema.org/nationality> <http://yago-knowledge.org/resource/Kingdom_of_Italy> >>
<http://schema.org/endDate> "1946-06-18"^^<http://www.w3.org/2001/XMLSchema#date>;
<http://schema.org/startDate> "1945-04-08"^^<http://www.w3.org/2001/XMLSchema#date> .
<< <http://yago-knowledge.org/resource/Margherita_Cagol> <http://schema.org/nationality> <http://yago-knowledge.org/resource/Italy> >>
<http://schema.org/endDate> "1975-06-05"^^<http://www.w3.org/2001/XMLSchema#date>;
<http://schema.org/startDate> "1946-06-18"^^<http://www.w3.org/2001/XMLSchema#date> .
<< <http://yago-knowledge.org/resource/Mihrengiz_Kadın> <http://schema.org/nationality> <http://yago-knowledge.org/resource/Ottoman_Empire> >>
<http://schema.org/endDate> "1923"^^<http://www.w3.org/2001/XMLSchema#gYear>;
<http://schema.org/startDate> "1869"^^<http://www.w3.org/2001/XMLSchema#gYear> .
General information
- Title: YAGO annotated facts (en)
- Identifier:
yago-annotated-facts
- Has version:
dev
- Theme:
- Encyclopaedia (eurovoc:4137)
- Metadata (eurovoc:c_40f54e0c)
- Open data (eurovoc:c_5ea6e5c4)
- Creator:
- The creators and contributors of Wikidata (1)
- Name: The creators and contributors of Wikidata
- Homepage: https://www.wikidata.org/
- The YAGO team of Télécom Paris and the Max Planck Institute for Informatics (2)
- Name: The YAGO team of Télécom Paris and the Max Planck Institute for Informatics
- Homepage: https://yago-knowledge.org/contributors
- Piotr Sowiński (3)
- Name: Piotr Sowiński
- Nickname: Ostrzyciel
- Homepage:
- The creators and contributors of Wikidata (1)
- License: https://spdx.org/licenses/CC-BY-SA-3.0
- Source:
- Date Issued: 2023-04-30
- Date Modified: 2024-08-29
- Landing page: yago-annotated-facts (dev)
- Conforms To: Metadata (https://w3id.org/riverbench/schema/metadata)
Technical metadata
- Has stream type usage:
- RDF stream type usage (1)
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a flattened stream of triples. (en)
- Has stream type: Flat RDF triple stream (stax:flatTripleStream)
- RDF stream type usage (2)
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a stream of graphs. Each graph corresponds to the RDF-star annotations of one Wikidata item. (en)
- Has stream type: RDF subject graph stream (stax:subjectGraphStream)
- RDF stream type usage (1)
- Has stream element count: 617,768
- Has stream element split:
- Type: Stream elements split by topic (rb:TopicStreamElementSplit)
- Comment: Every stream element corresponds to one Wikidata item. (en)
- Has subject shape:
- Comment: Custom target – subject of any quoted triple in the subject position. (en)
- Target custom: YAGO annotated facts target (rb:yagoTarget)
- Uses vocabulary: http://schema.org/
- Conforms to W3C RDF 1.1 specification: no
- Conforms to W3C RDF-star draft specification as of December 17, 2021: yes
- Uses generalized triples: no
- Uses generalized RDF datasets: no
- Uses RDF-star: yes
Distributions
Download links
The dataset is published in a few size variants, each containing a specific number of stream elements. For each size, there are three distribution types available: flat (just an N-Triples/N-Quads file), streaming (a .tar.gz archive with Turtle/TriG files, one file per stream element), and Jelly (a native binary format for streaming RDF). See the documentation for more details.
Distribution size | Statements | Flat | Streaming | Jelly |
---|---|---|---|---|
10K | 22,977 | 256.7 KB | 376.5 KB | 260.8 KB |
100K | 226,648 | 2.4 MB | 3.6 MB | 2.7 MB |
Full | 2,484,547 | 28.7 MB | 36.2 MB | 28.9 MB |
The full metadata of all distributions can be found below.
Full stream distribution
- Title: Full stream distribution
- Identifier:
stream-full
- Has file name:
stream_full.tar.gz
- Has stream type usage:
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a stream of graphs. Each graph corresponds to the RDF-star annotations of one Wikidata item. (en)
- Has stream type: RDF subject graph stream (stax:subjectGraphStream)
- Has distribution type:
- Full distribution (rb:fullDistribution)
- Stream distribution (rb:streamDistribution)
- Has stream element count: 617,768
- Byte size: 36.2 MB
- Media type: text/turtle
- Packaging format: application/tar
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
7ee64c45c2834a0e5e5ab1d85c7f8a5b
- Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
95a8a97909c001174a7e394e344b956d1143535a
- Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Download URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev/files/stream_full.tar.gz
- Statistics: statistics-full
Full Jelly distribution
- Title: Full Jelly distribution
- Identifier:
jelly-full
- Has file name:
jelly_full.jelly.gz
- Has stream type usage:
- RDF stream type usage (1)
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a stream of graphs. Each graph corresponds to the RDF-star annotations of one Wikidata item. (en)
- Has stream type: RDF subject graph stream (stax:subjectGraphStream)
- RDF stream type usage (2)
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a flattened stream of triples. (en)
- Has stream type: Flat RDF triple stream (stax:flatTripleStream)
- RDF stream type usage (1)
- Has distribution type:
- Full distribution (rb:fullDistribution)
- Jelly distribution (rb:jellyDistribution)
- Has stream element count: 617,768
- Byte size: 28.9 MB
- Media type: application/x-jelly-rdf
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
798bb6c728d973777f5bfcf84b089c4b
- Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
8408d2b846440935c0405dea9c6ce8bfd2525ded
- Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Download URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev/files/jelly_full.jelly.gz
- Statistics: statistics-full
Full flat distribution
- Title: Full flat distribution
- Identifier:
flat-full
- Has file name:
flat_full.nt.gz
- Has stream type usage:
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a flattened stream of triples. (en)
- Has stream type: Flat RDF triple stream (stax:flatTripleStream)
- Has distribution type:
- Flat distribution (rb:flatDistribution)
- Full distribution (rb:fullDistribution)
- Has stream element count: 617,768
- Byte size: 28.7 MB
- Media type: application/n-triples
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
b10036c730245cf23137376ad209dfb2
- Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
e47427312e2d585fa8fadc8598e7e642e4eac518
- Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Download URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev/files/flat_full.nt.gz
- Statistics: statistics-full
100K elements stream distribution
- Title: 100K elements stream distribution
- Identifier:
stream-100k
- Has file name:
stream_100K.tar.gz
- Has stream type usage:
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a stream of graphs. Each graph corresponds to the RDF-star annotations of one Wikidata item. (en)
- Has stream type: RDF subject graph stream (stax:subjectGraphStream)
- Has distribution type:
- Partial distribution (rb:partialDistribution)
- Stream distribution (rb:streamDistribution)
- Has stream element count: 100,000
- Byte size: 3.6 MB
- Media type: text/turtle
- Packaging format: application/tar
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
6b3a7379f74a09325ddc637a13228083
- Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
8a277d70a948b9d1ede6356ecf755ee635166123
- Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Download URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev/files/stream_100K.tar.gz
- Statistics: statistics-100k
100K elements Jelly distribution
- Title: 100K elements Jelly distribution
- Identifier:
jelly-100k
- Has file name:
jelly_100K.jelly.gz
- Has stream type usage:
- RDF stream type usage (1)
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a flattened stream of triples. (en)
- Has stream type: Flat RDF triple stream (stax:flatTripleStream)
- RDF stream type usage (2)
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a stream of graphs. Each graph corresponds to the RDF-star annotations of one Wikidata item. (en)
- Has stream type: RDF subject graph stream (stax:subjectGraphStream)
- RDF stream type usage (1)
- Has distribution type:
- Jelly distribution (rb:jellyDistribution)
- Partial distribution (rb:partialDistribution)
- Has stream element count: 100,000
- Byte size: 2.7 MB
- Media type: application/x-jelly-rdf
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
e7f1becd817dcb6ec18069d84876101e
- Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
8d01624d3577a10f3b29d3f444ee16be5e4b78be
- Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Download URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev/files/jelly_100K.jelly.gz
- Statistics: statistics-100k
100K elements flat distribution
- Title: 100K elements flat distribution
- Identifier:
flat-100k
- Has file name:
flat_100K.nt.gz
- Has stream type usage:
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a flattened stream of triples. (en)
- Has stream type: Flat RDF triple stream (stax:flatTripleStream)
- Has distribution type:
- Flat distribution (rb:flatDistribution)
- Partial distribution (rb:partialDistribution)
- Has stream element count: 100,000
- Byte size: 2.4 MB
- Media type: application/n-triples
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
3e1a342ac81835c781235990e0dde8d4
- Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
06943438ccf05510c97f53c246b0c4f425ecf27f
- Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Download URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev/files/flat_100K.nt.gz
- Statistics: statistics-100k
10K elements stream distribution
- Title: 10K elements stream distribution
- Identifier:
stream-10k
- Has file name:
stream_10K.tar.gz
- Has stream type usage:
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a stream of graphs. Each graph corresponds to the RDF-star annotations of one Wikidata item. (en)
- Has stream type: RDF subject graph stream (stax:subjectGraphStream)
- Has distribution type:
- Partial distribution (rb:partialDistribution)
- Stream distribution (rb:streamDistribution)
- Has stream element count: 10,000
- Byte size: 376.5 KB
- Media type: text/turtle
- Packaging format: application/tar
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
2ed92d6710623cd85bcba727fc8fc9b1
- Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
e60b7a726e858d8a703db4c66443df0fa65bbae7
- Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Download URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev/files/stream_10K.tar.gz
- Statistics: statistics-10k
10K elements Jelly distribution
- Title: 10K elements Jelly distribution
- Identifier:
jelly-10k
- Has file name:
jelly_10K.jelly.gz
- Has stream type usage:
- RDF stream type usage (1)
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a flattened stream of triples. (en)
- Has stream type: Flat RDF triple stream (stax:flatTripleStream)
- RDF stream type usage (2)
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a stream of graphs. Each graph corresponds to the RDF-star annotations of one Wikidata item. (en)
- Has stream type: RDF subject graph stream (stax:subjectGraphStream)
- RDF stream type usage (1)
- Has distribution type:
- Jelly distribution (rb:jellyDistribution)
- Partial distribution (rb:partialDistribution)
- Has stream element count: 10,000
- Byte size: 260.8 KB
- Media type: application/x-jelly-rdf
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
bc236eb78830a27977547e1b3e90f012
- Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
0544f2958b9ab5ae8c86c01f36be3b54c987f85e
- Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Download URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev/files/jelly_10K.jelly.gz
- Statistics: statistics-10k
10K elements flat distribution
- Title: 10K elements flat distribution
- Identifier:
flat-10k
- Has file name:
flat_10K.nt.gz
- Has stream type usage:
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a flattened stream of triples. (en)
- Has stream type: Flat RDF triple stream (stax:flatTripleStream)
- Has distribution type:
- Flat distribution (rb:flatDistribution)
- Partial distribution (rb:partialDistribution)
- Has stream element count: 10,000
- Byte size: 256.7 KB
- Media type: application/n-triples
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
0219ab000043593eac7df6b505900ce4
- Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
59461e65ea844ec242ba1cad62bff107c4d69d8a
- Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Download URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev/files/flat_10K.nt.gz
- Statistics: statistics-10k
Statistics
Statistics for full distributions
- Title: Statistics for full distributions
Sum | Unique (approx.) | Mean | St. dev. | Min. | Max. | |
---|---|---|---|---|---|---|
IRIs | 3,631,687 | 594,855 | 5.88 | 3.22 | 3 | 853 |
Blank nodes | 0 | N/A | 0.00 | 0.00 | 0 | 0 |
Objects | 3,127,393 | 166,380 | 5.06 | 5.06 | 1 | 853 |
Graphs | 617,768 | 1 | 1.00 | 0.00 | 1 | 1 |
Statements | 2,484,547 | N/A | 4.02 | 6.10 | 1 | 1,455 |
Literals | 1,736,327 | 57,578 | 2.81 | 2.50 | 1 | 66 |
Simple literals | 211 | 174 | 0.00 | 0.02 | 0 | 3 |
Datatype literals | 1,736,116 | 57,405 | 2.81 | 2.50 | 1 | 66 |
Language literals | 0 | 0 | 0.00 | 0.00 | 0 | 0 |
ASCII control chars | 0 | N/A | 0.00 | 0.00 | 0 | 0 |
Quoted triples | 2,484,547 | N/A | 4.02 | 6.10 | 1 | 1,455 |
Subjects | 2,009,932 | 1,896,569 | 3.25 | 3.04 | 2 | 850 |
Predicates | 1,622,855 | 75 | 2.63 | 0.48 | 2 | 3 |
Statistics for 100K distributions
- Title: Statistics for 100K distributions
Sum | Unique (approx.) | Mean | St. dev. | Min. | Max. | |
---|---|---|---|---|---|---|
IRIs | 502,972 | 102,657 | 5.03 | 5.30 | 3 | 853 |
Blank nodes | 0 | N/A | 0.00 | 0.00 | 0 | 0 |
Objects | 332,939 | 52,847 | 3.33 | 5.46 | 1 | 853 |
Graphs | 100,000 | 1 | 1.00 | 0.00 | 1 | 1 |
Statements | 226,648 | N/A | 2.27 | 9.28 | 1 | 1,455 |
Literals | 187,612 | 37,329 | 1.88 | 0.98 | 1 | 49 |
Simple literals | 66 | 66 | 0.00 | 0.03 | 0 | 3 |
Datatype literals | 187,546 | 37,263 | 1.88 | 0.97 | 1 | 49 |
Language literals | 0 | 0 | 0.00 | 0.00 | 0 | 0 |
ASCII control chars | 0 | N/A | 0.00 | 0.00 | 0 | 0 |
Quoted triples | 226,648 | N/A | 2.27 | 9.28 | 1 | 1,455 |
Subjects | 246,103 | 237,937 | 2.46 | 5.24 | 2 | 850 |
Predicates | 257,646 | 32 | 2.58 | 0.49 | 2 | 3 |
Statistics for 10K distributions
- Title: Statistics for 10K distributions
Sum | Unique (approx.) | Mean | St. dev. | Min. | Max. | |
---|---|---|---|---|---|---|
IRIs | 49,533 | 10,233 | 4.95 | 0.93 | 3 | 10 |
Blank nodes | 0 | N/A | 0.00 | 0.00 | 0 | 0 |
Objects | 33,009 | 7,553 | 3.30 | 1.39 | 1 | 13 |
Graphs | 10,000 | 1 | 1.00 | 0.00 | 1 | 1 |
Statements | 22,977 | N/A | 2.30 | 1.34 | 1 | 10 |
Literals | 19,576 | 7,332 | 1.96 | 0.90 | 1 | 8 |
Simple literals | 0 | 0 | 0.00 | 0.00 | 0 | 0 |
Datatype literals | 19,576 | 7,332 | 1.96 | 0.90 | 1 | 8 |
Language literals | 0 | 0 | 0.00 | 0.00 | 0 | 0 |
ASCII control chars | 0 | N/A | 0.00 | 0.00 | 0 | 0 |
Quoted triples | 22,977 | N/A | 2.30 | 1.34 | 1 | 10 |
Subjects | 23,762 | 23,762 | 2.38 | 0.53 | 2 | 7 |
Predicates | 26,100 | 12 | 2.61 | 0.49 | 2 | 3 |