Dataset: yago-annotated-facts (development version)
This is a subset of the YAGO 4 knowledge base (paper), based on Wikidata, version from February 24, 2020. This dataset includes only the fact annotations in RDF-star, that is facts about facts. Each stream element corresponds to one item in Wikidata.
Info
Download this metadata in RDF: Turtle, N-Triples, RDF/XML, Jelly
Source repository: dataset-yago-annotated-facts
Permanent URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev
Stream preview (click to expand)
<< <http://yago-knowledge.org/resource/_Q56236170> <http://schema.org/dissolutionDate> "2009"^^<http://www.w3.org/2001/XMLSchema#gYear> >>
<http://schema.org/endDate> "2009-12"^^<http://www.w3.org/2001/XMLSchema#gYearMonth>;
<http://schema.org/startDate> "2009-06"^^<http://www.w3.org/2001/XMLSchema#gYearMonth> .
<< <http://yago-knowledge.org/resource/Open_Science_Radio_Q18744554> <http://schema.org/creator> <http://yago-knowledge.org/resource/Matthias_Fromm_Q18748012> >>
<http://schema.org/startDate> "2013-01-02"^^<http://www.w3.org/2001/XMLSchema#date> .
<< <http://yago-knowledge.org/resource/Open_Science_Radio_Q18744554> <http://schema.org/creator> <http://yago-knowledge.org/resource/Konrad_Förstner_Q18744528> >>
<http://schema.org/startDate> "2014-01-19"^^<http://www.w3.org/2001/XMLSchema#date> .
<< <http://yago-knowledge.org/resource/Carnaval_na_avenida_Central,_atual_avenida_Rio_Branco_Q65621070> <http://schema.org/dateCreated> "1906-06-22"^^<http://www.w3.org/2001/XMLSchema#date> >>
<http://schema.org/endDate> "1906"^^<http://www.w3.org/2001/XMLSchema#gYear>;
<http://schema.org/startDate> "1906"^^<http://www.w3.org/2001/XMLSchema#gYear> .
<< <http://yago-knowledge.org/resource/Margherita_Cagol> <http://schema.org/nationality> <http://yago-knowledge.org/resource/Kingdom_of_Italy> >>
<http://schema.org/endDate> "1946-06-18"^^<http://www.w3.org/2001/XMLSchema#date>;
<http://schema.org/startDate> "1945-04-08"^^<http://www.w3.org/2001/XMLSchema#date> .
<< <http://yago-knowledge.org/resource/Margherita_Cagol> <http://schema.org/nationality> <http://yago-knowledge.org/resource/Italy> >>
<http://schema.org/endDate> "1975-06-05"^^<http://www.w3.org/2001/XMLSchema#date>;
<http://schema.org/startDate> "1946-06-18"^^<http://www.w3.org/2001/XMLSchema#date> .
<< <http://yago-knowledge.org/resource/Mihrengiz_Kadın> <http://schema.org/nationality> <http://yago-knowledge.org/resource/Ottoman_Empire> >>
<http://schema.org/endDate> "1923"^^<http://www.w3.org/2001/XMLSchema#gYear>;
<http://schema.org/startDate> "1869"^^<http://www.w3.org/2001/XMLSchema#gYear> .
General information
- Title: YAGO annotated facts (en)
- Identifier:
yago-annotated-facts - Version:
dev - Theme:
- Encyclopaedia (eurovoc:4137)
- Metadata (eurovoc:c_40f54e0c)
- Open data (eurovoc:c_5ea6e5c4)
- Creator:
- The creators and contributors of Wikidata (1)
- Name: The creators and contributors of Wikidata
- Homepage: https://www.wikidata.org/
- The YAGO team of Télécom Paris and the Max Planck Institute for Informatics (2)
- Name: The YAGO team of Télécom Paris and the Max Planck Institute for Informatics
- Homepage: https://yago-knowledge.org/contributors
- Piotr Sowiński (3)
- Name: Piotr Sowiński
- Nickname: Ostrzyciel
- Homepage:
- The creators and contributors of Wikidata (1)
- License: https://spdx.org/licenses/CC-BY-SA-3.0
- Source:
- Date Issued: 2023-04-30
- Date Modified: 2026-06-20
- Landing page: yago-annotated-facts (dev)
Technical metadata
- Has stream type usage:
- RDF stream type usage (1)
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a flattened stream of triples. (en)
- Has stream type: Flat RDF triple stream (stax:flatTripleStream)
- RDF stream type usage (2)
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a stream of graphs. Each graph corresponds to the RDF-star annotations of one Wikidata item. (en)
- Has stream type: RDF subject graph stream (stax:subjectGraphStream)
- RDF stream type usage (1)
- Has stream element count: 617,768
- Has stream element split:
- Type: Stream elements split by topic (rb:TopicStreamElementSplit)
- Comment: Every stream element corresponds to one Wikidata item. (en)
- Has subject shape:
- Comment: Custom target – subject of any quoted triple in the subject position. (en)
- Target custom: YAGO annotated facts target (rb:yagoTarget)
- Uses vocabulary: http://schema.org/
- Conforms to W3C RDF 1.1 specification: no
- Conforms to W3C RDF-star draft specification as of December 17, 2021: yes
- Uses generalized triples: no
- Uses generalized RDF datasets: no
- Uses RDF-star: yes
Distributions
Download links
The dataset is published in a few size variants, each containing a specific number of stream elements. For each size, there are three distribution types available: flat (an N-Triples/N-Quads file in the RDF Message Log format), streaming (a .tar.gz archive with Turtle/TriG files, one file per stream element), and Jelly (a native binary format for streaming RDF). See the documentation for more details.
| Distribution size | Statements | Flat | Streaming | Jelly |
|---|---|---|---|---|
| 10K | 22,977 | 267.8 KB | 376.3 KB | 257.1 KB |
| 100K | 226,648 | 2.5 MB | 3.6 MB | 2.7 MB |
| Full | 2,484,547 | 29.4 MB | 36.2 MB | 28.9 MB |
The full metadata of all distributions can be found below.
Full flat distribution
- Title: Full flat distribution
- Identifier:
flat-full - Has file name:
flat_full.nt.gz - Has distribution type:
- Flat distribution (RDF Messages) (rb:flatDistribution)
- Full distribution (rb:fullDistribution)
- Has stream type usage:
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a flattened stream of triples. (en)
- Has stream type: Flat RDF triple stream (stax:flatTripleStream)
- Has stream element count: 617,768
- Byte size: 29.4 MB
- Media type: application/n-triples
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
90a9b18fcf5b6cc143eb1028c4c708bb - Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
d2803acf2dd6130dfb1e528c72958771a0daf11e - Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Statistics: statistics-full
- Download URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev/files/flat_full.nt.gz
Full stream distribution
- Title: Full stream distribution
- Identifier:
stream-full - Has file name:
stream_full.tar.gz - Has distribution type:
- Full distribution (rb:fullDistribution)
- Stream distribution (rb:streamDistribution)
- Has stream type usage:
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a stream of graphs. Each graph corresponds to the RDF-star annotations of one Wikidata item. (en)
- Has stream type: RDF subject graph stream (stax:subjectGraphStream)
- Has stream element count: 617,768
- Byte size: 36.2 MB
- Media type: text/turtle
- Packaging format: application/tar
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
78fdef6a9024db89d1ad2e5e3482a1fd - Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
6bbb7e7b93fc6c6c94d70116c96992c6332a4a99 - Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Statistics: statistics-full
- Download URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev/files/stream_full.tar.gz
Full Jelly distribution
- Title: Full Jelly distribution
- Identifier:
jelly-full - Has file name:
jelly_full.jelly.gz - Has distribution type:
- Full distribution (rb:fullDistribution)
- Jelly distribution (rb:jellyDistribution)
- Has stream type usage:
- RDF stream type usage (1)
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a stream of graphs. Each graph corresponds to the RDF-star annotations of one Wikidata item. (en)
- Has stream type: RDF subject graph stream (stax:subjectGraphStream)
- RDF stream type usage (2)
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a flattened stream of triples. (en)
- Has stream type: Flat RDF triple stream (stax:flatTripleStream)
- RDF stream type usage (1)
- Has stream element count: 617,768
- Byte size: 28.9 MB
- Media type: application/x-jelly-rdf
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
ed21e2130a2ce738837498f195b37da0 - Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
8e77fe4b8e8a9d2ea716223f9a7904e04ce6b8b7 - Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Statistics: statistics-full
- Download URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev/files/jelly_full.jelly.gz
100K elements flat distribution
- Title: 100K elements flat distribution
- Identifier:
flat-100k - Has file name:
flat_100K.nt.gz - Has distribution type:
- Flat distribution (RDF Messages) (rb:flatDistribution)
- Partial distribution (rb:partialDistribution)
- Has stream type usage:
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a flattened stream of triples. (en)
- Has stream type: Flat RDF triple stream (stax:flatTripleStream)
- Has stream element count: 100,000
- Byte size: 2.5 MB
- Media type: application/n-triples
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
e3b1d9624cd388295cc10c69016c2f28 - Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
7d23cb9e51a89dd5a84051e48d9f4081e75f03ad - Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Statistics: statistics-100k
- Download URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev/files/flat_100K.nt.gz
100K elements stream distribution
- Title: 100K elements stream distribution
- Identifier:
stream-100k - Has file name:
stream_100K.tar.gz - Has distribution type:
- Partial distribution (rb:partialDistribution)
- Stream distribution (rb:streamDistribution)
- Has stream type usage:
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a stream of graphs. Each graph corresponds to the RDF-star annotations of one Wikidata item. (en)
- Has stream type: RDF subject graph stream (stax:subjectGraphStream)
- Has stream element count: 100,000
- Byte size: 3.6 MB
- Media type: text/turtle
- Packaging format: application/tar
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
a7d253134c269fd12fc2d3a4b9194df5 - Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
bfdaa5a54a420b7449308c2eea9a7f7d6ec224dd - Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Statistics: statistics-100k
- Download URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev/files/stream_100K.tar.gz
100K elements Jelly distribution
- Title: 100K elements Jelly distribution
- Identifier:
jelly-100k - Has file name:
jelly_100K.jelly.gz - Has distribution type:
- Jelly distribution (rb:jellyDistribution)
- Partial distribution (rb:partialDistribution)
- Has stream type usage:
- RDF stream type usage (1)
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a flattened stream of triples. (en)
- Has stream type: Flat RDF triple stream (stax:flatTripleStream)
- RDF stream type usage (2)
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a stream of graphs. Each graph corresponds to the RDF-star annotations of one Wikidata item. (en)
- Has stream type: RDF subject graph stream (stax:subjectGraphStream)
- RDF stream type usage (1)
- Has stream element count: 100,000
- Byte size: 2.7 MB
- Media type: application/x-jelly-rdf
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
30843f52dbb335e324f29a16c01a5dce - Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
981bb1f53301996502be6eefe09a3ede5ee338a3 - Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Statistics: statistics-100k
- Download URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev/files/jelly_100K.jelly.gz
10K elements flat distribution
- Title: 10K elements flat distribution
- Identifier:
flat-10k - Has file name:
flat_10K.nt.gz - Has distribution type:
- Flat distribution (RDF Messages) (rb:flatDistribution)
- Partial distribution (rb:partialDistribution)
- Has stream type usage:
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a flattened stream of triples. (en)
- Has stream type: Flat RDF triple stream (stax:flatTripleStream)
- Has stream element count: 10,000
- Byte size: 267.8 KB
- Media type: application/n-triples
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
965a7772ce6e1be31722d153ac11e3f4 - Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
de2a17811471d41aa196096b7e9834494aeb4d3a - Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Statistics: statistics-10k
- Download URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev/files/flat_10K.nt.gz
10K elements stream distribution
- Title: 10K elements stream distribution
- Identifier:
stream-10k - Has file name:
stream_10K.tar.gz - Has distribution type:
- Partial distribution (rb:partialDistribution)
- Stream distribution (rb:streamDistribution)
- Has stream type usage:
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a stream of graphs. Each graph corresponds to the RDF-star annotations of one Wikidata item. (en)
- Has stream type: RDF subject graph stream (stax:subjectGraphStream)
- Has stream element count: 10,000
- Byte size: 376.3 KB
- Media type: text/turtle
- Packaging format: application/tar
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
f217b524241db4433a4d4217dfc59375 - Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
7d4b331fcc1aa4bed48ebfacacc002d6d7e70c68 - Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Statistics: statistics-10k
- Download URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev/files/stream_10K.tar.gz
10K elements Jelly distribution
- Title: 10K elements Jelly distribution
- Identifier:
jelly-10k - Has file name:
jelly_10K.jelly.gz - Has distribution type:
- Jelly distribution (rb:jellyDistribution)
- Partial distribution (rb:partialDistribution)
- Has stream type usage:
- RDF stream type usage (1)
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a stream of graphs. Each graph corresponds to the RDF-star annotations of one Wikidata item. (en)
- Has stream type: RDF subject graph stream (stax:subjectGraphStream)
- RDF stream type usage (2)
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a flattened stream of triples. (en)
- Has stream type: Flat RDF triple stream (stax:flatTripleStream)
- RDF stream type usage (1)
- Has stream element count: 10,000
- Byte size: 257.1 KB
- Media type: application/x-jelly-rdf
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
c2a93aee67e9cdaec18596140362e72a - Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
10eac8147ed492bba6ebc334d56498b982b1b695 - Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Statistics: statistics-10k
- Download URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev/files/jelly_10K.jelly.gz
Statistics
Statistics for full distributions
- Title: Statistics for full distributions
| Sum | Unique | Mean | St. dev. | Min. | Max. | |
|---|---|---|---|---|---|---|
| IRIs | 3,631,687 | ~591,866 | 5.88 | 3.22 | 3 | 853 |
| Blank nodes | 0 | N/A | 0.00 | 0.00 | 0 | 0 |
| Literals | 1,736,327 | ~57,521 | 2.81 | 2.50 | 1 | 66 |
| Simple literals | 211 | ~174 | 0.00 | 0.02 | 0 | 3 |
| Datatype literals | 1,736,116 | ~57,356 | 2.81 | 2.50 | 1 | 66 |
| Language literals | 0 | ~0 | 0.00 | 0.00 | 0 | 0 |
| Datatypes | 647,861 | 6 | 1.05 | 0.22 | 1 | 3 |
| ASCII control chars | 0 | N/A | 0.00 | 0.00 | 0 | 0 |
| Quoted triples | 2,484,547 | N/A | 4.02 | 6.10 | 1 | 1,455 |
| Subjects | 2,009,932 | ~1,905,199 | 3.25 | 3.04 | 2 | 850 |
| Predicates | 1,622,855 | ~75 | 2.63 | 0.48 | 2 | 3 |
| Objects | 3,127,393 | ~165,841 | 5.06 | 5.06 | 1 | 853 |
| Graphs | 617,768 | ~1 | 1.00 | 0.00 | 1 | 1 |
| Statements | 2,484,547 | N/A | 4.02 | 6.10 | 1 | 1,455 |
| Bytes per statement | N/A | N/A | 336.15 | 642.39 | 0.78 | 311,033.00 |
Statistics for 100K distributions
- Title: Statistics for 100K distributions
| Sum | Unique | Mean | St. dev. | Min. | Max. | |
|---|---|---|---|---|---|---|
| IRIs | 502,972 | ~102,634 | 5.03 | 5.30 | 3 | 853 |
| Blank nodes | 0 | N/A | 0.00 | 0.00 | 0 | 0 |
| Literals | 187,612 | ~37,278 | 1.88 | 0.98 | 1 | 49 |
| Simple literals | 66 | ~66 | 0.00 | 0.03 | 0 | 3 |
| Datatype literals | 187,546 | ~37,218 | 1.88 | 0.97 | 1 | 49 |
| Language literals | 0 | ~0 | 0.00 | 0.00 | 0 | 0 |
| Datatypes | 110,424 | 5 | 1.10 | 0.31 | 1 | 3 |
| ASCII control chars | 0 | N/A | 0.00 | 0.00 | 0 | 0 |
| Quoted triples | 226,648 | N/A | 2.27 | 9.28 | 1 | 1,455 |
| Subjects | 246,103 | ~238,851 | 2.46 | 5.24 | 2 | 850 |
| Predicates | 257,646 | ~32 | 2.58 | 0.49 | 2 | 3 |
| Objects | 332,939 | ~52,703 | 3.33 | 5.46 | 1 | 853 |
| Graphs | 100,000 | ~1 | 1.00 | 0.00 | 1 | 1 |
| Statements | 226,648 | N/A | 2.27 | 9.28 | 1 | 1,455 |
| Bytes per statement | N/A | N/A | 291.52 | 1,282.61 | 0.78 | 311,033.00 |
Statistics for 10K distributions
- Title: Statistics for 10K distributions
| Sum | Unique | Mean | St. dev. | Min. | Max. | |
|---|---|---|---|---|---|---|
| IRIs | 49,533 | ~10,228 | 4.95 | 0.93 | 3 | 10 |
| Blank nodes | 0 | N/A | 0.00 | 0.00 | 0 | 0 |
| Literals | 19,576 | ~7,330 | 1.96 | 0.90 | 1 | 8 |
| Simple literals | 0 | ~0 | 0.00 | 0.00 | 0 | 0 |
| Datatype literals | 19,576 | ~7,330 | 1.96 | 0.90 | 1 | 8 |
| Language literals | 0 | ~0 | 0.00 | 0.00 | 0 | 0 |
| Datatypes | 11,195 | 3 | 1.12 | 0.33 | 1 | 3 |
| ASCII control chars | 0 | N/A | 0.00 | 0.00 | 0 | 0 |
| Quoted triples | 22,977 | N/A | 2.30 | 1.34 | 1 | 10 |
| Subjects | 23,762 | ~23,838 | 2.38 | 0.53 | 2 | 7 |
| Predicates | 26,100 | ~12 | 2.61 | 0.49 | 2 | 3 |
| Objects | 33,009 | ~7,548 | 3.30 | 1.39 | 1 | 13 |
| Graphs | 10,000 | ~1 | 1.00 | 0.00 | 1 | 1 |
| Statements | 22,977 | N/A | 2.30 | 1.34 | 1 | 10 |
| Bytes per statement | N/A | N/A | 293.82 | 197.27 | 11.50 | 2,500.00 |