yago-annotated-facts (development version)
This is a subset of the YAGO 4 knowledge base (paper), based on Wikidata, version from February 24, 2020. This dataset includes only the fact annotations in RDF-star, that is facts about facts. Each stream element corresponds to one item in Wikidata.
Info
Download this metadata in RDF: Turtle, N-Triples, RDF/XML, Jelly
Source repository: yago-annotated-facts
Stream preview (click to expand)
0000000000.ttl
<< <http://yago-knowledge.org/resource/_Q56236170> <http://schema.org/dissolutionDate> "2009"^^<http://www.w3.org/2001/XMLSchema#gYear> >>
<http://schema.org/endDate> "2009-12"^^<http://www.w3.org/2001/XMLSchema#gYearMonth> ;
<http://schema.org/startDate> "2009-06"^^<http://www.w3.org/2001/XMLSchema#gYearMonth> .
0000000010.ttl
<< <http://yago-knowledge.org/resource/Open_Science_Radio_Q18744554> <http://schema.org/creator> <http://yago-knowledge.org/resource/Konrad_Förstner_Q18744528> >>
<http://schema.org/startDate> "2014-01-19"^^<http://www.w3.org/2001/XMLSchema#date> .
<< <http://yago-knowledge.org/resource/Open_Science_Radio_Q18744554> <http://schema.org/creator> <http://yago-knowledge.org/resource/Matthias_Fromm_Q18748012> >>
<http://schema.org/startDate> "2013-01-02"^^<http://www.w3.org/2001/XMLSchema#date> .
0000000100.ttl
<< <http://yago-knowledge.org/resource/Carnaval_na_avenida_Central,_atual_avenida_Rio_Branco_Q65621070> <http://schema.org/dateCreated> "1906-06-22"^^<http://www.w3.org/2001/XMLSchema#date> >>
<http://schema.org/endDate> "1906"^^<http://www.w3.org/2001/XMLSchema#gYear> ;
<http://schema.org/startDate> "1906"^^<http://www.w3.org/2001/XMLSchema#gYear> .
0000001000.ttl
<< <http://yago-knowledge.org/resource/Margherita_Cagol> <http://schema.org/nationality> <http://yago-knowledge.org/resource/Italy> >>
<http://schema.org/endDate> "1975-06-05"^^<http://www.w3.org/2001/XMLSchema#date> ;
<http://schema.org/startDate> "1946-06-18"^^<http://www.w3.org/2001/XMLSchema#date> .
<< <http://yago-knowledge.org/resource/Margherita_Cagol> <http://schema.org/nationality> <http://yago-knowledge.org/resource/Kingdom_of_Italy> >>
<http://schema.org/endDate> "1946-06-18"^^<http://www.w3.org/2001/XMLSchema#date> ;
<http://schema.org/startDate> "1945-04-08"^^<http://www.w3.org/2001/XMLSchema#date> .
0000010000.ttl
<< <http://yago-knowledge.org/resource/Mihrengiz_Kadın> <http://schema.org/nationality> <http://yago-knowledge.org/resource/Ottoman_Empire> >>
<http://schema.org/endDate> "1923"^^<http://www.w3.org/2001/XMLSchema#gYear> ;
<http://schema.org/startDate> "1869"^^<http://www.w3.org/2001/XMLSchema#gYear> .
General information
- Title: YAGO annotated facts (en)
- Identifier:
yago-annotated-facts
- Has version:
dev
- Theme:
- Encyclopaedia (eurovoc:4137)
- Metadata (eurovoc:c_40f54e0c)
- Open data (eurovoc:c_5ea6e5c4)
- Creator:
- The creators and contributors of Wikidata (1)
- Name: The creators and contributors of Wikidata
- Homepage: https://www.wikidata.org/
- The YAGO team of Télécom Paris and the Max Planck Institute for Informatics (2)
- Name: The YAGO team of Télécom Paris and the Max Planck Institute for Informatics
- Homepage: https://yago-knowledge.org/contributors
- Piotr Sowiński (3)
- Name: Piotr Sowiński
- Nickname: Ostrzyciel
- Homepage:
- The creators and contributors of Wikidata (1)
- License: https://spdx.org/licenses/CC-BY-SA-3.0
- Source:
- Date Issued: 2023-04-30
- Date Modified: 2023-12-01
- Landing page: yago-annotated-facts (dev)
- Conforms To: Metadata (https://w3id.org/riverbench/schema/metadata)
Technical metadata
- Has stream type usage:
- RDF stream type usage (1)
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a stream of graphs. Each graph corresponds to the RDF-star annotations of one Wikidata item. (en)
- Has stream type: RDF subject graph stream (stax:subjectGraphStream)
- RDF stream type usage (2)
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a flattened stream of triples. (en)
- Has stream type: Flat RDF triple stream (stax:flatTripleStream)
- RDF stream type usage (1)
- Has stream element count: 617,768
- Has stream element split:
- Type: Stream elements split by topic (rb:TopicStreamElementSplit)
- Has subject shape:
- Comment: Custom target – subject of any quoted triple in the subject position. (en)
- Target custom: YAGO annotated facts target (rb:yagoTarget)
- Comment: Every stream element corresponds to one Wikidata item. (en)
- Uses vocabulary: http://schema.org/
- Conforms to W3C RDF 1.1 specification: no
- Conforms to W3C RDF-star draft specification as of December 17, 2021: yes
- Uses generalized triples: no
- Uses generalized RDF datasets: no
- Uses RDF-star: yes
Distributions
Full stream distribution
- Title: Full stream distribution
- Identifier:
stream-full
- Has file name:
stream_full.tar.gz
- Has stream type usage:
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a stream of graphs. Each graph corresponds to the RDF-star annotations of one Wikidata item. (en)
- Has stream type: RDF subject graph stream (stax:subjectGraphStream)
- Has distribution type:
- Full distribution (rb:fullDistribution)
- Stream distribution (rb:streamDistribution)
- Has stream element count: 617,768
- Byte size: 36.16 MB
- Media type: text/turtle
- Packaging format: application/tar
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
919810b0ba300c0962f9f01cf8e5e4b1
- Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
a3098bb5abc1697f3d5cef2f682e15b6847f47ad
- Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Download URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev/files/stream_full.tar.gz
- Statistics: statistics-full
Full Jelly distribution
- Title: Full Jelly distribution
- Identifier:
jelly-full
- Has file name:
jelly_full.jelly.gz
- Has stream type usage:
- RDF stream type usage (1)
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a flattened stream of triples. (en)
- Has stream type: Flat RDF triple stream (stax:flatTripleStream)
- RDF stream type usage (2)
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a stream of graphs. Each graph corresponds to the RDF-star annotations of one Wikidata item. (en)
- Has stream type: RDF subject graph stream (stax:subjectGraphStream)
- RDF stream type usage (1)
- Has distribution type:
- Full distribution (rb:fullDistribution)
- Jelly distribution (rb:jellyDistribution)
- Has stream element count: 617,768
- Byte size: 29.91 MB
- Media type: application/x-jelly-rdf
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
3a45bfe44a486e8a2c23b7d2ee9ce452
- Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
c1e592493e9571d3e5b4501a04fc4b6525013d89
- Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Download URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev/files/jelly_full.jelly.gz
- Statistics: statistics-full
Full flat distribution
- Title: Full flat distribution
- Identifier:
flat-full
- Has file name:
flat_full.nt.gz
- Has stream type usage:
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a flattened stream of triples. (en)
- Has stream type: Flat RDF triple stream (stax:flatTripleStream)
- Has distribution type:
- Flat distribution (rb:flatDistribution)
- Full distribution (rb:fullDistribution)
- Has stream element count: 617,768
- Byte size: 28.75 MB
- Media type: application/n-triples
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
5cebabe22bfa4e4ee2cd0f1b2502547a
- Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
f70d95958960eb247ecbe6df159a642abbb77152
- Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Download URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev/files/flat_full.nt.gz
- Statistics: statistics-full
100K elements stream distribution
- Title: 100K elements stream distribution
- Identifier:
stream-100k
- Has file name:
stream_100K.tar.gz
- Has stream type usage:
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a stream of graphs. Each graph corresponds to the RDF-star annotations of one Wikidata item. (en)
- Has stream type: RDF subject graph stream (stax:subjectGraphStream)
- Has distribution type:
- Partial distribution (rb:partialDistribution)
- Stream distribution (rb:streamDistribution)
- Has stream element count: 100,000
- Byte size: 3.57 MB
- Media type: text/turtle
- Packaging format: application/tar
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
a3553791d17cf80132f30ed5a6670c95
- Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
3cc1ebd96552f2c9b1bc042a799ab8f14160dbbb
- Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Download URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev/files/stream_100K.tar.gz
- Statistics: statistics-100k
100K elements Jelly distribution
- Title: 100K elements Jelly distribution
- Identifier:
jelly-100k
- Has file name:
jelly_100K.jelly.gz
- Has stream type usage:
- RDF stream type usage (1)
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a stream of graphs. Each graph corresponds to the RDF-star annotations of one Wikidata item. (en)
- Has stream type: RDF subject graph stream (stax:subjectGraphStream)
- RDF stream type usage (2)
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a flattened stream of triples. (en)
- Has stream type: Flat RDF triple stream (stax:flatTripleStream)
- RDF stream type usage (1)
- Has distribution type:
- Jelly distribution (rb:jellyDistribution)
- Partial distribution (rb:partialDistribution)
- Has stream element count: 100,000
- Byte size: 2.98 MB
- Media type: application/x-jelly-rdf
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
aeb910980d635f706e3c50f33b82f522
- Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
5c327e6e23c01cb5356d2705e160b272875cab12
- Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Download URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev/files/jelly_100K.jelly.gz
- Statistics: statistics-100k
100K elements flat distribution
- Title: 100K elements flat distribution
- Identifier:
flat-100k
- Has file name:
flat_100K.nt.gz
- Has stream type usage:
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a flattened stream of triples. (en)
- Has stream type: Flat RDF triple stream (stax:flatTripleStream)
- Has distribution type:
- Flat distribution (rb:flatDistribution)
- Partial distribution (rb:partialDistribution)
- Has stream element count: 100,000
- Byte size: 2.38 MB
- Media type: application/n-triples
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
459fcc6e818aba459271874ba7c01515
- Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
ba8de0be6ca49a118aed54ca674c67e3947fcdec
- Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Download URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev/files/flat_100K.nt.gz
- Statistics: statistics-100k
10K elements stream distribution
- Title: 10K elements stream distribution
- Identifier:
stream-10k
- Has file name:
stream_10K.tar.gz
- Has stream type usage:
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a stream of graphs. Each graph corresponds to the RDF-star annotations of one Wikidata item. (en)
- Has stream type: RDF subject graph stream (stax:subjectGraphStream)
- Has distribution type:
- Partial distribution (rb:partialDistribution)
- Stream distribution (rb:streamDistribution)
- Has stream element count: 10,000
- Byte size: 376.48 KB
- Media type: text/turtle
- Packaging format: application/tar
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
068ae9fc47cbed7dc8e6e57fe368a0bf
- Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
e19961eed6e9fd5c16f9611e702ceabdb63ff747
- Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Download URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev/files/stream_10K.tar.gz
- Statistics: statistics-10k
10K elements Jelly distribution
- Title: 10K elements Jelly distribution
- Identifier:
jelly-10k
- Has file name:
jelly_10K.jelly.gz
- Has stream type usage:
- RDF stream type usage (1)
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a stream of graphs. Each graph corresponds to the RDF-star annotations of one Wikidata item. (en)
- Has stream type: RDF subject graph stream (stax:subjectGraphStream)
- RDF stream type usage (2)
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a flattened stream of triples. (en)
- Has stream type: Flat RDF triple stream (stax:flatTripleStream)
- RDF stream type usage (1)
- Has distribution type:
- Jelly distribution (rb:jellyDistribution)
- Partial distribution (rb:partialDistribution)
- Has stream element count: 10,000
- Byte size: 301.51 KB
- Media type: application/x-jelly-rdf
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
906021e7c741953561d54ee08a532abb
- Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
2828def90c036707ef7214ca86ccbe47e7764e58
- Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Download URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev/files/jelly_10K.jelly.gz
- Statistics: statistics-10k
10K elements flat distribution
- Title: 10K elements flat distribution
- Identifier:
flat-10k
- Has file name:
flat_10K.nt.gz
- Has stream type usage:
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a flattened stream of triples. (en)
- Has stream type: Flat RDF triple stream (stax:flatTripleStream)
- Has distribution type:
- Flat distribution (rb:flatDistribution)
- Partial distribution (rb:partialDistribution)
- Has stream element count: 10,000
- Byte size: 256.83 KB
- Media type: application/n-triples
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
edaa110cde79d3c57846c39ea1ec7827
- Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
82db849d4410765888daba032cea8d635b6f0be6
- Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Download URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev/files/flat_10K.nt.gz
- Statistics: statistics-10k
Statistics
Statistics for full distributions
- Title: Statistics for full distributions
Sum | Unique | Mean | St. dev. | Min. | Max. | |
---|---|---|---|---|---|---|
IRIs | 3,631,687 | 594,855 | 5.88 | 3.22 | 3 | 853 |
Blank nodes | 0 | N/A | 0.00 | 0.00 | 0 | 0 |
Graphs | 617,768 | N/A | 1.00 | 0.00 | 1 | 1 |
Statements | 2,484,547 | N/A | 4.02 | 6.10 | 1 | 1,455 |
Literals | 1,736,327 | 57,578 | 2.81 | 2.50 | 1 | 66 |
Simple literals | 211 | 174 | 0.00 | 0.02 | 0 | 3 |
Datatype literals | 1,736,116 | 57,405 | 2.81 | 2.50 | 1 | 66 |
Language literals | 0 | 0 | 0.00 | 0.00 | 0 | 0 |
Quoted triples | 2,484,547 | N/A | 4.02 | 6.10 | 1 | 1,455 |
Subjects | 2,009,932 | N/A | 3.25 | 3.04 | 2 | 850 |
Predicates | 1,622,855 | N/A | 2.63 | 0.48 | 2 | 3 |
Objects | 3,127,393 | N/A | 5.06 | 5.06 | 1 | 853 |
Statistics for 100K distributions
- Title: Statistics for 100K distributions
Sum | Unique | Mean | St. dev. | Min. | Max. | |
---|---|---|---|---|---|---|
IRIs | 502,972 | 102,657 | 5.03 | 5.30 | 3 | 853 |
Blank nodes | 0 | N/A | 0.00 | 0.00 | 0 | 0 |
Graphs | 100,000 | N/A | 1.00 | 0.00 | 1 | 1 |
Statements | 226,648 | N/A | 2.27 | 9.28 | 1 | 1,455 |
Literals | 187,612 | 37,329 | 1.88 | 0.98 | 1 | 49 |
Simple literals | 66 | 66 | 0.00 | 0.03 | 0 | 3 |
Datatype literals | 187,546 | 37,263 | 1.88 | 0.97 | 1 | 49 |
Language literals | 0 | 0 | 0.00 | 0.00 | 0 | 0 |
Quoted triples | 226,648 | N/A | 2.27 | 9.28 | 1 | 1,455 |
Subjects | 246,103 | N/A | 2.46 | 5.24 | 2 | 850 |
Predicates | 257,646 | N/A | 2.58 | 0.49 | 2 | 3 |
Objects | 332,939 | N/A | 3.33 | 5.46 | 1 | 853 |
Statistics for 10K distributions
- Title: Statistics for 10K distributions
Sum | Unique | Mean | St. dev. | Min. | Max. | |
---|---|---|---|---|---|---|
IRIs | 49,533 | 10,233 | 4.95 | 0.93 | 3 | 10 |
Blank nodes | 0 | N/A | 0.00 | 0.00 | 0 | 0 |
Graphs | 10,000 | N/A | 1.00 | 0.00 | 1 | 1 |
Statements | 22,977 | N/A | 2.30 | 1.34 | 1 | 10 |
Literals | 19,576 | 7,332 | 1.96 | 0.90 | 1 | 8 |
Simple literals | 0 | 0 | 0.00 | 0.00 | 0 | 0 |
Datatype literals | 19,576 | 7,332 | 1.96 | 0.90 | 1 | 8 |
Language literals | 0 | 0 | 0.00 | 0.00 | 0 | 0 |
Quoted triples | 22,977 | N/A | 2.30 | 1.34 | 1 | 10 |
Subjects | 23,762 | N/A | 2.38 | 0.53 | 2 | 7 |
Predicates | 26,100 | N/A | 2.61 | 0.49 | 2 | 3 |
Objects | 33,009 | N/A | 3.30 | 1.39 | 1 | 13 |