yago-annotated-facts (development version)
This is a subset of the YAGO 4 knowledge base (paper), based on Wikidata, version from February 24, 2020. This dataset includes only the fact annotations in RDF-star, that is facts about facts. Each stream element corresponds to one item in Wikidata.
Info
Download this metadata in RDF: Turtle, N-Triples, RDF/XML
Source repository: yago-annotated-facts
General information
- Title: YAGO annotated facts
- Identifier: yago-annotated-facts
- Has version: dev
- Theme: Encyclopedic (rbt:encyclopedic)
- Creator:
- The creators and contributors of Wikidata (1)
- Name: The creators and contributors of Wikidata
- Homepage: https://www.wikidata.org/
- The YAGO team of Télécom Paris and the Max Planck Institute for Informatics (2)
- Name: The YAGO team of Télécom Paris and the Max Planck Institute for Informatics
- Homepage: https://yago-knowledge.org/contributors
- Piotr Sowiński (3)
- Name: Piotr Sowiński
- Nickname: Ostrzyciel
- Homepage:
- The creators and contributors of Wikidata (1)
- License: https://spdx.org/licenses/CC-BY-SA-3.0
- Source:
- Date Issued: 2023-04-30
- Date Modified: 2023-05-08
- Landing page: yago-annotated-facts (dev)
- Conforms To: Metadata (https://w3id.org/riverbench/schema/metadata)
Technical metadata
- Has stream element type: Triples (rb:triples)
- Has stream element count: 617,768
- Has stream element split:
- Type: Stream elements split by topic (rb:TopicStreamElementSplit)
- Comment: Every stream element corresponds to one Wikidata item.
- Uses ontology: http://schema.org/
- Conforms to W3C RDF 1.1 specification: no
- Conforms to W3C RDF-star draft specification as of December 17, 2021: yes
- Uses generalized triples: no
- Uses generalized RDF datasets: no
- Uses RDF-star: yes
Distributions
Full triple stream distribution
- Title: Full triple stream distribution
- Identifier: stream-full
- Has file name: stream_full.tar.gz
- Has distribution type:
- Full distribution (rb:fullDistribution)
- Triple stream distribution (rb:tripleStreamDistribution)
- Has stream element count: 617,768
- Byte size: 36.16 MB
- Media type: text/turtle
- Packaging format: application/tar
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
c9e06af6b58ed50dd1a5a7a4b778849c
- Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
b21f2846fab96a6054f813de27e992efe38404c8
- Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Download URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev/files/stream_full.tar.gz
Has statistics
IRI count statistics
- Type: IRI count statistics (rb:IriCountStatistics)
- Sum: 3,631,687
- Unique count (estimated): 594,855
- Mean: 5.88
- Standard deviation: 3.22
- Minimum: 3
- Maximum: 853
Blank node count statistics
- Type: Blank node count statistics (rb:BlankNodeCountStatistics)
- Sum: 0
- Mean: 0.00
- Standard deviation: 0.00
- Minimum: 0
- Maximum: 0
Literal count statistics
- Type: Literal count statistics (rb:LiteralCountStatistics)
- Sum: 1,736,327
- Unique count (estimated): 57,578
- Mean: 2.81
- Standard deviation: 2.50
- Minimum: 1
- Maximum: 66
Simple literal count statistics
- Type: Simple literal count statistics (rb:SimpleLiteralCountStatistics)
- Sum: 211
- Unique count (estimated): 174
- Mean: 0.00
- Standard deviation: 0.02
- Minimum: 0
- Maximum: 3
Datatype literal count statistics
- Type: Datatype literal count statistics (rb:DatatypeLiteralCountStatistics)
- Sum: 1,736,116
- Unique count (estimated): 57,405
- Mean: 2.81
- Standard deviation: 2.50
- Minimum: 1
- Maximum: 66
Language string count statistics
- Type: Language string count statistics (rb:LanguageLiteralCountStatistics)
- Sum: 0
- Unique count (estimated): 0
- Mean: 0.00
- Standard deviation: 0.00
- Minimum: 0
- Maximum: 0
Quoted triple count statistics
- Type: Quoted triple count statistics (rb:QuotedTripleCountStatistics)
- Sum: 2,484,547
- Mean: 4.02
- Standard deviation: 6.10
- Minimum: 1
- Maximum: 1,455
Subject count statistics
- Type: Subject count statistics (rb:SubjectCountStatistics)
- Sum: 2,009,932
- Mean: 3.25
- Standard deviation: 3.04
- Minimum: 2
- Maximum: 850
Predicate count statistics
- Type: Predicate count statistics (rb:PredicateCountStatistics)
- Sum: 1,622,855
- Mean: 2.63
- Standard deviation: 0.48
- Minimum: 2
- Maximum: 3
Object count statistics
- Type: Object count statistics (rb:ObjectCountStatistics)
- Sum: 3,127,393
- Mean: 5.06
- Standard deviation: 5.06
- Minimum: 1
- Maximum: 853
Graph count statistics
- Type: Graph count statistics (rb:GraphCountStatistics)
- Sum: 0
- Mean: 0.00
- Standard deviation: 0.00
- Minimum: 0
- Maximum: 0
Statement count statistics
- Type: Statement count statistics (rb:StatementCountStatistics)
- Sum: 2,484,547
- Mean: 4.02
- Standard deviation: 6.10
- Minimum: 1
- Maximum: 1,455
Full flat distribution
- Title: Full flat distribution
- Identifier: flat-full
- Has file name: flat_full.nt.gz
- Has distribution type:
- Flat distribution (rb:flatDistribution)
- Full distribution (rb:fullDistribution)
- Has stream element count: 617,768
- Byte size: 28.75 MB
- Media type: application/n-triples
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
5cebabe22bfa4e4ee2cd0f1b2502547a
- Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
f70d95958960eb247ecbe6df159a642abbb77152
- Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Download URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev/files/flat_full.nt.gz
Has statistics
IRI count statistics
- Type: IRI count statistics (rb:IriCountStatistics)
- Sum: 3,631,687
- Unique count (estimated): 594,855
- Mean: 5.88
- Standard deviation: 3.22
- Minimum: 3
- Maximum: 853
Blank node count statistics
- Type: Blank node count statistics (rb:BlankNodeCountStatistics)
- Sum: 0
- Mean: 0.00
- Standard deviation: 0.00
- Minimum: 0
- Maximum: 0
Literal count statistics
- Type: Literal count statistics (rb:LiteralCountStatistics)
- Sum: 1,736,327
- Unique count (estimated): 57,578
- Mean: 2.81
- Standard deviation: 2.50
- Minimum: 1
- Maximum: 66
Simple literal count statistics
- Type: Simple literal count statistics (rb:SimpleLiteralCountStatistics)
- Sum: 211
- Unique count (estimated): 174
- Mean: 0.00
- Standard deviation: 0.02
- Minimum: 0
- Maximum: 3
Datatype literal count statistics
- Type: Datatype literal count statistics (rb:DatatypeLiteralCountStatistics)
- Sum: 1,736,116
- Unique count (estimated): 57,405
- Mean: 2.81
- Standard deviation: 2.50
- Minimum: 1
- Maximum: 66
Language string count statistics
- Type: Language string count statistics (rb:LanguageLiteralCountStatistics)
- Sum: 0
- Unique count (estimated): 0
- Mean: 0.00
- Standard deviation: 0.00
- Minimum: 0
- Maximum: 0
Quoted triple count statistics
- Type: Quoted triple count statistics (rb:QuotedTripleCountStatistics)
- Sum: 2,484,547
- Mean: 4.02
- Standard deviation: 6.10
- Minimum: 1
- Maximum: 1,455
Subject count statistics
- Type: Subject count statistics (rb:SubjectCountStatistics)
- Sum: 2,009,932
- Mean: 3.25
- Standard deviation: 3.04
- Minimum: 2
- Maximum: 850
Predicate count statistics
- Type: Predicate count statistics (rb:PredicateCountStatistics)
- Sum: 1,622,855
- Mean: 2.63
- Standard deviation: 0.48
- Minimum: 2
- Maximum: 3
Object count statistics
- Type: Object count statistics (rb:ObjectCountStatistics)
- Sum: 3,127,393
- Mean: 5.06
- Standard deviation: 5.06
- Minimum: 1
- Maximum: 853
Graph count statistics
- Type: Graph count statistics (rb:GraphCountStatistics)
- Sum: 0
- Mean: 0.00
- Standard deviation: 0.00
- Minimum: 0
- Maximum: 0
Statement count statistics
- Type: Statement count statistics (rb:StatementCountStatistics)
- Sum: 2,484,547
- Mean: 4.02
- Standard deviation: 6.10
- Minimum: 1
- Maximum: 1,455
100K elements triple stream distribution
- Title: 100K elements triple stream distribution
- Identifier:
stream-100k
- Has file name:
stream_100K.tar.gz
- Has distribution type:
- Partial distribution (rb:partialDistribution)
- Triple stream distribution (rb:tripleStreamDistribution)
- Has stream element count: 100,000
- Byte size: 3.57 MB
- Media type: text/turtle
- Packaging format: application/tar
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
93db75a276ca63541bfb58f85dd1a4cb
- Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
ae652aa0df32689cb6b31ab652149c3fabe95216
- Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Download URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev/files/stream_100K.tar.gz
Has statistics
IRI count statistics
- Type: IRI count statistics (rb:IriCountStatistics)
- Sum: 502,972
- Unique count (estimated): 102,657
- Mean: 5.03
- Standard deviation: 5.30
- Minimum: 3
- Maximum: 853
Blank node count statistics
- Type: Blank node count statistics (rb:BlankNodeCountStatistics)
- Sum: 0
- Mean: 0.00
- Standard deviation: 0.00
- Minimum: 0
- Maximum: 0
Literal count statistics
- Type: Literal count statistics (rb:LiteralCountStatistics)
- Sum: 187,612
- Unique count (estimated): 37,329
- Mean: 1.88
- Standard deviation: 0.98
- Minimum: 1
- Maximum: 49
Simple literal count statistics
- Type: Simple literal count statistics (rb:SimpleLiteralCountStatistics)
- Sum: 66
- Unique count (estimated): 66
- Mean: 0.00
- Standard deviation: 0.03
- Minimum: 0
- Maximum: 3
Datatype literal count statistics
- Type: Datatype literal count statistics (rb:DatatypeLiteralCountStatistics)
- Sum: 187,546
- Unique count (estimated): 37,263
- Mean: 1.88
- Standard deviation: 0.97
- Minimum: 1
- Maximum: 49
Language string count statistics
- Type: Language string count statistics (rb:LanguageLiteralCountStatistics)
- Sum: 0
- Unique count (estimated): 0
- Mean: 0.00
- Standard deviation: 0.00
- Minimum: 0
- Maximum: 0
Quoted triple count statistics
- Type: Quoted triple count statistics (rb:QuotedTripleCountStatistics)
- Sum: 226,648
- Mean: 2.27
- Standard deviation: 9.28
- Minimum: 1
- Maximum: 1,455
Subject count statistics
- Type: Subject count statistics (rb:SubjectCountStatistics)
- Sum: 246,103
- Mean: 2.46
- Standard deviation: 5.24
- Minimum: 2
- Maximum: 850
Predicate count statistics
- Type: Predicate count statistics (rb:PredicateCountStatistics)
- Sum: 257,646
- Mean: 2.58
- Standard deviation: 0.49
- Minimum: 2
- Maximum: 3
Object count statistics
- Type: Object count statistics (rb:ObjectCountStatistics)
- Sum: 332,939
- Mean: 3.33
- Standard deviation: 5.46
- Minimum: 1
- Maximum: 853
Graph count statistics
- Type: Graph count statistics (rb:GraphCountStatistics)
- Sum: 0
- Mean: 0.00
- Standard deviation: 0.00
- Minimum: 0
- Maximum: 0
Statement count statistics
- Type: Statement count statistics (rb:StatementCountStatistics)
- Sum: 226,648
- Mean: 2.27
- Standard deviation: 9.28
- Minimum: 1
- Maximum: 1,455
100K elements flat distribution
- Title: 100K elements flat distribution
- Identifier: flat-100k
- Has file name:
flat_100K.nt.gz
- Has distribution type:
- Flat distribution (rb:flatDistribution)
- Partial distribution (rb:partialDistribution)
- Has stream element count: 100,000
- Byte size: 2.38 MB
- Media type: application/n-triples
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
459fcc6e818aba459271874ba7c01515
- Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
ba8de0be6ca49a118aed54ca674c67e3947fcdec
- Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Download URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev/files/flat_100K.nt.gz
Has statistics
IRI count statistics
- Type: IRI count statistics (rb:IriCountStatistics)
- Sum: 502,972
- Unique count (estimated): 102,657
- Mean: 5.03
- Standard deviation: 5.30
- Minimum: 3
- Maximum: 853
Blank node count statistics
- Type: Blank node count statistics (rb:BlankNodeCountStatistics)
- Sum: 0
- Mean: 0.00
- Standard deviation: 0.00
- Minimum: 0
- Maximum: 0
Literal count statistics
- Type: Literal count statistics (rb:LiteralCountStatistics)
- Sum: 187,612
- Unique count (estimated): 37,329
- Mean: 1.88
- Standard deviation: 0.98
- Minimum: 1
- Maximum: 49
Simple literal count statistics
- Type: Simple literal count statistics (rb:SimpleLiteralCountStatistics)
- Sum: 66
- Unique count (estimated): 66
- Mean: 0.00
- Standard deviation: 0.03
- Minimum: 0
- Maximum: 3
Datatype literal count statistics
- Type: Datatype literal count statistics (rb:DatatypeLiteralCountStatistics)
- Sum: 187,546
- Unique count (estimated): 37,263
- Mean: 1.88
- Standard deviation: 0.97
- Minimum: 1
- Maximum: 49
Language string count statistics
- Type: Language string count statistics (rb:LanguageLiteralCountStatistics)
- Sum: 0
- Unique count (estimated): 0
- Mean: 0.00
- Standard deviation: 0.00
- Minimum: 0
- Maximum: 0
Quoted triple count statistics
- Type: Quoted triple count statistics (rb:QuotedTripleCountStatistics)
- Sum: 226,648
- Mean: 2.27
- Standard deviation: 9.28
- Minimum: 1
- Maximum: 1,455
Subject count statistics
- Type: Subject count statistics (rb:SubjectCountStatistics)
- Sum: 246,103
- Mean: 2.46
- Standard deviation: 5.24
- Minimum: 2
- Maximum: 850
Predicate count statistics
- Type: Predicate count statistics (rb:PredicateCountStatistics)
- Sum: 257,646
- Mean: 2.58
- Standard deviation: 0.49
- Minimum: 2
- Maximum: 3
Object count statistics
- Type: Object count statistics (rb:ObjectCountStatistics)
- Sum: 332,939
- Mean: 3.33
- Standard deviation: 5.46
- Minimum: 1
- Maximum: 853
Graph count statistics
- Type: Graph count statistics (rb:GraphCountStatistics)
- Sum: 0
- Mean: 0.00
- Standard deviation: 0.00
- Minimum: 0
- Maximum: 0
Statement count statistics
- Type: Statement count statistics (rb:StatementCountStatistics)
- Sum: 226,648
- Mean: 2.27
- Standard deviation: 9.28
- Minimum: 1
- Maximum: 1,455
10K elements triple stream distribution
- Title: 10K elements triple stream distribution
- Identifier: stream-10k
- Has file name:
stream_10K.tar.gz
- Has distribution type:
- Partial distribution (rb:partialDistribution)
- Triple stream distribution (rb:tripleStreamDistribution)
- Has stream element count: 10,000
- Byte size: 376.46 KB
- Media type: text/turtle
- Packaging format: application/tar
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
7e304d69042c206d4db615a9886e5db0
- Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
3f32b11ba612cc400ba7e95973e69318d729d2e2
- Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Download URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev/files/stream_10K.tar.gz
Has statistics
IRI count statistics
- Type: IRI count statistics (rb:IriCountStatistics)
- Sum: 49,533
- Unique count (estimated): 10,233
- Mean: 4.95
- Standard deviation: 0.93
- Minimum: 3
- Maximum: 10
Blank node count statistics
- Type: Blank node count statistics (rb:BlankNodeCountStatistics)
- Sum: 0
- Mean: 0.00
- Standard deviation: 0.00
- Minimum: 0
- Maximum: 0
Literal count statistics
- Type: Literal count statistics (rb:LiteralCountStatistics)
- Sum: 19,576
- Unique count (estimated): 7,332
- Mean: 1.96
- Standard deviation: 0.90
- Minimum: 1
- Maximum: 8
Simple literal count statistics
- Type: Simple literal count statistics (rb:SimpleLiteralCountStatistics)
- Sum: 0
- Unique count (estimated): 0
- Mean: 0.00
- Standard deviation: 0.00
- Minimum: 0
- Maximum: 0
Datatype literal count statistics
- Type: Datatype literal count statistics (rb:DatatypeLiteralCountStatistics)
- Sum: 19,576
- Unique count (estimated): 7,332
- Mean: 1.96
- Standard deviation: 0.90
- Minimum: 1
- Maximum: 8
Language string count statistics
- Type: Language string count statistics (rb:LanguageLiteralCountStatistics)
- Sum: 0
- Unique count (estimated): 0
- Mean: 0.00
- Standard deviation: 0.00
- Minimum: 0
- Maximum: 0
Quoted triple count statistics
- Type: Quoted triple count statistics (rb:QuotedTripleCountStatistics)
- Sum: 22,977
- Mean: 2.30
- Standard deviation: 1.34
- Minimum: 1
- Maximum: 10
Subject count statistics
- Type: Subject count statistics (rb:SubjectCountStatistics)
- Sum: 23,762
- Mean: 2.38
- Standard deviation: 0.53
- Minimum: 2
- Maximum: 7
Predicate count statistics
- Type: Predicate count statistics (rb:PredicateCountStatistics)
- Sum: 26,100
- Mean: 2.61
- Standard deviation: 0.49
- Minimum: 2
- Maximum: 3
Object count statistics
- Type: Object count statistics (rb:ObjectCountStatistics)
- Sum: 33,009
- Mean: 3.30
- Standard deviation: 1.39
- Minimum: 1
- Maximum: 13
Graph count statistics
- Type: Graph count statistics (rb:GraphCountStatistics)
- Sum: 0
- Mean: 0.00
- Standard deviation: 0.00
- Minimum: 0
- Maximum: 0
Statement count statistics
- Type: Statement count statistics (rb:StatementCountStatistics)
- Sum: 22,977
- Mean: 2.30
- Standard deviation: 1.34
- Minimum: 1
- Maximum: 10
10K elements flat distribution
- Title: 10K elements flat distribution
- Identifier: flat-10k
- Has file name:
flat_10K.nt.gz
- Has distribution type:
- Flat distribution (rb:flatDistribution)
- Partial distribution (rb:partialDistribution)
- Has stream element count: 10,000
- Byte size: 256.83 KB
- Media type: application/n-triples
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
edaa110cde79d3c57846c39ea1ec7827
- Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
82db849d4410765888daba032cea8d635b6f0be6
- Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Download URL: https://w3id.org/riverbench/datasets/yago-annotated-facts/dev/files/flat_10K.nt.gz
Has statistics
IRI count statistics
- Type: IRI count statistics (rb:IriCountStatistics)
- Sum: 49,533
- Unique count (estimated): 10,233
- Mean: 4.95
- Standard deviation: 0.93
- Minimum: 3
- Maximum: 10
Blank node count statistics
- Type: Blank node count statistics (rb:BlankNodeCountStatistics)
- Sum: 0
- Mean: 0.00
- Standard deviation: 0.00
- Minimum: 0
- Maximum: 0
Literal count statistics
- Type: Literal count statistics (rb:LiteralCountStatistics)
- Sum: 19,576
- Unique count (estimated): 7,332
- Mean: 1.96
- Standard deviation: 0.90
- Minimum: 1
- Maximum: 8
Simple literal count statistics
- Type: Simple literal count statistics (rb:SimpleLiteralCountStatistics)
- Sum: 0
- Unique count (estimated): 0
- Mean: 0.00
- Standard deviation: 0.00
- Minimum: 0
- Maximum: 0
Datatype literal count statistics
- Type: Datatype literal count statistics (rb:DatatypeLiteralCountStatistics)
- Sum: 19,576
- Unique count (estimated): 7,332
- Mean: 1.96
- Standard deviation: 0.90
- Minimum: 1
- Maximum: 8
Language string count statistics
- Type: Language string count statistics (rb:LanguageLiteralCountStatistics)
- Sum: 0
- Unique count (estimated): 0
- Mean: 0.00
- Standard deviation: 0.00
- Minimum: 0
- Maximum: 0
Quoted triple count statistics
- Type: Quoted triple count statistics (rb:QuotedTripleCountStatistics)
- Sum: 22,977
- Mean: 2.30
- Standard deviation: 1.34
- Minimum: 1
- Maximum: 10
Subject count statistics
- Type: Subject count statistics (rb:SubjectCountStatistics)
- Sum: 23,762
- Mean: 2.38
- Standard deviation: 0.53
- Minimum: 2
- Maximum: 7
Predicate count statistics
- Type: Predicate count statistics (rb:PredicateCountStatistics)
- Sum: 26,100
- Mean: 2.61
- Standard deviation: 0.49
- Minimum: 2
- Maximum: 3
Object count statistics
- Type: Object count statistics (rb:ObjectCountStatistics)
- Sum: 33,009
- Mean: 3.30
- Standard deviation: 1.39
- Minimum: 1
- Maximum: 13
Graph count statistics
- Type: Graph count statistics (rb:GraphCountStatistics)
- Sum: 0
- Mean: 0.00
- Standard deviation: 0.00
- Minimum: 0
- Maximum: 0
Statement count statistics
- Type: Statement count statistics (rb:StatementCountStatistics)
- Sum: 22,977
- Mean: 2.30
- Standard deviation: 1.34
- Minimum: 1
- Maximum: 10