Skip to content

yago-annotated-facts (0.1.0)

This is a subset of the YAGO 4 knowledge base (paper), based on Wikidata, version from February 24, 2020. This dataset includes only the fact annotations in RDF-star, that is facts about facts. Each stream element corresponds to one item in Wikidata.

Info

Download this metadata in RDF: Turtle, N-Triples, RDF/XML
Source repository: yago-annotated-facts

General information

Technical metadata

  • Has stream element type: Triples (rb:triples)
  • Has stream element count: 617,768
  • Has stream element split:
    • Type: Stream elements split by topic (rb:TopicStreamElementSplit)
    • Comment: Every stream element corresponds to one Wikidata item.
  • Uses ontology: http://schema.org/
  • Conforms to W3C RDF 1.1 specification: no
  • Conforms to W3C RDF-star draft specification as of December 17, 2021: yes
  • Uses generalized triples: no
  • Uses generalized RDF datasets: no
  • Uses RDF-star: yes

Distributions

Full triple stream distribution

Has statistics

IRI count statistics
  • Type: IRI count statistics (rb:IriCountStatistics)
  • Sum: 3,631,687
  • Unique count (estimated): 594,855
  • Mean: 5.88
  • Standard deviation: 3.22
  • Minimum: 3
  • Maximum: 853
Blank node count statistics
Literal count statistics
  • Type: Literal count statistics (rb:LiteralCountStatistics)
  • Sum: 1,736,327
  • Unique count (estimated): 57,578
  • Mean: 2.81
  • Standard deviation: 2.50
  • Minimum: 1
  • Maximum: 66
Simple literal count statistics
  • Type: Simple literal count statistics (rb:SimpleLiteralCountStatistics)
  • Sum: 211
  • Unique count (estimated): 174
  • Mean: 0.00
  • Standard deviation: 0.02
  • Minimum: 0
  • Maximum: 3
Datatype literal count statistics
  • Type: Datatype literal count statistics (rb:DatatypeLiteralCountStatistics)
  • Sum: 1,736,116
  • Unique count (estimated): 57,405
  • Mean: 2.81
  • Standard deviation: 2.50
  • Minimum: 1
  • Maximum: 66
Language string count statistics
  • Type: Language string count statistics (rb:LanguageLiteralCountStatistics)
  • Sum: 0
  • Unique count (estimated): 0
  • Mean: 0.00
  • Standard deviation: 0.00
  • Minimum: 0
  • Maximum: 0
Quoted triple count statistics
Subject count statistics
  • Type: Subject count statistics (rb:SubjectCountStatistics)
  • Sum: 2,009,932
  • Mean: 3.25
  • Standard deviation: 3.04
  • Minimum: 2
  • Maximum: 850
Predicate count statistics
  • Type: Predicate count statistics (rb:PredicateCountStatistics)
  • Sum: 1,622,855
  • Mean: 2.63
  • Standard deviation: 0.48
  • Minimum: 2
  • Maximum: 3
Object count statistics
  • Type: Object count statistics (rb:ObjectCountStatistics)
  • Sum: 3,127,393
  • Mean: 5.06
  • Standard deviation: 5.06
  • Minimum: 1
  • Maximum: 853
Graph count statistics
  • Type: Graph count statistics (rb:GraphCountStatistics)
  • Sum: 0
  • Mean: 0.00
  • Standard deviation: 0.00
  • Minimum: 0
  • Maximum: 0
Statement count statistics
  • Type: Statement count statistics (rb:StatementCountStatistics)
  • Sum: 2,484,547
  • Mean: 4.02
  • Standard deviation: 6.10
  • Minimum: 1
  • Maximum: 1,455

Full flat distribution

Has statistics

IRI count statistics
  • Type: IRI count statistics (rb:IriCountStatistics)
  • Sum: 3,631,687
  • Unique count (estimated): 594,855
  • Mean: 5.88
  • Standard deviation: 3.22
  • Minimum: 3
  • Maximum: 853
Blank node count statistics
Literal count statistics
  • Type: Literal count statistics (rb:LiteralCountStatistics)
  • Sum: 1,736,327
  • Unique count (estimated): 57,578
  • Mean: 2.81
  • Standard deviation: 2.50
  • Minimum: 1
  • Maximum: 66
Simple literal count statistics
  • Type: Simple literal count statistics (rb:SimpleLiteralCountStatistics)
  • Sum: 211
  • Unique count (estimated): 174
  • Mean: 0.00
  • Standard deviation: 0.02
  • Minimum: 0
  • Maximum: 3
Datatype literal count statistics
  • Type: Datatype literal count statistics (rb:DatatypeLiteralCountStatistics)
  • Sum: 1,736,116
  • Unique count (estimated): 57,405
  • Mean: 2.81
  • Standard deviation: 2.50
  • Minimum: 1
  • Maximum: 66
Language string count statistics
  • Type: Language string count statistics (rb:LanguageLiteralCountStatistics)
  • Sum: 0
  • Unique count (estimated): 0
  • Mean: 0.00
  • Standard deviation: 0.00
  • Minimum: 0
  • Maximum: 0
Quoted triple count statistics
Subject count statistics
  • Type: Subject count statistics (rb:SubjectCountStatistics)
  • Sum: 2,009,932
  • Mean: 3.25
  • Standard deviation: 3.04
  • Minimum: 2
  • Maximum: 850
Predicate count statistics
  • Type: Predicate count statistics (rb:PredicateCountStatistics)
  • Sum: 1,622,855
  • Mean: 2.63
  • Standard deviation: 0.48
  • Minimum: 2
  • Maximum: 3
Object count statistics
  • Type: Object count statistics (rb:ObjectCountStatistics)
  • Sum: 3,127,393
  • Mean: 5.06
  • Standard deviation: 5.06
  • Minimum: 1
  • Maximum: 853
Graph count statistics
  • Type: Graph count statistics (rb:GraphCountStatistics)
  • Sum: 0
  • Mean: 0.00
  • Standard deviation: 0.00
  • Minimum: 0
  • Maximum: 0
Statement count statistics
  • Type: Statement count statistics (rb:StatementCountStatistics)
  • Sum: 2,484,547
  • Mean: 4.02
  • Standard deviation: 6.10
  • Minimum: 1
  • Maximum: 1,455

100K elements triple stream distribution

Has statistics

IRI count statistics
  • Type: IRI count statistics (rb:IriCountStatistics)
  • Sum: 502,972
  • Unique count (estimated): 102,657
  • Mean: 5.03
  • Standard deviation: 5.30
  • Minimum: 3
  • Maximum: 853
Blank node count statistics
Literal count statistics
  • Type: Literal count statistics (rb:LiteralCountStatistics)
  • Sum: 187,612
  • Unique count (estimated): 37,329
  • Mean: 1.88
  • Standard deviation: 0.98
  • Minimum: 1
  • Maximum: 49
Simple literal count statistics
  • Type: Simple literal count statistics (rb:SimpleLiteralCountStatistics)
  • Sum: 66
  • Unique count (estimated): 66
  • Mean: 0.00
  • Standard deviation: 0.03
  • Minimum: 0
  • Maximum: 3
Datatype literal count statistics
  • Type: Datatype literal count statistics (rb:DatatypeLiteralCountStatistics)
  • Sum: 187,546
  • Unique count (estimated): 37,263
  • Mean: 1.88
  • Standard deviation: 0.97
  • Minimum: 1
  • Maximum: 49
Language string count statistics
  • Type: Language string count statistics (rb:LanguageLiteralCountStatistics)
  • Sum: 0
  • Unique count (estimated): 0
  • Mean: 0.00
  • Standard deviation: 0.00
  • Minimum: 0
  • Maximum: 0
Quoted triple count statistics
Subject count statistics
  • Type: Subject count statistics (rb:SubjectCountStatistics)
  • Sum: 246,103
  • Mean: 2.46
  • Standard deviation: 5.24
  • Minimum: 2
  • Maximum: 850
Predicate count statistics
  • Type: Predicate count statistics (rb:PredicateCountStatistics)
  • Sum: 257,646
  • Mean: 2.58
  • Standard deviation: 0.49
  • Minimum: 2
  • Maximum: 3
Object count statistics
  • Type: Object count statistics (rb:ObjectCountStatistics)
  • Sum: 332,939
  • Mean: 3.33
  • Standard deviation: 5.46
  • Minimum: 1
  • Maximum: 853
Graph count statistics
  • Type: Graph count statistics (rb:GraphCountStatistics)
  • Sum: 0
  • Mean: 0.00
  • Standard deviation: 0.00
  • Minimum: 0
  • Maximum: 0
Statement count statistics
  • Type: Statement count statistics (rb:StatementCountStatistics)
  • Sum: 226,648
  • Mean: 2.27
  • Standard deviation: 9.28
  • Minimum: 1
  • Maximum: 1,455

100K elements flat distribution

Has statistics

IRI count statistics
  • Type: IRI count statistics (rb:IriCountStatistics)
  • Sum: 502,972
  • Unique count (estimated): 102,657
  • Mean: 5.03
  • Standard deviation: 5.30
  • Minimum: 3
  • Maximum: 853
Blank node count statistics
Literal count statistics
  • Type: Literal count statistics (rb:LiteralCountStatistics)
  • Sum: 187,612
  • Unique count (estimated): 37,329
  • Mean: 1.88
  • Standard deviation: 0.98
  • Minimum: 1
  • Maximum: 49
Simple literal count statistics
  • Type: Simple literal count statistics (rb:SimpleLiteralCountStatistics)
  • Sum: 66
  • Unique count (estimated): 66
  • Mean: 0.00
  • Standard deviation: 0.03
  • Minimum: 0
  • Maximum: 3
Datatype literal count statistics
  • Type: Datatype literal count statistics (rb:DatatypeLiteralCountStatistics)
  • Sum: 187,546
  • Unique count (estimated): 37,263
  • Mean: 1.88
  • Standard deviation: 0.97
  • Minimum: 1
  • Maximum: 49
Language string count statistics
  • Type: Language string count statistics (rb:LanguageLiteralCountStatistics)
  • Sum: 0
  • Unique count (estimated): 0
  • Mean: 0.00
  • Standard deviation: 0.00
  • Minimum: 0
  • Maximum: 0
Quoted triple count statistics
Subject count statistics
  • Type: Subject count statistics (rb:SubjectCountStatistics)
  • Sum: 246,103
  • Mean: 2.46
  • Standard deviation: 5.24
  • Minimum: 2
  • Maximum: 850
Predicate count statistics
  • Type: Predicate count statistics (rb:PredicateCountStatistics)
  • Sum: 257,646
  • Mean: 2.58
  • Standard deviation: 0.49
  • Minimum: 2
  • Maximum: 3
Object count statistics
  • Type: Object count statistics (rb:ObjectCountStatistics)
  • Sum: 332,939
  • Mean: 3.33
  • Standard deviation: 5.46
  • Minimum: 1
  • Maximum: 853
Graph count statistics
  • Type: Graph count statistics (rb:GraphCountStatistics)
  • Sum: 0
  • Mean: 0.00
  • Standard deviation: 0.00
  • Minimum: 0
  • Maximum: 0
Statement count statistics
  • Type: Statement count statistics (rb:StatementCountStatistics)
  • Sum: 226,648
  • Mean: 2.27
  • Standard deviation: 9.28
  • Minimum: 1
  • Maximum: 1,455

10K elements triple stream distribution

Has statistics

IRI count statistics
  • Type: IRI count statistics (rb:IriCountStatistics)
  • Sum: 49,533
  • Unique count (estimated): 10,233
  • Mean: 4.95
  • Standard deviation: 0.93
  • Minimum: 3
  • Maximum: 10
Blank node count statistics
Literal count statistics
  • Type: Literal count statistics (rb:LiteralCountStatistics)
  • Sum: 19,576
  • Unique count (estimated): 7,332
  • Mean: 1.96
  • Standard deviation: 0.90
  • Minimum: 1
  • Maximum: 8
Simple literal count statistics
  • Type: Simple literal count statistics (rb:SimpleLiteralCountStatistics)
  • Sum: 0
  • Unique count (estimated): 0
  • Mean: 0.00
  • Standard deviation: 0.00
  • Minimum: 0
  • Maximum: 0
Datatype literal count statistics
  • Type: Datatype literal count statistics (rb:DatatypeLiteralCountStatistics)
  • Sum: 19,576
  • Unique count (estimated): 7,332
  • Mean: 1.96
  • Standard deviation: 0.90
  • Minimum: 1
  • Maximum: 8
Language string count statistics
  • Type: Language string count statistics (rb:LanguageLiteralCountStatistics)
  • Sum: 0
  • Unique count (estimated): 0
  • Mean: 0.00
  • Standard deviation: 0.00
  • Minimum: 0
  • Maximum: 0
Quoted triple count statistics
Subject count statistics
  • Type: Subject count statistics (rb:SubjectCountStatistics)
  • Sum: 23,762
  • Mean: 2.38
  • Standard deviation: 0.53
  • Minimum: 2
  • Maximum: 7
Predicate count statistics
Object count statistics
  • Type: Object count statistics (rb:ObjectCountStatistics)
  • Sum: 33,009
  • Mean: 3.30
  • Standard deviation: 1.39
  • Minimum: 1
  • Maximum: 13
Graph count statistics
  • Type: Graph count statistics (rb:GraphCountStatistics)
  • Sum: 0
  • Mean: 0.00
  • Standard deviation: 0.00
  • Minimum: 0
  • Maximum: 0
Statement count statistics
  • Type: Statement count statistics (rb:StatementCountStatistics)
  • Sum: 22,977
  • Mean: 2.30
  • Standard deviation: 1.34
  • Minimum: 1
  • Maximum: 10

10K elements flat distribution

Has statistics

IRI count statistics
  • Type: IRI count statistics (rb:IriCountStatistics)
  • Sum: 49,533
  • Unique count (estimated): 10,233
  • Mean: 4.95
  • Standard deviation: 0.93
  • Minimum: 3
  • Maximum: 10
Blank node count statistics
Literal count statistics
  • Type: Literal count statistics (rb:LiteralCountStatistics)
  • Sum: 19,576
  • Unique count (estimated): 7,332
  • Mean: 1.96
  • Standard deviation: 0.90
  • Minimum: 1
  • Maximum: 8
Simple literal count statistics
  • Type: Simple literal count statistics (rb:SimpleLiteralCountStatistics)
  • Sum: 0
  • Unique count (estimated): 0
  • Mean: 0.00
  • Standard deviation: 0.00
  • Minimum: 0
  • Maximum: 0
Datatype literal count statistics
  • Type: Datatype literal count statistics (rb:DatatypeLiteralCountStatistics)
  • Sum: 19,576
  • Unique count (estimated): 7,332
  • Mean: 1.96
  • Standard deviation: 0.90
  • Minimum: 1
  • Maximum: 8
Language string count statistics
  • Type: Language string count statistics (rb:LanguageLiteralCountStatistics)
  • Sum: 0
  • Unique count (estimated): 0
  • Mean: 0.00
  • Standard deviation: 0.00
  • Minimum: 0
  • Maximum: 0
Quoted triple count statistics
Subject count statistics
  • Type: Subject count statistics (rb:SubjectCountStatistics)
  • Sum: 23,762
  • Mean: 2.38
  • Standard deviation: 0.53
  • Minimum: 2
  • Maximum: 7
Predicate count statistics
Object count statistics
  • Type: Object count statistics (rb:ObjectCountStatistics)
  • Sum: 33,009
  • Mean: 3.30
  • Standard deviation: 1.39
  • Minimum: 1
  • Maximum: 13
Graph count statistics
  • Type: Graph count statistics (rb:GraphCountStatistics)
  • Sum: 0
  • Mean: 0.00
  • Standard deviation: 0.00
  • Minimum: 0
  • Maximum: 0
Statement count statistics
  • Type: Statement count statistics (rb:StatementCountStatistics)
  • Sum: 22,977
  • Mean: 2.30
  • Standard deviation: 1.34
  • Minimum: 1
  • Maximum: 10