Dataset: politiquices (development version)
Support and opposition relations extracted from news articles archived in Arquivo.pt. The dataset describes news articles in Portuguese and the presented political stances. Dataset source, more information about the project (Portuguese).
Info
Download this metadata in RDF: Turtle, N-Triples, RDF/XML, Jelly
Source repository: dataset-politiquices
Permanent URL: https://w3id.org/riverbench/datasets/politiquices/dev
Stream preview (click to expand)
PREFIX ns1: <http://purl.org/dc/elements/1.1/>
PREFIX ns2: <http://www.politiquices.pt/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
[ ns2:ent1 <http://www.wikidata.org/entity/Q745134>;
ns2:ent1_str "Mota Amaral";
ns2:ent2 <http://www.wikidata.org/entity/Q57398>;
ns2:ent2_str "Cavaco";
ns2:score "0.6829476952552795"^^xsd:float;
ns2:type "ent1_other_ent2";
ns2:url <https://www.linguateca.pt/CHAVE?PUBLICO-19940819-122>
] .
<https://www.linguateca.pt/CHAVE?PUBLICO-19940819-122>
ns1:date "1994-08-19"^^xsd:date;
ns1:title "Mota Amaral com Cavaco"@pt .
PREFIX ns1: <http://www.politiquices.pt/>
PREFIX ns2: <http://purl.org/dc/elements/1.1/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
[ ns1:ent1 <http://www.wikidata.org/entity/Q737410>;
ns1:ent1_str "Manuel Alegre";
ns1:ent2 <http://www.wikidata.org/entity/Q1688029>;
ns1:ent2_str "Jerónimo de Sousa";
ns1:score "1.0"^^xsd:float;
ns1:type "ent1_opposes_ent2";
ns1:url <https://publico.pt/1238588>
] .
<https://publico.pt/1238588>
ns2:date "2005-11-12"^^xsd:date;
ns2:title "Manuel Alegre critica campanha agressiva de Jerónimo de Sousa"@pt .
PREFIX ns1: <http://www.politiquices.pt/>
PREFIX ns2: <http://purl.org/dc/elements/1.1/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
[ ns1:ent1 <http://www.wikidata.org/entity/Q76>;
ns1:ent1_str "Obama";
ns1:ent2 <http://www.wikidata.org/entity/Q567>;
ns1:ent2_str "Merkel";
ns1:score "0.9981821775436401"^^xsd:float;
ns1:type "ent1_other_ent2";
ns1:url <https://arquivo.pt/wayback/20141127055917/http://www.publico.pt/mundo/noticia/o-ebola-e-a-mais-grave-urgencia-sanitaria-dos-ultimos-anos-dizem-obama-e-merkel-1673060>
] .
<https://arquivo.pt/wayback/20141127055917/http://www.publico.pt/mundo/noticia/o-ebola-e-a-mais-grave-urgencia-sanitaria-dos-ultimos-anos-dizem-obama-e-merkel-1673060>
ns2:date "2014-11-27"^^xsd:date;
ns2:title "O ébola é \"a mais grave urgência sanitária dos últimos anos\", dizem Obama e Merkel"@pt .
PREFIX ns1: <http://www.politiquices.pt/>
PREFIX ns2: <http://purl.org/dc/elements/1.1/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
[ ns1:ent1 <http://www.wikidata.org/entity/Q2571494>;
ns1:ent1_str "Carlos César";
ns1:ent2 <http://www.wikidata.org/entity/Q57398>;
ns1:ent2_str "Cavaco";
ns1:score "0.9920841455459595"^^xsd:float;
ns1:type "ent1_opposes_ent2";
ns1:url <https://arquivo.pt/wayback/20151119204457/http://observador.pt/2015/11/18/carlos-cesar-responsabiliza-cavaco-por-incontinencia-verbal-entre-os-partidos/>
] .
<https://arquivo.pt/wayback/20151119204457/http://observador.pt/2015/11/18/carlos-cesar-responsabiliza-cavaco-por-incontinencia-verbal-entre-os-partidos/>
ns2:date "2015-11-19"^^xsd:date;
ns2:title "Carlos César responsabiliza Cavaco por \"incontinência verbal\" entre os partidos"@pt .
PREFIX ns1: <http://purl.org/dc/elements/1.1/>
PREFIX ns2: <http://www.politiquices.pt/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
[ ns2:ent1 <http://www.wikidata.org/entity/Q456034>;
ns2:ent1_str "Isabel dos Santos";
ns2:ent2 <http://www.wikidata.org/entity/Q1318666>;
ns2:ent2_str "Ulrich";
ns2:score "1.0"^^xsd:float;
ns2:type "ent1_opposes_ent2";
ns2:url <https://arquivo.pt/wayback/20151003185118/http://economico.sapo.pt/noticias/isabel-dos-santos-vai-oporse-ao-plano-de-ulrich-para-angola_230444.html>
] .
<https://arquivo.pt/wayback/20151003185118/http://economico.sapo.pt/noticias/isabel-dos-santos-vai-oporse-ao-plano-de-ulrich-para-angola_230444.html>
ns1:date "2015-10-03"^^xsd:date;
ns1:title "Isabel dos Santos vai opor-se ao plano de Ulrich para Angola"@pt .
General information
- Title: Politiquices (en)
- Identifier:
politiquices
- Version:
dev
- Theme:
- Political communication (eurovoc:c_9eea2203)
- Political press (eurovoc:2600)
- Politics (eurovoc:4704)
- Creator:
- David Soares Batista (1)
- Name: David Soares Batista
- Comment: Dataset creator
- Homepage: https://www.politiquices.pt/about
- Piotr Sowiński (2)
- Name: Piotr Sowiński
- Nickname: Ostrzyciel
- Comment: Processing the dataset
- Homepage:
- David Soares Batista (1)
- License: https://spdx.org/licenses/CC-BY-4.0
- Source:
- Date Issued: 2023-05-01
- Date Modified: 2024-10-18
- Landing page: politiquices (dev)
Technical metadata
- Has stream type usage:
- RDF stream type usage (1)
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a stream of graphs corresponding to news articles. (en)
- Has stream type: RDF graph stream (stax:graphStream)
- RDF stream type usage (2)
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a flattened stream of triples. (en)
- Has stream type: Flat RDF triple stream (stax:flatTripleStream)
- RDF stream type usage (1)
- Has stream element count: 17,773
- Has stream element split:
- Type: Stream elements split by topic (rb:TopicStreamElementSplit)
- Comment: Each stream element corresponds to one news article. (en)
- Uses vocabulary:
- Conforms to W3C RDF 1.1 specification: yes
- Conforms to W3C RDF-star draft specification as of December 17, 2021: yes
- Uses generalized triples: no
- Uses generalized RDF datasets: no
- Uses RDF-star: no
Distributions
Download links
The dataset is published in a few size variants, each containing a specific number of stream elements. For each size, there are three distribution types available: flat (just an N-Triples/N-Quads file), streaming (a .tar.gz archive with Turtle/TriG files, one file per stream element), and Jelly (a native binary format for streaming RDF). See the documentation for more details.
Distribution size | Statements | Flat | Streaming | Jelly |
---|---|---|---|---|
10K | 90,000 | 1.6 MB | 1.4 MB | 1.4 MB |
Full | 159,957 | 2.9 MB | 2.4 MB | 2.5 MB |
The full metadata of all distributions can be found below.
10K elements Jelly distribution
- Title: 10K elements Jelly distribution
- Identifier:
jelly-10k
- Has file name:
jelly_10K.jelly.gz
- Has distribution type:
- Jelly distribution (rb:jellyDistribution)
- Partial distribution (rb:partialDistribution)
- Has stream type usage:
- RDF stream type usage (1)
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a flattened stream of triples. (en)
- Has stream type: Flat RDF triple stream (stax:flatTripleStream)
- RDF stream type usage (2)
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a stream of graphs corresponding to news articles. (en)
- Has stream type: RDF graph stream (stax:graphStream)
- RDF stream type usage (1)
- Has stream element count: 10,000
- Byte size: 1.4 MB
- Media type: application/x-jelly-rdf
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
0cb848132117b243bc6af1a44b11e3d8
- Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
902c699b4b2fcd9462a98627b259b72a8cec19f6
- Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Statistics: statistics-10k
- Download URL: https://w3id.org/riverbench/datasets/politiquices/dev/files/jelly_10K.jelly.gz
10K elements flat distribution
- Title: 10K elements flat distribution
- Identifier:
flat-10k
- Has file name:
flat_10K.nt.gz
- Has distribution type:
- Flat distribution (rb:flatDistribution)
- Partial distribution (rb:partialDistribution)
- Has stream type usage:
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a flattened stream of triples. (en)
- Has stream type: Flat RDF triple stream (stax:flatTripleStream)
- Has stream element count: 10,000
- Byte size: 1.6 MB
- Media type: application/n-triples
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
a9fa06d576847f0561ed8918fc5d393b
- Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
bbb08f28db574bf22199c34c2f641515df1b70dd
- Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Statistics: statistics-10k
- Download URL: https://w3id.org/riverbench/datasets/politiquices/dev/files/flat_10K.nt.gz
Full Jelly distribution
- Title: Full Jelly distribution
- Identifier:
jelly-full
- Has file name:
jelly_full.jelly.gz
- Has distribution type:
- Full distribution (rb:fullDistribution)
- Jelly distribution (rb:jellyDistribution)
- Has stream type usage:
- RDF stream type usage (1)
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a stream of graphs corresponding to news articles. (en)
- Has stream type: RDF graph stream (stax:graphStream)
- RDF stream type usage (2)
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a flattened stream of triples. (en)
- Has stream type: Flat RDF triple stream (stax:flatTripleStream)
- RDF stream type usage (1)
- Has stream element count: 17,773
- Byte size: 2.5 MB
- Media type: application/x-jelly-rdf
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
f561170e2e4920b719c89139ba96252e
- Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
1cd06bcebd4e8328de5367c6f1cada7a75ab3f52
- Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Statistics: statistics-full
- Download URL: https://w3id.org/riverbench/datasets/politiquices/dev/files/jelly_full.jelly.gz
Full flat distribution
- Title: Full flat distribution
- Identifier:
flat-full
- Has file name:
flat_full.nt.gz
- Has distribution type:
- Flat distribution (rb:flatDistribution)
- Full distribution (rb:fullDistribution)
- Has stream type usage:
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a flattened stream of triples. (en)
- Has stream type: Flat RDF triple stream (stax:flatTripleStream)
- Has stream element count: 17,773
- Byte size: 2.9 MB
- Media type: application/n-triples
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
00df81d90a440c2e74c9d930ed7ad27b
- Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
8290273fe3d6ed32a604e7ad4480647c228ba70d
- Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Statistics: statistics-full
- Download URL: https://w3id.org/riverbench/datasets/politiquices/dev/files/flat_full.nt.gz
10K elements stream distribution
- Title: 10K elements stream distribution
- Identifier:
stream-10k
- Has file name:
stream_10K.tar.gz
- Has distribution type:
- Partial distribution (rb:partialDistribution)
- Stream distribution (rb:streamDistribution)
- Has stream type usage:
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a stream of graphs corresponding to news articles. (en)
- Has stream type: RDF graph stream (stax:graphStream)
- Has stream element count: 10,000
- Byte size: 1.4 MB
- Media type: text/turtle
- Packaging format: application/tar
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
36f8b10b24400c369132187914c688fd
- Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
82ef3af42baa39c41ba869b58660fe12ddcb8f11
- Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Statistics: statistics-10k
- Download URL: https://w3id.org/riverbench/datasets/politiquices/dev/files/stream_10K.tar.gz
Full stream distribution
- Title: Full stream distribution
- Identifier:
stream-full
- Has file name:
stream_full.tar.gz
- Has distribution type:
- Full distribution (rb:fullDistribution)
- Stream distribution (rb:streamDistribution)
- Has stream type usage:
- Type: RDF stream type usage (stax:RdfStreamTypeUsage)
- Comment: The dataset can be viewed as a stream of graphs corresponding to news articles. (en)
- Has stream type: RDF graph stream (stax:graphStream)
- Has stream element count: 17,773
- Byte size: 2.4 MB
- Media type: text/turtle
- Packaging format: application/tar
- Compression format: application/gzip
- Checksum:
- Checksum (1)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
677a5b130be6c5d280354dcbcc8fc5cb
- Algorithm: ChecksumAlgorithm_md5 (spdx:checksumAlgorithm_md5)
- Checksum (2)
- Type: Checksum (spdx:Checksum)
- ChecksumValue:
e29309e2e642485c17bf287b1c2b0a4018e729d6
- Algorithm: ChecksumAlgorithm_sha1 (spdx:checksumAlgorithm_sha1)
- Checksum (1)
- Statistics: statistics-full
- Download URL: https://w3id.org/riverbench/datasets/politiquices/dev/files/stream_full.tar.gz
Statistics
Statistics for 10K distributions
- Title: Statistics for 10K distributions
Sum | Unique | Mean | St. dev. | Min. | Max. | |
---|---|---|---|---|---|---|
IRIs | 119,998 | ~10,704 | 12.00 | 0.01 | 11 | 12 |
Blank nodes | 10,000 | N/A | 1.00 | 0.00 | 1 | 1 |
Literals | 60,000 | ~22,255 | 6.00 | 0.00 | 6 | 6 |
Simple literals | 30,000 | ~1,064 | 3.00 | 0.00 | 3 | 3 |
Datatype literals | 20,000 | ~11,259 | 2.00 | 0.00 | 2 | 2 |
Language literals | 10,000 | ~9,953 | 1.00 | 0.00 | 1 | 1 |
Datatypes | 20,000 | 2 | 2.00 | 0.00 | 2 | 2 |
ASCII control chars | 5 | N/A | 0.00 | 0.03 | 0 | 2 |
Quoted triples | 0 | N/A | 0.00 | 0.00 | 0 | 0 |
Subjects | 20,000 | ~20,027 | 2.00 | 0.00 | 2 | 2 |
Predicates | 90,000 | ~9 | 9.00 | 0.00 | 9 | 9 |
Objects | 89,998 | ~32,878 | 9.00 | 0.01 | 8 | 9 |
Graphs | 10,000 | ~1 | 1.00 | 0.00 | 1 | 1 |
Statements | 90,000 | N/A | 9.00 | 0.00 | 9 | 9 |
Bytes per statement | N/A | N/A | 142.15 | 17.60 | 104.67 | 218.67 |
Statistics for full distributions
- Title: Statistics for full distributions
Sum | Unique | Mean | St. dev. | Min. | Max. | |
---|---|---|---|---|---|---|
IRIs | 213,274 | ~18,562 | 12.00 | 0.01 | 11 | 12 |
Blank nodes | 17,773 | N/A | 1.00 | 0.00 | 1 | 1 |
Literals | 106,638 | ~36,211 | 6.00 | 0.00 | 6 | 6 |
Simple literals | 53,319 | ~1,295 | 3.00 | 0.00 | 3 | 3 |
Datatype literals | 35,546 | ~17,272 | 2.00 | 0.00 | 2 | 2 |
Language literals | 17,773 | ~17,642 | 1.00 | 0.00 | 1 | 1 |
Datatypes | 35,546 | 2 | 2.00 | 0.00 | 2 | 2 |
ASCII control chars | 21 | N/A | 0.00 | 0.04 | 0 | 3 |
Quoted triples | 0 | N/A | 0.00 | 0.00 | 0 | 0 |
Subjects | 35,546 | ~35,498 | 2.00 | 0.00 | 2 | 2 |
Predicates | 159,957 | ~9 | 9.00 | 0.00 | 9 | 9 |
Objects | 159,955 | ~54,678 | 9.00 | 0.01 | 8 | 9 |
Graphs | 17,773 | ~1 | 1.00 | 0.00 | 1 | 1 |
Statements | 159,957 | N/A | 9.00 | 0.00 | 9 | 9 |
Bytes per statement | N/A | N/A | 142.10 | 17.69 | 104.67 | 218.67 |