Metadata
RiverBench includes rich RDF metadata for each dataset, profile, schema, and the suite itself. This metadata is used to generate the website, and can also be used by other tools. The metadata is permissively licensed.
Accessing metadata
On each page of a RiverBench resource (e.g., dataset, task, profile) you will find a box with links to the RDF metadata. You can also use the HTTP content negotation mechanism on permanent URLs (starting with https://w3id.org/riverbench/
) to request the machine-readable metadata instead of the HTML page.
You can find the permanent URL in the Info box with metadata download links, or by copying the Permanent URL link in the top right corner of the page:
Examples of URLs that will return the metadata with content negotiation:
- https://w3id.org/riverbench/
- https://w3id.org/riverbench/v/dev
- https://w3id.org/riverbench/v/dev/categories/flat
- https://w3id.org/riverbench/v/dev/profiles/stream-datasets
- https://w3id.org/riverbench/v/dev/tasks/flat-compression
- https://w3id.org/riverbench/datasets/nanopubs/dev
- https://w3id.org/riverbench/schema/metadata/dev
To request a metadata file in a given format explicitly, you can also append .nt
, .ttl
, .rdf
, or .jelly
to these URLs.
The following metadata formats are supported:
- N-Triples (
.nt
, content typeapplication/n-triples
) - Turtle (
.ttl
, content typetext/turtle
) - RDF/XML (
.rdf
, content typeapplication/rdf+xml
) - Jelly (
.jelly
, content typeapplication/x-jelly-rdf
)
If you are curious, you can find the rules that make this work here.
Metadata dumps
Starting from RiverBench version 2.0.0, the entire metadata of RiverBench is published in easily accessible dumps. The dump for a given RiverBench release can be downladed from the main page of RiverBench. The links to download the dump are in the "Info" box near the top of the page.
The dumps can also be downloaded directly using the following URLs, where {version}
is the version tag of the suite release (e.g., dev
or 2.0.0
):
https://w3id.org/riverbench/dumps/{version}.{extension}.gz
- Metadata dump without benchmark results (a single RDF graph).
- Available since RiverBench 2.0.0.
- Supported extensions:
nt
,ttl
,rdf
,jelly
.
https://w3id.org/riverbench/dumps-with-results/{version}.{extension}.gz
- Metadata dump with community-reported benchmark results. The default graph contains the RiverBench metadata. The benchmark results are in named graphs, using the nanopublication structure.
- Available since RiverBench 2.1.0.
- Supported extensions:
nq
,trig
,jelly
.
Editing metadata
A large portion of the metadata is automatically generated. However, the rest is written manually in Turtle files in various repositories:
- RiverBench main repo / metadata.ttl – metadata about the suite itself
- {Dataset repo} / metadata.ttl – metadata about the dataset
- {Category repo} / metadata.ttl – metadata about the benchmark category
- {Category repo} / profiles / {profile name}.ttl – metadata about the profile
- {Category repo} / tasks / {task name} / metadata.ttl – metadata about the benchmark task
All of these files can be conveniently accessed and edited using the Edit this page or Edit metadata button at the top of the page:
Feel free to submit pull requests to these files to fix errors or add new information. After the pull request is accepted, the changes will be reflected automatically in the website and the READMEs.
Used ontologies
The metadata uses mainly these ontologies:
- DCAT 3
- DCMI Metadata Terms
- FOAF
- RDF Stream Taxonomy (RDF-STaX) – for RDF stream type annotations
- EuroVoc – for dataset themes
- VoID
- RiverBench metadata ontology
- RiverBench documentation ontology