Use PROV-O, isBasedOn, and ProvONE for provenance information

Discussion: #72

Status

Accepted

Decision

Provide provenance information about data processing workflows, data derivation relationships, and versioning information using PROV-O, ProvONE, and schema.org predicates.

schema:isBasedOn.
PROV-O namespace (http://www.w3.org/ns/prov#) predicates, including prov:wasDerivedFrom, prov:wasRevisionOf, prov:used, and prov:generatedBy
ProvONE, which specializes PROV for reproducible software workflows, can be used to specify provone:Program and provone:Execution that create derived products

Context

High level relationships that link datasets based on their processing workflows and versioning relationships are critical for consumers to differentiate versions of a schema:Dataset, to clarify when a dataset is derived from one or more source Datasets, and to specify linkages to the software that created these derived Datasets. The PROV-O recommendation provides the widely-adopted vocabulary for representing this type of information, and should be used within Dataset descriptions, as most relevant properties are missing from schema.org. The main exception is schema:isBasedOn, which provides a predicate for indicating that a Dataset was derived from one or more source Datasets. Producers and consumers should interpret schema:isBasedOn to be an equivalent property to prov:wasDerivedFrom (in the owl:equivalentProperty sense). Either is acceptable for representing derivation relationships, but there is utility in expressing the relationship with both predicates. When other PROV predicates are used, it is preferred to use prov:wasDerivedFrom. Workflow relationships among processing software and both source and derived data files shoudl be described with prov:used and prov:generatedBy predicates, and by using workflow classes from ProvONE.

Consequences

Versioning and derivation relationships will be clearer
Aggregators and search systems should use these properties to cluster versions of Datasets, and to provide bi-directional linkages to derived data products
Software processing and other processing relationships can be provided via linkages to prov:Activity, and particularly provone:Execution and provone:Program for specifying software processes that create derived products. However, these relationships get complicated as multi-step software workflows can be complex. Providing these relationships is useful when researchers strive to describe reproducible research through software workflows.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

72-provenance.md

72-provenance.md

Use PROV-O, isBasedOn, and ProvONE for provenance information

Status

Decision

Context

Consequences

Files

72-provenance.md

Latest commit

History

72-provenance.md

File metadata and controls

Use PROV-O, isBasedOn, and ProvONE for provenance information

Status

Decision

Context

Consequences