-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intention of adding provenance of DOIs with isBasedOn? #72
Comments
great idea, @alko-k! Could you write up a proposal here with an example we could use in the documentation? |
Thanks Adam, I also add the schema git issue for isBasedOn: schemaorg/schemaorg#1993 |
schema:isBasedOn is a reasonable although lightweight provenance statement. In our other work, we use PROV-O predicates like Here's an example data package in which we've embedded the PROV-O properties in our ORE manifest for the data package. You can look at the RDF triples we're using with a tool like rapper: $ rapper -o turtle https://cn.dataone.org/cn/v2/object/resource_map_urn:uuid:c2e7831c-3e38-4ac1-a0b5-dff3a00ad9f1 So, I'd like to see our guidance recommend PROV-O vocabularies for provenance, with a recommendation that |
Thanks @mbjones for all your insight and examples. There is a small issue though that the 'structured-data/testing-tool' google provides, will not pass the test with the additional prov properties... |
Yeah, we have encountered that issue of the Google SDTT throwing an error when it encounters types outside of schema.org. It is annoying for sure. We have discussed that with Google, and they indicate that the Google tools ignore those type errors and that they still import documents with other types, but they ignore the other types. We've asked them to change them to Warnings, but they have indicated that the SDTT is focused on Google's import, and so they want to keep those as errors. For our recommendations, we've decided to 1) mostly recommend schema.org types, but 2) to go ahead and recommend other types when needed if there isn't something suitable in schema.org. Our recommendations on external vocabularies in the |
Other projects have had similar goals of using schema.org to describe science artifacts and they all seem to trickle external vocabularies in as they need to describe specifics. One example of a specification that mixes schema.org and W3C Prov is RO Crate. You can see they they used mostly schema in this example but also brought in prov (and used it side by side with schema). Normal w3c provAs JSON-LD,
Minimal Extension to ProvONE
|
Began a new branch https://github.com/ESIPFed/science-on-schema.org/tree/feature_72_provenance for editing the Guide and a new proposed provenance ADR for how we recommend handling provenance information. Editing is not complete, still working on:
|
@ashepherd @datadavev @fils @alko-k @smrgeoinfo Completed first draft of the provenance proposal. Please review the: @amoeba @csjx @gothub @mpsaloha Given your familiarity with our use of PROV and ProvONE in DataONE, I would appreciate if you could give this a look over as well. You'll note that I omitted the use of |
See also: schemaorg/schemaorg#1905 |
@davev thanks for the pointer on schema:Action , I was unaware of that. I think it could be successfully used in place of {
"@context": {
"@vocab": "https://schema.org/",
"prov": "http://www.w3.org/ns/prov#",
"provone": "http://purl.dataone.org/provone/2015/01/15/ontology#"
},
"@id": "https://doi.org/10.xxxx/Dataset-2",
"@type": "Dataset",
"name": "Removal of organic carbon by natural bacterioplankton communities as a function of pCO2 from laboratory experiments between 2012 and 2016",
"prov:wasDerivedFrom": "https://doi.org/10.xxxx/Dataset-1",
"schema:isBasedOn": "https://doi.org/10.xxxx/Dataset-1",
"prov:wasGeneratedBy":
{
"@id": "https://example.org/executions/execution-42",
"@type": "provone:Execution",
"prov:hadPlan": "https://somerepository.org/datasets/10.xxxx/Dataset-2.v2/process-script.R",
"prov:used": "https://doi.org/10.xxxx/Dataset-1"
}
} And here's the same structure rewritten with only schema.org using {
"@context": {
"@vocab": "https://schema.org/"
},
"@graph": [
{
"@id": "https://doi.org/10.xxxx/Dataset-2",
"@type": "https://schema.org/Dataset",
"https://schema.org/name": "Removal of organic carbon by natural bacterioplankton communities as a function of pCO2 from laboratory experiments between 2012 and 2016",
"schema:isBasedOn": "https://doi.org/10.xxxx/Dataset-1"
},
{
"@id": "https://example.org/executions/execution-42",
"@type": "schema:CreateAction",
"schema:instrument": "https://somerepository.org/datasets/10.xxxx/Dataset-2.v2/process-script.R",
"schema:object": "https://doi.org/10.xxxx/Dataset-1",
"schema:result": "https://doi.org/10.xxxx/Dataset-2"
}
]
} I think
Which is confusingly similar to If we did all of this with schema.org, I'd want to be explicit in the guide as to the intended mapping to PROV so that equivalence could be had with people using the more precise PROV and ProvONE vocabularies. I think by comparing them to more explicit vocabularies we can make our intended interpretation clear. Feedback appreciated. |
Your example seems ok to me. I read That said, what is the practical benefit of using only |
btw, this is another way of writing your second example above to be slightly more Dataset centric:
|
This looks really good and the edits to the Dataset guide look and read great. Something that stands out to me is the shape of the I might flatten it, like: {
"@context": {
"@vocab": "https://schema.org/",
"prov": "http://www.w3.org/ns/prov#",
"provone": "http://purl.dataone.org/provone/2015/01/15/ontology#"
},
"@id": "https://doi.org/10.xxxx/Dataset-2",
"@type": "Dataset",
"name": "Removal of organic carbon by natural bacterioplankton communities as a function of pCO2 from laboratory experiments between 2012 and 2016",
"prov:wasDerivedFrom": "https://doi.org/10.xxxx/Dataset-1",
"prov:wasGeneratedBy": "https://somerepository.org/datasets/10.xxxx/Dataset-2.v2/process-script.R"
} (The I can see you're trying to find a way to capture an execution explicitly and my example makes the execution implicit and vague. Another property, like While Schema.org tends to be pretty flat, SOSO doesn't really shy away from it, so an alternative to my super flat example might look like: {
"@context": {
"@vocab": "https://schema.org/",
"prov": "http://www.w3.org/ns/prov#",
"provone": "http://purl.dataone.org/provone/2015/01/15/ontology#"
},
"@id": "https://doi.org/10.xxxx/Dataset-2",
"@type": "Dataset",
"name": "Removal of organic carbon by natural bacterioplankton communities as a function of pCO2 from laboratory experiments between 2012 and 2016",
"prov:wasDerivedFrom": "https://doi.org/10.xxxx/Dataset-1",
"prov:wasGeneratedBy": {
"@id": "https://somerepository.org/datasets/10.xxxx/Dataset-2.v2/process-script.R",
"@type": "Foo",
"foo:used": "https://doi.org/10.xxxx/Dataset-1"
}
} But I don't think we have the terms to do this right now. |
@amoeba Thanks, Bryce. I agree about the |
@PaoloMissier @ludaesch do you have any thoughts on this issue discussing provenance representation in schema.org and PROV/ProvONE? See in particular: #72 (comment) and the comments that follow. |
I think this looks good, Matt!
I second avoiding blank-nodes except in the case where we could rarely if
ever imagine wanting to reference that blank-node's graph in some other
context.
I was curious why "schema:isBasedOn" is not also recommended for the case
where "prov:wasRevisionOf" is used, given that the definition of
"schema:isBasedOn" is so broad:
A resource from which this work is derived or from which it is a
modification or adaption.
and as "prov:wasRevisionOf" is an rdfs:sub-property of
"prov:wasDerivedFrom", it seems that "schema:isBasedOn" is also appropriate
for describing this type of "modification that retains substantial content
from the original entity" (sensu PROV:wasRevisionOf).
Similarly, in the diagram "Indicating a software workflow or processing
activity: prov:used and prov:wasGeneratedBy"
the "prov:wasRevisionOf" would also seem to fit this template and might be
added to the diagram (shown as a sub-property?), where its potential
representation by "schema:isBasedOn" predicate could be depicted?
So it might be useful at least to clarify in the text that
"PROV:wasRevisionOf" is a sub-property of "PROV:wasDerivedFrom", and
possibly as well that the former could also be represented by
"schema:isBasedOn"?
I thought about the statement that "schema:isBasedOn" is an
*OWL:equivalentProperty* with "PROV:wasDerivedFrom". I think this might be
a bit overstep, as I'm not sure their extensions would be identical. I
feel that "schema:isBasedOn" is a bit broader. For example, I would be
comfortable saying that the movie "West Side Story" *schema:isBasedOn* the
book "Romeo & Juliet", but would be less comfortable asserting that the
movie "West Side Story" *prov:wasDerivedFrom* the book "Romeo & Juliet". I
would be comfortable saying the movie "West Side Story" *prov:wasInfluencedBy
(i.e. super-property of "prov:wasDerivedFrom") *the book "Romeo & Juliet".
Anyhow, just some thoughts and hoping not dancing on the head of a pin.
thanks,
Mark
…On Tue, Jul 28, 2020 at 1:42 PM Matt Jones ***@***.***> wrote:
@amoeba <https://github.com/amoeba> Thanks, Bryce. I agree about the @id
for the execution instance. I seriously considered making it a blank node
by omitting the @id because people often don't track executions. They do,
however, track execution times and other properties, so it would be nice to
have something to hang those properties on, and to differentiate multiple
executions of the same script (especially for model runs, etc). But there's
been a lot discussion in this group about avoiding blank nodes, so I
thought it prudent to put in some stand-in for the execution identifier. I
would prefer to leave it out though.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#72 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABHLL6KDWGE6MQT6TCEBCUTR54Z45ANCNFSM4KEFWI4Q>
.
|
Thanks for the comments @mpsaloha. I was equating I looked for a subproperty in SO that was equivalent to |
Hi Matt,
My comments interlaced below...
Thanks for the comments @mpsaloha <https://github.com/mpsaloha>. I was equating schema:isBasedOn with prov:wasDerivedFrom, whereas we have interpreted the subproperty prov:wasRevisionOf to specialize the property to the narrower case where the new entity is both derived from the original AND it represents a new version of the same entity. So, all revisions are derivations, but not all derivations are revisions.
Agreed-- since "prov:wasRevisionOf" is a subproperty of
"prov:wasDerivedFrom" its extension is included, but narrower than the
extension of its superproperty. Hope it was clear that I agree with this
interpretation.
In DataONE, we interpret prov:wasRevisionOf to mean that the new object is meant to explicitly replace the original version, and is wholly substitutable for the orginal.
Ah, this specializes a bit on the definition of "prov:wasRevisionOf" as
described in the PROV specs, where it is recommended for when some entity
is a modification of, but "contains substantial content" of its precursor
(and doesn't specify "replaces").
We use that to hide older versions of Datasets in search results. And there are definitely broader uses of prov:wasDerivedFrom, such as when two data sources are combined into an integrated whole, but the new Dataset is not meant to replace the original per se. So I'd like us to be able to express both the *derived from* and *replaces* semantics from PROV.
I see, thanks for clarifying! I think the "prov:qualifiedRevision" could
be used to indicate this "replaces" function. Resembling example (44) in
https://www.w3.org/TR/prov-o/
I looked for a subproperty in SO that was equivalent to prov:wasRevisionOf, and didn't find a match.
Yes, I looked too and couldn't find one.
There could be one though. The closest thing I could find is that there is schema:UpdateAction <https://schema.org/UpdateAction> which is meant to explicitly be an action in which the schema:result replaces the schema:object, but because these same properties are used in all schema:Action classes, such as schema:CreateAction <https://schema.org/CreateAction>, the interpretation of the schema:result as a "replacement" only applies within the context of schema:UpdateAction. So, I couldn't find a dedicated subproperty indicating replacement semantics in SO, and I left the prov:wasRevisionOf for the time being. Maybe there's another approach.
Thanks for pointing this out, and I agree it doesn't seem to fit the bill
as is-- although it might work as a triple of "schema:replaceAction" within
a "prov:qualifiedRevision" pattern (v. examples 44 and 62 in prov-o).
But this complicates things a bit. I guess my main concerns were 1)
formally stating that schema:isBasedOn is an owl:equivalentProperty of
prov:wasDerivedFrom due to potentially non-congruent extensions; and 2)
that as prov:wasRevisionOf is a subproperty of prov:wasDerivedFrom, the
schema:isBasedOn is also suitable for describing it. But you've described
how you also want prov:wasRevisionOf to strongly indicate "replaces earlier
version". Nevertheless, schema:isBasedOn would remain true even in this
case-- just less constraining?
cheers,
Mark
…On Thu, Jul 30, 2020 at 11:54 AM Matt Jones ***@***.***> wrote:
Thanks for the comments @mpsaloha <https://github.com/mpsaloha>. I was
equating schema:isBasedOn with prov:wasDerivedFrom, whereas we have
interpreted the subproperty prov:wasRevisionOf to specialize the property
to the narrower case where the new entity is both derived from the original
AND it represents a new version of the same entity. So, all revisions are
derivations, but not all derivations are revisions. In DataONE, we
interpret prov:wasRevisionOf to mean that the new object is meant to
explicitly replace the original version, and is wholly substitutable for
the orginal. We use that to hide older versions of Datasets in search
results. And there are definitely broader uses of prov:wasDerivedFrom,
such as when two data sources are combined into an integrated whole, but
the new Dataset is not meant to replace the original per se. So I'd like us
to be able to express both the *derived from* and *replaces* semantics
from PROV.
I looked for a subproperty in SO that was equivalent to prov:wasRevisionOf,
and didn't find a match. There could be one though. The closest thing I
could find is that there is schema:UpdateAction
<https://schema.org/UpdateAction> which is meant to explicitly be an
action in which the schema:result replaces the schema:object, but because
these same properties are used in all schema:Action classes, such as
schema:CreateAction <https://schema.org/CreateAction>, the interpretation
of the schema:result as a "replacement" only applies within the context
of schema:UpdateAction. So, I couldn't find a dedicated subproperty
indicating replacement semantics in SO, and I left the prov:wasRevisionOf
for the time being. Maybe there's another approach.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#72 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABHLL6LWQHKJH6LBUKBWD7LR6G6WPANCNFSM4KEFWI4Q>
.
|
Perhaps |
Dave-- I think we'd then lose the notion of the replacer being *a
revision_of* rather than simply *substitute_for* the replacee. The example
they give of changing movies is very different from the notion that the
derived entity contains significant components of the original entity.
Mark
…On Thu, Jul 30, 2020 at 2:54 PM Dave Vieglais ***@***.***> wrote:
Perhaps SO:ReplaceAction (The act of editing a recipient by replacing an
old object with a new object) with its replacee and replacer corresponds
with prov:wasRevisionOf? Though the description doesn't necessarily mean
the replacer is a revision, could be just a new instance.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#72 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABHLL6NCV6KKMRABLTSCUITR6HT2XANCNFSM4KEFWI4Q>
.
|
Yep, agreed. The act of substitution is clear, but the notion that the replacement is a revision of the replacee is not.
|
Of all of the options above the one that is most human understandable is:
And I think an @id is necessary because I can see people needing to query for every dataset/product that used a particular commonly used script as part of a processing chain, when that script is found to have a bug in it that requires reprocessing everything that used it. |
I edited @rduerr 's example to preserve the formatting in the JSON, no content change. +1 for that encoding approach |
Discussed the proposal and ADR during the SOSO call on Aug 3. General consensus that the use of PROV-O and ProvONE predicates was preferred because of their increased semantic precision. We agreed to move towards approving the ADR, but will give people another week or so to comment. @mbjones will prepare a PR with minor revisions shortly thereafter. |
I looked at the ADR and updated text - looks good to me. |
Uploaded the current ProvONE OWL file to COR for better community visibility and navigation. See: |
PR #134 merged in the accepted provenance features into develop and will now be included in the release, so closing this issue. |
Hi again @ashepherd ,
is there an intention of adding the isBasedOn schema.org property to refer to older DOIs on the full dataset json?
Thanks
Alexandra
The text was updated successfully, but these errors were encountered: