Proposal: Disallow non-vocabulary keywords #241
Replies: 10 comments 28 replies
-
We can use the "inline" suggestions to enable an ad-hoc vocab that's only valid within the meta-schema that declares it. This would eliminate the need for a URI and other boilerplate things. This may only be a valid approach for SVA vocabs, though as non-SVAs would need code. |
Beta Was this translation helpful? Give feedback.
-
I assume this should say, "Non-SVA keywords from optional vocabularies that are not supported by the implementation SHOULD be collected as annotations". I think this could be problematic. If we collect the value of a non-SVA keyword as an annotation, the annotation result specified by the vocabulary might be different than the one collected. A consumer of the annotations may expect one or the other annotation result and might end up broken if the validator adds or removes support for that vocabulary. I think the safest thing to do for a non-SVA keyword from an optional vocabulary that isn't supported is to completely ignore it, including no output unit for that keyword in the results. |
Beta Was this translation helpful? Give feedback.
-
Several of this proposal's assumptions are now being tracked or addressed directly:
|
Beta Was this translation helpful? Give feedback.
-
Related: json-schema-org/json-schema-spec#1365 and associated PR json-schema-org/json-schema-spec#1244 (@awwright). |
Beta Was this translation helpful? Give feedback.
-
I expect a consequence of not supporting unknown keywords is no longer being able to process schemas with unsupported optional vocabularies (which... why then have the ability to declare them optional?), unless the vocabulary can be expressed in a meaningful machine-readable way that says, "These keywords are mine," so that they implementation can know to ignore those (because the vocab is optional). For another discussion perhaps Are optional vocabularies even useful? One of the goals we have is interoperability. If a schema that uses an optional vocab is processed by two implementations, one of which understand that vocab and the other doesn't, then they could have differing validation results. (Maybe the meta-schema author included a vocab with validation keywords as optional.) This breaks interoperability. Is that something we need to cover since it was a user that wrote that meta-schema? |
Beta Was this translation helpful? Give feedback.
-
@gregsdennis
I'm aware of |
Beta Was this translation helpful? Give feedback.
-
Would the proposed change break Linked Data and JSON-LD (and json-ld-schema), which use namespaced:URIs for some or all attributes? A jsonschema parser could raise a warning when there are unknown keywords in the schema, but I suspect that excluding unknown keywords would limit reusability of jsonschema specifically for linked data use cases. There exist validators more complex than jsonschema for which there hopefully needn't be yet another schema definition document, that unreserved:keywords could be used in to define online validation with for example. Linked Data use cases to make sure this change wouldn't break, that should be added to the Use Cases document:
|
Beta Was this translation helpful? Give feedback.
-
Couldn't json-schema decentralization help avoiding to disallow non-vocabulary keywords breaking change? For example, @cfworker/json-schema library treats each keyword in each draft as a separate schema. In this way, this library finds the correct interpreter for any combination of schemas. |
Beta Was this translation helpful? Give feedback.
-
This feels like the same as XML's Unique Particle Attribution, discussion here: https://lists.w3.org/Archives/Public/www-tag/2004Aug/att-0010/NRMVersioningProposal.html Has that been reviewed? Does JSON schema want to make JSON into XML? Food for thought. |
Beta Was this translation helpful? Give feedback.
-
HTTP and MIME both solved this issue with headers by allowing non-standard headers as long as they were prefixed by X-. Seems like a similar solution would be possible here. Even thought it would still break existing schema, it would also continue to allow organizations to create custom tooling with additional fields that aren't part of the declared vocabulary. Failing to support some method for orgs to create additional fields will simply lead to them turning off schema validation in some cases. |
Beta Was this translation helpful? Give feedback.
-
This is based on numerous recent discussions including @awwright raising it on a recent community call, @karenetheridge raising it in an issue comment, and possibly others I'm forgetting, with apologies.
For further discussion on how to continue support for this, please see https://github.com/orgs/json-schema-org/discussions/329.
Terminology
Therefore,
title
andreadOnly
are SVAs. Hyper-Schema'slinks
is not, because it is a template that is filled out with instance data, and therefore requires custom code to support properly.Scope of this discussion
This topic interacts with many complex topics, but let's try to keep this discussion by assuming that certain problems are solvable. It is fair game to raise doubts over that solvability since the proposal here requires that these problems are solved, but let's not dive into how to solve these things here (I have tons of thoughts on several of these points, and I know others do as well).
Edit - A new discussion on how to solve these things is now here.
Assume a working machine-readable minimal association of keywords and vocabularies
The topic of creating a machine-readable vocabulary definition is closely related, but please let's keep that complex topic as separate as possible. For the purpose of this discussion assume that:
Assume a strengthened dialect/vocabulary determination requirement
Currently it's conformant to ignore
$schema
, and the behavior of unknown$schema
values or missing$schema
is under-constrained. There are numerous ideas on how to improve this, how to connect it to the media type registration, and how compatibility should be handled. These ideas are already being discussed in multiple other discussions, and should stay there.Let's assume that whatever happens with all of that, in the future:
$schema
or any replacement/addition in this area MUST be respected (including whatever behavior or range of allowable behaviors are defined for the absence of$schema
), such that schemas are always processed accordingly or are not processed at all (in a way that clearly informs the caller of the refusal to process).true
orfalse
in$vocabulary
, whether those exact keywords and values are still in use or not. This discussion will simply refer to required vs optional vocabularies, which are currently implemented bytrue
andfalse
respectively.Assume better explanation of optional vocabularies
As of 2020-12, optional vocabularies exist for two purposes:
links
) to be gracefully ignored in validation-only contextsThis is not as clear as it could be in the current specification, in part because in 2019-09 I had some other ideas in mind as shown by the weird handling of the format vocabulary. All of that was (correctly) jettisoned in 2020-12.
Let's assume that we agree on use case 1, and reach a consensus on whether use case 2 and/or any other use cases are intended, and clarify that in the spec. SVA-only vocabularies are central to the question of removing non-vocabulary keywords, so they are the main thing to worry about here.
Problem
Vocabularies carve out namespaces allowing keywords of the same name from different sources to be distinguished. Schema authors can avoid conflicts among identically named keywords by managing namespaces through
$vocabulary
.Allowing non-vocabulary keywords means that the entire possible set of keyword names is a single namespace that is always available. While a meta-schema can describe non-vocabulary keywords, there are no normative requirements regarding enabling or disabling them. Implementations can enable or disable them at will, including by default, potentially masking or contradicting vocabulary keywords.
This means that any new keyword added to a vocabulary (including the JSON Schema core vocabulary) might conflict with an always-enabled (or unpredictably-enabled) non-vocabulary keyword in the wild. Therefore, no keyword addition can be considered safe, making forward compatibility impossible to guarantee.
This has also led to some people assuming an ability to treat standard keywords as non-vocabulary keywords as a way to ignore
$vocabulary
controls, which directly violates schema/meta-schema author intent.The current situation
Keyword recognition
Since there is no mechanism to associate keywords and vocabularies, there are currently a two cases that an implementation might encounter:
$vocabulary
(eithertrue
orfalse
)Currently, if an optional vocabulary is unsupported, its keywords appear unknown in the same way as non-vocabulary keywords.
Vocabulary creation
Creating a vocabulary right now requires
Using the vocabulary requires writing a custom meta-schema that adds the vocabulary URI and references the associated meta-schema, and referencing that custom meta-schema in the schema.
All of this has to be done even if the vocabulary is only used as an optional vocabulary and therefore does not require custom code.
The proposal
Tradeoffs
In doing this, we lose the ability to casually use additional keywords without declaring a meta-schema and vocabulary for using them. Let's say we had a keyword
whatever
that we just wanted to have treated as an annotation. Currently, you can just use it without doing anything else.This proposal would require:
whatever
(we can emphasize that there are very easy ways to assign URIs such astag:[email protected],2022:your-vocab-name
so people don't need to worry about hostnames, HTTPS, etc.)Note that while you can create a meta-schema for your vocabulary and reference it, this is not strictly required.
Various ideas that folks have had (to be discussed elsewhere!) for inlining keywords in vocabulary declarations, or even inlining extension meta-schemas in schemas, could reduce the impact of this scenario (and vocabularies in general, which is why the overall topic of streamlining vocabularies belongs in another discussion).
This could be as low-impact as associating a URI with a list of keyword names (and SVA-ness) and writing something slightly more complex than
$schema
with a URI in the schema to indicate the usage.I think it's worth doing, and also it points to several other topics that we would need to prioritize in order to make it viable.
The current approach is too heavy-weight, but for specifically replacing the "collect unknown keywords as annotations" case, I think we can make it work in a more lightweight manner.
Beta Was this translation helpful? Give feedback.
All reactions