Proposal: Disallow non-vocabulary keywords #241

handrews · 2022-09-17T23:15:50Z

handrews
Sep 17, 2022

This is based on numerous recent discussions including @awwright raising it on a recent community call, @karenetheridge raising it in an issue comment, and possibly others I'm forgetting, with apologies.

For further discussion on how to continue support for this, please see https://github.com/orgs/json-schema-org/discussions/329.

Terminology

a keyword is supported if the implementation understands to run a specific piece of code for it
a keyword is understood if the implementation knows what vocabulary it comes from, but doesn't have specific code for it
a keyword is unknown if the implementation cannot figure out whether it comes from any particular vocabulary or not
a simple value annotation (SVA) keyword is one that unconditionally uses the exact keyword value as the annotation value

Therefore, title and readOnly are SVAs. Hyper-Schema's links is not, because it is a template that is filled out with instance data, and therefore requires custom code to support properly.

Scope of this discussion

This topic interacts with many complex topics, but let's try to keep this discussion by assuming that certain problems are solvable. It is fair game to raise doubts over that solvability since the proposal here requires that these problems are solved, but let's not dive into how to solve these things here (I have tons of thoughts on several of these points, and I know others do as well).

Edit - A new discussion on how to solve these things is now here.

Assume a working machine-readable minimal association of keywords and vocabularies

The topic of creating a machine-readable vocabulary definition is closely related, but please let's keep that complex topic as separate as possible. For the purpose of this discussion assume that:

We will be able to create something that will at least indicate which keywords go with which vocabularies, most likely but not necessarily a separate file identified the vocabulary URI.
We will be able to mark which of those keywords are SVAs in whatever format we come up with.

Assume a strengthened dialect/vocabulary determination requirement

Currently it's conformant to ignore $schema, and the behavior of unknown $schema values or missing $schema is under-constrained. There are numerous ideas on how to improve this, how to connect it to the media type registration, and how compatibility should be handled. These ideas are already being discussed in multiple other discussions, and should stay there.

Let's assume that whatever happens with all of that, in the future:

$schema or any replacement/addition in this area MUST be respected (including whatever behavior or range of allowable behaviors are defined for the absence of $schema), such that schemas are always processed accordingly or are not processed at all (in a way that clearly informs the caller of the refusal to process).
Whatever this ends up looking like, it still has something analogous to setting vocabularies to true or false in $vocabulary, whether those exact keywords and values are still in use or not. This discussion will simply refer to required vs optional vocabularies, which are currently implemented by true and false respectively.

Assume better explanation of optional vocabularies

As of 2020-12, optional vocabularies exist for two purposes:

to allow for the common case of vocabularies consisting of only SVAs to be implemented without custom code
to allow for vocabularies consisting of only non-validation-impacting keywords (so, SVAs but also templated annotations such as Hyper-Schema's links) to be gracefully ignored in validation-only contexts

This is not as clear as it could be in the current specification, in part because in 2019-09 I had some other ideas in mind as shown by the weird handling of the format vocabulary. All of that was (correctly) jettisoned in 2020-12.

Let's assume that we agree on use case 1, and reach a consensus on whether use case 2 and/or any other use cases are intended, and clarify that in the spec. SVA-only vocabularies are central to the question of removing non-vocabulary keywords, so they are the main thing to worry about here.

Problem

Vocabularies carve out namespaces allowing keywords of the same name from different sources to be distinguished. Schema authors can avoid conflicts among identically named keywords by managing namespaces through $vocabulary.

Allowing non-vocabulary keywords means that the entire possible set of keyword names is a single namespace that is always available. While a meta-schema can describe non-vocabulary keywords, there are no normative requirements regarding enabling or disabling them. Implementations can enable or disable them at will, including by default, potentially masking or contradicting vocabulary keywords.

This means that any new keyword added to a vocabulary (including the JSON Schema core vocabulary) might conflict with an always-enabled (or unpredictably-enabled) non-vocabulary keyword in the wild. Therefore, no keyword addition can be considered safe, making forward compatibility impossible to guarantee.

This has also led to some people assuming an ability to treat standard keywords as non-vocabulary keywords as a way to ignore $vocabulary controls, which directly violates schema/meta-schema author intent.

The current situation

Keyword recognition

Since there is no mechanism to associate keywords and vocabularies, there are currently a two cases that an implementation might encounter:

supported keywords from vocabularies in $vocabulary (either true or false)
unknown keywords

Currently, if an optional vocabulary is unsupported, its keywords appear unknown in the same way as non-vocabulary keywords.

Vocabulary creation

Creating a vocabulary right now requires

specifying the keywords
assigning a URI to the vocabulary
creating a vocabulary meta-schema and assigning a URI to it
assuming we add a file for associating keywords with vocabularies, one would need to write that as well

Using the vocabulary requires writing a custom meta-schema that adds the vocabulary URI and references the associated meta-schema, and referencing that custom meta-schema in the schema.

All of this has to be done even if the vocabulary is only used as an optional vocabulary and therefore does not require custom code.

The proposal

Non-vocabulary keywords are banned, meaning that they cause an error when encountered even if the meta-schema allows them
Since the vocabulary-keyword association file notes which keywords are SVAs, an SVA-only vocabulary can automatically be detected
SVA-only vocabularies are considered supported regardless of whether they are optional or required, as they do not require custom code to implement
Since there are no longer unknown keywords, there is no need to specify what to do with them
SVA-keywords MUST always be collected as annotations (assuming annotations are being collected) no matter whether the vocabulary that defines them is optional-supported, optional-unsupported, or required
Non-SVA keywords from optional vocabularies SHOULD be collected as annotations, and MUST otherwise be ignored

Tradeoffs

In doing this, we lose the ability to casually use additional keywords without declaring a meta-schema and vocabulary for using them. Let's say we had a keyword whatever that we just wanted to have treated as an annotation. Currently, you can just use it without doing anything else.

This proposal would require:

assigning a URI to a vocabulary containing whatever (we can emphasize that there are very easy ways to assign URIs such as tag:[email protected],2022:your-vocab-name so people don't need to worry about hostnames, HTTPS, etc.)
writing the keyword-vocabulary association and providing it somehow
writing a custom meta-schema including the vocabulary and providing it somehow
referencing the custom meta-schema

Note that while you can create a meta-schema for your vocabulary and reference it, this is not strictly required.

Various ideas that folks have had (to be discussed elsewhere!) for inlining keywords in vocabulary declarations, or even inlining extension meta-schemas in schemas, could reduce the impact of this scenario (and vocabularies in general, which is why the overall topic of streamlining vocabularies belongs in another discussion).

This could be as low-impact as associating a URI with a list of keyword names (and SVA-ness) and writing something slightly more complex than $schema with a URI in the schema to indicate the usage.

I think it's worth doing, and also it points to several other topics that we would need to prioritize in order to make it viable.

The current approach is too heavy-weight, but for specifically replacing the "collect unknown keywords as annotations" case, I think we can make it work in a more lightweight manner.

gregsdennis · 2022-09-18T00:26:52Z

gregsdennis
Sep 18, 2022
Maintainer

We can use the "inline" suggestions to enable an ad-hoc vocab that's only valid within the meta-schema that declares it. This would eliminate the need for a URI and other boilerplate things.

This may only be a valid approach for SVA vocabs, though as non-SVAs would need code.

1 reply

handrews Sep 18, 2022
Author

Yeah that's a good possible way to streamline. SVA-only vocabs are my main concern here, as they are the only behavior that can currently be accomplished without code anyway. We need to preserve that in a reasonably lightweight manner, but we don't need to improve any of the other cases (in this discussion, anyway).

jdesrosiers · 2022-09-21T17:15:48Z

jdesrosiers
Sep 21, 2022
Maintainer

Non-SVA keywords from optional vocabularies SHOULD be collected as annotations, and MUST otherwise be ignored

I assume this should say, "Non-SVA keywords from optional vocabularies that are not supported by the implementation SHOULD be collected as annotations".

I think this could be problematic. If we collect the value of a non-SVA keyword as an annotation, the annotation result specified by the vocabulary might be different than the one collected. A consumer of the annotations may expect one or the other annotation result and might end up broken if the validator adds or removes support for that vocabulary.

I think the safest thing to do for a non-SVA keyword from an optional vocabulary that isn't supported is to completely ignore it, including no output unit for that keyword in the results.

3 replies

handrews Sep 21, 2022
Author

@jdesrosiers yes to the "that are not supported by the implementation" part.

What you propose would be a change from the current behavior of collecting every such keyword as an annotation regardless of its nature. Is that intentional?

What you are proposing would effectively create two distinct (rather than overlapping) use cases for optional vocabularies:

SVA-only vocabularies, which can be supported without code as optional vocabularies. These should basically always be considered "optional", which raises the question of whether the optional/required distinction even matters- if an implementation can determine that a vocabulary is SVA-only, it can support it as a required vocabulary as that's all it needs to "understand"
non-validation-impacting, non-SVA-only vocabularies, which would just be ignored. This is the "please don't refuse to process my hyper-schema just because you don't understand hyper-schema" scenario. The pro of what you suggest is that if the caller gets the links annotations at all, they will be filled out properly. The con is that if the implementation doesn't support hyper-schema, the caller gets nothing and does not even get the benefit of knowing which links are valid for the instance, even if they have to do all of the template-filling themselves.

A middle path for the second case would involve indicating in the output that the annotations were collected blindly, at which point the caller can determine if they want to bother. Variations on this would allow the caller to signal whether they want blind collection, or whether they want a refusal, which would be a way of balancing/negotiating schema author and caller requirements.

jdesrosiers Sep 21, 2022
Maintainer

What you propose would be a change from the current behavior of collecting every such keyword as an annotation regardless of its nature. Is that intentional?

Yes, it's intentional. Your proposal didn't create the problem I identified, but if we're fixing the problems with unknown keyword annotation collection, why leave some of the problems unfixed?

I agree with all the rest of what you wrote. The only way I see it being safe to collect non-SVA unimplemented keywords as SVAs is if they are distinguishable from the fully implemented keyword result. For example, in my implementation, every keyword has a URI. Let's say that the URI for links is https://json-schema.org/keywords/links. If the links were collected as an SVA it could be identified as https://json-schema.org/keywords/unknown#links. Someone could code against the "unknown-links" URI and the "links" URI and seamlessly switch over when the validator suddenly supports the hyper-schema vocabulary.

handrews Sep 21, 2022
Author

@jdesrosiers

Yes, it's intentional. Your proposal didn't create the problem I identified, but if we're fixing the problems with unknown keyword annotation collection, why leave some of the problems unfixed?

Yeah, you've identified a really great opportunity and solution here!

And yes, any mechanism that allows someone to determine what they're getting would allow for a reasonable grey zone. If combined with the ability to request specific annotations (#236), that would make it easier to determine if an evaluation is safe or not. Potentially you could request that non-SVAs only be collected if fully supported, or request them to be collected regardless.

handrews · 2022-09-25T19:34:38Z

handrews
Sep 25, 2022
Author

Several of this proposal's assumptions are now being tracked or addressed directly:

Better explanation of optional vocabularies is addressed by PR Normative language for "$vocabulary" json-schema-spec#1295
Strengthening the requirements around optional vocabularies is being tracked as issue Strengthen requirements around optional vocabularies json-schema-spec#1300
Strengthening the requirements around dialect/vocabulary determination is being tracked as issue Strengthen "$schema" + "$vocabulary" requirements json-schema-spec#1301

0 replies

gregsdennis · 2022-12-15T20:40:13Z

gregsdennis
Dec 15, 2022
Maintainer

Related: json-schema-org/json-schema-spec#1365 and associated PR json-schema-org/json-schema-spec#1244 (@awwright).

0 replies

gregsdennis · 2023-01-26T22:50:14Z

gregsdennis
Jan 26, 2023
Maintainer

I expect a consequence of not supporting unknown keywords is no longer being able to process schemas with unsupported optional vocabularies (which... why then have the ability to declare them optional?), unless the vocabulary can be expressed in a meaningful machine-readable way that says, "These keywords are mine," so that they implementation can know to ignore those (because the vocab is optional).

For another discussion perhaps

Are optional vocabularies even useful? One of the goals we have is interoperability. If a schema that uses an optional vocab is processed by two implementations, one of which understand that vocab and the other doesn't, then they could have differing validation results. (Maybe the meta-schema author included a vocab with validation keywords as optional.) This breaks interoperability.

Is that something we need to cover since it was a user that wrote that meta-schema?

10 replies

SorinGFS · 2023-01-31T21:59:27Z

SorinGFS
Jan 31, 2023

@gregsdennis
...continuation from inappropriate topic.

Would you still have this stance if adequate tooling were provided to aid in the migration? For example, if alterschema were to support migrating any unknown keywords to one of the proposed solutions:
@foo convention
ad-hoc annotation vocabs
(Again, I'd like to have proposals discussed over there. Let's keep this to the "breaking changes" topic.)

I'm aware of alterschema existance, for my small projects I don't think it's relevant, I can easily make the transition to anything. I think that those with complicated projects should be asked what their opinion is.

1 reply

SorinGFS Feb 9, 2023

@gregsdennis

Would you still have this stance if adequate tooling were provided to aid in the migration? For example, if alterschema were to support migrating any unknown keywords to one of the proposed solutions:
@foo convention
ad-hoc annotation vocabs

First proposed solution means a simple parse and replace in the schema, then same parse and replace in depending components.
Second proposed solution would be much more complicated than first one, because keywords are to be found in a different object.
Second proposed solution would be cleaner than first one, but would require more work.
IMO if you want a competent opinion, I think that the tool you are talking about should first exist and be capable of conversions that can then be tested by those interested for several scenarios. As far as I know, there is not even an example that can be analyzed for the second scenario. If there was now a tool that would work, maybe those interested would imagine more easily what they have to do and maybe they would be favorable to the change...

westurner · 2023-02-01T12:35:06Z

westurner
Feb 1, 2023

I expect a consequence of not supporting unknown keywords is no longer being able to process schemas with unsupported optional vocabularies (which... why then have the ability to declare them optional?), unless the vocabulary can be expressed in a meaningful machine-readable way that says, "These keywords are mine," so that they implementation can know to ignore those (because the vocab is optional).

Would the proposed change break Linked Data and JSON-LD (and json-ld-schema), which use namespaced:URIs for some or all attributes?

A jsonschema parser could raise a warning when there are unknown keywords in the schema, but I suspect that excluding unknown keywords would limit reusability of jsonschema specifically for linked data use cases. There exist validators more complex than jsonschema for which there hopefully needn't be yet another schema definition document, that unreserved:keywords could be used in to define online validation with for example.

Linked Data use cases to make sure this change wouldn't break, that should be added to the Use Cases document:

validate schema.org JSONLD, where attributes are full URIs or uri:CURIEs
CURIE: https://en.wikipedia.org/wiki/CURIE
(W3C) QName: https://en.wikipedia.org/wiki/QName

5 replies

jdesrosiers Feb 1, 2023
Maintainer

JSON Schema has no relation to Linked Data, JSON-LD, and Schema.org. Nothing we do will effect your use of those technologies.

westurner Feb 2, 2023

JSON Schema has no relation to Linked Data, JSON-LD, and Schema.org. Nothing we do will effect your use of those technologies.

Is that confirmed with test cases and/or a IDK a statement of intent to work well with Linked Data, too?

Relequestual Feb 2, 2023
Maintainer

JSON Schema has no relation to Linked Data, JSON-LD, and Schema.org. Nothing we do will effect your use of those technologies.

Is that confirmed with test cases and/or a IDK a statement of intent to work well with Linked Data, too?

I don't really know what this means. There is nothing in the JSON Schema specifcation that links to JSON-LD or Schema.org.
In what way do you see them as interacting?

westurner Feb 2, 2023

JSON Schema MAY be used to validate data that is also already JSON-LD or that is also validated with SHACL.

How would such a statement of intent to not break Linked Data - e.g. the aforementioned specs - affect whether for example the existing json-ld-schema test cases which may not (?) even currently use custom keywords or extension vocabularies?:
https://github.com/mulesoft-labs/json-ld-schema/tree/master/src/test/data/schema

JSON Schema test case: Schema.org IDK ScholarlyArticle as JSON-LD
- and then also with SHACL but which repo?

Relequestual Feb 2, 2023
Maintainer

JSON Schema MAY be used to validate data...

I think that's kind of the point here. JSON Schema validates data. JSON data. JSON Schema doesn't care what sort of JSON data, as long as it's valid JSON data.

Any custom keywords or extensions can be formed into a vocabulary (or something else should we provide another method for pure embedding or signalling pure annotation extensions).

The fact that the data is JSON-LD, or any other data structure that uses JSON, matters not. Unless I'm really missing something here, in which case, please enlighten me.

Taking a look at the specific example you linked, I can see there's a custom dialect and vocabulary. That's great to see! It uses JSON Schema 2019-09. Updating it to the "non-breaking era" JSON Schema looks like it would be trivial. As for tooling? It really depends how it's being used, so I couldn't comment.

SorinGFS · 2023-02-09T19:50:53Z

SorinGFS
Feb 9, 2023

@gregsdennis

Couldn't json-schema decentralization help avoiding to disallow non-vocabulary keywords breaking change? For example, @cfworker/json-schema library treats each keyword in each draft as a separate schema. In this way, this library finds the correct interpreter for any combination of schemas.

5 replies

gregsdennis Feb 9, 2023
Maintainer

json-schema decentralization

I don't understand what you mean by this or how it relates to disallowing non-vocab keywords?

For example, @cfworker/json-schema library treats each keyword in each draft as a separate schema.

Do you mean it "treats each keyword in each draft as a separate keyword?" A keyword by itself isn't a schema.

If so, users typically expect that a keyword in one version means the same thing and operate the same way in the next version. Their primary complaint is that keywords keep changing from one version to the next.

SorinGFS Feb 9, 2023

Do you mean it "treats each keyword in each draft as a separate keyword?" A keyword by itself isn't a schema.

Agree, a keyword is not a schema, but behind a keyword is an intention which can this way to be identified and coupled with the right interpreter, like a translator.

users typically expect that a keyword in one version means the same thing and operate the same way in the next version. Their primary complaint is that keywords keep changing from one version to the next.

The point is if you have a way to identify back in time which intent was behind an used keyword it doesn't matter how it changed. You will be able to colect and map all the intents to their correct actions. As an analogy, that is how pandoc works and is able to convert almost any document format into any document format. They do not care how a document marks its formats, they have their own engine which is non human readable in which they map every existing intent in a document, then they translate all the intents to another document type using their formatting rules. See this picture.

This technique will bring backwards compatibility, will provide to users a way for easy transition to latest release, there will be no reason for them to stick to older drafts. Plus, in next draft or drafts if you just add new kewords and don't change older ones I think that will satisfy everybody.

Relequestual Feb 10, 2023
Maintainer

I believe I understand what you're saying, and it is something that has been discussed previously, and may come up again, however...

The pandoc anaolgy doesn't hold. Pandoc is a tool for conversion. It exists and can be used when the need arrises. JSON Schema is a standard, with many implementations across many languages. The schema itself is supposed to be interoperable across them all, providing the same validation result. They cannot all be expected to also provide previous to current translation, nor bolt on a specific tool in a specific language.

While there is a tool which does schema upgrades, it is stand alone. Making it required that people implement it to be a compliant validation is unreasonable.

SorinGFS Feb 10, 2023

While there is a tool which does schema upgrades, it is stand alone. Making it required that people implement it to be a compliant validation is unreasonable.

I also think the same, it shouldn't be required to use a tool. And I imagine this working based only on an instruction set about re-referencing old drafts and old-draft based schemas in a decentralized structure, and the next drafts being released directly in that decentralized format. The users of new drafts would not even need to be aware of that set of instructions. In the next period I will try to provide a functional example.

SorinGFS Feb 21, 2023

@Relequestual

As I promised, here is my view of decentralized json-schema. The work is still at the beginning, a lot of work to do identify all the rules that must apply, but at least you can have a view of what I try to achieve. For exemple, by using it I was able to detect all the keywords used since the first draft:

Detected keyword definitions across versions from draft-00 to draft/next:

============================================
= = = = = = = 1 1 1 $anchor
= = = = = = 1 1 1 1 $comment
= = = = = = = = 1 1 $dynamicAnchor
= = = = = 1 1 1 1 1 $id
= = = = = = = 1 1 1 $recursiveAnchor
1 1 1 1 1 1 1 1 1 1 $ref
1 1 1 1 1 1 1 1 1 1 $schema
= = = = = = = 1 1 1 $vocabulary
= = = = = = = 1 1 = absoluteKeywordLocation
= = = 1 1 1 1 1 = = additionalItems
1 1 1 1 1 1 1 1 1 1 additionalProperties
= = = = 1 1 1 1 1 1 allOf
1 1 1 = = = = = = = alternate
= = = = = = 1 1 1 = anchor
= = = = = = 1 1 1 = anchorPointer
= = = = = = = 1 1 1 annotations
= = = = 1 1 1 1 1 1 anyOf
= = = = = = 1 1 = = attachmentPointer
= = = = = 1 1 1 = = base
= = = = 1 1 = = = = binaryEncoding
= = = = = 1 1 1 1 1 const
= = = = = 1 1 1 1 1 contains
1 1 1 1 = = 1 1 1 1 contentEncoding
= = = = = = 1 1 1 1 contentMediaType
= = = = = = = 1 1 1 contentSchema
= = = = = = 1 1 = = contextPointer
= = = = = = 1 1 = = contextUri
1 1 1 1 1 1 1 1 1 1 default
= = = 1 1 1 1 1 1 1 dependencies
= = = = = = = 1 1 1 dependentRequired
= = = = = = = 1 1 1 dependentSchemas
= = = = = = = 1 1 1 deprecated
1 1 1 1 1 1 1 1 1 1 description
1 1 1 1 = = = = = = disallow
= = 1 1 = = = = = = divisibleBy
= = = = = = = = = 1 droppedAnnotations
= = = = = = 1 1 1 1 else
= = = = 1 = = = = = encType
1 1 1 1 = = = = = = enctype
1 1 1 1 1 1 1 1 1 1 enum
= = = = = = = = 1 = error
= = = = = = = 1 1 1 errors
= = = = = = = = = 1 evaluationPath
= = = = = 1 1 1 1 1 examples
= = = 1 1 1 1 1 1 1 exclusiveMaximum
= = = 1 1 1 1 1 1 1 exclusiveMinimum
1 1 1 1 = = = = = = extends
1 1 1 1 1 1 1 1 1 1 format
1 1 1 1 1 = = = = = fragmentResolution
= = = = = = 1 1 1 = headerSchema
1 1 1 1 1 1 1 1 1 = href
= = = = = = 1 1 = = hrefInputTemplates
= = = = = = 1 1 = = hrefPrepopulatedInput
= = = = = 1 1 1 1 = hrefSchema
= = = = = = = 1 = = https://json-schema.org/draft/2019-09/vocab/applicator
= = = = = = = 1 = = https://json-schema.org/draft/2019-09/vocab/content
= = = = = = = 1 = = https://json-schema.org/draft/2019-09/vocab/core
= = = = = = = 1 = = https://json-schema.org/draft/2019-09/vocab/format
= = = = = = = 1 1 = https://json-schema.org/draft/2019-09/vocab/hyper-schema
= = = = = = = 1 = = https://json-schema.org/draft/2019-09/vocab/meta-data
= = = = = = = 1 = = https://json-schema.org/draft/2019-09/vocab/validation
= = = = = = = = 1 = https://json-schema.org/draft/2020-12/vocab/applicator
= = = = = = = = 1 = https://json-schema.org/draft/2020-12/vocab/content
= = = = = = = = 1 = https://json-schema.org/draft/2020-12/vocab/core
= = = = = = = = 1 = https://json-schema.org/draft/2020-12/vocab/format-annotation
= = = = = = = = 1 = https://json-schema.org/draft/2020-12/vocab/format-assertion
= = = = = = = = 1 = https://json-schema.org/draft/2020-12/vocab/meta-data
= = = = = = = = 1 = https://json-schema.org/draft/2020-12/vocab/unevaluated
= = = = = = = = 1 = https://json-schema.org/draft/2020-12/vocab/validation
= = = = = = = = = 1 https://json-schema.org/draft/next/vocab/applicator
= = = = = = = = = 1 https://json-schema.org/draft/next/vocab/content
= = = = = = = = = 1 https://json-schema.org/draft/next/vocab/core
= = = = = = = = = 1 https://json-schema.org/draft/next/vocab/format-annotation
= = = = = = = = = 1 https://json-schema.org/draft/next/vocab/format-assertion
= = = = = = = = = 1 https://json-schema.org/draft/next/vocab/meta-data
= = = = = = = = = 1 https://json-schema.org/draft/next/vocab/unevaluated
= = = = = = = = = 1 https://json-schema.org/draft/next/vocab/validation
1 1 1 1 1 = = = = = id
= = = = = = 1 1 1 1 if
= = = = = = = 1 1 1 instanceLocation
1 1 1 1 1 1 1 1 1 1 items
= = = = = = = 1 1 = keywordLocation
1 1 1 1 1 1 1 1 1 = links
= = = = = = = 1 1 1 maxContains
1 1 = = = = = = = = maxDecimal
1 1 1 1 1 1 1 1 1 1 maxItems
1 1 1 1 1 1 1 1 1 1 maxLength
= = = = 1 1 1 1 1 1 maxProperties
1 1 1 1 1 1 1 1 1 1 maximum
1 1 1 = = = = = = = maximumCanEqual
= = = = 1 1 = = = = media
1 1 1 1 1 1 = = = = mediaType
1 1 1 1 1 = = = = = method
= = = = = = = 1 1 1 minContains
1 1 1 1 1 1 1 1 1 1 minItems
1 1 1 1 1 1 1 1 1 1 minLength
= = = = 1 1 1 1 1 1 minProperties
1 1 1 1 1 1 1 1 1 1 minimum
1 1 1 = = = = = = = minimumCanEqual
= = = = 1 1 1 1 1 1 multipleOf
= = = = = = = = = 1 nested
= = = = 1 1 1 1 1 1 not
= = = = 1 1 1 1 1 1 oneOf
1 1 1 = = = = = = = optional
1 1 1 1 1 = = = = = pathStart
1 1 1 1 1 1 1 1 1 1 pattern
= = = 1 1 1 1 1 1 1 patternProperties
= = = = = = = = 1 1 prefixItems
1 1 1 1 1 1 1 1 1 1 properties
= = = = = = = = = 1 propertyDependencies
= = = = = 1 1 1 1 1 propertyNames
= = = = = 1 1 1 1 1 readOnly
1 1 1 1 = = = = = = readonly
1 1 1 1 1 1 1 1 1 = rel
= = = 1 1 1 1 1 1 1 required
1 1 1 1 = = = = = = requires
1 1 1 1 = = = = = = root
= = = = 1 = = = = = schema
= = = = = = = = = 1 schemaLocation
= = = = = 1 = = = = submissionEncType
= = = = = = 1 1 1 = submissionMediaType
= = = = = 1 1 1 1 = submissionSchema
= = = = = = 1 1 1 = targetHints
= = = = = = 1 1 1 = targetMediaType
= = 1 1 1 1 1 1 1 = targetSchema
= = = = = = 1 1 = = targetUri
= = = = = = 1 1 1 = templatePointers
= = = = = = 1 1 1 = templateRequired
= = = = = = 1 1 1 1 then
1 1 1 1 1 1 1 1 1 1 title
1 1 1 1 1 1 1 1 1 1 type
= = = = = = = 1 1 1 unevaluatedItems
= = = = = = = 1 1 1 unevaluatedProperties
= = 1 1 1 1 1 1 1 1 uniqueItems
= = = = = = = 1 1 1 valid
= = = = = = 1 1 1 1 writeOnly
============================================

all keywords count: 136
keyword definition referrers: 1062
keyword definition footprints: 467

======= checking invalid references =======
[ 'https://json-schema.org/draft/2020-12/meta/hyper-schema' ]
============================================
invalid references count: 1

automaton82 · 2023-03-05T13:31:22Z

automaton82
Mar 5, 2023

This feels like the same as XML's Unique Particle Attribution, discussion here:

https://lists.w3.org/Archives/Public/www-tag/2004Aug/att-0010/NRMVersioningProposal.html

Has that been reviewed? Does JSON schema want to make JSON into XML? Food for thought.

1 reply

gregsdennis Mar 5, 2023
Maintainer

Thanks for raising this. That looks like a namespacing disambiguation. That's only marginally related to what we're doing here.

Here, we just want to ensure that every keyword used in a schema is defined somewhere. For us, that means in a vocabulary. We don't have a concept of namespacing (though separately, we have been investigating how that could work).

Does JSON schema want to make JSON into XML?

I don't see how we could make JSON into XML. They're quite different syntaxes and data models. XML does have a lot that's good, which is why it was king for a long time, so what would be wrong with wanting to support some of those features if we can?

jherico · 2023-03-05T17:40:49Z

jherico
Mar 5, 2023

HTTP and MIME both solved this issue with headers by allowing non-standard headers as long as they were prefixed by X-. Seems like a similar solution would be possible here. Even thought it would still break existing schema, it would also continue to allow organizations to create custom tooling with additional fields that aren't part of the declared vocabulary. Failing to support some method for orgs to create additional fields will simply lead to them turning off schema validation in some cases.

2 replies

AjaxGb Mar 5, 2023

Agreed, reserving some prefix like x- or _ for custom fields would be a simple and clean way to solve the issue.

gregsdennis Mar 5, 2023
Maintainer

Please see https://github.com/orgs/json-schema-org/discussions/329#discussioncomment-4988859.

Also, prefixed headers like that are now advised against.

Proposal: Disallow non-vocabulary keywords #241

Terminology

Scope of this discussion

Assume a working machine-readable minimal association of keywords and vocabularies

Assume a strengthened dialect/vocabulary determination requirement

Assume better explanation of optional vocabularies

Problem

The current situation

Keyword recognition

Vocabulary creation

The proposal

Tradeoffs

Replies: 10 comments · 28 replies

gregsdennis Sep 18, 2022 Maintainer

handrews Sep 18, 2022 Author

jdesrosiers Sep 21, 2022 Maintainer

handrews Sep 21, 2022 Author

jdesrosiers Sep 21, 2022 Maintainer

handrews Sep 21, 2022 Author

handrews Sep 25, 2022 Author

gregsdennis Dec 15, 2022 Maintainer

gregsdennis Jan 26, 2023 Maintainer

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

jdesrosiers Feb 1, 2023 Maintainer

Relequestual Feb 2, 2023 Maintainer

Relequestual Feb 2, 2023 Maintainer

gregsdennis Feb 9, 2023 Maintainer

Relequestual Feb 10, 2023 Maintainer

gregsdennis Mar 5, 2023 Maintainer

gregsdennis Mar 5, 2023 Maintainer

Replies: 10 comments 28 replies

gregsdennis
Sep 18, 2022
Maintainer

handrews Sep 18, 2022
Author

jdesrosiers
Sep 21, 2022
Maintainer

handrews Sep 21, 2022
Author

jdesrosiers Sep 21, 2022
Maintainer

handrews Sep 21, 2022
Author

handrews
Sep 25, 2022
Author

gregsdennis
Dec 15, 2022
Maintainer

gregsdennis
Jan 26, 2023
Maintainer

jdesrosiers Feb 1, 2023
Maintainer

Relequestual Feb 2, 2023
Maintainer

Relequestual Feb 2, 2023
Maintainer

gregsdennis Feb 9, 2023
Maintainer

Relequestual Feb 10, 2023
Maintainer

gregsdennis Mar 5, 2023
Maintainer

gregsdennis Mar 5, 2023
Maintainer