Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganizing the meta-schemas #1159

Open
gregsdennis opened this issue Dec 11, 2021 · 12 comments
Open

Reorganizing the meta-schemas #1159

gregsdennis opened this issue Dec 11, 2021 · 12 comments
Assignees

Comments

@gregsdennis
Copy link
Member

Currently, the keywords are organized based on what "kind" of keyword they are: applicator vs annotation vs assertion (vs "special").

core applicator validation unevaluated meta-data format content
$id prefixItems type unevaluatedItems title format contentEncoding
$schema items const unevaluatedProperties description contentMediaType
$ref contains enum default contentSchema
$anchor additionalProperties multipleOf deprecated
$dynamicRef properties maximum readOnly
$dynamicAnchor patternProperties exclusiveMaximum writeOnly
$vocabulary dependentSchemas minimum examples
$comment propertyNames exclusiveMinimum
$defs if maxLength
then minLength
else pattern
allOf maxItems
anyOf minItems
oneOf uniqueItems
not maxContains
minContains
maxProperties
minProperties
required
dependentRequired

I think it might be easier for schema authors if we organized the keywords by function. This table reorganizes the keywords primarily by what kind of data the keyword addresses. It still has some "special" categories as well.

core meta-data combinatorial array object number string other/multiple format
$id title if prefixItems properties maximum maxLength type format
$schema description then items patternProperties exclusiveMaximum minLength const
$ref default else unevaluatedItems additionalProperties minimum pattern enum
$anchor deprecated allOf maxItems unevaluatedProperties exclusiveMinimum contentEncoding maxContains
$dynamicRef readOnly anyOf minItems maxProperties multipleOf contentMediaType minContains
$dynamicAnchor writeOnly oneOf uniqueItems minProperties contentSchema contains
$vocabulary examples not required
$comment dependentRequired
$defs dependentSchemas
propertyNames

format is still on its own so that we can include it with false while leaving the door open to others using it with true.

Aside from that, I think this organization makes more sense from an author's point of view.

@jdesrosiers
Copy link
Member

I'm not sure how this makes anything easier. Vocabularies should be organized to make it easy to combine them to make new dialects. For example, if I'm defining a dialect for data definition, I might want applicator keywords, but not validation keywords that don't apply to the ddl domain. The current organization isn't perfect, but I don't see how the proposal is better. I can't imagine how splitting by JSON type would be useful for constructing dialects. Why would I ever want my dialect to support object keywords and not array keywords?

@gregsdennis
Copy link
Member Author

gregsdennis commented Dec 12, 2021

Why would I ever want my dialect to support object keywords and not array keywords?

Maybe you don't need arrays with your data model.

But I'm not asking that question. I'm asking why keywords for arrays exist in multiple vocabs. In particular to this case, what is the use of separating items and prefixItems from minItems and maxItems? If I have an array, I want to be able to define it, and two vocabs makes it a little harder.

Organization by "keyword type" doesn't really seem helpful to anyone other that spec authors.

@jdesrosiers
Copy link
Member

what is the use of separating items and prefixItems from minItems and maxItems?

This is exactly the example I gave. If I'm creating a DDL dialect, I only want keywords that I can apply to data-types. minItems and maxItems are validation keywords. They apply to the value, not the type definition. Having an applicator vocabulary makes some sense. It's the keywords that define structure. Any dialect can start with that as the skeleton of their dialect and fill in their keywords to flesh it out. I just can't imagine a use-case where it would makes sense to compose vocabularies based on type.

However, vocabulary organization is arbitrary and no matter what we choose, there will always be use cases where it doesn't make sense. I'd rather define keywords than vocabularies. Then people can combine them however they like without being constrained by the categorization we choose for them.

@karenetheridge
Copy link
Member

Aside from that, I think this organization makes more sense from an author's point of view.

This table would certainly be useful to include on the documentation site; there are lots of keywords and categorizing them in different ways can make it easier for a schema author to find what they need.

Perhaps a list of all the keywords, with a column for what vocabulary they belong to (and a link to the spec entry for each), and a column showing what instance type(s) they are applicable for? ..so basically a simplified form of https://docs.google.com/spreadsheets/d/18SIXnzyjXTJZgqeo5W-qIEwq-bNKXb5M76Pq_47r2Is/edit#gid=0

@gregsdennis
Copy link
Member Author

gregsdennis commented Dec 19, 2021

If I'm creating a DDL dialect, I only want keywords that I can apply to data-types... - @jdesrosiers

I don't know that the average schema author is going to be creating dialects, though. That involves writing meta-schemas. Most schema authors are just going to be just taking the base meta-schema.

If I am such a schema author, I don't know what an "applicator" is. I just have an array, and I need to write a schema. To do that, I want to know what keywords I can use that pertain to arrays. As it stands, I have to look in applicator (again, what is that?), validation, and unevaluated to find them all.

With the proposed organization, all I have to do is look at the array meta-schema/vocabulary (and perhaps glance over the other/multiple one) to find keywords that apply.

Having an applicator vocabulary makes some sense. It's the keywords that define structure. - @jdesrosiers

The ones I list under combinatorial don't define structure. These stick out as a "logic" group.

But more to the point here is the example I mentioned earlier. How can you properly define the structure of, say, an array without all of the keywords that pertain to arrays, e.g. both items and maxItems or both contains and minContains? Yet these keyword pairs are currently listed in separate vocabularies.

I think we're making things harder for John & Jane Schema-Author.

This table would certainly be useful to include on the documentation site - @karenetheridge

I see this as a secondary option, but I think there's value in actually reorganizing the vocabularies themselves.

@karenetheridge
Copy link
Member

I think there's value in actually reorganizing the vocabularies themselves

I'm not convinced, given:

I don't know that the average schema author is going to be creating dialects

@jdesrosiers
Copy link
Member

This table would certainly be useful to include on the documentation site;

When we get around to documenting vocabularies, I agree that something like this would be useful.

I don't know that the average schema author is going to be creating dialects, though. That involves writing meta-schemas. Most schema authors are just going to be just taking the base meta-schema.

I completely agree. Most schema authors won't be creating dialects. But, the only reason for anyone to care about vocabularies is if they are creating dialects. They're otherwise a fairly irrelevant concept to the average schema author. What you are describing sounds like documentation concerns. People are definitely not going to be digging through meta-schemas to see what keywords are available to them. The UJS site is already organized very much the way you've broken things down, so I'm not seeing a major problem here.

But more to the point here is the example I mentioned earlier. How can you properly define the structure of, say, an array without all of the keywords that pertain to arrays, e.g. both items and maxItems or both contains and minContains? Yet these keyword pairs are currently listed in separate vocabularies.

The vocabulary breakdown is definitely not perfect. There are certainly some minor improvements we can make, but no matter what we choose, it will make sense in one circumstance and not in another. This is why I want to move away from keywords being identified by their vocabulary. If keywords are identified independently, everyone can group keywords into vocabularies however works best for them.

I already answered how items and maxItems in different vocabs makes sense. If you didn't like my answer, that's fine, but I don't know what else I can say. For contains and minContains, I agree that it makes no sense to have these in different vocabularies. They are both validation keywords in my opinion.

What would make sense to me would be a vocab for basic structural JSON definition. It would contain everything you need for basic type definition like properties, items, and type. Dialect authors can start with this as a base and flesh it out with their own custom vocabularies. Not included would be logic keywords like anyOf and validation keywords like minLength. So, while the applicator vocabulary comes close to filling this need, it does miss the mark a bit.

@gregsdennis
Copy link
Member Author

People are definitely not going to be digging through meta-schemas to see what keywords are available to them.

For what it's worth, that's what I did when I first picked up JSON Schema.

@jdesrosiers
Copy link
Member

For what it's worth, that's what I did when I first picked up JSON Schema.

Point taken. This was too strong a statement. I'll rephrase to, "In my experience, it's very uncommon for people to be digging through meta-schemas to see what keywords are available to them". I'm willing to accept that it's more common than I think. I just meant to express that I've never heard of anyone doing this until now.

@gregsdennis
Copy link
Member Author

I'm moving this conversation to a discussion. Will report back here once decided.

@gregsdennis gregsdennis moved this to In Discussion in Stable Release Development May 23, 2024
@gregsdennis
Copy link
Member Author

This conversation needs to be reframed in the context of not having vocabularies.

I think there is still a benefit to having some grouping of keywords, but it needs to be stated outside of the vocabulary context.

@gregsdennis
Copy link
Member Author

I'm removing this from the stable release discussion since this really involves vocabs, which are being demoted to a proposal. (I'll open a different issue for a related discussion we need to have.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

3 participants