Skip to content

config-schema: add sample config files to exercise schema tests #6316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

jgreitemann
Copy link

@jgreitemann jgreitemann commented Apr 11, 2025

  • Uses the datatest-stable crate to "drive" the schema tests directly by the toml files. Since this crate provides a custom test harness, the test needs to be a separate integration test binary. If the benefit of data-driven tests is not deemed worth either the additional (dev) dependency or the additional test binary, I can easily drop the corresponding commits and add the test function stubs back in.
  • I see two potential avenues to build on this:
    • These tests only check that the sample configs conform to the scheme (or not) but there's no guarantee that the actual settings parsers behave as stipulated by the schema. It shouldn't be hard to take these same sample files and add (again data-driven) tests which actually parse these config files successfully (or not).
    • The default values given by the schema are currently hard-coded separately from the defaults set by lib/src/config/misc.toml and cli/src/config/*.toml. I've fixed at least one case each where the default value given by the schema was wrong (acb1ee5) or missing (e9db6ba), respectively. A test should find such discrepancies, though I'm not sure how best to do this.

Checklist

If applicable:

  • I have updated CHANGELOG.md
  • I have updated the documentation (README.md, docs/, demos/)
  • I have updated the config schema (cli/src/config-schema.json)
  • I have added tests to cover my changes

@jgreitemann jgreitemann requested a review from a team as a code owner April 11, 2025 12:41
@martinvonz
Copy link
Member

This feels like @steadmon's domain. Do you think you can find time to review it?

@steadmon
Copy link
Member

I can give at least a quick scan later today. My main question for @martinvonz or other maintainers is how do we feel about adding new dependencies?

@martinvonz
Copy link
Member

I don't mind it much if the new dependency doesn't make the build much slower or makes the binary much larger (shouldn't apply here since it's only used in tests). Another concern is how stable the dependency is and how long we can expect it to stick around. If it gets abandoned, we may have to fork it or vendor a fork of it. To me, it's mostly about whether it saves us time vs writing it ourselves.

@jgreitemann
Copy link
Author

datatest-stable is maintained by the cargo-nextest folks and seems pretty well maintained. If it were to be abandoned, it shouldn't be hard to replace it with something else or even to go back to manually spelling out the test cases; there's basically no API surface other than the harness! macro that specifies the data directories.

I haven't measured build times, though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests like this one for user.email or those for operation.hostname rely on the format specified by the schema for those properties. jj does not in fact enforce these formats, nor do they functionally matter AFAICT. Similarly, git does not mind a malformed email address.

So these are cases where the schema is more restrictive than what jj actually accepts. I guess it still makes sense usability-wise to have the schema nudge the user towards best practices and keep the format? If so, I do think it makes sense to keep these tests as they explicitly test the schema; however, if ever these sample-configs were to be used for other purposes (like validating the config parser), this discrepancy between schema and parser would resurface.

Copy link
Member

@steadmon steadmon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long-delayed review. This looks good to me, just a couple of nitpicks.

…iles

The previous implementation of `assert_no_forgotten_test_files`
hard-coded the name of the `runner` integration test and required all
other source files to appear in matching `mod` declarations. Thus, this
approach cannot handle multiple integration tests.

However, additional integration tests may be desirable
- to support tests using a custom test harness (see upcoming commits)
- to balance the trade-off between test run time and compile time as
  the test suite grows in the future.

The new implementation first uses `taplo` to parse the `[[test]]`
sections of the manifest to identify integration test main modules,
and then searches in those for `mod` declarations. This is then compared
to the list of source files in the tests directory. Like the previous
implementation, the new one does not attempt to recurse into submodules
or to handle directory-style modules; just like before it only treats
source files without a module declaration as an error and relies on the
compiler to complain about the other way around.

When `taplo` is not installed, the check is skipped unless it is running
in CI where we require `taplo` to be available.
The `datatest-stable` crate allows to dynamically instantiate test cases
based on available files. This is applied to `test_config_schema` to
create one test case per config file. As case in point, the test case
for `hints.toml` was missing previously, hence the total number of tests
is up one.

This will become useful when adding more config examples to somewhat
exhaust the schema.

`datatest-stable` uses a custom test harness and thus cannot be used in
the same integration test binary that all of the other test modules run
in. However, if data-driven tests are to be used for other applications,
they can share in the same binary, so the module structure is already
set up to mirror the central "runner" approach.
Adds a bunch of additional sample config toml files. Via the
`datatest_runner`, these each correspond to a test case to check that
the toml is correctly (in-)validated according to the schema.

The `valid/*.toml` files typically define multiple related config
options at once. Where there's some overlap with the default configs in
`cli/src/config`, the aim was to choose different allowed values, e.g.
hex colors, file size in bytes (numeric), etc.

The `invalid/*.toml` files typically only define a single offending
property such as to not obscure individual false negatives. All of the
"invalid" files are still valid toml as the aim is not to test the
`toml_edit` crate or Taplo.

The sample files all contain a Taplo schema directive. This allows them
to be validated against the schema on the fly by Taplo's LSP and derived
IDE plugins to speed up editing and immediately highlight offending
options.

Closes jj-vcs#5695.
These are two more instances where the default values were wrong and in
fact not even consistent with the schema itself.

I've found these by running
```sh
jq -r 'paths(type == "object" and has("default")) as $p | getpath($p).default | tojson as $v | $p | map("\"\(select(. != "properties"))\"") | join(".") as $k | "\($k) = \($v)"' cli/src/config-schema.json | taplo check --schema=file://$PWD/cli/src/config-schema.json -
```
which uses `jq` to filter the default values from the schema definition
to create a rudimentary TOML file containing all the defaults according
to the schema and then uses `taplo` the validate this TOML against the
schema.

This approach could be developed further to also parse the intermediate
TOML file and compare the result with the default config (from parsing
an empty config). That would not only test for self-consistency of the
schema's proclaimed defaults but also for consistency with the actual
defaults as assumed by jj.
While `ui.pager` can be a string which will be tokenized on whitespace,
and argument token array, or a command/env table, the `command` key
within that table currently must be an array. The schema previously
explicitly also allowed it to be a string but that does not actually
work, as exemplified by running:
```sh
$ jj --config-file cli/tests/sample-configs/invalid/ui.pager_command-env_string.toml config list
Config error: Invalid type or value for ui.pager
Caused by: data did not match any variant of untagged enum CommandNameAndArgs
```

`CommandNameAndArgs` should potentially be changed to allow strings.
For now, the schema has been updated to reflect the status quo. A new
sample toml has been added to the `invalid` directory to cover this;
prior to updating the schema, this new test case failed. Once the
behavior is changed to allow string, the file merely needs to be moved
to `valid`.
Anytime an external tool is referenced in the config, the command can be
provided as a string or as a token array. In the latter case, the array
must not be empty; at least the command name must be provided.

The schema didn't previously object to an empty array, though; this has
now been rectified. I've added more sample configs to cover this case.
Those same configs can also be used to illustrate that this is indeed
jj's current behavior:

$ jj --config-file cli/tests/sample-configs/invalid/ui.pager_empty_array.toml show
Config error: Invalid type or value for ui.pager
Caused by: data did not match any variant of untagged enum CommandNameAndArgs
                                                                                                                                                                                  
$ jj --config-file cli/tests/sample-configs/invalid/ui.pager.command_empty_array.toml show
Config error: Invalid type or value for ui.pager
Caused by: data did not match any variant of untagged enum CommandNameAndArgs
                                                                                                                                                                                  
$ jj --config-file cli/tests/sample-configs/invalid/ui.editor_empty_array.toml config edit --user
Config error: Invalid type or value for ui.editor
Caused by: data did not match any variant of untagged enum CommandNameAndArgs
                                                                                                                                                                                  
$ jj --config-file cli/tests/sample-configs/invalid/ui.diff-editor_empty_array.toml split
Error: Failed to load tool configuration
Caused by:
1: Invalid type or value for ui.diff-editor
2: data did not match any variant of untagged enum CommandNameAndArgs
                                                                                                                                                                                  
$ jj --config-file cli/tests/sample-configs/invalid/ui.merge-editor_empty_array.toml resolve
Error: Failed to load tool configuration
Caused by:
1: Invalid type or value for ui.merge-editor
2: data did not match any variant of untagged enum CommandNameAndArgs
                                                                                                                                                                                  
$ jj --config-file cli/tests/sample-configs/invalid/ui.diff.tool_empty_array.toml diff
Config error: Invalid type or value for ui.diff.tool
Caused by: data did not match any variant of untagged enum CommandNameAndArgs
                                                                                                                                                                                  
$ jj --config-file cli/tests/sample-configs/invalid/fix.tools.command_empty_array.toml fix
Config error: Invalid type or value for fix.tools.black
Caused by: data did not match any variant of untagged enum CommandNameAndArgs
in `command`

As a notable exception, `ui.default-command` *is* allowed to be an empty
array. In that case, `jj` will print a usage message. This is also
covered by a valid sample config.
…ol config

Not sure if that is intentional or should rather be considered a bug,
but currently the "structured" option for specifying an external tool
requires that both the command "command" and the "env" keys are
specified. The "command" key makes sense; for the "env" key it came as
a surprise, but it can be argued that this form should only be used when
environment variables need to be specified and the plain string or array
form should be used otherwise.

Either way, the schema did not accurately reflect the current behavior;
now it does. Two sample configs have been added as schema test cases.
The `CommandNameAndArgs` struct is used in multiple places to specify
external tools. Previously, the schema only allowed for this in
`ui.pager`.

This commit adds a few sample configs which define variables editors and
fix tools as commands with env vars.

The schema has also been updated to make these valid.
A pattern has emerged where a integration tests check for the
availability of an external tool (`git`, `taplo`, `gpg`, ...) and skip
the test (by simply passing it) when it is not available. To check this,
the program is run with the `--version` flag.

Some tests require that the program be available at least when running
in CI, by calling `ensure_running_outside_ci` conditionally on the
outcome. The decision is up to each test, though, the utility merely
returns a `bool`.
@martinvonz
Copy link
Member

I'm happy to trust @steadmon's review of this but we're both googlers, so I'll let another maintainer approve it instead. Maybe @thoughtpolice?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants