Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: DocSlt, documentation examples as sqllogictest tests #5117

Closed
stdrc opened this issue Sep 5, 2022 · 17 comments
Closed

RFC: DocSlt, documentation examples as sqllogictest tests #5117

stdrc opened this issue Sep 5, 2022 · 17 comments
Labels

Comments

@stdrc
Copy link
Member

stdrc commented Sep 5, 2022

Background

Currently we write e2e tests in the form of sqllogictest scripts, and we have a dedicated e2e_test folder in the source tree for this. However, sometimes we may want to write some SQL examples directly in comments near the code to document the behavior of code, so that further modification won't break the tests easily.

Such cases include:

/// Concatenates two arrays with same data type.
/// The behavior is the same as PG.
///
/// Examples:
/// - `select array_cat(array[66], array[123]);` => `[66,123]`
/// - `select array_cat(array[66], null::int[]);` => `[66]`
/// - `select array_cat(null::int[], array[123]);` => `[123]`
fn concat_array(left: DatumRef, right: DatumRef) -> Datum {

https://github.com/risingwavelabs/risingwave/blob/5f8fd83064691677ea91d18c1a960d920c2f3a07/src/expr/src/vector_op/agg/functions.rs#L128-L133

https://github.com/risingwavelabs/risingwave/blob/5f8fd83064691677ea91d18c1a960d920c2f3a07/src/frontend/src/binder/query.rs#L48-L78

Design

Like rustdoc's documentation tests, this RFC introduces DocSlt, to allow running slt tests in doc comments.

Developers can write examples in slt syntax like below:

/// Concatenates two arrays with same data type.
/// The behavior is the same as PG.
///
/// Examples:
///
/// ```slt
/// query T
/// select array_cat(array[66], array[123]);
/// ----
/// {66,123}
///
/// query T
/// select array_cat(array[66], null::int[]);
/// ----
/// {66}
///
/// query T
/// select array_cat(null::int[], array[123]);
/// ----
/// {123}
/// ```

Then use the following commands to generate slt files from such documentation and run those slt tests:

./risedev docslt -p risingwave_expr # extract SQL examples in slt syntax from `risingwave_expr`
./risedev slt-generated -p 4566 -d dev
# or ./risedev slt -p 4566 -d dev 'e2e_test/generated/**/*.slt'

The generated slt files should be added into source version control, just like the yaml files of planner tests.

The generation is in granularity of file, which means all the docslt blocks in one source file are placed in one slt file.

You can use slt-setup and slt-teardown blocks to control the setup and teardown process, like this:

//! To setup and teardown docslt environment, you can use `slt-setup`
//! and `slt-teardown` code blocks:
//!
//! ```slt-setup
//! statement ok
//! create table t (a int);
//! ```
//!
//! ```slt-teardown
//! statement ok
//! drop table t;
//! ```

All setup blocks in one source file will be added at the beginning of generated slt file. Similarly, all teardown blocks will be added at the end.

I've already implemented a demo for this RFC on branch rc/docslt, you can see the changes here: main...rc/docslt.

Please feel free to give any comments!

Future Optimizations

No response

Discussions

  1. Do we need a simplified DSL to write such tests instead of using full slt syntax?
  2. Any better approaches to achieve setup and teardown?
  3. Is there any problem to generate slt files in source file granularity?

Q&A

No response

@github-actions github-actions bot added this to the release-0.1.13 milestone Sep 5, 2022
@stdrc stdrc added the component/test Test related issue. label Sep 5, 2022
@stdrc stdrc changed the title RFC: **docslt**, documentation examples as sqllogictest tests RFC: DocSlt, documentation examples as sqllogictest tests Sep 5, 2022
@xxchan
Copy link
Member

xxchan commented Sep 5, 2022

I propose a simple alternative: just say // Refer to e2e_test/streaming/array_agg.slt for examples 😄

IMO what you want to do is to introduce an additional tool to keep doc example in sync with e2e tests. My point is that e2e tests + references when needed is already good enough, considering:

  • There are not many SQL examples in docs.
  • Some of them are very simple. Some of them are for illustration only (e.g., those on is_correlated) instead of precise ones which can be tested.
  • The docs are developer-oriented, instead of user-oriented

@neverchanje
Copy link
Contributor

Agree with xxchan. I feel it is a cool idea, but the ROI is not high enough given its engineering complexity.

@StrikeW
Copy link
Contributor

StrikeW commented Sep 6, 2022

I think it is not a good idea to maintain test cases in code comments. Those examples in the comment are just for illustration purposes. And to make DocSlt cases in sync with the latest version code you need to add those DocSlt cases to CI pipeline too.

@stdrc
Copy link
Member Author

stdrc commented Sep 6, 2022

  • There are not many SQL examples in docs.

Agree with this, so the ROI is indeed not high enough, like Tao said.

  • Some of them are very simple. Some of them are for illustration only (e.g., those on is_correlated) instead of precise ones which can be tested.

Actually what I want is exactly to turn these examples for illustration into e2e tests, not the reverse. I'm not encouraging people to write e2e tests in doc comments, just like rustdoc doesn't encourage people to write unittests or integration tests in doctest.

So I was thinking about introducing a simplified DSL for such tests, like:

=> create table t (a int);
=> insert into t values (1);
=> select * from t;
1

so the tests won't be as verbose as slt tests.

My point is that, we can have a more convenient way to document a function that handles some specific SQL cases, not to test the function. And since we can document it with SQL examples, why not execute it to force the correctness? I think the motivation is the same as Rust doctests.

  • The docs are developer-oriented, instead of user-oriented

Developer experience matters! Yes, a ref link to e2e tests is good, but it's apparently not that convenient. E2e tests are intended to test all possible cases, including corner cases and complex ones, however IMO they are too verbose to serve as examples. OTOH, they're also too distant from the code, devs may not be aware of that they break the e2e tests until the code modification is completed (think of small refactoring that seems do no harm).

@stdrc
Copy link
Member Author

stdrc commented Sep 6, 2022

IMO what you want to do is to introduce an additional tool to keep doc example in sync with e2e tests.

This is not my purpose. I want to make SQL examples in documentation tested, not to sync these examples with e2e tests. Converting the examples to e2e tests is the method, not the purpose.

@TennyZhuang
Copy link
Contributor

Agree with xxchan. I feel it is a cool idea, but the ROI is not high enough given its engineering complexity.

I'm sure its engineering complexity is very low. I can even finish it in one day :)

@TennyZhuang
Copy link
Contributor

Weak +1, I think it's a good idea just like I think rust doctest is very cool, and we've found a way to finish it with only a few effort.

The SQL examples in comments are very WYSIWYG, and I think it's more clear than some long nerual language comments.

@TennyZhuang
Copy link
Contributor

I propose a simple alternative: just say // Refer to e2e_test/streaming/array_agg.slt for examples

We can't refer to a specified statements in this way (line number will be changed quickly and unmaintainable). We may have to create too many slt files if we use this way.

@TennyZhuang
Copy link
Contributor

The engineering complexity is very low, in brief:

  1. Extract all members in workspace by cargo metadata
  2. Extract every crate's doccomments by cargo rustdoc --output-format json
  3. Filter the doc comments by language (e.g. only accept docslt)
  4. Generate some slt files and just run them using sqllogictest binary.

@TennyZhuang
Copy link
Contributor

I think if we always think things in the ROI way, we even can't get risedev :)

@stdrc
Copy link
Member Author

stdrc commented Sep 6, 2022

The engineering complexity is very low

Yep, I already finished most parts of the tool last night with less than 200 LoC.

@TennyZhuang TennyZhuang removed this from the release-0.1.13 milestone Sep 6, 2022
@xxchan
Copy link
Member

xxchan commented Sep 6, 2022

I think if we always think things in the ROI way, we even can't get risedev :)

Request @skyzh for comments 😄

I propose a simple alternative: just say // Refer to e2e_test/streaming/array_agg.slt for examples

We can't refer to a specified statements in this way (line number will be changed quickly and unmaintainable). We may have to create too many slt files if we use this way.

Too many slt files isn't bad IMO. Or what about considering subtest name label in slt? risinglightdb/sqllogictest-rs#81 (comment) 😄

@skyzh
Copy link
Contributor

skyzh commented Sep 6, 2022

Most e2e test now needs complex set-ups (like a few session variables). If we add more such variables in the future, I guess it would be hard to find and modify each single case in rustdoc.

I don't have a clear answer for now whether we need docslt. It would be good to have a try and decide whether to migrate later. Some measures could be taken to reduce the complexity of docslt. e.g., we only allow testing simple projections / computation in slt.

@xxchan
Copy link
Member

xxchan commented Sep 6, 2022

we only allow testing simple projections / computation in slt.

They (implementations of functions) are just the most useful use cases of docslt I can imagine. 🤔

@xxchan
Copy link
Member

xxchan commented Sep 6, 2022

why not execute it to force the correctness? I think the motivation is the same as Rust doctests.

  • The docs are developer-oriented, instead of user-oriented

Developer experience matters!

(Off topic) Fun fact: only @BugenZhao @ZENOTME @wangrunji0408 and me has written doctests, and most of then are ignored 😄

@BugenZhao
Copy link
Member

My doubt is that, unlike the doc test of Rust, we'll have no intuitive way to guarantee that the docslt really covers the below codes. For example, some changes on the optimizer make the same SQL never be dispatched to the original expression anymore, but we can't check it out immediately.

@stdrc
Copy link
Member Author

stdrc commented Sep 14, 2022

Since #5246 is merged, you can now write SQL examples in sqllogictest syntax in doc comments. The tests will be run on CI to force the correctness. Looking forward to bug reports and improvement suggestions. I also added some documentation for this tool in developer guide, feel free to check it out.😋

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants