Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Ensure consistency between code and documentation. #9173

Closed
ongchi opened this issue Feb 9, 2024 · 4 comments
Closed

[Proposal] Ensure consistency between code and documentation. #9173

ongchi opened this issue Feb 9, 2024 · 4 comments
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@ongchi
Copy link
Contributor

ongchi commented Feb 9, 2024

Is your feature request related to a problem or challenge?

There are currently two documentation hosting targets for DataFusion: docs.rs (several crates) and arrow.apache.org.

  1. docs.rs - All crates published to crates.io are automatically documented on docs.rs. Doc tests can also serve as working examples and ensure the correctness of documentation.
  2. User Guide - The document hosted on arrow.apache.org is generated by Sphinx with manually mantained document source. It's a great source for getting an overview of the project and understanding how DataFusion works and its general usage.

Parts of the documentation could be shared between each other. For example, built-in functions should also be listed in the Expression API of the user guide. However, each of them maintain its own document source separately, and they are not fully consistant with each other.

Describe the solution you'd like

Merge relevant parts of Sphinx source into Rust doc comments.
Then extract doc comments from JSON output (rfcs#2963 - nightly toolchain required) of the rustdoc and generate markdown files. Finally, include these files in Sphinx by doctree.

A utility to generate markdown files from doc comments is required. It should not take much effort by utilizing rustdoc-json and rustdoc-types.

Describe alternatives you've considered

Create a shared doc folder between Rust and Sphinx source, and merge relevant part into one. Then include external file by doc attribute in Rust, or by doctree in Sphinx.

This is really annoying to find the right file when writing doc in development. And I don't want to do that.🤪

Additional context

No response

@alamb
Copy link
Contributor

alamb commented Feb 9, 2024

Thank you @ongchi -- I think this is a really great idea. Thank you

Here is a similar issue from @andygrove #7951 that also has a PR #7956

As a pragmatic matter, here is my suggestion for incrementally implementing this without having to make a massive PR that will be a conflict magnet.

First, we accept that there will be a period of time where we have a split set of documentation (one auto generated, and one static)

Then, build the new automatic documentation system you describe based on the functions in https://github.com/apache/arrow-datafusion/tree/main/datafusion/functions (it is a subset at the moment). That way we can get the tooling and pattern sorted out. Then as we migrate the rest of the functions over, we can migrate the documentation as well

@ongchi
Copy link
Contributor Author

ongchi commented Feb 13, 2024

Here is a quick (and dirty) proof of concept of the doc comments extractor.

# Under arrow-datafusion project folder
comment-extract \
    --package "datafusion-expr" \
    --module-path "datafusion_expr::expr_fn" \
    --kind function

The output example would be like this:
https://gist.github.com/ongchi/ad5b256ddcf0dc5560e910e765e2c225

@alamb
Copy link
Contributor

alamb commented Feb 14, 2024

I took a quick look and https://gist.github.com/ongchi/ad5b256ddcf0dc5560e910e765e2c225 looks 👌 very nice -- thanks @ongchi

@alamb
Copy link
Contributor

alamb commented Aug 20, 2024

I believe we now achieve this goal using the doctest! macro -- for example

// Instructions for Documentation Examples
//
// The following commands test the examples from the user guide as part of
// `cargo test --doc`
//
// # Adding new tests:
//
// Simply add code like this to your .md file and ensure your md file is
// included in the lists below.
//
// ```rust
// <code here will be tested>
// ```
//
// Note that sometimes it helps to author the doctest as a standalone program
// first, and then copy it into the user guide.
//
// # Debugging Test Failures
//
// Unfortunately, the line numbers reported by doctest do not correspond to the
// line numbers of in the .md files. Thus, if a doctest fails, use the name of
// the test to find the relevant file in the list below, and then find the
// example in that file to fix.
//
// For example, if `user_guide_expressions(line 123)` fails,
// go to `docs/source/user-guide/expressions.md` to find the relevant problem.

@alamb alamb closed this as completed Aug 20, 2024
@alamb alamb added the documentation Improvements or additions to documentation label Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants