Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug TiDB Cloud Documentation: Import Sample Data to TiDB Cloud #15740

Open
Tracked by #15480
qiancai opened this issue Dec 16, 2023 · 10 comments
Open
Tracked by #15480

Debug TiDB Cloud Documentation: Import Sample Data to TiDB Cloud #15740

qiancai opened this issue Dec 16, 2023 · 10 comments
Assignees
Labels
2024-tidb-docs-dash This issue or PR is included in the 2024 TiDB Docs Dash event. tidb-docs-dash-bonus Indicates that the issue or PR with bonus points

Comments

@qiancai
Copy link
Collaborator

qiancai commented Dec 16, 2023

This issue is a sub-issue of Debug TiDB Cloud Documentation: Summary Issue · Issue #15480 · pingcap/docs. The purpose of this sub-issue is to verify and debug the Import Sample Data to TiDB Cloud document.

You can follow the instructions provided in #15480 to verify and debug the instructions in this document.

  1. After finishing your verification, please add your verification result to this sub-issue as a comment. The result can be the issues you encounter, the mistakes you find, or any other findings. If everything looks fine, you can also add it as a comment.
  2. For any issues you found during the verification, welcome to create a pull request (PR) to fix them directly. In the pull request, please indicate which issue this PR resolves in the PR description (for example, fix #15740). To learn how to create a pull request, see TiDB Documentation Contributing Guide.

Note: Currently, the TiDB Cloud documentation is in English only and it is stored in the release-7.5 branch of pingcap/docs for reusing the SQL documentation of TiDB. Hence, to create a pull request for TiDB Cloud documentation, make sure that your PR is based on the release-7.5 branch.

Your contribution to testing and verifying the documentation is highly appreciated!

@minaelee
Copy link

minaelee commented Jan 9, 2024

/assign

@minaelee
Copy link

minaelee commented Jan 10, 2024

General notes:
My main impression is that this document seems to have an identity crisis - is it a tutorial, or is it a reference?

In some places it seems to be a tutorial, such as where it instructs the reader to use the sample data and sample Bucket URI and so on. It offers a hands-on experience to achieve a pre-set goal without trying to explore every possible option.

In other places, it seems to be to be a reference that provides details about every aspect of a topic, such as where it tells the reader all about importing into pre-created tables or importing from AS3/GCS, despite the sample data only coming from AWS.

I would strongly consider splitting this into two documents:

  1. Make the original document 'Import Sample Data (SQL File)' into a reference, more along the lines of the other documents in this group that start with "Import..." that consistently gives all the details for each option, instead of skipping some and explaining others. I would rename this file 'Import SQL from Amazon S3 or GCS', to go along with the other documents in its group, which are named 'Import CSV File from Amazon S3 or GCS' and 'Import Apache Parquet Files from Amazon S3 or GCS' in the navigation sidebar.

  2. A tutorial, explicitly referred to as a tutorial, titled 'Try Out SQL Import' (along the lines of the more general 'Try Out' guides in the 'Getting Started' section), added as a subdocument under 'Import SQL from Amazon S3 or GCS'. Perhaps more tutorials could be added for the other import options as well—in which case, I would advise moving the tutorials to their own named section under the 'Import Data' section.

If this document is not split into two, then I have additional comments regarding making the single document more consistently one way or the other.

@minaelee
Copy link

In Step 2, this block of text seems unnecessary:

Data format: select SQL File. TiDB Cloud supports importing compressed files in the following formats: .gzip, .gz, .zstd, .zst and .snappy. If you want to import compressed SQL files, name the files in the ${db_name}.${table_name}.${suffix}.sql.${compress} format, in which ${suffix} is optional and can be any integer such as '000001'. For example, if you want to import the trips.000001.sql.gz file to the bikeshare.trips table, you can rename the file as bikeshare.trips.000001.sql.gz. Note that you only need to compress the data files, not the database or table schema files. The Snappy compressed file must be in the official Snappy format. Other variants of Snappy compression are not supported.

The information is repeated in the import UI, as well as in the Naming Conventions for Data Import Page that's linked from the UI. That's 4 places at least with the same information repeated, meaning 4 potential separate update points.

I suggest removing the block entirely (everything after Data format: select SQL File), and potentially leave a link to the Naming Conventions for Data Import page, i.e.:

Data format: select SQL File. For information about naming conventions, see Naming conventions for data import.

@minaelee
Copy link

At the end of Part 2:

If the region of the bucket is different from your cluster, confirm the compliance of cross region. Click Next.

The "Click Next" directive is confusing for those whose region is not different from their cluster, and thus will not see a Next button to click. It was confusing to me until I realized that it was not a standalone command but connected to the previous one. Suggest connecting the two sentences: If the region of the bucket is different from your cluster, confirm the compliance of the cross region, then click Next."

@minaelee
Copy link

minaelee commented Jan 10, 2024

In Step 3, it should be more clear to the user that if using the sample data, they should choose the import from S3 option. This is an example of where the document does not know whether it's a tutorial or reference. Here, it acts like a reference, saying:

You can choose to import into pre-created tables, or import schema and data from the source.

Then giving detailed information about each. No explicit instruction is given to someone who is following along with the sample data as to which one to choose.

Also in Step 3:

When the data import progress shows Completed, you have successfully imported the sample data and the database schema to your database in TiDB Cloud.
Once the cluster finishes the data importing process, you will get the sample data in your database.

Additional direction could be useful here. Tell the user that after the Import Task window shows that the task is completed, click the "Explore your data by Chat2Query" button to run test queries in the terminal.

@rpaik
Copy link
Member

rpaik commented Jan 10, 2024

@minaelee thanks for your contributions!

@dveeden
Copy link
Contributor

dveeden commented Jan 10, 2024

I agree that splitting this into a reference and tutorial would be good.

@okJiang
Copy link
Member

okJiang commented Jan 10, 2024

My main impression is that this document seems to have an identity crisis - is it a tutorial, or is it a reference?

I agree. In fact, this problem also exists in a large number of other documents.

You have done a great job, and all of the suggestions are very valuable and practical. Thank you very much. As a developer working on related features, I support all of your suggestions.👍 @minaelee

@Frank945946
Copy link
Contributor

Thank you for your suggestions. Your suggestions are very specific and targeted. We will make improvements both in the UI and documentation based on your feedback. @minaelee

@hfxsd
Copy link
Collaborator

hfxsd commented Jan 10, 2024

Hi @minaelee , Thank you sincerely for your valuable feedback! We are truly impressed by your technical writing expertise.

I wholeheartedly agree with all of your insightful suggestions. Would you be so kind as to create a pull request (PR) to update the documentation as per your recommendations?

Your contributions are greatly appreciated, and we look forward to implementing your enhancements.

@hfxsd hfxsd assigned hfxsd and unassigned hfxsd Jan 22, 2024
@rpaik rpaik added the tidb-docs-dash-bonus Indicates that the issue or PR with bonus points label Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2024-tidb-docs-dash This issue or PR is included in the 2024 TiDB Docs Dash event. tidb-docs-dash-bonus Indicates that the issue or PR with bonus points
Development

No branches or pull requests

7 participants