- If your TiDB cluster is hosted by AWS (the Dev Tier is hosted by AWS by default), fill in the following parameters:
+ If your TiDB cluster is hosted by AWS (the Developer Tier is hosted by AWS by default), fill in the following parameters:
- **Data Source Type**: `AWS S3`.
- **Bucket URL**: enter the sample data URL `s3://tidbcloud-samples/data-ingestion/`.
@@ -151,7 +156,13 @@ We provide Capital Bikeshare sample data for you to easily import data and run s
4. Click **Import**.
- The data import process will take 5 to 10 minutes. When the data import progress bar shows **Success**, you successfully import the sample data and the database schema in your database.
+ A warning message about the database resource consumption is displayed. For a newly created cluster, you can ignore the warning message.
+
+5. Click **Confirm**.
+
+ TiDB Cloud starts validating whether it can access the sample data in the specified bucket URL. After the validation is completed and successful, the import task starts automatically.
+
+The data import process will take 5 to 10 minutes. When the data import progress bar shows **Success**, you successfully import the sample data and the database schema in your database.
## Step 4. Query data
diff --git a/tidb-cloud/tidb-cloud-sql-tuning-overview.md b/tidb-cloud/tidb-cloud-sql-tuning-overview.md
new file mode 100644
index 0000000000000..e2a1f29063661
--- /dev/null
+++ b/tidb-cloud/tidb-cloud-sql-tuning-overview.md
@@ -0,0 +1,117 @@
+---
+title: SQL Tuning Overview
+summary: Learn about how to tune SQL performance in TiDB Cloud.
+---
+
+# SQL Tuning Overview
+
+This document introduces how to tune SQL performance in TiDB Cloud. To get the best SQL performance, you can do the following:
+
+- Tune SQL performance. There are many ways to optimize SQL performance, such as analyzing query statements, optimizing execution plans, and optimizing full table scan.
+- Optimize schema design. Depending on your business workload type, you may need to optimize the schemas to avoid transaction conflicts or hotspots.
+
+## Tune SQL performance
+
+To improve the performance of SQL statements, consider the following principles.
+
+- Minimize the scope of the scanned data. It is always a best practice to scan only the minimum scope of data and avoid scanning all data.
+- Use appropriate indexes. For each column in the `WHERE` clause in a SQL statement, make sure that there is a corresponding index. Otherwise, the `WHERE` clause will scan the full table and result in poor performance.
+- Use appropriate Join types. Depending on the size and correlation of each table in the query, it is very important to choose the right Join type. Generally, the cost-based optimizer in TiDB automatically chooses the optimal Join type. However, in some cases, you may need to specify the Join type manually. For details, see [Explain Statements That Use Joins](/explain-joins.md).
+- Use appropriate storage engines. It is recommended to use the TiFlash storage engine for Hybrid Transactional and Analytical Processing (HTAP) workloads. See [HTAP Queries](https://docs.pingcap.com/tidb/stable/dev-guide-hybrid-oltp-and-olap-queries).
+
+TiDB Cloud provides several tools to help you analyze slow queries on a cluster. The following sections describe several approaches to optimize slow queries.
+
+### Use Statement on the Diagnosis tab
+
+The TiDB Cloud console provides a **[Statement](/tidb-cloud/tune-performance.md#statement-analysis)** sub-tab on the **Diagnosis** tab. It collects the execution statistics of SQL statements of all databases on the cluster. You can use it to identify and analyze SQL statements that consume a long time in total or in a single execution.
+
+Note that on this sub-tab, SQL queries with the same structure (even if the query parameters do not match) are grouped into the same SQL statement. For example, `SELECT * FROM employee WHERE id IN (1, 2, 3)` and `select * from EMPLOYEE where ID in (4, 5)` are both part of the same SQL statement `select * from employee where id in (...)`.
+
+You can view some key information in **Statement**.
+
+- SQL statement overview: including SQL digest, SQL template ID, the time range currently viewed, the number of execution plans, and the database where the execution takes place.
+- Execution plan list: if a SQL statement has more than one execution plan, the list is displayed. You can select different execution plans and the details of the selected execution plan are displayed at the bottom of the list. If there is only one execution plan, the list will not be displayed.
+- Execution plan details: shows the details of the selected execution plan. It collects the execution plans of such SQL type and the corresponding execution time from several perspectives to help you get more information. See [Execution plan in details](https://docs.pingcap.com/tidb/stable/dashboard-statement-details#statement-execution-details-of-tidb-dashboard) (area 3 in the image below).
+
+
+
+In addition to the information in the **Statement** dashboard, there are also some SQL best practices for TiDB Cloud as described in the following sections.
+
+### Check the execution plan
+
+You can use [`EXPLAIN`](/explain-overview.md) to check the execution plan calculated by TiDB for a statement during compiling. In other words, TiDB estimates hundreds or thousands of possible execution plans and selects an optimal execution plan that consumes the least resource and executes the fastest.
+
+If the execution plan selected by TiDB is not optimal, you can use EXPLAIN or [`EXPLAIN ANALYZE`](/sql-statements/sql-statement-explain-analyze.md) to diagnose it.
+
+### Optimize the execution plan
+
+After parsing the original query text by `parser` and basic validity verification, TiDB first makes some logical equivalent changes to the query. For more information, see [SQL Logical Optimization](/sql-logical-optimization.md).
+
+Through these equivalence changes, the query can become easier to handle in the logical execution plan. After the equivalence changes, TiDB gets a query plan structure that is equivalent to the original query, and then gets a final execution plan based on the data distribution and the specific execution overhead of an operator. For more information, see [SQL Physical Optimization](/sql-physical-optimization.md).
+
+Also, TiDB can choose to enable execution plan cache to reduce the creation overhead of the execution plan when executing the `PREPARE` statement, as introduced in [Prepare Execution Plan Cache](/sql-prepared-plan-cache.md).
+
+### Optimize full table scan
+
+The most common reason for slow SQL queries is that the `SELECT` statements perform full table scan or use incorrect indexes. You can use EXPLAIN or EXPLAIN ANALYZE to view the execution plan of a query and locate the cause of the slow execution. There are [three methods](https://docs.pingcap.com/tidb/stable/dev-guide-optimize-sql) that you can use to optimize.
+
+- Use secondary index
+- Use covering index
+- Use primary index
+
+### DML best practices
+
+See [DML best practices](https://docs.pingcap.com/tidb/stable/dev-guide-optimize-sql-best-practices#dml-best-practices).
+
+### DDL best practices when selecting primary keys
+
+See [Guidelines to follow when selecting primary keys](https://docs.pingcap.com/tidb/stable/dev-guide-create-table#guidelines-to-follow-when-selecting-primary-key).
+
+### Index best practices
+
+[Best practices for indexing](https://docs.pingcap.com/tidb/stable/dev-guide-index-best-practice) include best practices for creating indexes and using indexes.
+
+The speed of creating indexes is conservative by default, and the index creation process can be accelerated by [modifying variables](https://docs.pingcap.com/tidb/stable/dev-guide-optimize-sql-best-practices#add-index-best-practices) in some scenarios.
+
+
+
+## Optimize schema design
+
+If you still cannot get better performance based on SQL performance tuning, you may need to check your schema design and data read model to avoid transaction conflicts and hotspots.
+
+### Transaction conflicts
+
+For more information on how to locate and resolve transaction conflicts, see [Troubleshoot Lock Conflicts](https://docs.pingcap.com/tidb/stable/troubleshoot-lock-conflicts#troubleshoot-lock-conflicts).
+
+### Hotspot issues
+
+You can analyze hotspot issues using [Key Visualizer](/tidb-cloud/tune-performance.md#key-visualizer).
+
+You can use Key Visualizer to analyze the usage patterns of TiDB clusters and troubleshoot traffic hotspots. This page provides a visual representation of the TiDB cluster's traffic over time.
+
+You can observe the following information in Key Visualizer. You may need to understand some [basic concepts](https://docs.pingcap.com/tidb/stable/dashboard-key-visualizer#basic-concepts) first.
+
+- A large heat map that shows the overall traffic over time
+- The detailed information about a coordinate of the heat map
+- The identification information such as tables and indexes that is displayed on the left side
+
+In Key Visualizer, there are [four common heat map results](https://docs.pingcap.com/tidb/stable/dashboard-key-visualizer#common-heatmap-types).
+
+- Evenly distributed workload: desired result
+- Alternating brightness and darkness along the X-axis (time): need to check the resources at peak times
+- Alternating brightness and darkness along the Y-axis: need to check the degree of hotspot aggregation generated
+- Bright diagonal lines: need to check the business model
+
+In both cases of X-axis and Y-axis alternating bright and dark, you need to address read and write pressure.
+
+For more information about SQL performance optimization, see [SQL Optimization](https://docs.pingcap.com/tidb/stable/sql-faq#sql-optimization) in SQL FAQs.
diff --git a/tidb-cloud/tidb-cloud-tune-performance-overview.md b/tidb-cloud/tidb-cloud-tune-performance-overview.md
new file mode 100644
index 0000000000000..d44304f7f1804
--- /dev/null
+++ b/tidb-cloud/tidb-cloud-tune-performance-overview.md
@@ -0,0 +1,127 @@
+---
+title: Overview for Analyzing and Tuning Performance
+summary: Learn about how to analyze and tune SQL performance in TiDB Cloud.
+---
+
+# Overview for Analyzing and Tuning Performance
+
+This document describes steps to help you analyze and tune SQL performance in TiDB Cloud.
+
+## User response time
+
+User response time indicates how long an application takes to return the results of a request to users. As you can see from the following sequential timing diagram, the time of a typical user request contains the following:
+
+- The network latency between the user and the application
+- The processing time of the application
+- The network latency during the interaction between the application and the database
+- The service time of the database
+
+The user response time is affected by various subsystems on the request chain, such as network latency and bandwidth, number and request types of concurrent users, and resource usage of server CPU and I/O. To optimize the entire system effectively, you need to first identify the bottlenecks in user response time.
+
+To get a total user response time within a specified time range (`ΔT`), you can use the following formula:
+
+Total user response time in `ΔT` = Average TPS (Transactions Per Second) x Average user response time x `ΔT`.
+
+
+
+## Relationship between user response time and system throughput
+
+User response time consists of service time, queuing time, and concurrent waiting time to complete a user request.
+
+```
+User Response time = Service time + Queuing delay + Coherency delay
+```
+
+- Service time: the time a system consumes on certain resources when processing a request, for example, the CPU time that a database consumes to complete a SQL request.
+- Queuing delay: the time a system waits in a queue for service of certain resources when processing a request.
+- Coherency delay: the time a system communicates and collaborates with other concurrent tasks, so that it can access shared resources when processing a request.
+
+System throughput indicates the number of requests that can be completed by a system per second. User response time and throughput are usually inverse of each other. When the throughput increases, the system resource utilization and the queuing latency for a requested service increase accordingly. Once resource utilization exceeds a certain inflection point, the queuing latency will increase dramatically.
+
+For example, for a database system running OLTP loads, after its CPU utilization exceeds 65%, the CPU queueing scheduling latency increases significantly. This is because concurrent requests of a system are not completely independent, which means that these requests can collaborate and compete for shared resources. For example, requests from different users might perform mutually exclusive locking operations on the same data. When the resource utilization increases, the queuing and scheduling latency increases too, which causes that the shared resources cannot be released in time and in turn prolongs the waiting time for shared resources by other tasks.
+
+## Troubleshoot bottlenecks in user response time
+
+There are several pages on the TiDB Cloud console that help you troubleshoot user response time.
+
+- **Overview**: on this tab, you can view TiDB metrics such as total QPS, latency, connections, request QPS, request duration, storage size, CPU, IO Read, and IO Write.
+- **Diagnosis**:
+
+ - **Statement** enables you to directly observe SQL execution on the page, and easily locate performance problems without querying the system tables. You can click a SQL statement to further view the execution plan of the query for troubleshooting and analysis. For more information about SQL performance tuning, see [SQL Tuning Overview](/tidb-cloud/tidb-cloud-sql-tuning-overview.md).
+ - **Key Visualizer** helps you observe TiDB's data access patterns and data hotspots.
+
+If you require additional metrics, you can contact the [PingCAP support team](/tidb-cloud/tidb-cloud-support.md).
+
+If you experience latency and performance issues, refer to the steps in the following sections for analysis and troubleshooting.
+
+### Bottlenecks outside the TiDB cluster
+
+Observe Latency(P80) on the **Overview** tab. If this value is much lower than the P80 value for user response time, you can determine that the main bottleneck might be outside the TiDB cluster. In this case, you can use the following steps to troubleshoot the bottleneck.
+
+1. Check the TiDB version on the left side of the [Overview tab](/tidb-cloud/monitor-tidb-cluster.md). If it is v6.0.0 or earlier versions, it is recommended to contact the [PingCAP support team](/tidb-cloud/tidb-cloud-support.md) to confirm if the Prepared plan cache, Raft-engine and TiKV AsyncIO features can be enabled. Enabling these features, along with application-side tuning, can significantly improve throughput performance and reduce latency and resource utilization.
+2. If necessary, you can increase the TiDB token limit to increase the throughput.
+3. If the prepared plan cache feature is enabled, and you use JDBC on the user side, it is recommended to use the following configuration:
+
+ ```
+ useServerPrepStmts=true&cachePrepStmts=true& prepStmtCacheSize=1000&prepStmtCacheSqlLimit=20480&useConfigs=maxPerformance
+ ```
+
+ If you do not use JDBC and want to take full advantage of the prepared plan cache feature of the current TiDB cluster, you need to cache the prepared statement objects on the client side. You do not need to reset the calls to StmtPrepare and StmtClose. Reduce the number of commands to be called for each query from 3 to 1. It requires some development effort, depending on your performance requirements and the amount of client-side changes. You can consult the [PingCAP support team](/tidb-cloud/tidb-cloud-support.md) for help.
+
+### Bottlenecks in the TiDB cluster
+
+If you determine that the performance bottleneck is within a TiDB cluster, it is recommended that you do the following:
+
+- Optimize slow SQL queries.
+- Resolve hotspot issues.
+- Scale out the cluster to expand the capacity.
+
+#### Optimize slow SQL queries
+
+For more information about SQL performance tuning, see [SQL Tuning Overview](/tidb-cloud/tidb-cloud-sql-tuning-overview.md).
+
+#### Resolve hotstpot issues
+
+You can view hotspot issues on the [Key Visualizer tab](/tidb-cloud/tune-performance.md#key-visualizer). The following screenshot shows a sample heat map. The horizontal coordinate of the map is the time, and the vertical coordinate is the table and index. Brighter color indicates higher traffic. You can toggle the display of read or write traffic in the toolbar.
+
+
+
+The following screenshot shows an example of a write hotspot. A bright diagonal line (diagonal up or diagonal down) appears in the write flow graph, and the write traffic appears only at the end of the line. It becomes a stepped pattern as the number of table Regions grows. It indicates that there is a write hotspot in the table. When a write hotspot occurs, you need to check whether you are using a self-incrementing primary key, or no primary key, or using a time-dependent insert statement or index.
+
+
+
+A read hotspot is generally represented in the heat map as a bright horizontal line, usually a small table with a large number of queries, as shown in the following screenshot.
+
+
+
+Hover over the highlighted block to see which table or index has high traffic, as shown in the following screenshot.
+
+
+
+#### Scale out
+
+On the cluster [Overview](/tidb-cloud/monitor-tidb-cluster.md) page, check the storage space, CPU utilization, and TiKV IO rate metrics. If any of them are reaching the upper limit for a long time, it is possible that the current cluster size cannot meet the business requirements. It is recommended to contact the [PingCAP support team](/tidb-cloud/tidb-cloud-support.md) to confirm if you need to scale out the cluster.
+
+#### Other issues
+
+If the previous methods cannot resolve the performance issue, you can contact the [PingCAP support team](/tidb-cloud/tidb-cloud-support.md) for help. It is recommended to provide the following information to speed up the troubleshooting process.
+
+- The cluster ID
+- The issue interval and a comparable normal interval
+- The problem phenomenon and expected behavior
+- The business workload characteristics, such as read or write ratios and primary behavior
+
+## Summary
+
+In general, you can use the following optimization methods to analyze and resolve performance issues.
+
+| Action | Effect |
+|:--|:--|
+| Prepared plan cache + JDBC | Throughput performance will be greatly improved, latency will be significantly reduced, and the average TiDB CPU utilization will be significantly reduced. |
+| Enable AsyncIO and Raft-engine in TiKV | There will be some improvement in throughput performance. You need to contact the [PingCAP support team](/tidb-cloud/tidb-cloud-support.md) to enable it. |
+| Clustered Index | Throughput performance will be greatly improved. |
+| Scale out TiDB nodes |Throughput performance will be greatly improved. |
+| Client-side optimization. Split 1 JVM into 3 | Throughput performance will improve significantly and may further continue to improve throughput capacity if further split. |
+| Limit the network latency between the application and the database | High network latency can lead to decreased throughput and increased latency. |
+
+In the future, TiDB Cloud will introduce more observable metrics and self-diagnostic services. They will provide you with a more comprehensive understanding of performance metrics and operational advice to improve your experience.
diff --git a/tidb-cloud/troubleshoot-import-access-denied-error.md b/tidb-cloud/troubleshoot-import-access-denied-error.md
new file mode 100644
index 0000000000000..cad734d9dad9c
--- /dev/null
+++ b/tidb-cloud/troubleshoot-import-access-denied-error.md
@@ -0,0 +1,172 @@
+---
+title: Troubleshoot Access Denied Errors during Data Import from Amazon S3
+summary: Learn how to troubleshoot access denied errors when importing data from Amazon S3 to TiDB Cloud.
+---
+
+# Troubleshoot Access Denied Errors during Data Import from Amazon S3
+
+This document describes how to troubleshoot access denied errors that might occur when you import data from Amazon S3 into TiDB Cloud.
+
+After you click **Import** on the **Data Import Task** page of the TiDB Cloud console and confirm the import process, TiDB Cloud starts validating whether it can access your data in your specified bucket URL. If you see an error message with the keyword `AccessDenied`, an access denied error has occurred.
+
+To troubleshoot the access denied errors, perform the following checks in the AWS Management Console.
+
+## Check the policy of the IAM role
+
+1. In the AWS Management Console, go to **IAM** > **Access Management** > **Roles**.
+2. In the list of roles, find and click the role you have created for the target TiDB cluster. The role summary page is displayed.
+3. In the **Permission policies** area of the role summary page, a list of policies is displayed. Take the following steps for each policy:
+ 1. Click the policy to enter the policy summary page.
+ 2. On the policy summary page, click the **{}JSON** tab to check the permission policy. Make sure that the `Resource` fields in the policy are correctly configured.
+
+The following is a sample policy.
+
+```
+{
+ "Version": "2012-10-17",
+ "Statement": [
+ {
+ "Sid": "VisualEditor0",
+ "Effect": "Allow",
+ "Action": [
+ "s3:GetObject",
+ "s3:GetObjectVersion"
+ ],
+ "Resource": "arn:aws:s3:::tidb-cloud-source-data/mydata/*"
+ },
+ {
+ "Sid": "VisualEditor1",
+ "Effect": "Allow",
+ "Action": [
+ "s3:ListBucket",
+ "s3:GetBucketLocation"
+ ],
+ "Resource": "arn:aws:s3:::tidb-cloud-source-data"
+ },
+ {
+ "Sid": "AllowKMSkey",
+ "Effect": "Allow",
+ "Action": [
+ "kms:Decrypt"
+ ],
+ "Resource": "arn:aws:kms:ap-northeast-1:105880447796:key/c3046e91-fdfc-4f3a-acff-00597dd3801f"
+ }
+ ]
+}
+```
+
+In this sample policy, pay attention to the following:
+
+- In `"arn:aws:s3:::tidb-cloud-source-data/mydata/*"`, `"arn:aws:s3:::tidb-cloud-source-data"` is a sample S3 bucket ARN, and `/mydata/*` is a directory that you can customize in your S3 bucket root level for data storage. The directory needs to end with `/*`, for example, `"
//*"`. If `/*` is not added, the `AccessDenied` error occurs.
+
+- If you have enabled AWS Key Management Service key (SSE-KMS) with customer-managed key encryption, make sure the following configuration is included in the policy. `"arn:aws:kms:ap-northeast-1:105880447796:key/c3046e91-fdfc-4f3a-acff-00597dd3801f"` is a sample KMS key of the bucket.
+
+ ```
+ {
+ "Sid": "AllowKMSkey",
+ "Effect": "Allow",
+ "Action": [
+ "kms:Decrypt"
+ ],
+ "Resource": "arn:aws:kms:ap-northeast-1:105880447796:key/c3046e91-fdfc-4f3a-acff-00597dd3801f"
+ }
+ ```
+
+ If the objects in your bucket have been copied from another encrypted bucket, the KMS key value needs to include the keys of both buckets. For example, `"Resource": ["arn:aws:kms:ap-northeast-1:105880447796:key/c3046e91-fdfc-4f3a-acff-00597dd3801f","arn:aws:kms:ap-northeast-1:495580073302:key/0d7926a7-6ecc-4bf7-a9c1-a38f0faec0cd"]`.
+
+If your policy is not correctly configured as the preceding example shows, correct the `Resource` fields in your policy and try importing data again.
+
+> **Tip:**
+>
+> If you have updated the permission policy multiple times and still get the `AccessDenied` error during data import, you can try to revoke active sessions. Go to **IAM** > **Access Management** > **Roles**, click your target role to enter the role summary page. On the role summary page, find **Revoke active sessions** and click the button to revoke active sessions. Then, retry the data import.
+>
+> Note that this might affect your other applications.
+
+## Check the bucket policy
+
+1. In the AWS Management Console, open the Amazon S3 console, and then go to the **Buckets** page. A list of buckets is displayed.
+2. In the list, find and click the target bucket. The bucket information page is displayed.
+3. Click the **Permissions** tab, and then scroll down to the **Bucket policy** area. By default, this area has no policy value. If any denied policy is displayed in this area, the `AccessDenied` error might occur during data import.
+
+If you see a denied policy, check whether the policy relates to the current data import. If yes, delete it from the area and retry the data import.
+
+## Check the trust entity
+
+1. In the AWS Management Console, go to **IAM** > **Access Management** > **Roles**.
+2. In the list of roles, find and click the role you have created for the target TiDB cluster. The role summary page is displayed.
+3. On the role summary page, click the **Trust relationships** tab, and you will see the trusted entities.
+
+The following is a sample trust entity:
+
+```
+{
+ "Version": "2012-10-17",
+ "Statement": [
+ {
+ "Effect": "Allow",
+ "Principal": {
+ "AWS": "arn:aws:iam::380838443567:root"
+ },
+ "Action": "sts:AssumeRole",
+ "Condition": {
+ "StringEquals": {
+ "sts:ExternalId": "696e6672612d617069a79c22fa5740944bf8bb32e4a0c4e3fe"
+ }
+ }
+ }
+ ]
+}
+```
+
+In the sample trust entity:
+
+- `380838443567` is the TiDB Cloud Account ID. Make sure that this field in your trust entity matches your TiDB Cloud Account ID.
+- `696e6672612d617069a79c22fa5740944bf8bb32e4a0c4e3fe` is the TiDB Cloud External ID. Make sure that this field in your trusted entity matches your TiDB Cloud External ID.
+
+## Check the Object Ownership
+
+1. In the AWS Management Console, open the Amazon S3 console, and then go to the **Buckets** page. A list of buckets is displayed.
+2. In the list of buckets, find and click the target bucket. The bucket information page is displayed.
+3. On the bucket information page, click the **Permissions** tab, and then scroll down to the **Object Ownership** area. Make sure that the "Object Ownership" configuration is "Bucket owner enforced".
+
+ If the configuration is not "Bucket owner enforced", the `AccessDenied` error occurs, because your account does not have enough permissions for all objects in this bucket.
+
+To handle the error, click **Edit** in the upper-right corner of the Object Ownership area and change the ownership to "Bucket owner enforced". Note that this might affect your other applications that are using this bucket.
+
+## Check your bucket encryption type
+
+There are more than one way to encrypt an S3 bucket. When you try to access the objects in a bucket, the role you have created must have the permission to access the encryption key for data decryption. Otherwise, the `AccessDenied` error occurs.
+
+To check the encryption type of your bucket, take the following steps:
+
+1. In the AWS Management Console, open the Amazon S3 console, and then go to the **Buckets** page. A list of buckets is displayed.
+2. In the list of buckets, find and click the target bucket. The bucket information page is displayed.
+3. On the bucket information page, click the **Properties** tab, scroll down to the **Default encryption** area, and then check the configurations in this area.
+
+There are two types of server-side encryption: Amazon S3-managed key (SSE-S3) and AWS Key Management Service (SSE-KMS). For SSE-S3, further check is not needed because this encryption type does not cause access denied errors. For SSE-KMS, you need to check the following:
+
+- If the AWS KMS key ARN in the area is displayed in black without an underline, the AWS KMS key is an AWS-managed key (aws/s3).
+- If the AWS KMS key ARN in the area is displayed in blue with a link, click the key ARN to open the key information page. Check the left navigation bar to see the specific encryption type. It might be an AWS managed key (aws/s3) or a customer managed key.
+
+
+For the AWS managed key (aws/s3) in SSE-KMS
+
+In this situation, if the `AccessDenied` error occurs, the reason might be that the key is read-only and cross-account permission grants are not allowed. See the AWS article [Why are cross-account users getting Access Denied errors when they try to access S3 objects encrypted by a custom AWS KMS key](https://aws.amazon.com/premiumsupport/knowledge-center/cross-account-access-denied-error-s3/) for details.
+
+To solve the access denied error, click **Edit** in the upper-right corner of the **Default encryption** area, and change the AWS KMS key to "Choose from your AWS KMS keys" or "Enter AWS KMS key ARN", or change the server-side encryption type to "AWS S3 Managed Key (SSE-S3). In addition to this method, you can also create a new bucket and use the custom-managed key or the SSE-S3 encryption method.
+
+
+
+For the customer-managed key in SSE-KMS
+
+To solve the `AccessDenied` error in this situation, click the key ARN or manually find the key in KMS. A **Key users** page is displayed. Click **Add** in the upper-right corner of the area to add the role you have used to import data to TiDB Cloud. Then, try importing data again.
+
+
+
+> **Note:**
+>
+> If the objects in your bucket have been copied from an existing encrypted bucket, you also need to include the key of the source bucket in the AWS KMS key ARN. This is because the objects in the your bucket use the same encryption method as the source object encryption. For more information, see the AWS document [Using default encryption with replication](https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucket-encryption.html).
+
+## Check the AWS article for instruction
+
+If you have performed all the checks above and still get the `AccessDenied` error, you can check the AWS article [How do I troubleshoot 403 Access Denied errors from Amazon S3](https://aws.amazon.com/premiumsupport/knowledge-center/s3-troubleshoot-403/) for more instruction.
diff --git a/tidb-cloud/use-htap-cluster.md b/tidb-cloud/use-htap-cluster.md
index 137b3528f0653..233c10e5b3268 100644
--- a/tidb-cloud/use-htap-cluster.md
+++ b/tidb-cloud/use-htap-cluster.md
@@ -5,15 +5,15 @@ summary: Learn how to use HTAP cluster in TiDB Cloud.
# Use an HTAP Cluster
-[HTAP](https://en.wikipedia.org/wiki/Hybrid_transactional/analytical_processing) means Hybrid Transactional/Analytical Processing. The HTAP cluster in TiDB Cloud is composed of [TiKV](https://tikv.org), a row-based storage engine designed for transactional processing, and [TiFlash](https://docs.pingcap.com/tidb/stable/tiflash-overview)beta, a columnar storage designed for analytical processing. Your application data is first stored in TiKV and then replicated to TiFlashbeta via the Raft consensus algorithm. So it is real time replication from the row store to the columnar store.
+[HTAP](https://en.wikipedia.org/wiki/Hybrid_transactional/analytical_processing) means Hybrid Transactional/Analytical Processing. The HTAP cluster in TiDB Cloud is composed of [TiKV](https://tikv.org), a row-based storage engine designed for transactional processing, and [TiFlash](https://docs.pingcap.com/tidb/stable/tiflash-overview), a columnar storage designed for analytical processing. Your application data is first stored in TiKV and then replicated to TiFlash via the Raft consensus algorithm. So it is real time replication from the row store to the columnar store.
-With TiDB Cloud, you can create an HTAP cluster easily by specifying one or more TiFlashbeta nodes according to your HTAP workload. If the TiFlashbeta node count is not specified when you create the cluster or you want to add more TiFlashbeta nodes, you can change the node count by [scaling the cluster](/tidb-cloud/scale-tidb-cluster.md).
+With TiDB Cloud, you can create an HTAP cluster easily by specifying one or more TiFlash nodes according to your HTAP workload. If the TiFlash node count is not specified when you create the cluster or you want to add more TiFlash nodes, you can change the node count by [scaling the cluster](/tidb-cloud/scale-tidb-cluster.md).
> **Note:**
>
-> A Developer Tier cluster has one TiFlashbeta node by default and you cannot change the number.
+> A Developer Tier cluster has one TiFlash node by default and you cannot change the number.
-TiKV data is not replicated to TiFlashbeta by default. You can select which table to replicate to TiFlashbeta using the following SQL statement:
+TiKV data is not replicated to TiFlash by default. You can select which table to replicate to TiFlash using the following SQL statement:
{{< copyable "sql" >}}
@@ -21,7 +21,7 @@ TiKV data is not replicated to TiFlashbeta by default. You can select
ALTER TABLE table_name SET TIFLASH REPLICA 1;
```
-The number of replicas count must be smaller than the number of TiFlashbeta nodes. Setting the number of replicas to `0` means deleting the replica in TiFlashbeta.
+The number of replicas count must be no larger than the number of TiFlash nodes. Setting the number of replicas to `0` means deleting the replica in TiFlash.
To check the replication progress, use the following command:
@@ -31,13 +31,13 @@ To check the replication progress, use the following command:
SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = '' and TABLE_NAME = '';
```
-## Use TiDB to read TiFlashbeta replicas
+## Use TiDB to read TiFlash replicas
-After data is replicated to TiFlashbeta, you can use one of the following three ways to read TiFlashbeta replicas to accelerate your analytical computing.
+After data is replicated to TiFlash, you can use one of the following three ways to read TiFlash replicas to accelerate your analytical computing.
### Smart selection
-For tables with TiFlashbeta replicas, the TiDB optimizer automatically determines whether to use TiFlashbeta replicas based on the cost estimation. For example:
+For tables with TiFlash replicas, the TiDB optimizer automatically determines whether to use TiFlash replicas based on the cost estimation. For example:
{{< copyable "sql" >}}
@@ -55,7 +55,7 @@ explain analyze select count(*) from test.t;
+--------------------------+---------+---------+--------------+---------------+----------------------------------------------------------------------+--------------------------------+-----------+------+
```
-`cop[tiflash]` means that the task will be sent to TiFlashbeta for processing. If your queries have not selected a TiFlashbeta replica, try to update the statistics using the `analyze table` statement, and then check the result using the `explain analyze` statement.
+`cop[tiflash]` means that the task will be sent to TiFlash for processing. If your queries have not selected a TiFlash replica, try to update the statistics using the `analyze table` statement, and then check the result using the `explain analyze` statement.
### Engine isolation
@@ -77,4 +77,4 @@ Manual hint can force TiDB to use specified replicas for one or more specific ta
select /*+ read_from_storage(tiflash[table_name]) */ ... from table_name;
```
-To learn more about TiFlashbeta, refer to the documentation [here](https://docs.pingcap.com/tidb/stable/tiflash-overview/).
+To learn more about TiFlash, refer to the documentation [here](https://docs.pingcap.com/tidb/stable/tiflash-overview/).
diff --git a/tidb-lightning/tidb-lightning-backends.md b/tidb-lightning/tidb-lightning-backends.md
index ae64aa177abd3..756ce53154d75 100644
--- a/tidb-lightning/tidb-lightning-backends.md
+++ b/tidb-lightning/tidb-lightning-backends.md
@@ -1,5 +1,5 @@
---
-title: TiDB Lightning Import Mode
+title: TiDB Lightning Import Modes
summary: Learn how to choose different import modes of TiDB Lightning.
aliases: ['/docs/dev/tidb-lightning/tidb-lightning-tidb-backend/','/docs/dev/reference/tools/tidb-lightning/tidb-backend/','/tidb/dev/tidb-lightning-tidb-backend','/docs/dev/loader-overview/','/docs/dev/reference/tools/loader/','/docs/dev/load-misuse-handling/','/docs/dev/reference/tools/error-case-handling/load-misuse-handling/','/tidb/dev/load-misuse-handling','/tidb/dev/loader-overview/']
---
diff --git a/tidb-lightning/tidb-lightning-configuration.md b/tidb-lightning/tidb-lightning-configuration.md
index 98d1dccc8c29f..90799a83fb9a4 100644
--- a/tidb-lightning/tidb-lightning-configuration.md
+++ b/tidb-lightning/tidb-lightning-configuration.md
@@ -6,7 +6,7 @@ aliases: ['/docs/dev/tidb-lightning/tidb-lightning-configuration/','/docs/dev/re
# TiDB Lightning Configuration
-This document provides samples for global configuration, task configuration, and TiKV Importer configuration in TiDB Lightning, and describes the usage of command-line parameters.
+This document provides samples for global configuration and task configuration, and describes the usage of command-line parameters.
## Configuration files
@@ -116,8 +116,8 @@ driver = "file"
# keep-after-success = false
[tikv-importer]
-# "local": The default mode. It applies to large dataset import, for example, greater than 1 TiB. However, during the import, downstream TiDB is not available to provide services.
-# "tidb": You can use this mode for small dataset import, for example, smaller than 1 TiB. During the import, downstream TiDB is available to provide services.
+# "local": Physical import mode, used by default. It applies to large dataset import, for example, greater than 1 TiB. However, during the import, downstream TiDB is not available to provide services.
+# "tidb": Logical import mode. You can use this mode for small dataset import, for example, smaller than 1 TiB. During the import, downstream TiDB is available to provide services.
# backend = "local"
# Whether to allow importing data to tables with data. The default value is `false`.
# When you use parallel import mode, you must set it to `true`, because multiple TiDB Lightning instances are importing the same table at the same time.
@@ -125,12 +125,13 @@ driver = "file"
# The listening address of tikv-importer when backend is "importer". Change it to the actual address.
addr = "172.16.31.10:8287"
-# Action to do when trying to insert a duplicated entry in the "tidb" backend.
+# Action to do when trying to insert a duplicated entry in logical import mode.
# - replace: use new entry to replace the existing entry
# - ignore: keep the existing entry, and ignore the new entry
# - error: report error and quit the program
# on-duplicate = "replace"
-# Whether to detect and resolve duplicate records (unique key conflict) when the backend is 'local'.
+
+# Whether to detect and resolve duplicate records (unique key conflict) in physical import mode.
# The following resolution algorithms are supported:
# - record: only records duplicate records to the `lightning_task_info.conflict_error_v1` table on the target TiDB. Note that the
# required version of the target TiKV is no earlier than v5.2.0; otherwise it falls back to 'none'.
@@ -139,16 +140,19 @@ addr = "172.16.31.10:8287"
# - remove: records all duplicate records to the lightning_task_info database, like the 'record' algorithm. But it removes all duplicate records from the target table to ensure a consistent
# state in the target TiDB.
# duplicate-resolution = 'none'
-# The number of KV pairs sent in one request in the "local" backend.
+# The number of KV pairs sent in one request in physical import mode.
# send-kv-pairs = 32768
-# The directory of local KV sorting in the "local" backend. If the disk
+# The directory of local KV sorting in physical import mode. If the disk
# performance is low (such as in HDD), it is recommended to set the directory
# on a different disk from `data-source-dir` to improve import speed.
# sorted-kv-dir = ""
-# The concurrency that TiKV writes KV data in the "local" backend.
+# The concurrency that TiKV writes KV data in physical import mode.
# When the network transmission speed between TiDB Lightning and TiKV
# exceeds 10 Gigabit, you can increase this value accordingly.
# range-concurrency = 16
+# Limits the bandwidth in which TiDB Lightning writes data into each TiKV
+# node in physical import mode. 0 by default, which means no limit.
+# store-write-bwlimit = "128MiB"
[mydumper]
# Block size for file reading. Keep it longer than the longest string of the data source.
@@ -289,10 +293,12 @@ max-allowed-packet = 67_108_864
# Private key of this service. Default to copy of `security.key-path`
# key-path = "/path/to/lightning.key"
-# When data importing is complete, tidb-lightning can automatically perform
-# the Checksum, Compact and Analyze operations. It is recommended to leave
-# these as true in the production environment.
-# The execution order: Checksum -> Analyze
+# In physical import mode, when data importing is complete, tidb-lightning can
+# automatically perform the Checksum and Analyze operations. It is recommended
+# to leave these as true in the production environment.
+# The execution order: Checksum -> Analyze.
+# In logical import mode, Checksum and Analyze is not needed, and they are always
+# skipped in the actual operation.
[post-restore]
# Specifies whether to perform `ADMIN CHECKSUM TABLE ` for each table to verify data integrity after importing.
# The following options are available:
@@ -327,85 +333,6 @@ switch-mode = "5m"
log-progress = "5m"
```
-### TiKV Importer
-
-```toml
-# TiKV Importer configuration file template.
-
-# Log file.
-log-file = "tikv-importer.log"
-# Log level: trace, debug, info, warn, error, off.
-log-level = "info"
-
-# Listening address of the status server. Prometheus can scrape metrics from this address.
-status-server-address = "0.0.0.0:8286"
-
-[server]
-# The listening address of tikv-importer. tidb-lightning needs to connect to
-# this address to write data.
-addr = "0.0.0.0:8287"
-# Size of the thread pool for the gRPC server.
-grpc-concurrency = 16
-
-[metric]
-# These settings are relevant when using Prometheus Pushgateway. Normally you should let Prometheus
-# to scrape metrics from the status-server-address.
-# The Prometheus client push job name.
-job = "tikv-importer"
-# The Prometheus client push interval.
-interval = "15s"
-# The Prometheus Pushgateway address.
-address = ""
-
-[rocksdb]
-# The maximum number of concurrent background jobs.
-max-background-jobs = 32
-
-[rocksdb.defaultcf]
-# Amount of data to build up in memory before flushing data to the disk.
-write-buffer-size = "1GB"
-# The maximum number of write buffers that are built up in memory.
-max-write-buffer-number = 8
-
-# The compression algorithms used in different levels.
-# The algorithm at level-0 is used to compress KV data.
-# The algorithm at level-6 is used to compress SST files.
-# The algorithms at level-1 to level-5 are unused for now.
-compression-per-level = ["lz4", "no", "no", "no", "no", "no", "lz4"]
-
-[rocksdb.writecf]
-# (same as above)
-compression-per-level = ["lz4", "no", "no", "no", "no", "no", "lz4"]
-
-[security]
-# The path for TLS certificates. Empty string means disabling secure connections.
-# ca-path = ""
-# cert-path = ""
-# key-path = ""
-
-[import]
-# The directory to store engine files.
-import-dir = "/mnt/ssd/data.import/"
-# Number of threads to handle RPC requests.
-num-threads = 16
-# Number of concurrent import jobs.
-num-import-jobs = 24
-# Maximum duration to prepare Regions.
-#max-prepare-duration = "5m"
-# Split Regions into this size according to the importing data.
-#region-split-size = "512MB"
-# Stream channel window size. The stream will be blocked on channel full.
-#stream-channel-window = 128
-# Maximum number of open engines.
-max-open-engines = 8
-# Maximum upload speed (bytes per second) from Importer to TiKV.
-# upload-speed-limit = "512MB"
-# Minimum ratio of available space on the target store: `store_available_space`/`store_capacity`.
-# Importer pauses uploading SST if the availability ratio of the target store is less than this
-# value, to allow enough time for PD to balance Regions.
-min-available-ratio = 0.05
-```
-
## Command line parameters
### Usage of `tidb-lightning`
@@ -417,7 +344,7 @@ min-available-ratio = 0.05
| -d *directory* | Directory or [external storage URL](/br/backup-and-restore-storages.md) of the data dump to read from | `mydumper.data-source-dir` |
| -L *level* | Log level: debug, info, warn, error, fatal (default = info) | `lightning.log-level` |
| -f *rule* | [Table filter rules](/table-filter.md) (can be specified multiple times) | `mydumper.filter` |
-| --backend *backend* | [Delivery backend](/tidb-lightning/tidb-lightning-backends.md) (`local`, `importer`, or `tidb`) | `tikv-importer.backend` |
+| --backend *[backend](/tidb-lightning/tidb-lightning-overview.md)* | Select an import mode. `local` refers to physical import mode; `tidb` refers to logical import mode. | `local` |
| --log-file *file* | Log file path. By default, it is `/tmp/lightning.log.{timestamp}`. If set to '-', it means that the log files will be output to stdout. | `lightning.log-file` |
| --status-addr *ip:port* | Listening address of the TiDB Lightning server | `lightning.status-port` |
| --importer *host:port* | Address of TiKV Importer | `tikv-importer.addr` |
@@ -457,15 +384,3 @@ This tool can execute various actions given one of the following parameters:
The *tablename* must either be a qualified table name in the form `` `db`.`tbl` `` (including the backquotes), or the keyword "all".
Additionally, all parameters of `tidb-lightning` described in the section above are valid in `tidb-lightning-ctl`.
-
-## Usage of `tikv-importer`
-
-| Parameter | Explanation | Corresponding setting |
-|:----|:----|:----|
-| -C, --config *file* | Reads configuration from *file*. If not specified, the default configuration would be used. | |
-| -V, --version | Prints program version | |
-| -A, --addr *ip:port* | Listening address of the TiKV Importer server | `server.addr` |
-| --status-server *ip:port* | Listening address of the status server | `status-server-address` |
-| --import-dir *dir* | Stores engine files in this directory | `import.import-dir` |
-| --log-level *level* | Log level: trace, debug, info, warn, error, off | `log-level` |
-| --log-file *file* | Log file path | `log-file` |
diff --git a/tidb-lightning/tidb-lightning-faq.md b/tidb-lightning/tidb-lightning-faq.md
index c557a4dc2488b..99412ff71f490 100644
--- a/tidb-lightning/tidb-lightning-faq.md
+++ b/tidb-lightning/tidb-lightning-faq.md
@@ -137,13 +137,7 @@ tidb-lightning-ctl --config tidb-lightning.toml --fetch-mode
The TiDB Lightning toolset is best used with a 10-Gigabit network card. 1-Gigabit network cards are *not recommended*, especially for `tikv-importer`.
-1-Gigabit network cards can only provide a total bandwidth of 120 MB/s, which has to be shared among all target TiKV stores. TiDB Lightning can easily saturate all bandwidth of the 1-Gigabit network and bring down the cluster because PD is unable to be contacted anymore. To avoid this, set an *upload speed limit* in [Importer's configuration](/tidb-lightning/tidb-lightning-configuration.md#tikv-importer):
-
-```toml
-[import]
-# Restricts the total upload speed to TiKV to 100 MB/s or less
-upload-speed-limit = "100MB"
-```
+1-Gigabit network cards can only provide a total bandwidth of 120 MB/s, which has to be shared among all target TiKV stores. TiDB Lightning can easily saturate all bandwidth of the 1-Gigabit network and bring down the cluster because PD is unable to be contacted anymore.
## Why TiDB Lightning requires so much free space in the target TiKV cluster?
diff --git a/tidb-lightning/tidb-lightning-overview.md b/tidb-lightning/tidb-lightning-overview.md
index 0237a71b3e9d4..36701cfe4ccd4 100644
--- a/tidb-lightning/tidb-lightning-overview.md
+++ b/tidb-lightning/tidb-lightning-overview.md
@@ -6,54 +6,40 @@ aliases: ['/docs/dev/tidb-lightning/tidb-lightning-overview/','/docs/dev/referen
# TiDB Lightning Overview
-[TiDB Lightning](https://github.com/pingcap/tidb-lightning) is a tool used for fast full import of large amounts of data into a TiDB cluster. You can download TiDB Lightning from [here](/download-ecosystem-tools.md).
+[TiDB Lightning](https://github.com/pingcap/tidb-lightning) is a tool used for importing data at TB scale to TiDB clusters. It is often used for initial data import to TiDB clusters.
-Currently, TiDB Lightning can mainly be used in the following two scenarios:
+TiDB Lightning supports the following file formats:
-- Importing **large amounts** of **new** data **quickly**
-- Restore all backup data
+- Files exported by [Dumpling](/dumpling-overview.md)
+- CSV files
+- [Apache Parquet files generated by Amazon Aurora](/migrate-aurora-to-tidb.md)
-Currently, TiDB Lightning supports:
+TiDB Lightning can read data from the following sources:
-- Importing files exported by [Dumpling](/dumpling-overview.md), CSV files, and [Apache Parquet files generated by Amazon Aurora](/migrate-aurora-to-tidb.md).
-- Reading data from a local disk or from the Amazon S3 storage. For details, see [External Storages](/br/backup-and-restore-storages.md).
+- Local
+- [Amazon S3](/br/backup-and-restore-storages.md#s3-url-parameters)
+- [Google Cloud Storage](/br/backup-and-restore-storages.md#gcs-url-parameters)
## TiDB Lightning architecture

-The complete import process is as follows:
+TiDB Lightning supports two import modes, configured by `backend`. The import mode determines the way data is imported into TiDB.
-1. Before importing, `tidb-lightning` switches the TiKV cluster to "import mode", which optimizes the cluster for writing and disables automatic compaction.
+- [Physical Import Mode](/tidb-lightning/tidb-lightning-physical-import-mode.md): TiDB Lightning first encodes data into key-value pairs and stores them in a local temporary directory, then uploads these key-value pairs to each TiKV node, and finally calls the TiKV Ingest interface to insert data into TiKV's RocksDB. If you need to perform initial import, consider physical import mode, which has higher import speed.
-2. `tidb-lightning` creates the skeleton of all tables from the data source.
+- [Logical Import Mode](/tidb-lightning/tidb-lightning-backends.md#tidb-backend): TiDB Lightning first encodes the data into SQL statements and then runs these SQL statements directly for data import. If the cluster to be imported is in production, or if the target table to be imported already contains data, use logical import mode.
-3. Each table is split into multiple continuous *batches*, so that data from a huge table (200 GB+) can be imported incrementally and concurrently.
+| Import mode | Physical Import Mode | Logical Import Mode |
+|:---|:---|:---|
+| Speed | Fast (100~500 GiB/hour) | Low (10~50 GiB/hour)|
+| Resource consumption| High | Low |
+| Network bandwidth consumption | High | Low |
+| ACID compliance during import | No | Yes |
+| Target tables | Must be empty | Can contain data |
+| TiDB cluster version | >= 4.0.0 | All |
+| Whether the TiDB cluster can provide service during import | [Limited service](/tidb-lightning/tidb-lightning-physical-import-mode.md#limitations) | Yes |
-4. For each batch, `tidb-lightning` creates an *engine file* to store KV pairs. `tidb-lightning` then reads the data source in parallel, transforms each row into KV pairs according to the TiDB rules, and writes these KV pairs into the local files for temporary storage.
-
-5. Once a complete engine file is written, `tidb-lightning` divides and schedules these data and imports them into the target TiKV cluster.
-
- There are two kinds of engine files: *data engines* and *index engines*, each corresponding to two kinds of KV pairs: the row data and secondary indices. Normally, the row data are entirely sorted in the data source, while the secondary indices are out of order. Because of this, the data engines are uploaded as soon as a batch is completed, while the index engines are imported only after all batches of the entire table are encoded.
-
-6. After all engines associated to a table are imported, `tidb-lightning` performs a checksum comparison between the local data source and those calculated from the cluster, to ensure there is no data corruption in the process; tells TiDB to `ANALYZE` all imported tables, to prepare for optimal query planning; and adjusts the `AUTO_INCREMENT` value so future insertions will not cause conflict.
-
- The auto-increment ID of a table is computed by the estimated *upper bound* of the number of rows, which is proportional to the total file size of the data files of the table. Therefore, the final auto-increment ID is often much larger than the actual number of rows. This is expected since in TiDB auto-increment is [not necessarily allocated sequentially](/mysql-compatibility.md#auto-increment-id).
-
-7. Finally, `tidb-lightning` switches the TiKV cluster back to "normal mode", so the cluster resumes normal services.
-
-If the target cluster of data import is v3.x or earlier versions, you need to use the Importer-backend to import data. In this mode, `tidb-lightning` sends the parsed KV pairs to `tikv-importer` via gRPC and `tikv-importer` imports the data.
-
-TiDB Lightning also supports using TiDB-backend for data import. In this mode, `tidb-lightning` transforms data into `INSERT` SQL statements and directly executes them on the target cluster. See [TiDB Lightning Backends](/tidb-lightning/tidb-lightning-backends.md) for details.
-
-## Restrictions
-
-- If you use TiDB Lightning together with TiFlash:
-
- No matter a table has TiFlash replica(s) or not, you can import data to that table using TiDB Lightning. Note that this might slow the TiDB Lightning procedure, which depends on the NIC bandwidth on the lightning host, the CPU and disk load of the TiFlash node, and the number of TiFlash replicas.
-
-- If you use TiDB Lightning together with TiDB:
-
- TiDB Lightning does not support importing `charset=GBK` tables to TiDB clusters earlier than v5.4.0.
-
-- For Apache Parquet files, TiDB Lightning currently only accepts Amazon Aurora Parquet files.
+
+The preceding performance data is used to compare the import performance difference between the two modes. The actual import speed is affected by various factors such as hardware configuration, table schema, and the number of indexes.
+
diff --git a/tidb-lightning/tidb-lightning-physical-import-mode-usage.md b/tidb-lightning/tidb-lightning-physical-import-mode-usage.md
index d8319e0ee25d8..61704f5b0f1cd 100644
--- a/tidb-lightning/tidb-lightning-physical-import-mode-usage.md
+++ b/tidb-lightning/tidb-lightning-physical-import-mode-usage.md
@@ -5,7 +5,7 @@ summary: Learn how to use the physical import mode in TiDB Lightning.
# Use Physical Import Mode
-This document introduces how to use the physical import mode in TiDB Lightning, including writing the configuration file and tuning performance.
+This document introduces how to use the physical import mode in TiDB Lightning, including writing the configuration file, tuning performance, and configuring disk quota.
## Configure and use the physical import mode
@@ -37,6 +37,10 @@ duplicate-resolution = 'remove'
# The directory of local KV sorting.
sorted-kv-dir = "./some-dir"
+# Limits the bandwidth in which TiDB Lightning writes data into each TiKV
+# node in physical import mode. 0 by default, which means no limit.
+# store-write-bwlimit = "128MiB"
+
[tidb]
# The information of the target cluster. The address of any tidb-server from the cluster.
host = "172.16.31.1"
@@ -126,6 +130,53 @@ mysql> select table_name,index_name,key_data,row_data from conflict_error_v1 lim
You can manually identify the records that need to be retained and insert these records into the table.
+## Import data into a cluster in production
+
+Starting from TiDB Lightning v6.2.0, you can import data into a cluster in production using physical import mode. TiDB Lightning implements a new mechanism to limit the impact of the import on the online application.
+
+With the new mechanism, TiDB Lightning does not pause the global scheduling, but only pauses scheduling for the region that stores the target table data. This significantly reduces the impact of the import on the online application.
+
+
+TiDB Lightning does not support importing data into a table that already contains data.
+
+The TiDB cluster must be v6.1.0 or later versions. For earlier versions, TiDB Lightning keeps the old behavior, which pauses scheduling globally and severely impacts the online application during the import.
+
+
+By default, TiDB Lightning pauses the cluster scheduling for the minimum range possible. However, under the default configuration, the cluster performance still might be affected by fast import. To avoid this, you can configure the following options to control the import speed and other factors that might impact the cluster performance:
+
+```toml
+[tikv-importer]
+# Limits the bandwidth in which TiDB Lightning writes data into each TiKV node in physical import mode.
+store-write-bwlimit = "128MiB"
+
+[tidb]
+# Use smaller concurrency to reduce the impact of Checksum and Analyze on the transaction latency.
+distsql-scan-concurrency = 3
+
+[cron]
+# Prevent TiKV from switching to import mode.
+switch-mode = '0'
+```
+
+You can measure the impact of data import on TPCC results by simulating the online application using TPCC and importing data into a TiDB cluster using TiDB Lightning. The test result is as follows:
+
+| Concurrency | TPM | P99 | P90 | AVG |
+| ----- | --- | --- | --- | --- |
+| 1 | 20%~30% | 60%~80% | 30%~50% | 30%~40% |
+| 8 | 15%~25% | 70%~80% | 35%~45% | 20%~35% |
+| 16 | 20%~25% | 55%~85% | 35%~40% | 20%~30% |
+| 64 | No significant impact |
+| 256 | No significant impact |
+
+The percentage in the preceding table indicates the impact of data import on TPCC results.
+
+* For the TPM column, the number indicates the percentage of TPM decrease.
+* For the P99, P90, and AVG columns, the number indicates the percentage of latency increase.
+
+The test results show that the smaller the concurrency, the larger the impact of data import on TPCC results. When the concurrency is 64 or more, the impact of data import on TPCC results is negligible.
+
+Therefore, if your TiDB cluster has a latency-sensitive application and a low concurrency, it is strongly recommended **not** to use TiDB Lightning to import data into the cluster. This will cause a significant impact on the online application.
+
## Performance tuning
**The most direct and effective ways to improve import performance of the physical import mode are as follows:**
@@ -170,3 +221,30 @@ If the table is large, Lightning will split the table into multiple batches of 1
After the file data is read, Lightning needs to do some post-processing, such as encoding and sorting the data locally. The concurrency of these operations is controlled by `region-concurrency`. The default value is the number of CPU cores. You can leave this configuration in the default value. It is recommended to deploy Lightning on a separate server from other components. If you must deploy Lightning together with other components, you need to lower the value of `region-concurrency` according to the load.
The [`num-threads`](/tikv-configuration-file.md#num-threads) configuration of TiKV can also affect the performance. For new clusters, it is recommended to set `num-threads` to the number of CPU cores.
+
+## Configure disk quota New in v6.2.0
+
+> **Warning:**
+>
+> Disk quota is still an experimental feature. It is **NOT** recommended that you use it in the production environment.
+
+When you import data in physical import mode, TiDB Lightning creates a large number of temporary files on the local disk to encode, sort, and split the original data. When the local disk space is insufficient, TiDB Lightning reports an error and exits because of write failure.
+
+To avoid this situation, you can configure disk quota for TiDB Lightning. When the size of the temporary files exceeds the disk quota, TiDB Lightning pauses the process of reading the source data and writing temporary files. TiDB Lightning prioritizes writing the sorted key-value pairs to TiKV. After deleting the local temporary files, TiDB Lightning continues the import process.
+
+To enable disk quota, add the following configuration to your configuration file:
+
+```toml
+[tikv-importer]
+# MaxInt64 by default, which is 9223372036854775807 bytes.
+disk-quota = "10GB"
+backend = "local"
+
+[cron]
+# The interval of checking disk quota. 60 seconds by default.
+check-disk-quota = "30s"
+```
+
+`disk-quota` limits the storage space used by TiDB Lightning. The default value is MaxInt64, which is 9223372036854775807 bytes. This value is much larger than the disk space you might need for the import, so leaving it as the default value is equivalent to not setting the disk quota.
+
+`check-disk-quota` is the interval of checking disk quota. The default value is 60 seconds. When TiDB Lightning checks the disk quota, it acquires an exclusive lock for the relevant data, which blocks all the import threads. Therefore, if TiDB Lightning checks the disk quota before every write, it significantly slows down the write efficiency (as slow as a single-thread write). To achieve efficient write, disk quota is not checked before every write; instead, TiDB Lightning pauses all the import threads and checks the disk quota every `check-disk-quota` interval. That is, if the value of `check-disk-quota` is set to a large value, the disk space used by TiDB Lightning might exceed the disk quota you set, which leaves the disk quota ineffective. Therefore, it is recommended to set the value of `check-disk-quota` to a small value. The specific value of this item is determined by the environment in which TiDB Lightning is running. In different environments, TiDB Lightning writes temporary files at different speeds. Theoretically, the faster the speed, the smaller the value of `check-disk-quota` should be.
diff --git a/tidb-lightning/tidb-lightning-physical-import-mode.md b/tidb-lightning/tidb-lightning-physical-import-mode.md
index f62120b7043a4..bfd59efbf8a14 100644
--- a/tidb-lightning/tidb-lightning-physical-import-mode.md
+++ b/tidb-lightning/tidb-lightning-physical-import-mode.md
@@ -11,23 +11,26 @@ Before you use the physical import mode, make sure to read [Requirements and res
## Implementation
-1. Before importing data, TiDB Lightning automatically switches the TiKV nodes to "import mode", which improves write performance and stops PD scheduling and auto-compaction.
+1. Before importing data, TiDB Lightning automatically switches the TiKV nodes to "import mode", which improves write performance and stops auto-compaction. TiDB Lightning determines whether to pause global scheduling according to the TiDB cluster version.
-2. `tidb-lightning` creates table schemas in the target database and fetches the metadata.
+ - When the TiDB cluster >= v6.1.0 and TiDB Lightning >= v6.2.0, TiDB Lightning pauses scheduling for the region that stores the target table data. After the import is completed, TiDB Lightning recovers scheduling.
+ - When the TiDB cluster < v6.1.0 or TiDB Lightning < v6.2.0, TiDB Lightning pauses global scheduling.
-3. Each table is divided into multiple contiguous **blocks**, so that Lightning can import data data from large tables (200 GB+) in parallel.
+2. TiDB Lightning creates table schemas in the target database and fetches the metadata.
-4. `tidb-lightning` prepares an "engine file" for each block to handle key-value pairs. `tidb-lightning` reads the SQL dump in parallel, converts the data source to key-value pairs in the same encoding as TiDB, sorts the key-value pairs and writes them to a local temporary storage file.
+3. Each table is divided into multiple contiguous **blocks**, so that TiDB Lightning can import data from large tables (greater than 200 GB) in parallel.
-5. When an engine file is written, `tidb-lightning` starts to split and schedule data on the target TiKV cluster, and then imports data to TiKV cluster.
+4. TiDB Lightning prepares an "engine file" for each block to handle key-value pairs. TiDB Lightning reads the SQL dump in parallel, converts the data source to key-value pairs in the same encoding as TiDB, sorts the key-value pairs and writes them to a local temporary storage file.
+
+5. When an engine file is written, TiDB Lightning starts to split and schedule data on the target TiKV cluster, and then imports data to TiKV cluster.
The engine file contains two types of engines: **data engine** and **index engine**. Each engine corresponds to a type of key-value pairs: row data and secondary index. Normally, row data is completely ordered in the data source, and the secondary index is unordered. Therefore, the data engine files are imported immediately after the corresponding block is written, and all index engine files are imported only after the entire table is encoded.
-6. After all engine files are imported, `tidb-lightning` compares the checksum between the local data source and the downstream cluster, and ensures that the imported data is not corrupted. Then `tidb-lightning` analyzes the new data (`ANALYZE`) to optimize the future operations. Meanwhile, `tidb-lightning` adjusts the `AUTO_INCREMENT` value to prevent conflicts in the future.
+6. After all engine files are imported, TiDB Lightning compares the checksum between the local data source and the downstream cluster, and ensures that the imported data is not corrupted. Then TiDB Lightning analyzes the new data (`ANALYZE`) to optimize the future operations. Meanwhile, `tidb-lightning` adjusts the `AUTO_INCREMENT` value to prevent conflicts in the future.
The auto-increment ID is estimated by the **upper bound** of the number of rows, and is proportional to the total size of the table data file. Therefore, the auto-increment ID is usually larger than the actual number of rows. This is normal because the auto-increment ID [is not necessarily contiguous](/mysql-compatibility.md#auto-increment-id).
-7. After all steps are completed, `tidb-lightning` automatically switches the TiKV nodes to "normal mode", and the TiDB cluster can provide services normally.
+7. After all steps are completed, TiDB Lightning automatically switches the TiKV nodes to "normal mode". If global scheduling is paused, TiDB Lightning also recovers global scheduling. After that, the TiDB cluster can provide services normally.
## Requirements and restrictions
@@ -57,7 +60,7 @@ It is recommended that you allocate CPU more than 32 cores and memory greater th
### Limitations
-- Do not use physical import mode to import data to TiDB clusters in production. It has severe performance implications.
+- Do not use physical import mode to directly import data to TiDB clusters in production. It has severe performance implications. If you need to do so, refer to [Import data into a cluster in production](/tidb-lightning/tidb-lightning-physical-import-mode-usage.md#import-data-into-a-cluster-in-production).
- Do not use multiple TiDB Lightning instances to import data to the same TiDB cluster by default. Use [Parallel Import](/tidb-lightning/tidb-lightning-distributed-import.md) instead.
- When you use multiple TiDB Lightning to import data to the same target, do not mix the backends. That is, do not use physical import mode and logical import mode at the same time.
- A single Lightning process can import a single table of 10 TB at most. Parallel import can use 10 Lightning instances at most.
diff --git a/tidb-limitations.md b/tidb-limitations.md
index 994e27ac08103..083671216f456 100644
--- a/tidb-limitations.md
+++ b/tidb-limitations.md
@@ -75,3 +75,7 @@ This document describes the common usage limitations of TiDB, including the maxi
| Type | Upper limit |
|:----------|:----------|
| The maximum number of SQL statements in a single transaction | When the optimistic transaction is used and the transaction retry is enabled, the default upper limit is 5000, which can be modified using [`stmt-count-limit`](/tidb-configuration-file.md#stmt-count-limit). |
+
+## Limitations on TiKV version
+
+In your cluster, if the version of the TiDB component is v6.2.0 or later, the version of TiKV must be v6.2.0 or later.
diff --git a/tiflash/tiflash-configuration.md b/tiflash/tiflash-configuration.md
index 5a21bc5a39169..a8be4fe6e2ae8 100644
--- a/tiflash/tiflash-configuration.md
+++ b/tiflash/tiflash-configuration.md
@@ -76,8 +76,9 @@ delta_index_cache_size = 0
## DTFile format
## * format_version = 1, the old format, deprecated.
## * format_version = 2, the default format for versions < v6.0.0.
- ## * format_version = 3, the default format for versions >= v6.0.0, which provides more data validation features.
- # format_version = 3
+ ## * format_version = 3, the default format for v6.0.0 and v6.1.x, which provides more data validation features.
+ ## * format_version = 4, the default format for v6.2.0 and later versions, which reduces write amplification and background task resource consumption
+ # format_version = 4
[storage.main]
## The list of directories to store the main data. More than 90% of the total data is stored in
@@ -201,6 +202,11 @@ delta_index_cache_size = 0
# Compression level of the TiFlash storage engine. The default value is 1. It is recommended that you set this value to 1 if dt_compression_method is LZ4, -1 (smaller compression rate, but better read performance) or 1 if dt_compression_method is zstd, and 9 if dt_compression_method is LZ4HC.
dt_compression_level = 1
+ ## New in v6.2.0. Use the thread pool to handle read requests from the storage engine. The default value is false.
+ ## Warning: This is still an experimental feature. It is NOT recommended that you use it in the production environment.
+
+ # dt_enable_read_thread = false
+
## Security settings take effect starting from v4.0.5.
[security]
## New in v5.0. This configuration item enables or disables log redaction. If the configuration value
diff --git a/tikv-configuration-file.md b/tikv-configuration-file.md
index f6c1b1415b17d..d9841619f730a 100644
--- a/tikv-configuration-file.md
+++ b/tikv-configuration-file.md
@@ -400,6 +400,7 @@ Configuration items related to storage.
+ When API V2 is used, you are expected to set `storage.enable-ttl = true` at the same time. Because API V2 supports the TTL feature, you must turn on `enable-ttl` explicitly. Otherwise, it will be in conflict because `storage.enable-ttl` defaults to `false`.
+ When API V2 is enabled, you need to deploy at least one tidb-server instance to reclaim expired data. Note that this tidb-server instance cannot provide read or write services. To ensure high availability, you can deploy multiple tidb-server instances.
+ Client support is required for API V2. For details, see the corresponding instruction of the client for the API V2.
+ + Since v6.2.0, Change Data Capture (CDC) for RawKV is supported using the component [TiKV-CDC](https://github.com/tikv/migration/tree/main/cdc).
+ Default value: `1`
> **Warning:**
@@ -1389,11 +1390,6 @@ Configuration items related to `rocksdb.defaultcf.titan`.
+ Determines whether to optimize the read performance. When `level-merge` is enabled, there is more write amplification.
+ Default value: `false`
-### `gc-merge-rewrite`
-
-+ Determines whether to use the merge operator to write back blob indexes for Titan GC. When `gc-merge-rewrite` is enabled, it reduces the effect of Titan GC on the writes in the foreground.
-+ Default value: `false`
-
## raftdb
Configuration items related to `raftdb`