Skip to content

Commit

Permalink
pd-control: add scheduler status (pingcap#10371)
Browse files Browse the repository at this point in the history
  • Loading branch information
CabinfeverB authored Sep 16, 2022
1 parent 7ebc7d5 commit 86b311d
Show file tree
Hide file tree
Showing 3 changed files with 24 additions and 4 deletions.
2 changes: 1 addition & 1 deletion best-practices/pd-scheduling-best-practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -297,4 +297,4 @@ If a TiKV node fails, PD defaults to setting the corresponding node to the **dow

Practically, if a node failure is considered unrecoverable, you can immediately take it offline. This makes PD replenish replicas soon in another node and reduces the risk of data loss. In contrast, if a node is considered recoverable, but the recovery cannot be done in 30 minutes, you can temporarily adjust `max-store-down-time` to a larger value to avoid unnecessary replenishment of the replicas and resources waste after the timeout.

In TiDB v5.2.0, TiKV introduces the mechanism of slow TiKV node detection. By sampling the requests in TiKV, this mechanism works out a score ranging from 1 to 100. A TiKV node with a score higher than or equal to 80 is marked as slow. You can add [`evict-slow-store-scheduler`](/pd-control.md#scheduler-show--add--remove--pause--resume--config) to detect and schedule slow nodes. If only one TiKV is detected as slow, and the slow score reaches the upper limit (100 by default), the leader in this node will be evicted (similar to the effect of `evict-leader-scheduler`).
In TiDB v5.2.0, TiKV introduces the mechanism of slow TiKV node detection. By sampling the requests in TiKV, this mechanism works out a score ranging from 1 to 100. A TiKV node with a score higher than or equal to 80 is marked as slow. You can add [`evict-slow-store-scheduler`](/pd-control.md#scheduler-show--add--remove--pause--resume--config--describe) to detect and schedule slow nodes. If only one TiKV is detected as slow, and the slow score reaches the upper limit (100 by default), the leader in this node will be evicted (similar to the effect of `evict-leader-scheduler`).
9 changes: 7 additions & 2 deletions pd-configuration-file.md
Original file line number Diff line number Diff line change
Expand Up @@ -231,6 +231,11 @@ Configuration items related to scheduling
+ The number of Region scheduling tasks performed at the same time
+ Default value: `2048`
### `enable-diagnostic` <span class="version-mark">New in v6.3.0</span>
+ Controls whether to enable the diagnostic feature. When it is enabled, PD records the state during scheduling to help diagnose. If enabled, it might slightly affect the scheduling speed and consume more memory when there are many stores.
+ Default value: false
### `hot-region-schedule-limit`
+ Controls the hot Region scheduling tasks that are running at the same time. It is independent of the Region scheduling.
Expand Down Expand Up @@ -276,7 +281,7 @@ Configuration items related to scheduling
+ Determines whether to enable the merging of cross-table Regions
+ Default value: `true`
### `region-score-formula-version` <span class="version-mark">New in v5.0</span>
### `region-score-formula-version` <span class="version-mark">New in v5.0</span>
+ Controls the version of the Region score formula
+ Default value: `v2`
Expand All @@ -297,7 +302,7 @@ Configuration items related to scheduling
+ Default value: `10m`
> **Note:**
>
>
> The information about hot Regions is updated every three minutes. If the interval is set to less than three minutes, updates during the interval might be meaningless.
### `hot-regions-reserved-days` <span class="version-mark">New in v5.4.0</span>
Expand Down
17 changes: 16 additions & 1 deletion pd-control.md
Original file line number Diff line number Diff line change
Expand Up @@ -750,7 +750,7 @@ Usage:
}
```
### `scheduler [show | add | remove | pause | resume | config]`
### `scheduler [show | add | remove | pause | resume | config | describe]`
Use this command to view and control the scheduling policy.
Expand All @@ -770,8 +770,23 @@ Usage:
>> scheduler resume balance-region-scheduler // Continue to run the balance-region scheduler
>> scheduler resume all // Continue to run all schedulers
>> scheduler config balance-hot-region-scheduler // Display the configuration of the balance-hot-region scheduler
>> scheduler describe balance-region-scheduler // Display the running state and related diagnostic information of the balance-region scheduler
```
### `scheduler describe balance-region-scheduler`
Use this command to view the running state and related diagnostic information of the `balance-region-scheduler`.
Since TiDB v6.3.0, PD provides the running state and brief diagnostic information for `balance-region-scheduler` and `balance-leader-scheduler`. Other schedulers and checkers are not supported yet. To enable this feature, you can modify the [`enable-diagnostic`](/pd-configuration-file.md#enable-diagnostic-new-in-v630) configuration item using `pd-ctl`.
The state of the scheduler can be one of the following:
- `disabled`: the scheduler is unavailable or removed.
- `paused`: the scheduler is paused.
- `scheduling`: the scheduler is generating scheduling operators.
- `pending`: the scheduler cannot generate scheduling operators. For a scheduler in the `pending` state, brief diagnostic information is returned. The brief information describes the state of stores and explains why these stores cannot be selected for scheduling.
- `normal`: there is no need to generate scheduling operators.
### `scheduler config balance-leader-scheduler`
Use this command to view and control the `balance-leader-scheduler` policy.
Expand Down

0 comments on commit 86b311d

Please sign in to comment.