Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add tiup no-sudo doc #19753

Merged
merged 21 commits into from
Feb 19, 2025
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions TOC.md
Original file line number Diff line number Diff line change
Expand Up @@ -496,6 +496,7 @@
- TiUP 组件文档
- [tiup-playground 运行本地测试集群](/tiup/tiup-playground.md)
- [tiup-cluster 部署运维生产集群](/tiup/tiup-cluster.md)
- [tiup-cluster no-sudo 模式部署运维生产集群](/tiup/tiup-cluster-no-sudo-mode.md)
hfxsd marked this conversation as resolved.
Show resolved Hide resolved
- [tiup-mirror 定制离线镜像](/tiup/tiup-mirror.md)
- [tiup-bench 进行 TPCC/TPCH 压力测试](/tiup/tiup-bench.md)
- [TiDB Operator](/tidb-operator-overview.md)
Expand Down
198 changes: 198 additions & 0 deletions tiup/tiup-cluster-no-sudo-mode.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
---
title: 使用 TiUP no-sudo 模式部署运维 TiDB 线上集群
summary: 介绍如何使用 TiUP no-sudo 模式部署运维 TiDB 线上集群
Yujie-Xie marked this conversation as resolved.
Show resolved Hide resolved
---

# 使用 TiUP no-sudo 模式部署运维 TiDB 线上集群

本文重在介绍如何使用 TiUP no-sudo 模式部署一个 TiDB 线上集群
Yujie-Xie marked this conversation as resolved.
Show resolved Hide resolved

> **注意:**
>
> CentOS 版本限制: CentOS 8 及以后的版本
Yujie-Xie marked this conversation as resolved.
Show resolved Hide resolved

## 准备用户并配置 SSH 互信
1. 以 tidb 用户为例,需要依次登录到所有部署目标机器以 `root` 用户使用如下命令创建一个普通用户 `tidb`。在 no-sudo 模式下不需要为 `tidb` 用户配置 sudo 免密,即无需将 `tidb` 用户加入 `sudoers` 中。
Yujie-Xie marked this conversation as resolved.
Show resolved Hide resolved

```bash
adduser tidb
```

2. 在每台部署目标机器上为 `tidb` 用户启动 `systemd user` 模式(重要步骤)
Yujie-Xie marked this conversation as resolved.
Show resolved Hide resolved

1. 使用 `tidb` 用户设置 `XDG_RUNTIME_DIR` 环境变量
Yujie-Xie marked this conversation as resolved.
Show resolved Hide resolved

```bash
mkdir -p ~/.bashrc.d
echo "export XDG_RUNTIME_DIR=/run/user/$(id -u)" > ~/.bashrc.d/systemd
source ~/.bashrc.d/systemd
```

3. 使用 `root` 用户启动 user service
Yujie-Xie marked this conversation as resolved.
Show resolved Hide resolved

```shell
$ systemctl start [email protected] #1000 is the id of tidb user. You can get the user id by executing id
hfxsd marked this conversation as resolved.
Show resolved Hide resolved
$ systemctl status [email protected]
[email protected] - User Manager for UID 1000
Loaded: loaded (/usr/lib/systemd/system/[email protected]; static; vendor preset>
Active: active (running) since Mon 2024-01-29 03:30:51 EST; 1min 7s ago
Main PID: 3328 (systemd)
Status: "Startup finished in 420ms."
Tasks: 6
Memory: 6.1M
CGroup: /user.slice/user-1000.slice/[email protected]
├─dbus.service
│ └─3442 /usr/bin/dbus-daemon --session --address=systemd: --nofork >
├─init.scope
│ ├─3328 /usr/lib/systemd/systemd --user
│ └─3335 (sd-pam)
└─pulseaudio.service
└─3358 /usr/bin/pulseaudio --daemonize=no --log-target=journal
```

执行`systemctl --user`,如果没有报错,说明 systemd user 模式已经正常启动。
Yujie-Xie marked this conversation as resolved.
Show resolved Hide resolved

3. 在中控机使用 ssh-keygen 生成密钥,并将公钥复制到其他部署机器完成 SSH 互信。

## 准备部署拓扑文件

1. 使用以下 tiup 命令生成拓扑文件
Yujie-Xie marked this conversation as resolved.
Show resolved Hide resolved

```bash
tiup cluster template > topology.yaml
```

2. 编辑拓扑文件
Yujie-Xie marked this conversation as resolved.
Show resolved Hide resolved

相比以往的模式,使用 no-sudo 模式的 TiUP 需要在 `topology.yaml` 的 `global` 模块中加上 `systemd_mode: "user"` 这一行,该 `systemd_mode` 参数用于指明是否使用 `systemd user` 模式。如果不设置该参数,其默认值为 `system`,表明需要使用 sudo 权限。此外,no-sudo 模式无法使用 `/data` 目录作为 `deploy_dir` 和 `data_dir`,因为会有权限问题,需要选择一个普通用户可以访问的路径。下方示例使用了相对路径,最终使用的路径为 `/home/tidb/data/tidb-deploy` 和 `/home/tidb/data/tidb-data`。
拓扑文件的其余部分与旧版本一致。

```yaml
global:
user: "tidb"
systemd_mode: "user"
ssh_port: 22
deploy_dir: "data/tidb-deploy"
data_dir: "data/tidb-data"
arch: "amd64"
...
```
Yujie-Xie marked this conversation as resolved.
Show resolved Hide resolved

## 手动修复检查项
执行 `tiup cluster check topology.yaml --user tidb` 会有一些失败的检查项,示例:
Yujie-Xie marked this conversation as resolved.
Show resolved Hide resolved

```bash
Node Check Result Message
---- ----- ------ -------
192.168.124.27 thp Fail THP is enabled, please disable it for best performance
192.168.124.27 command Pass numactl: policy: default
192.168.124.27 os-version Pass OS is CentOS Stream 8
192.168.124.27 network Pass network speed of ens160 is 10000MB
192.168.124.27 disk Warn mount point / does not have 'noatime' option set
192.168.124.27 disk Fail multiple components tikv:/home/blackcat/data/tidb-deploy/tikv-20160/data/tidb-data,tikv:/home/blackcat/data/tidb-deploy/tikv-20161/data/tidb-data are using the same partition 192.168.124.27:/ as data dir
192.168.124.27 selinux Pass SELinux is disabled
192.168.124.27 cpu-cores Pass number of CPU cores / threads: 16
192.168.124.27 cpu-governor Warn Unable to determine current CPU frequency governor policy
192.168.124.27 swap Warn swap is enabled, please disable it for best performance
192.168.124.27 memory Pass memory size is 9681MB
192.168.124.27 service Fail service firewalld is running but should be stopped
```

由于在 no-sudo 模式下,`tidb` 用户没有 sudo 权限,执行 `tiup cluster check topology.yaml --apply --user tidb` 会因为权限不足而无法对失败的检查项进行自动修复,所以需要使用 `root` 用户在部署机器上手动执行以下操作。
hfxsd marked this conversation as resolved.
Show resolved Hide resolved

1. 安装 numactl 工具
Yujie-Xie marked this conversation as resolved.
Show resolved Hide resolved

```shell
sudo yum -y install numactl
```
Yujie-Xie marked this conversation as resolved.
Show resolved Hide resolved

2. 关闭 swap
Yujie-Xie marked this conversation as resolved.
Show resolved Hide resolved

```shell
swapoff -a || exit 0
```
Yujie-Xie marked this conversation as resolved.
Show resolved Hide resolved

3. 禁止透明大页
Yujie-Xie marked this conversation as resolved.
Show resolved Hide resolved

```shell
echo never > /sys/kernel/mm/transparent_hugepage/enabled
```
Yujie-Xie marked this conversation as resolved.
Show resolved Hide resolved

4. 开启 irqbalance service

```shell
systemctl start irqbalance
```
Yujie-Xie marked this conversation as resolved.
Show resolved Hide resolved

5. 关闭防火墙以及关闭防火墙自启动

```shell
systemctl stop firewalld.service
systemctl disable firewalld.service
```
Yujie-Xie marked this conversation as resolved.
Show resolved Hide resolved

6. 修改 sysctl 参数

```shell
echo "fs.file-max = 1000000">> /etc/sysctl.conf
echo "net.core.somaxconn = 32768">> /etc/sysctl.conf
echo "net.ipv4.tcp_tw_recycle = 0">> /etc/sysctl.conf
echo "net.ipv4.tcp_syncookies = 0">> /etc/sysctl.conf
echo "vm.overcommit_memory = 1">> /etc/sysctl.conf
echo "vm.swappiness = 0">> /etc/sysctl.conf
sysctl -p
```
Yujie-Xie marked this conversation as resolved.
Show resolved Hide resolved

7. 配置 tidb 用户的 limits.conf 文件

```shell
cat << EOF >>/etc/security/limits.conf
tidb soft nofile 1000000
tidb hard nofile 1000000
tidb soft stack 32768
tidb hard stack 32768
tidb soft core unlimited
tidb hard core unlimited
EOF
```
Yujie-Xie marked this conversation as resolved.
Show resolved Hide resolved

## 部署集群

为了使用上述步骤准备好的 `tidb` 用户而避免重新创建新的用户,执行 deploy 命令时需要加上 `--user tidb`,即:

```shell
tiup cluster deploy mycluster v8.1.0 topology.yaml --user tidb
```

启动集群
Yujie-Xie marked this conversation as resolved.
Show resolved Hide resolved

```shell
tiup cluster start mycluster
hfxsd marked this conversation as resolved.
Show resolved Hide resolved
```

扩容集群
Yujie-Xie marked this conversation as resolved.
Show resolved Hide resolved

```shell
tiup cluster scale-out mycluster scale.yaml --user tidb
```

缩容集群
Yujie-Xie marked this conversation as resolved.
Show resolved Hide resolved

```shell
tiup cluster scale-in mycluster -N 192.168.124.27:20160
```

升级集群
Yujie-Xie marked this conversation as resolved.
Show resolved Hide resolved

```shell
tiup cluster upgrade mycluster v8.2.0
```

## 常见问题
1. 启动 [email protected] 时出现错误:Failed to fully start up daemon: Permission denied
Yujie-Xie marked this conversation as resolved.
Show resolved Hide resolved

这可能是因为您的 `/etc/pam.d/system-auth.ued` 文件中缺少 `pam_systemd.so`。您可以使用以下命令检查 `/etc/pam.d/system-auth.ued` 文件是否已包含 `pam_systemd.so` 模块的配置。如果没有,则将 `session optional pam_systemd.so` 附加到文件末尾。
Yujie-Xie marked this conversation as resolved.
Show resolved Hide resolved

```shell
grep 'pam_systemd.so' /etc/pam.d/system-auth.ued || echo 'session optional pam_systemd.so' >> /etc/pam.d/system-auth.ued
```
1 change: 1 addition & 0 deletions tiup/tiup-cluster-topology-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ summary: 介绍通过 TiUP 部署或扩容 TiDB 集群时提供的拓扑文件

- `user`:以什么用户来启动部署的集群,默认值:"tidb",如果 `<user>` 字段指定的用户在目标机器上不存在,会自动尝试创建
- `group`:自动创建用户时指定用户所属的用户组,默认和 `<user>` 字段值相同,若指定的组不存在,则自动创建
- `systemd_mode`: 部署集群过程中在目标机器上使用的 systemd 模式,默认值为 `system`,若设置为 `user`,则表示在目标机器上不使用 sudo 权限,即使用 TiUP no-sudo 模式。
Yujie-Xie marked this conversation as resolved.
Show resolved Hide resolved
- `ssh_port`:指定连接目标机器进行操作的时候使用的 SSH 端口,默认值:22
- `enable_tls`:是否对集群启用 TLS。启用之后,组件之间、客户端与组件之间都必须使用生成的 TLS 证书进行连接,默认值:false
- `listen_host`:默认使用的监听 IP。如果为空,每个实例会根据其 `host` 字段是否包含 `:` 来自动设置为 `::` 或 `0.0.0.0`。tiup-cluster v1.14.0 引入该配置
Expand Down