-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make backup and restore more universal and secure #1276
make backup and restore more universal and secure #1276
Conversation
manifests/backup/restore-gcs.yaml
Outdated
name: demo2-restore | ||
namespace: test2 | ||
spec: | ||
from: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For restore
, this field should be to
.
manifests/backup/restore-gcs.yaml
Outdated
gcs: | ||
projectId: <your-project-id> | ||
secretName: gcs-secret | ||
storageType: gcs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting storageType
, storageClassName
and storageSize
together is confusing.
Is the PV a temporary storage? Maybe we can rename either the storage backend or the temporary PV.
Also, I think the backup path is specific to each storage backend, it's better to put under the corresponding storage backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this?
spec:
from: # storage backend info for backup file
type: gcs
gcs:
projectId: <your-project-id>
secretName: gcs-secret
path: gcs://test1-demo1/backup-2019-11-11T16:06:05Z.tgz
to: # downstream tidb cluster info
host: 10.0.0.2
port: 4000
user: root
secretName: backup-demo2-tidb-secret
tempStorage: # use temporarily to download the backup file
storageClassName: local-storage
size: 1Gi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree putting storageType
, storageClassName
and storageSize
together is confusing. These three fields all start with the word storage, which makes people think they have any relationship, but there is no relationship between them. Therefore I intend to remove storageType
field, here is the related issue #768. I prefer the revised definition below.
spec:
to:
host: 10.0.0.2
port: 4000
user: root
secretName: backup-demo2-tidb-secret
gcs:
projectId: <your-project-id>
secretName: gcs-secret
# "<bucket-name>/<path-to-backup-file>"
path: test1-demo1/backup-2019-11-11T16:06:05Z.tgz
storageClassName: local-storage
storageSize: 1Gi
What do you think? @DanielZhangQD @shuijing198799 @weekface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use from
and to
is more clear for users.
I agree to remove the storageType
attribute, gcs:
indicate the type already.
spec:
storageClassName: local-storage
storageSize: 1Gi
from:
gcs:
projectId: <your-project-id>
secretName: gcs-secret
path: gcs://test1-demo1/backup-2019-11-11T16:06:05Z.tgz
to:
host: 10.0.0.2
port: 4000
user: root
secretName: backup-demo2-tidb-secret
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that defining a remote storage configuration(here is gcs
) in restore has already indicated that the backup data must come from here, and there is no need to use from
to indicate this relationship. In addition, we should ensure that the spec
is as simple as possible to reduce the cost of use for users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are storage class and size defined, users may think the backup is in the storage defined with storageClassName
and storageSize
. Adding to
makes it less confuse and more straight forward and also simple enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think no need to add from/to
, users know they are doing backup
or restore
, so even if no from/to
indication, they know that the tidb cluster are used for what.
For the storageClassName: local-storage storageSize: 1Gi
, I prefer to make it clear in the doc that they are used for temporary storage whether to put them under tempStorage
or not.
spec:
cluster or tidb:
host: 10.0.0.2
port: 4000
user: root
secretName: backup-demo2-tidb-secret
gcs:
projectId: <your-project-id>
secretName: gcs-secret
# "<bucket-name>/<path-to-backup-file>"
path: test1-demo1/backup-2019-11-11T16:06:05Z.tgz
storageClassName: local-storage
storageSize: 1Gi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if users provide multiple cloud storage configurations if we omit the storageType
field?
host: 10.0.0.1 | ||
port: 4000 | ||
user: root | ||
secretName: backup-demo1-tidb-secret |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use passwordSecretName
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it makes sense to include the name of the key in the secretName
. What if the secret stores more stuff?
// GetTidbUserAndPassword get the tidb user and password from specific secret | ||
func GetTidbUserAndPassword(ns, name, tidbSecretName string, secretLister corelisters.SecretLister) (user, password, reason string, err error) { | ||
// GenerateTidbPasswordEnv generate the password EnvVar | ||
func GenerateTidbPasswordEnv(ns, name, tidbSecretName string, secretLister corelisters.SecretLister) ([]corev1.EnvVar, string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can follow k8s builtin controller's behavior to not checking the referenced resources existence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of the check is to make it fail quickly when there is a problem. At the same time, the failure information and reason will be written into the backup CR status, which is helpful for upper-layer application integrated with us to diagnose the problem quickly. k8s only discovers problems when it encounters problems in the execution layer, and then records the problem to the underlying pod object, which is not very straightforward for diagnosing the problem of the upper-layer application
secret, err := secretLister.Secrets(ns).Get(gcsSecretName) | ||
if err != nil { | ||
err := fmt.Errorf("backup %s/%s get gcs secret %s failed, err: %v", ns, name, gcsSecretName, err) | ||
err := fmt.Errorf("get gcs secret %s/%s failed, err: %v", ns, gcsSecretName, err) | ||
return certEnv, "GetGcsSecretFailed", err | ||
} | ||
|
||
keyStr, exist := CheckAllKeysExistInSecret(secret, constants.GcsCredentialsKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already checked in GenerateGcsCertEnvVar
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GenerateGcsCertEnvVar
does not check whether the secret exists and contains the specified key.
secret, err := secretLister.Secrets(ns).Get(s3SecretName) | ||
if err != nil { | ||
err := fmt.Errorf("backup %s/%s get s3 secret %s failed, err: %v", ns, name, s3SecretName, err) | ||
err := fmt.Errorf("get s3 secret %s/%s failed, err: %v", ns, s3SecretName, err) | ||
return certEnv, "GetS3SecretFailed", err | ||
} | ||
|
||
keyStr, exist := CheckAllKeysExistInSecret(secret, constants.S3AccessKey, constants.S3SecretKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already checked in GenerateS3CertEnvVar
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
@@ -506,7 +506,7 @@ type S3StorageProvider struct { | |||
Provider S3StorageProviderType `json:"provider"` | |||
// Region in which the S3 compatible bucket is located. | |||
Region string `json:"region,omitempty"` | |||
// Bucket in which to store the Backup. | |||
// Bucket in which to store the backup data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not support specifying a prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by specifying a bucket prefix and adding a random string to construct a complete bucket name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users may want to store the backup file under a specific directory (prefix), though we have generated a prefix composed by cluster name and namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this pr, users must specify the bucket name, since backup is not the same as before. There is no cluster and namespace, so the bucket name will not be generated automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bucket name is different from prefix. I'm wondering if we should provide another parameter prefix
, so users can store backup file in a bucket under a specific directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added one on #1280
// BackupPath is the location of the backup. | ||
BackupPath string `json:"backupPath"` | ||
// Type is the backup type for tidb cluster. | ||
Type BackupType `json:"backupType,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can allow users to omit this field if there is only one storage provider to improve UX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The BackupType
field is unused currently. Do you mean StorageType
?
Co-Authored-By: Tennix <[email protected]>
cmd/backup-manager/app/cmd/backup.go
Outdated
cmd.Flags().StringVarP(&bo.Namespace, "namespace", "n", "", "Backup CR's namespace") | ||
cmd.Flags().StringVarP(&bo.Host, "host", "h", "", "Tidb cluster access address") | ||
cmd.Flags().Int32VarP(&bo.Port, "port", "P", bkconstants.DefaultTidbPort, "Port number to use for connecting tidb cluster") | ||
cmd.Flags().StringVarP(&bo.Bucket, "bucket", "k", "", "Bucket in which to store the backup data") | ||
cmd.Flags().StringVarP(&bo.Password, bkconstants.TidbPasswordKey, "p", "", "Password to use when connecting to tidb cluster") | ||
cmd.Flags().StringVarP(&bo.User, "user", "u", "", "User for login tidb cluster") | ||
cmd.Flags().StringVarP(&bo.StorageType, "storageType", "S", "", "Backend storage type") | ||
cmd.Flags().StringVarP(&bo.StorageType, "storageType", "s", "", "Backend storage type") | ||
cmd.Flags().StringVarP(&bo.BackupName, "backupName", "b", "", "Backup CRD object name") | ||
util.SetFlagsFromEnv(cmd.Flags(), bkconstants.BackupManagerEnvVarPrefix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, backup-manager will get the backup from kube-apiserver and you only need to pass the name and namespace of the backup CR, then all the other info can be retrieved from that CR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, backup-manager can assemble its own parameters through backup & restore CR. But here I hope that the logic of these parameter checks can be done at the operator level, instead of waiting until the backup & restore job is executed to find that some parameters are problematic. We think that the backup-manager startup parameters obtained from the operator must be legal. The backup-manager only needs to focus on processing backup and restore tasks and updating the status of backup CR.
type: string | ||
cluster: | ||
description: Cluster represents the tidb cluster to be restored. | ||
backupType: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a backupType
in backupSpec
, why add one more here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is generated automatically
…estore-more-universal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…estore-more-universal
I have tested backup and restore locally and it works as expected. @tennix @weekface @DanielZhangQD @shuijing198799 PTAL again |
/run-e2e-tests |
…estore-more-universal
/run-e2e-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/run-e2e-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What problem does this PR solve?
resolve #1217 and #1253
What is changed and how does it work?
Check List
Tests
Code changes
Side effects
Related changes
Does this PR introduce a user-facing change?: