Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with scylla manager backups using Minio provider. #259

Closed
prasus opened this issue Nov 23, 2020 · 2 comments
Closed

Issues with scylla manager backups using Minio provider. #259

prasus opened this issue Nov 23, 2020 · 2 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@prasus
Copy link

prasus commented Nov 23, 2020

Greetings! We are having a scylla cluster deployed using scylla operator on our self hosted kubernetes cluster and it works fine, but I am facing issues when creating backups using scylla manager, we are using Minio as a backup target with the below agent configurations

cat /mnt/scylla-agent-config/scylla-manager-agent.yaml 
s3:
  access_key_id: admin
  secret_access_key: mypassword
  provider: Minio
  endpoint: http://minio.mycompany.com:9000

Looks like the scylla manager assumes that it's on AWS even after specifying provider: Minio flag in the agent config.

bash-4.2# scylla-manager-agent check-location -L s3:scylla-backups --debug
{"L":"DEBUG","T":"2020-11-23T05:39:39.920Z","N":"rclone","M":"AWS failed to fetch instance identity: Get http://169.254.169.254/latest/dynamic/instance-identity/document: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"}
{"L":"INFO","T":"2020-11-23T05:39:39.920Z","N":"rclone","M":"registered s3 provider [name=s3, upload_concurrency=2, provider=AWS, chunk_size=50M]"}
{"L":"INFO","T":"2020-11-23T05:39:39.920Z","N":"rclone","M":"registered gcs provider [name=gcs, chunk_size=50M]"}
{"L":"ERROR","T":"2020-11-23T05:40:39.922Z","N":"rclone","M":": error reading destination directory: NoCredentialProviders: no valid providers in chain. Deprecated.\n\tFor verbose messaging see aws.Config.CredentialsChainVerboseErrors"}
{"L":"INFO","T":"2020-11-23T05:40:39.923Z","N":"rclone","M":"S3 bucket cluster-backups: Waiting for checks to finish"}
{"L":"INFO","T":"2020-11-23T05:40:39.923Z","N":"rclone","M":"S3 bucket cluster-backups: Waiting for transfers to finish"}
FAILED: no providers - attach IAM Role to EC2 instance or put your access keys to s3 section of /etc/scylla-manager-agent/scylla-manager-agent.yaml and restart agent

The scylla pods are able to contact the minio cluster and the backup seems to work fine when uploading small system/default key-spaces like system_schema, system_traces etc, but fails when uploading non-system keyspaces having bigger datasize.

$ sctool task progress -c bda13859-e979-4d5f-852b-671cac448106 backup/11a47266-f991-4831-a4aa-2be43b255868
Arguments:	-L s3:scylla-backups --retention 5
Status:		RUNNING (uploading data)
Start time:	23 Nov 20 09:00:05 UTC
Duration:	32s
Progress:	0%
Snapshot Tag:	sm_20201123090007UTC
Datacenters:	
  - dus6

╭───────────────┬──────────┬──────────┬─────────┬──────────────┬────────╮
│ Host          │ Progress │     Size │ Success │ Deduplicated │ Failed │
├───────────────┼──────────┼──────────┼─────────┼──────────────┼────────┤
│ 10.98.218.215 │       0% │ 63.86GiB │  451KiB │       289KiB │     0B │
│ 10.99.185.170 │       0% │ 63.85GiB │  434KiB │       205KiB │     0B │
│ 10.99.208.120 │       0% │ 63.82GiB │  408KiB │       184KiB │     0B │
╰───────────────┴──────────┴──────────┴─────────┴──────────────┴────────╯

Below are the scylla manager errors produced during a backup task (sctool backup -c bda13859-e979-4d5f-852b-671cac448106 -L s3:scylla-backups --retention 5 --interval '24h'), as seen in the logs the backup fails for user_ks keyspace in which we have about 75G data.

{"L":"INFO","T":"2020-11-23T09:00:13.677Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.99.185.170","keyspace":"system_schema","table":"columns","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:13.677Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.98.218.215","keyspace":"system_schema","table":"columns","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:13.677Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.99.208.120","keyspace":"system_schema","table":"columns","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:14.643Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.98.218.215","keyspace":"system_schema","table":"computed_columns","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:15.474Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.98.218.215","keyspace":"system_schema","table":"dropped_columns","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:15.637Z","N":"http","M":"HTTP","from":"10.244.50.0:25836","method":"GET","uri":"/api/v1/cluster/bda13859-e979-4d5f-852b-671cac448106/task/backup/11a47266-f991-4831-a4aa-2be43b255868","status":200,"bytes":298,"duration":"2ms","_trace_id":"d2sl3rsqRfa-6f8LS45YXw"}
{"L":"INFO","T":"2020-11-23T09:00:15.643Z","N":"http","M":"HTTP","from":"10.244.50.0:25836","method":"GET","uri":"/api/v1/cluster/bda13859-e979-4d5f-852b-671cac448106/task/backup/11a47266-f991-4831-a4aa-2be43b255868/latest","status":200,"bytes":11887,"duration":"5ms","_trace_id":"t9yRyF3hQnaEWFxxsHffKA"}
{"L":"INFO","T":"2020-11-23T09:00:16.345Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.98.218.215","keyspace":"system_schema","table":"functions","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:16.393Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.99.185.170","keyspace":"system_schema","table":"computed_columns","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:16.395Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.99.208.120","keyspace":"system_schema","table":"computed_columns","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:16.772Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.98.218.215","keyspace":"system_schema","table":"indexes","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:16.772Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.99.208.120","keyspace":"system_schema","table":"dropped_columns","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:16.926Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.99.185.170","keyspace":"system_schema","table":"dropped_columns","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:17.083Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.99.208.120","keyspace":"system_schema","table":"functions","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:17.195Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.98.218.215","keyspace":"system_schema","table":"keyspaces","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:17.314Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.99.185.170","keyspace":"system_schema","table":"functions","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:17.593Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.99.185.170","keyspace":"system_schema","table":"indexes","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:17.822Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.99.185.170","keyspace":"system_schema","table":"keyspaces","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:19.105Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.99.208.120","keyspace":"system_schema","table":"indexes","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:19.311Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.99.208.120","keyspace":"system_schema","table":"keyspaces","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:21.472Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.98.218.215","keyspace":"system_schema","table":"scylla_tables","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:21.473Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.99.185.170","keyspace":"system_schema","table":"scylla_tables","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:21.830Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.98.218.215","keyspace":"system_schema","table":"tables","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:24.879Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.99.185.170","keyspace":"system_schema","table":"tables","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:24.879Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.99.208.120","keyspace":"system_schema","table":"scylla_tables","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:26.323Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.98.218.215","keyspace":"system_schema","table":"triggers","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:27.149Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.99.208.120","keyspace":"system_schema","table":"tables","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:29.638Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.99.185.170","keyspace":"system_schema","table":"triggers","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:30.335Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.98.218.215","keyspace":"system_schema","table":"types","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:31.529Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.99.208.120","keyspace":"system_schema","table":"triggers","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:34.896Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.99.185.170","keyspace":"system_schema","table":"types","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:35.209Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.98.218.215","keyspace":"system_schema","table":"view_virtual_columns","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:35.454Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.98.218.215","keyspace":"system_schema","table":"views","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:36.830Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.99.208.120","keyspace":"system_schema","table":"types","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:38.241Z","N":"http","M":"HTTP","from":"10.244.50.0:8074","method":"GET","uri":"/api/v1/cluster/bda13859-e979-4d5f-852b-671cac448106/task/backup/11a47266-f991-4831-a4aa-2be43b255868","status":200,"bytes":298,"duration":"1ms","_trace_id":"gXsYt34QRE-jU9U75kAuow"}
{"L":"INFO","T":"2020-11-23T09:00:38.245Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.99.185.170","keyspace":"system_schema","table":"view_virtual_columns","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:38.250Z","N":"http","M":"HTTP","from":"10.244.50.0:8074","method":"GET","uri":"/api/v1/cluster/bda13859-e979-4d5f-852b-671cac448106/task/backup/11a47266-f991-4831-a4aa-2be43b255868/latest","status":200,"bytes":13288,"duration":"8ms","_trace_id":"581IQd9rR4iSuzjaK8akWg"}
{"L":"INFO","T":"2020-11-23T09:00:38.657Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.99.185.170","keyspace":"system_schema","table":"views","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:39.995Z","N":"backup.upload","M":"Snapshot already uploaded skipping","host":"10.98.218.215","table":"events","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:39.995Z","N":"backup.upload","M":"Snapshot already uploaded skipping","host":"10.98.218.215","table":"node_slow_log","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:39.995Z","N":"backup.upload","M":"Snapshot already uploaded skipping","host":"10.98.218.215","table":"node_slow_log_time_idx","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:39.995Z","N":"backup.upload","M":"Snapshot already uploaded skipping","host":"10.98.218.215","table":"sessions","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:39.995Z","N":"backup.upload","M":"Snapshot already uploaded skipping","host":"10.98.218.215","table":"sessions_time_idx","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:39.995Z","N":"backup.upload","M":"Snapshot already uploaded skipping","host":"10.98.218.215","table":"contact_count","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:39.995Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.98.218.215","keyspace":"user_ks","table":"event_by_recipient","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:40.516Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.99.208.120","keyspace":"system_schema","table":"view_virtual_columns","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:41.069Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.99.208.120","keyspace":"system_schema","table":"views","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:42.808Z","N":"backup.upload","M":"Snapshot already uploaded skipping","host":"10.99.185.170","table":"events","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:42.808Z","N":"backup.upload","M":"Snapshot already uploaded skipping","host":"10.99.185.170","table":"node_slow_log","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:42.808Z","N":"backup.upload","M":"Snapshot already uploaded skipping","host":"10.99.185.170","table":"node_slow_log_time_idx","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:42.808Z","N":"backup.upload","M":"Snapshot already uploaded skipping","host":"10.99.185.170","table":"sessions","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:42.808Z","N":"backup.upload","M":"Snapshot already uploaded skipping","host":"10.99.185.170","table":"sessions_time_idx","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:42.808Z","N":"backup.upload","M":"Snapshot already uploaded skipping","host":"10.99.185.170","table":"contact_count","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:42.808Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.99.185.170","keyspace":"user_ks","table":"event_by_recipient","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:45.980Z","N":"backup.upload","M":"Snapshot already uploaded skipping","host":"10.99.208.120","table":"events","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:45.980Z","N":"backup.upload","M":"Snapshot already uploaded skipping","host":"10.99.208.120","table":"node_slow_log","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:45.980Z","N":"backup.upload","M":"Snapshot already uploaded skipping","host":"10.99.208.120","table":"node_slow_log_time_idx","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:45.980Z","N":"backup.upload","M":"Snapshot already uploaded skipping","host":"10.99.208.120","table":"sessions","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:45.980Z","N":"backup.upload","M":"Snapshot already uploaded skipping","host":"10.99.208.120","table":"sessions_time_idx","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:45.980Z","N":"backup.upload","M":"Snapshot already uploaded skipping","host":"10.99.208.120","table":"contact_count","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:45.980Z","N":"backup.upload","M":"Uploading table snapshot","host":"10.99.208.120","keyspace":"user_ks","table":"event_by_recipient","location":"s3:scylla-backups","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:52.856Z","N":"cluster.client","M":"HTTP retry backoff","operation":"JobProgress","wait":"858.297602ms","error":"EOF","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:53.729Z","N":"cluster.client","M":"HTTP retry backoff","operation":"JobProgress","wait":"1.828290475s","error":"dial tcp 10.98.218.215:10001: connect: connection refused","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:53.886Z","N":"http","M":"HTTP","from":"10.244.50.0:19947","method":"GET","uri":"/api/v1/cluster/bda13859-e979-4d5f-852b-671cac448106/task/backup/11a47266-f991-4831-a4aa-2be43b255868","status":200,"bytes":298,"duration":"3ms","_trace_id":"_GnDCABkQRiZ3fo7wgpkaQ"}
{"L":"INFO","T":"2020-11-23T09:00:53.892Z","N":"http","M":"HTTP","from":"10.244.50.0:19947","method":"GET","uri":"/api/v1/cluster/bda13859-e979-4d5f-852b-671cac448106/task/backup/11a47266-f991-4831-a4aa-2be43b255868/latest","status":200,"bytes":13642,"duration":"5ms","_trace_id":"-9__RnDBTgaGoZn4ov1pfw"}
{"L":"INFO","T":"2020-11-23T09:00:55.089Z","N":"http","M":"HTTP","from":"10.244.48.152:46436","method":"GET","uri":"/api/v1/clusters","status":200,"bytes":218,"duration":"11ms","_trace_id":"4_1SYqG6SIOTOdJDiaTJNA"}
{"L":"INFO","T":"2020-11-23T09:00:55.094Z","N":"http","M":"HTTP","from":"10.244.48.152:46436","method":"GET","uri":"/api/v1/cluster/bda13859-e979-4d5f-852b-671cac448106/tasks?all=true&type=repair","status":200,"bytes":470,"duration":"4ms","_trace_id":"fqynXHCyRe2X0MHFvqMKeA"}
{"L":"INFO","T":"2020-11-23T09:00:55.098Z","N":"http","M":"HTTP","from":"10.244.48.152:46436","method":"GET","uri":"/api/v1/cluster/bda13859-e979-4d5f-852b-671cac448106/tasks?all=true&type=backup","status":200,"bytes":404,"duration":"2ms","_trace_id":"XuyU1bDZRuK1HbzkePXBgg"}
{"L":"INFO","T":"2020-11-23T09:00:55.103Z","N":"scheduler","M":"Task deleted","cluster_id":"bda13859-e979-4d5f-852b-671cac448106","task_type":"backup","task_id":"11a47266-f991-4831-a4aa-2be43b255868","_trace_id":"R3LntSdOSiWoH4ej3us2LQ"}
{"L":"INFO","T":"2020-11-23T09:00:55.103Z","N":"scheduler","M":"Task execution canceled","cluster_id":"11a47266-f991-4831-a4aa-2be43b255868","task_type":"backup","task_id":"11a47266-f991-4831-a4aa-2be43b255868","run_id":"366ccf18-2d6a-11eb-9696-420623b63dde","_trace_id":"R3LntSdOSiWoH4ej3us2LQ"}
{"L":"INFO","T":"2020-11-23T09:00:55.103Z","N":"http","M":"HTTP","from":"10.244.48.152:46436","method":"DELETE","uri":"/api/v1/cluster/bda13859-e979-4d5f-852b-671cac448106/task/backup/11a47266-f991-4831-a4aa-2be43b255868","status":0,"bytes":0,"duration":"3ms","_trace_id":"R3LntSdOSiWoH4ej3us2LQ"}
{"L":"INFO","T":"2020-11-23T09:00:55.106Z","N":"scheduler","M":"Task scheduled","cluster_id":"bda13859-e979-4d5f-852b-671cac448106","task_type":"repair","task_id":"6178f81e-4007-40ab-8b88-d2d9b2ed06fc","run_id":"651c3f19-2d6a-11eb-9697-420623b63dde","activation":"2020-11-23T09:01:25.099Z","_trace_id":"Cbp0fiN5QEatT1t9bW52OQ"}
{"L":"INFO","T":"2020-11-23T09:00:55.106Z","N":"http","M":"HTTP","from":"10.244.48.152:46436","method":"PUT","uri":"/api/v1/cluster/bda13859-e979-4d5f-852b-671cac448106/task/repair/6178f81e-4007-40ab-8b88-d2d9b2ed06fc","status":200,"bytes":329,"duration":"2ms","_trace_id":"Cbp0fiN5QEatT1t9bW52OQ"}
{"L":"ERROR","T":"2020-11-23T09:00:55.103Z","N":"backup.upload","M":"Failed to fetch job info","host":"10.99.185.170","keyspace":"user_ks","table":"event_by_recipient","job_id":1606107736,"error":"giving up after 1 attempts: context canceled","_trace_id":"66XW6y9TTi-kJB20_nBO3g","errorStack":"github.com/scylladb/mermaid/pkg/scyllaclient.(*retryableOperation).submit\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/scyllaclient/retry.go:60\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/scyllaclient.retryableTransport.Submit\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/scyllaclient/retry.go:54\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/scyllaclient/internal/agent/client/operations.(*Client).JobProgress\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/scyllaclient/internal/agent/client/operations/operations_client.go:213\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/scyllaclient.(*Client).RcloneJobProgress\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/scyllaclient/client_rclone.go:88\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/service/backup.(*worker).waitJob\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/service/backup/worker_upload.go:207\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/service/backup.(*worker).uploadDataDir\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/service/backup/worker_upload.go:157\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/service/backup.(*worker).uploadSnapshotDir\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/service/backup/worker_upload.go:135\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/service/backup.(*worker).uploadHost\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/service/backup/worker_upload.go:57\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/service/backup.(*worker).Upload.func2\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/service/backup/worker_upload.go:27\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/service/backup.hostsInParallel.func1\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/service/backup/parallel.go:80\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/util/parallel.Run.func1\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/util/parallel/parallel.go:62\nruntime.goexit\n\truntime/asm_amd64.s:1357\n"}
{"L":"ERROR","T":"2020-11-23T09:00:55.103Z","N":"backup.upload","M":"Failed to fetch job info","host":"10.98.218.215","keyspace":"user_ks","table":"event_by_recipient","job_id":1606107741,"error":"giving up after 2 attempts: context canceled","_trace_id":"66XW6y9TTi-kJB20_nBO3g","errorStack":"github.com/scylladb/mermaid/pkg/scyllaclient.(*retryableOperation).submit\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/scyllaclient/retry.go:60\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/scyllaclient.retryableTransport.Submit\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/scyllaclient/retry.go:54\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/scyllaclient/internal/agent/client/operations.(*Client).JobProgress\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/scyllaclient/internal/agent/client/operations/operations_client.go:213\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/scyllaclient.(*Client).RcloneJobProgress\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/scyllaclient/client_rclone.go:88\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/service/backup.(*worker).waitJob\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/service/backup/worker_upload.go:207\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/service/backup.(*worker).uploadDataDir\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/service/backup/worker_upload.go:157\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/service/backup.(*worker).uploadSnapshotDir\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/service/backup/worker_upload.go:135\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/service/backup.(*worker).uploadHost\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/service/backup/worker_upload.go:57\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/service/backup.(*worker).Upload.func2\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/service/backup/worker_upload.go:27\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/service/backup.hostsInParallel.func1\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/service/backup/parallel.go:80\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/util/parallel.Run.func1\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/util/parallel/parallel.go:62\nruntime.goexit\n\truntime/asm_amd64.s:1357\n"}
{"L":"ERROR","T":"2020-11-23T09:00:55.103Z","N":"backup.upload","M":"Failed to fetch job info","host":"10.99.208.120","keyspace":"user_ks","table":"event_by_recipient","job_id":1606108188,"error":"giving up after 1 attempts: context canceled","_trace_id":"66XW6y9TTi-kJB20_nBO3g","errorStack":"github.com/scylladb/mermaid/pkg/scyllaclient.(*retryableOperation).submit\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/scyllaclient/retry.go:60\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/scyllaclient.retryableTransport.Submit\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/scyllaclient/retry.go:54\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/scyllaclient/internal/agent/client/operations.(*Client).JobProgress\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/scyllaclient/internal/agent/client/operations/operations_client.go:213\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/scyllaclient.(*Client).RcloneJobProgress\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/scyllaclient/client_rclone.go:88\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/service/backup.(*worker).waitJob\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/service/backup/worker_upload.go:207\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/service/backup.(*worker).uploadDataDir\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/service/backup/worker_upload.go:157\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/service/backup.(*worker).uploadSnapshotDir\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/service/backup/worker_upload.go:135\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/service/backup.(*worker).uploadHost\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/service/backup/worker_upload.go:57\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/service/backup.(*worker).Upload.func2\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/service/backup/worker_upload.go:27\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/service/backup.hostsInParallel.func1\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/service/backup/parallel.go:80\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/util/parallel.Run.func1\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/util/parallel/parallel.go:62\nruntime.goexit\n\truntime/asm_amd64.s:1357\n"}
{"L":"INFO","T":"2020-11-23T09:00:55.116Z","N":"cluster.client","M":"HTTP retry backoff","operation":"JobStop","wait":"987.095507ms","error":"dial tcp 10.98.218.215:10001: connect: connection refused"}
{"L":"ERROR","T":"2020-11-23T09:00:55.143Z","N":"backup.upload","M":"Upload dir failed","host":"10.99.208.120","from":"data:user_ks/event_by_recipient-c0f53250257811eb833b000000000005/snapshots/sm_20201123090007UTC","to":"s3:scylla-backups/backup/sst/cluster/bda13859-e979-4d5f-852b-671cac448106/dc/dus6/node/96fa6c3e-e071-4a25-955e-2e477f9db3b6/keyspace/user_ks/table/event_by_recipient/c0f53250257811eb833b000000000005","error":"context canceled; clear job stats: giving up after 1 attempts: context canceled","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"ERROR","T":"2020-11-23T09:00:55.143Z","N":"backup.upload","M":"Upload dir failed","host":"10.99.185.170","from":"data:user_ks/event_by_recipient-c0f53250257811eb833b000000000005/snapshots/sm_20201123090007UTC","to":"s3:scylla-backups/backup/sst/cluster/bda13859-e979-4d5f-852b-671cac448106/dc/dus6/node/15a8db72-ea1c-4d42-8449-5a24020358a2/keyspace/user_ks/table/event_by_recipient/c0f53250257811eb833b000000000005","error":"context canceled; clear job stats: giving up after 1 attempts: context canceled","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"ERROR","T":"2020-11-23T09:00:55.143Z","N":"backup.upload","M":"Uploading snapshot files failed on host","host":"10.99.185.170","error":"upload snapshot: copy \"data:user_ks/event_by_recipient-c0f53250257811eb833b000000000005/snapshots/sm_20201123090007UTC\" to \"s3:scylla-backups/backup/sst/cluster/bda13859-e979-4d5f-852b-671cac448106/dc/dus6/node/15a8db72-ea1c-4d42-8449-5a24020358a2/keyspace/user_ks/table/event_by_recipient/c0f53250257811eb833b000000000005\": context canceled; clear job stats: giving up after 1 attempts: context canceled","_trace_id":"66XW6y9TTi-kJB20_nBO3g","errorStack":"github.com/scylladb/mermaid/pkg/service/backup.(*worker).uploadHost\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/service/backup/worker_upload.go:58\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/service/backup.(*worker).Upload.func2\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/service/backup/worker_upload.go:27\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/service/backup.hostsInParallel.func1\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/service/backup/parallel.go:80\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/util/parallel.Run.func1\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/util/parallel/parallel.go:62\nruntime.goexit\n\truntime/asm_amd64.s:1357\n"}
{"L":"ERROR","T":"2020-11-23T09:00:55.143Z","N":"backup.upload","M":"Uploading snapshot files failed on host","host":"10.99.208.120","error":"upload snapshot: copy \"data:user_ks/event_by_recipient-c0f53250257811eb833b000000000005/snapshots/sm_20201123090007UTC\" to \"s3:scylla-backups/backup/sst/cluster/bda13859-e979-4d5f-852b-671cac448106/dc/dus6/node/96fa6c3e-e071-4a25-955e-2e477f9db3b6/keyspace/user_ks/table/event_by_recipient/c0f53250257811eb833b000000000005\": context canceled; clear job stats: giving up after 1 attempts: context canceled","_trace_id":"66XW6y9TTi-kJB20_nBO3g","errorStack":"github.com/scylladb/mermaid/pkg/service/backup.(*worker).uploadHost\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/service/backup/worker_upload.go:58\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/service/backup.(*worker).Upload.func2\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/service/backup/worker_upload.go:27\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/service/backup.hostsInParallel.func1\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/service/backup/parallel.go:80\ngithub.jparrowsec.cn/scylladb/mermaid/pkg/util/parallel.Run.func1\n\tgithub.jparrowsec.cn/scylladb/mermaid@/pkg/util/parallel/parallel.go:62\nruntime.goexit\n\truntime/asm_amd64.s:1357\n"}
{"L":"INFO","T":"2020-11-23T09:00:55.425Z","N":"http","M":"HTTP","from":"10.244.48.152:46436","method":"GET","uri":"/api/v1/clusters","status":200,"bytes":218,"duration":"2ms","_trace_id":"Y0fhx4T7TfSM9dxR0jPfuQ"}
{"L":"INFO","T":"2020-11-23T09:00:55.428Z","N":"http","M":"HTTP","from":"10.244.48.152:46436","method":"GET","uri":"/api/v1/cluster/bda13859-e979-4d5f-852b-671cac448106/tasks?all=true&type=repair","status":200,"bytes":470,"duration":"2ms","_trace_id":"05Vy-Q8jTuiUWdbdr-JMmw"}
{"L":"INFO","T":"2020-11-23T09:00:55.430Z","N":"http","M":"HTTP","from":"10.244.48.152:46436","method":"GET","uri":"/api/v1/cluster/bda13859-e979-4d5f-852b-671cac448106/tasks?all=true&type=backup","status":200,"bytes":3,"duration":"1ms","_trace_id":"HQC1pwMcS0Gb2-8Zg-IPoA"}
{"L":"INFO","T":"2020-11-23T09:00:55.434Z","N":"scheduler","M":"Task scheduled","cluster_id":"bda13859-e979-4d5f-852b-671cac448106","task_type":"repair","task_id":"6178f81e-4007-40ab-8b88-d2d9b2ed06fc","run_id":"654e3f3c-2d6a-11eb-9698-420623b63dde","activation":"2020-11-23T09:01:25.430Z","_trace_id":"Z-gGPQJST3K4L09Dnai06w"}
{"L":"INFO","T":"2020-11-23T09:00:55.434Z","N":"http","M":"HTTP","from":"10.244.48.152:46436","method":"PUT","uri":"/api/v1/cluster/bda13859-e979-4d5f-852b-671cac448106/task/repair/6178f81e-4007-40ab-8b88-d2d9b2ed06fc","status":200,"bytes":328,"duration":"3ms","_trace_id":"Z-gGPQJST3K4L09Dnai06w"}
{"L":"INFO","T":"2020-11-23T09:00:56.104Z","N":"cluster.client","M":"HTTP retry backoff","operation":"JobStop","wait":"2.066256709s","error":"dial tcp 10.98.218.215:10001: connect: connection refused"}
{"L":"INFO","T":"2020-11-23T09:00:58.194Z","N":"cluster.client","M":"HTTP","host":"10.98.218.215:10001","method":"POST","uri":"/agent/rclone/job/stop","duration":"22ms","status":500,"bytes":88,"dump":"HTTP/1.1 500 Internal Server Error\r\nContent-Length: 88\r\nContent-Type: application/json\r\nDate: Mon, 23 Nov 2020 09:00:58 GMT\r\n\r\n{\"input\":{\"jobid\":1606107741},\"message\":\"job not found\",\"path\":\"job/stop\",\"status\":500}\n"}
{"L":"INFO","T":"2020-11-23T09:00:58.194Z","N":"cluster.client","M":"HTTP retry backoff","operation":"JobStop","wait":"4.158342298s","error":"agent [HTTP 500] job not found"}
{"L":"INFO","T":"2020-11-23T09:01:02.353Z","N":"cluster.client","M":"HTTP","host":"10.98.218.215:10001","method":"POST","uri":"/agent/rclone/job/stop","duration":"0ms","status":500,"bytes":88,"dump":"HTTP/1.1 500 Internal Server Error\r\nContent-Length: 88\r\nContent-Type: application/json\r\nDate: Mon, 23 Nov 2020 09:01:02 GMT\r\n\r\n{\"input\":{\"jobid\":1606107741},\"message\":\"job not found\",\"path\":\"job/stop\",\"status\":500}\n"}
{"L":"INFO","T":"2020-11-23T09:01:02.353Z","N":"cluster.client","M":"HTTP retry backoff","operation":"JobStop","wait":"8.994378311s","error":"agent [HTTP 500] job not found"}
{"L":"INFO","T":"2020-11-23T09:01:11.349Z","N":"cluster.client","M":"HTTP","host":"10.98.218.215:10001","method":"POST","uri":"/agent/rclone/job/stop","duration":"0ms","status":500,"bytes":88,"dump":"HTTP/1.1 500 Internal Server Error\r\nContent-Length: 88\r\nContent-Type: application/json\r\nDate: Mon, 23 Nov 2020 09:01:11 GMT\r\n\r\n{\"input\":{\"jobid\":1606107741},\"message\":\"job not found\",\"path\":\"job/stop\",\"status\":500}\n"}
{"L":"INFO","T":"2020-11-23T09:01:11.349Z","N":"cluster.client","M":"HTTP retry backoff","operation":"JobStop","wait":"13.161211945s","error":"agent [HTTP 500] job not found"}
{"L":"INFO","T":"2020-11-23T09:01:24.511Z","N":"cluster.client","M":"HTTP","host":"10.98.218.215:10001","method":"POST","uri":"/agent/rclone/job/stop","duration":"0ms","status":500,"bytes":88,"dump":"HTTP/1.1 500 Internal Server Error\r\nContent-Length: 88\r\nContent-Type: application/json\r\nDate: Mon, 23 Nov 2020 09:01:24 GMT\r\n\r\n{\"input\":{\"jobid\":1606107741},\"message\":\"job not found\",\"path\":\"job/stop\",\"status\":500}\n"}
{"L":"INFO","T":"2020-11-23T09:01:24.511Z","N":"cluster.client","M":"HTTP retry backoff","operation":"JobStop","wait":"32.585929886s","error":"agent [HTTP 500] job not found"}
{"L":"INFO","T":"2020-11-23T09:01:25.430Z","N":"scheduler","M":"Task started","cluster_id":"bda13859-e979-4d5f-852b-671cac448106","task_type":"repair","task_id":"6178f81e-4007-40ab-8b88-d2d9b2ed06fc","run_id":"654e3f3c-2d6a-11eb-9698-420623b63dde","_trace_id":"s_cuk45mQqanEDEvSvVnpQ"}
{"L":"INFO","T":"2020-11-23T09:01:25.744Z","N":"repair","M":"Initializing repair","cluster_id":"bda13859-e979-4d5f-852b-671cac448106","task_id":"6178f81e-4007-40ab-8b88-d2d9b2ed06fc","run_id":"654e3f3c-2d6a-11eb-9698-420623b63dde","target":{"units":[{"keyspace":"user_ks","tables":["contact_count","event_by_sender","event_by_recipient"],"all_tables":true}],"dc":["dus6"],"token_ranges":"dcpr","continue":true},"_trace_id":"s_cuk45mQqanEDEvSvVnpQ"}
{"L":"INFO","T":"2020-11-23T09:01:25.923Z","N":"cluster.client","M":"Measuring datacenter latencies","dcs":["dus6"],"_trace_id":"s_cuk45mQqanEDEvSvVnpQ"}
{"L":"INFO","T":"2020-11-23T09:01:25.928Z","N":"repair.worker","M":"Initialising repair","host":"10.99.185.170","_trace_id":"s_cuk45mQqanEDEvSvVnpQ"}
{"L":"INFO","T":"2020-11-23T09:01:25.933Z","N":"repair.worker","M":"Detected row-level repair","host":"10.99.185.170","_trace_id":"s_cuk45mQqanEDEvSvVnpQ"}
{"L":"INFO","T":"2020-11-23T09:01:25.936Z","N":"repair.worker","M":"Initialising repair","host":"10.99.208.120","_trace_id":"s_cuk45mQqanEDEvSvVnpQ"}
{"L":"INFO","T":"2020-11-23T09:01:25.938Z","N":"repair.worker","M":"Detected row-level repair","host":"10.99.208.120","_trace_id":"s_cuk45mQqanEDEvSvVnpQ"}
{"L":"INFO","T":"2020-11-23T09:01:25.940Z","N":"repair.worker","M":"Initialising repair","host":"10.98.218.215","_trace_id":"s_cuk45mQqanEDEvSvVnpQ"}
{"L":"INFO","T":"2020-11-23T09:01:25.942Z","N":"repair.worker","M":"Detected row-level repair","host":"10.98.218.215","_trace_id":"s_cuk45mQqanEDEvSvVnpQ"}
{"L":"INFO","T":"2020-11-23T09:01:25.944Z","N":"repair","M":"Repairing unit","unit":{"keyspace":"user_ks","tables":["contact_count","event_by_sender","event_by_recipient"],"all_tables":true},"_trace_id":"s_cuk45mQqanEDEvSvVnpQ"}
{"L":"INFO","T":"2020-11-23T09:01:25.993Z","N":"repair.worker","M":"Repairing","host":"10.99.185.170","_trace_id":"s_cuk45mQqanEDEvSvVnpQ"}
{"L":"INFO","T":"2020-11-23T09:01:26.531Z","N":"repair.worker","M":"Repairing","host":"10.99.185.170","shard":0,"percent_complete":0,"_trace_id":"s_cuk45mQqanEDEvSvVnpQ"}
{"L":"INFO","T":"2020-11-23T09:01:48.550Z","N":"http","M":"HTTP","from":"10.244.50.0:15127","method":"GET","uri":"/api/v1/cluster/bda13859-e979-4d5f-852b-671cac448106/task/backup/11a47266-f991-4831-a4aa-2be43b255868","status":404,"bytes":132,"duration":"2ms","error":"resource not found: load task \"11a47266-f991-4831-a4aa-2be43b255868\": not found","_trace_id":"4QeiMQ-HRc2nkzogZHpDbA"}
{"L":"ERROR","T":"2020-11-23T09:01:55.103Z","N":"scheduler","M":"Task did not stop in time","cluster_id":"bda13859-e979-4d5f-852b-671cac448106","task_type":"backup","task_id":"11a47266-f991-4831-a4aa-2be43b255868","run_id":"366ccf18-2d6a-11eb-9696-420623b63dde","wait":"1m0s","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"ERROR","T":"2020-11-23T09:01:55.104Z","N":"scheduler","M":"Task ended with error","cluster_id":"bda13859-e979-4d5f-852b-671cac448106","task_type":"backup","task_id":"11a47266-f991-4831-a4aa-2be43b255868","run_id":"366ccf18-2d6a-11eb-9696-420623b63dde","status":"ERROR","cause":"stop task in 1m0s","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}

Environment:

  • Platform: Self Hosted
  • Kubernetes version: 1.18.8
  • Scylla version:4.0.0
  • Scylla Manager version: 2.1
  • Scylla Manager Agent version: 2.2.0
  • Scylla-operator version: e.g.: 3.0

Any recommendations for fixing this would be much appreciated! 🙂 🙏🏿

@prasus prasus added the kind/bug Categorizes issue or PR as related to a bug. label Nov 23, 2020
@zimnx
Copy link
Collaborator

zimnx commented Nov 23, 2020

Looks like your scylla-manager-agent is crashing. Most likely due to resource limits.

{"L":"INFO","T":"2020-11-23T09:00:52.856Z","N":"cluster.client","M":"HTTP retry backoff","operation":"JobProgress","wait":"858.297602ms","error":"EOF","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}
{"L":"INFO","T":"2020-11-23T09:00:53.729Z","N":"cluster.client","M":"HTTP retry backoff","operation":"JobProgress","wait":"1.828290475s","error":"dial tcp 10.98.218.215:10001: connect: connection refused","_trace_id":"66XW6y9TTi-kJB20_nBO3g"}

Can you provide agent logs?

@prasus
Copy link
Author

prasus commented Nov 23, 2020

Thank you @zimnx 👍🏿 As discussed, The backup task works fine when tried after increasing the resources allocated to the scylla-manager-agent sidecar container.

It would have been great if there is a way to specify the scylla-manager-agent resources in the cluster CRD, I will open an enhancement request for that. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants