Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backups generate a lot of operations #4091

Closed
kperreau opened this issue Oct 29, 2024 · 4 comments
Closed

Backups generate a lot of operations #4091

kperreau opened this issue Oct 29, 2024 · 4 comments

Comments

@kperreau
Copy link

kperreau commented Oct 29, 2024

Hello,

I wanted to set up ScyllaDB backups using Helm config (Kubernetes), and I noticed a huge number of operations. In just one night, my R2 bucket registered over 500k operations (reads and writes) without actually generating a backup, seemingly just due to health checks from Scylla Manager.

I also noticed many files created in this format: scylla-manager-agent-1001119299/test.

Am I doing something wrong in my configuration, or is it normal for Scylla to generate such a high number of operations? Or could this be a bug?

Scylla/Scylla Manager version: 6.1.1
Helm chart version: v1.14.0

Helm Values:

# Backup configuration
backups:
  - name: "daily-backup"
    cron: "0 4 * * *"  # Schedule daily backups at 4:00 AM
    timezone: "UTC"  # Timezone for the cron schedule
    keyspace:
      - "mykeyspace"  # Keyspace to back up
    location:
      - "s3:scylla-backup"  # Backup to Cloudflare R2
    rateLimit:
      - "100"
    retention: 3  # Retain the last 3 backups
    numRetries: 3  # Retry failed backups 3 times
    uploadParallel:
      - "3"  # Run the backup uploads in parallel on 3 nodes
    snapshotParallel:
      - "2"  # Take snapshots in parallel on 2 nodes
    startDate: now

repairs:
  - name: "cluster-repair-everyday"
    cron: "0 5 * * *"  # Schedule daily backups at 5:00 AM
    intensity: "1"
    keyspace:
      - "mykeyspace"  # Keyspace to back up
    startDate: now
    timezone: "UTC"  # Timezone for the cron schedule

Tasks:

Cluster: scylla/scylla (e4fd7590-3970-45e3-bb83-ab7021319cea)
+--------------------------------+-------------+--------+----------+---------+-------+------------------------+------------+--------+------------------------+
| Task                           | Schedule    | Window | Timezone | Success | Error | Last Success           | Last Error | Status | Next                   |
+--------------------------------+-------------+--------+----------+---------+-------+------------------------+------------+--------+------------------------+
| backup/daily-backup            | 0 4 * * *   |        | UTC      | 0       | 0     |                        |            | NEW    | 30 Oct 24 04:00:00 UTC |
| healthcheck/alternator         | @every 15s  |        | UTC      | 473     | 0     | 29 Oct 24 14:19:38 UTC |            | DONE   | 29 Oct 24 14:19:53 UTC |
| healthcheck/cql                | @every 15s  |        | UTC      | 471     | 0     | 29 Oct 24 14:19:32 UTC |            | DONE   | 29 Oct 24 14:19:47 UTC |
| healthcheck/rest               | @every 1m0s |        | UTC      | 116     | 0     | 29 Oct 24 14:18:46 UTC |            | DONE   | 29 Oct 24 14:19:46 UTC |
| repair/cluster-repair-everyday | 0 5 * * *   |        | UTC      | 1       | 0     | 29 Oct 24 12:34:14 UTC |            | DONE   | 30 Oct 24 05:00:00 UTC |
+--------------------------------+-------------+--------+----------+---------+-------+------------------------+------------+--------+------------------------+

Lot of objects in about 30mn:

Image

I also noticed a lot of logs related to backups when i set backups:
Image

@Michal-Leszczynski
Copy link
Collaborator

Hi @kperreau, during location check SM indeed creates and deletes some tmp files in backup location, but it does so only when backup task is added/updated/executed. So I believe that this is more of a ScyllaDB Operator issue.

@rzetelskik should this issue be transferred to Operator repo?
Also, isn't it a dup? I vaguely remember some similar issue in the past.

@rzetelskik
Copy link
Member

Also, isn't it a dup? I vaguely remember some similar issue in the past.

We had an issue with tasks being recreated in a hotloop but it's been fixed recently scylladb/scylla-operator#1827.
The fix will be a part of 1.15 release.

Hard to say if it's the case without looking into manager-controller logs.

@kperreau could you attach a must-gather archive?

@kperreau
Copy link
Author

Here are the logs from manager-controller when i enable the backups conf (not very verbose):

2024-10-30 12:48:28.909	I1030 11:48:28.909004       1 manager/controller.go:148] "Hit conflict, will retry in a bit" Key="scylla/scylla" Error="can't update status: Operation cannot be fulfilled on scyllaclusters.scylla.scylladb.com \"scylla\": the object has been modified; please apply your changes to the latest version and try again"
2024-10-30 12:48:28.900	I1030 11:48:28.900867       1 manager/status.go:94] "Updating status" ScyllaCluster="scylla/scylla"
2024-10-30 12:48:28.732	I1030 11:48:28.732143       1 manager/status.go:101] "Status updated" ScyllaCluster="scylla/scylla"
2024-10-30 12:48:28.712	I1030 11:48:28.712352       1 manager/status.go:94] "Updating status" ScyllaCluster="scylla/scylla"
2024-10-30 12:48:28.567	I1030 11:48:28.567449       1 manager/controller.go:148] "Hit conflict, will retry in a bit" Key="scylla/scylla" Error="can't update status: Operation cannot be fulfilled on scyllaclusters.scylla.scylladb.com \"scylla\": the object has been modified; please apply your changes to the latest version and try again"
2024-10-30 12:48:28.563	I1030 11:48:28.563087       1 manager/status.go:94] "Updating status" ScyllaCluster="scylla/scylla"
2024-10-30 12:48:28.424	I1030 11:48:28.424313       1 manager/status.go:101] "Status updated" ScyllaCluster="scylla/scylla"
2024-10-30 12:48:28.405	I1030 11:48:28.405135       1 manager/status.go:94] "Updating status" ScyllaCluster="scylla/scylla"

@Michal-Leszczynski
Copy link
Collaborator

@kperreau please open an issue in the Scylla Operator repo and make sure to include the must-gather-archive archive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants