Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout while listing DO volume snapshots #295

Closed
PrasadG193 opened this issue Apr 6, 2020 · 5 comments
Closed

Timeout while listing DO volume snapshots #295

PrasadG193 opened this issue Apr 6, 2020 · 5 comments

Comments

@PrasadG193
Copy link

What did you do? (required. The issue will be closed when not provided.)

I tried creating VolumeSnapshot by referencing existing VolumeSnapshotContent with manifest like this:

apiVersion: snapshot.storage.k8s.io/v1alpha1
kind: VolumeSnapshot
metadata:
  name: new-snapshot
spec:
  snapshotClassName: do-block-storage
  snapshotContentName: snapcontent-clone

What did you expect to happen?

I was expecting VolumeSnapshot resource to be created without any error and status readyToUse: true to be set

but instead the resource created failed with the following error in the created VolumeSnapshot status

status:
  creationTime: null
  error:
  error:
    message: 'Failed to check and update snapshot: failed to list snapshot content
      snapcontent-c4b22fc4-eadc-4d84-bde8-1bccfb3073d4-clone: "rpc error: code = Aborted
      desc = ListSnapshots listing volume snapshots has failed: Get https://api.digitalocean.com/v2/snapshots?page=13&per_page=20&resource_type=volume:
      context deadline exceeded"'
    time: "2020-04-04T16:38:53Z"
  readyToUse: false
  restoreSize: null

I am assuming this due to timeout reached while quering the snapshot resource since we have large number of snapshot created i.e around 270+. When we cleaned up the existing snapshot, the resource got created.

Configuration (MUST fill this out):

manifests used to reproduce the issue:

volumesnapshotcontent.yaml

apiVersion: snapshot.storage.k8s.io/v1alpha1
kind: VolumeSnapshotContent
metadata:
  name: snapcontent-clone
spec:
  csiVolumeSnapshotSource:
    driver: dobs.csi.digitalocean.com
    restoreSize: 0
    snapshotHandle: xxx-xxx-xx-xx     # Copied from any existing snapshot content 
  deletionPolicy: Delete
  snapshotClassName: do-block-storage
  volumeSnapshotRef:
    apiVersion: snapshot.storage.k8s.io/v1alpha1
    kind: VolumeSnapshot
    name: new-snapshot
    namespace: default

volumesnapshot.yaml

apiVersion: snapshot.storage.k8s.io/v1alpha1
kind: VolumeSnapshot
metadata:
  name: new-snapshot
spec:
  snapshotClassName: do-block-storage
  snapshotContentName: snapcontent-clone
  • CSI Version: v1.2.0

  • Kubernetes Version: v1.16.6

  • Cloud provider/framework version, if applicable (such as Rancher): DigitalOcean managed Kubernetes

@PrasadG193 PrasadG193 changed the title Timeout while listing volume snapshot Timeout while listing DO volume snapshots Apr 6, 2020
@timoreimann
Copy link
Contributor

Hi @PrasadG193 👋

Just to clarify: were you able to verify that the timeout continued to occur even after several minutes? I'd like to rule out this wasn't a temporary problem with the public snapshot API.

Thank you!

@PrasadG193
Copy link
Author

Hey @timoreimann ,
I have not changed the default csi-snapshotter timeout which is 1m I guess
https://github.com/kubernetes-csi/external-snapshotter/blob/master/cmd/csi-snapshotter/main.go#L64

@timoreimann
Copy link
Contributor

In a quick test, I was able to list ~400 snapshots from the DO API in approximately 10 seconds. This means that in general, the API should be fast enough.

We need to inspect the particular cluster that you made those requests from. I think you also opened a DO support ticket, so let's continue further coordination from there and eventually post the final results here.

@timoreimann
Copy link
Contributor

I'm going to close this one out since you also filed a DO support ticket. Feel free to continue the discussion there.

For what it's worth, we made several improvements and fixes to the CSI driver recently, also for snapshots. It doesn't explain your performance issues, though it may affect your usage of snapshots in other regards (#299 seems relevant in particular).

We plan on doing a release soon. Keep an eye on the DOKS change log for when the fixes will be available in our managed offering. Let us know if you have more questions.

@PrasadG193
Copy link
Author

Sure. Thanks @timoreimann

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants