ceph-csi pool parameter not passed correctly #8550

ryanmickler · 2020-07-28T14:56:34Z

Nomad version

11.3

Operating system and Environment details

os: ubuntu bionic
ceph-csi: quay.io/cephcsi/cephcsi:v2.1.2-amd64

Issue

csi volume container parameter pool apparently not passed to ceph-csi node during volume mount. recieving missing required parameter pool

Potentially problem related to NodeStageVolume in CSI spec.

A quick inspection looks like NodeStageVolume (mounts the volume to a staging path on the node.)
https://github.com/ceph/ceph-csi/blob/47d5b60af8d48574ff6d11ca37dbff5a6f56815b/internal/rbd/nodeserver.go#L116
is calling genVolFromVolumeOptions on line 171
https://github.com/ceph/ceph-csi/blob/47d5b60af8d48574ff6d11ca37dbff5a6f56815b/internal/rbd/nodeserver.go#L171
Then inside that function, we are hitting our missing required parameter pool error here:
https://github.com/ceph/ceph-csi/blob/be9e7cf956c378227ff43e0194410468919766b7/internal/rbd/rbd_util.go#L694

Reproduction steps

Job file

csi-test-job.hcl

job "csi-test" {

    update {
        max_parallel = 3
        stagger = "30s"
    }

    region = "${region}"
    datacenters = ["${datacenter}"]
    type = "service"

    group "group" {

        count = 1
        volume "group_csi_volume" {
            type = "csi"
            read_only = false
            source = "csi-test-0"
        }

        task "task" {

            driver = "docker"

            resources {
                cpu    = 500
                memory = 1024
                network {
                    mbits = 1
                }
            }

           volume_mount {
              volume      = "group_csi_volume"
             destination = "/mnt/foo"
             read_only   = false
            }

            config {
                image = "alpine/latest"
                command = "sleep"
                args = [ "infinity" ]
            }
        }
    }
}

using the terraform nomad_volume provider (had tried using just raw hcl upload to nomad cli, no change)
ceph-csi-volume.tf

data "nomad_plugin" "ceph-csi" {
  plugin_id        = "ceph-csi"
  wait_for_healthy = true
}
resource "nomad_volume" "csi_volume" {
  type            = "csi"
  plugin_id       = data.nomad_plugin.ceph-csi.plugin_id
  volume_id       = "..."
  name            = "..."
  external_id     = "..."
  access_mode     = "single-node-writer"
  attachment_mode = "file-system"
  parameters = {
    # String representing a Ceph cluster
    clusterID = "..."
    # Ceph pool into which the RBD image shall be created
    pool = var.pool
  }
  secrets = {
    userID = "..."
    userKey = "..."
  }
  context = {}
}

ceph-csi-plugin-controller-job.hcl

...
    group "ceph-csi" {
        count = 1

        task "plugin" {

            driver = "docker"

            resources {
                cpu    = 200
                memory = 500

                network {
                    mbits = 1
                    port "metrics" {}
                }
            }

            # /etc/ceph-csi-config/config.json
            template {
                data = <<CONFG
[
    {
        "clusterID": "ceph",
        "monitors": [
            "..."
        ]
    }
]
CONFG
                destination   = "new/config.json"
                change_mode   = "restart"
            }

            config {
                image = "quay.io/cephcsi/cephcsi:v2.1.2-amd64"

                args = [
                    "--type=rbd",
                    "--controllerserver=true",
                    "--drivername=rbd.csi.ceph.com",
                    "--logtostderr",
                    "--endpoint=unix://csi/csi.sock",
                    "--metricsport=$${NOMAD_PORT_metrics}",
                    "--nodeid=..."
                ]

                # all CSI node plugins will need to run as privileged tasks
                # so they can mount volumes to the host. controller plugins
                # do not need to be privileged.
                privileged = true

                volumes = [
                    "new/config.json:/etc/ceph-csi-config/config.json",
                ]
            }

            service {
                name = "ceph-csi"
                port = "metrics"
                tags = [ "prometheus" ]
            }

            csi_plugin {
                id        = "ceph-csi"
                type      = "controller"
                mount_dir = "/csi" 
            }
        }
    }

ceph-csi-node.hcl

...
type= "system"

   group "ceph" {

        task "plugin" {

            driver = "docker"

            resources {
                cpu    = 200
                memory = 500

                network {
                    mbits = 1
                    port "metrics" {}
                }
            }

            # /etc/ceph-csi-config/config.json
            template {
                data = <<CONFG
[
    {
        "clusterID": "ceph",
        "monitors": [
            "..."
        ]
    }
]
CONFG
                destination   = "new/config.json"
                change_mode   = "restart"
            }

            config {
                image = "quay.io/cephcsi/cephcsi:v2.1.2-amd64"

                args = [
                    "--type=rbd",
                    # Name of the driver
                    "--drivername=rbd.csi.ceph.com",
                    "--logtostderr",
                    "--nodeserver=true",
                    "--endpoint=unix://csi/csi.sock",
                    # Unique ID distinguishing this instance of Ceph CSI among other instances, 
                    # when sharing Ceph clusters across CSI instances for provisioning
                    "--instanceid=...",
                    # This node's ID
                    "--nodeid=...", 
                    # TCP port for liveness metrics requests (/metrics)
                    "--metricsport=$${NOMAD_PORT_metrics}",
                ]

                # all CSI node plugins will need to run as privileged tasks
                # so they can mount volumes to the host. controller plugins
                # do not need to be privileged.
                privileged = true

                volumes = [
                    "new/config.json:/etc/ceph-csi-config/config.json",
                ]
                mounts = [
                    {
                        type = "tmpfs"
                        target = "/tmp/csi/keys"
                        readonly = false
                        tmpfs_options {
                            size = 1000000 # size in bytes
                        }
                    },
                ]
            }

            service {
                name = "ceph-csi"
                port = "metrics"
                tags = [ "prometheus" ]
            }

            csi_plugin {
                id        = "ceph-csi"
                type      = "node"
                mount_dir = "/csi" 
            }
        }
    }

Nomad Client logs

from ceph-csi node container:
E0728 04:47:20.003465 1 utils.go:163] ID: 23 Req-ID: csi-test-0 GRPC error: rpc error: code = Internal desc = missing required parameter pool

split off from #8212
this specific error first mentioned here #7668 (comment)

The text was updated successfully, but these errors were encountered:

ryanmickler · 2020-07-28T15:08:41Z

@tgross -> I tried to include enough spec to get a complete picture. let me know if more required.

tgross · 2020-07-28T19:48:55Z

Thanks @ryanmickler. I'll take a look into this later in the week.

ryanmickler · 2020-08-05T05:30:55Z

I believe i've found the problem.

First, here:

nomad/api/csi.go

Line 95 in 8f98ff2

type CSIVolume struct {

we can see both the parameters and context maps are part of the structure. in the hcl, I am passing in pool as a parameter.

Next, here:

nomad/client/pluginmanager/csimanager/volume.go

Line 171 in 1cb9e75

req := &csi.NodeStageVolumeRequest{

only the context map is passed through to NodeStageVolumeRequest as VolumeContext, and the parameters block is ignored.

(Its not clear to me what should be a parameter, and what should be in context)

And in the bug you fixed here: 901664f, context should now be passed properly into NodeStageVolume.

This was patched in https://github.com/hashicorp/nomad/releases/tag/v0.12.0-beta2.
So i guess the following should work in nomad v0.12.1?

data "nomad_plugin" "ceph-csi" {
  plugin_id        = "ceph-csi"
  wait_for_healthy = true
}
resource "nomad_volume" "csi_volume" {
  type            = "csi"
  plugin_id       = data.nomad_plugin.ceph-csi.plugin_id
  volume_id       = "..."
  name            = "..."
  external_id     = "..."
  access_mode     = "single-node-writer"
  attachment_mode = "file-system"
  parameters = {}
  secrets = {
    userID = "..."
    userKey = "..."
  }
  context = {
     # String representing a Ceph cluster
    clusterID = "..."
    # Ceph pool in which the RBD image exists/shall be created
    pool = var.pool
  }
}

ryanmickler · 2020-08-10T06:58:22Z

Update: well, pushing to Nomad 0.12.1 and passing pool in the context map seems to get past the original error i was having.

Now the issues I'm having are related to how strictly external_id needs to be specifically formatted for the ceph-csi driver. I'll raise these in a new issue if its related to nomad.

I'd be happy to close this, but there's probably some amount of documentation to preserve here as others are likely to hit this error.

tgross · 2020-08-10T13:06:49Z

Update: well, pushing to Nomad 0.12.1 and passing pool in the context map seems to get past the original error i was having.

Darn, I didn't dig into this more quickly because I would have caught the version where we'd fixed that. Sorry about that.

I'd be happy to close this, but there's probably some amount of documentation to preserve here as others are likely to hit this error.

Yeah, we don't currently have a great way of documenting the details of what each CSI plugin expects for inputs, and I'd worry about getting stale with what the upstream projects are doing as well. There's clearly some Nomad-specific documentation bits to be had that don't quite belong in Nomad docs but not anywhere else either.

There's an integrations/ directory that seems like it might be a good fit for example volume specs, or maybe the demos/ directory. We're having a bit of an internal discussion about that.

I'm going to close this issue for now but rest assured we're tracking a discussion about what we can do around documentation improvements.

RickyGrassmuck · 2020-08-10T15:30:12Z

@tgross I love the idea of having an Examples directory containing CSI deployment specs. If this could be something that could be contributed to by the public I would be happy to send over the Openstack spec we have along with any others we may use.

ryanmickler · 2020-08-11T07:47:52Z

@rigrassm do you mean using the Cinder CSI plugin https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/using-cinder-csi-plugin.md#csi-cinder-driver

RickyGrassmuck · 2020-08-11T16:26:46Z

@ryanmickler yup, that's the one. It's currently broken until 0.12.2 is released but I've tested it with the patch that fixes it and it works great.

ryanmickler · 2020-08-12T01:43:39Z

Perhaps we could get started on a branch where we put our example configs together?

RickyGrassmuck · 2020-08-12T02:47:03Z

Sure, I would happy to contribute to that!

I have one created for the Cinder CSI Driver already and did some work getting the iscsi CSI driver working which I suspect will work after the 0.12.2 release.

I'd like to hear from @tgross about the internal discussion re: the structure and location of these examples so that we can get this going as smoothly as possible. It's a good amount of work reverse engineering these CSI drivers to be deployed on Nomad (or any orchestrator not named K8's lol) so having documented examples of their configs coming in from the community would be fantastic.

ryanmickler · 2020-08-12T02:49:07Z

Absolutely. Lets wait for feedback.

PS. I also use nomad on openstack, but ive been needing to use CephFS, which I dont think cinder can support.

For that, we'll need to do that Manila CSI plugin (https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/using-manila-csi-plugin.md) which i assume your config will help with.

tgross · 2020-08-12T12:58:49Z

Thanks folks! I've opened #8651 to add a directory at ./demo/csi/ to collect these.

RickyGrassmuck · 2020-08-12T22:48:00Z

@tgross Awesome, appreciate it!

@ryanmickler I Just opened #8662 adding an example for the Cinder CSI driver if you wanted to take a look at it.

ryanmickler · 2020-08-13T06:11:19Z

Great, i just added #8664

This should get people started.

github-actions · 2022-11-03T02:33:45Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

ryanmickler changed the title ~~ceph-csi pool parameter not passed to NodeStageVolume~~ ceph-csi pool parameter not passed correctly Jul 28, 2020

notnoop added stage/needs-investigation theme/storage labels Jul 28, 2020

tgross closed this as completed Aug 10, 2020

tgross added theme/docs Documentation issues and enhancements and removed stage/needs-investigation labels Aug 10, 2020

tgross mentioned this issue Aug 12, 2020

directory for CSI examples #8651

Merged

github-actions bot locked as resolved and limited conversation to collaborators Nov 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ceph-csi pool parameter not passed correctly #8550

ceph-csi pool parameter not passed correctly #8550

ryanmickler commented Jul 28, 2020 •

edited

Loading

ryanmickler commented Jul 28, 2020

tgross commented Jul 28, 2020

ryanmickler commented Aug 5, 2020 •

edited

Loading

ryanmickler commented Aug 10, 2020

tgross commented Aug 10, 2020

RickyGrassmuck commented Aug 10, 2020 •

edited

Loading

ryanmickler commented Aug 11, 2020

RickyGrassmuck commented Aug 11, 2020

ryanmickler commented Aug 12, 2020

RickyGrassmuck commented Aug 12, 2020

ryanmickler commented Aug 12, 2020 •

edited

Loading

tgross commented Aug 12, 2020

RickyGrassmuck commented Aug 12, 2020

ryanmickler commented Aug 13, 2020

github-actions bot commented Nov 3, 2022

ceph-csi pool parameter not passed correctly #8550

ceph-csi pool parameter not passed correctly #8550

Comments

ryanmickler commented Jul 28, 2020 • edited Loading

Nomad version

Operating system and Environment details

Issue

Reproduction steps

Job file

Nomad Client logs

ryanmickler commented Jul 28, 2020

tgross commented Jul 28, 2020

ryanmickler commented Aug 5, 2020 • edited Loading

ryanmickler commented Aug 10, 2020

tgross commented Aug 10, 2020

RickyGrassmuck commented Aug 10, 2020 • edited Loading

ryanmickler commented Aug 11, 2020

RickyGrassmuck commented Aug 11, 2020

ryanmickler commented Aug 12, 2020

RickyGrassmuck commented Aug 12, 2020

ryanmickler commented Aug 12, 2020 • edited Loading

tgross commented Aug 12, 2020

RickyGrassmuck commented Aug 12, 2020

ryanmickler commented Aug 13, 2020

github-actions bot commented Nov 3, 2022

ryanmickler commented Jul 28, 2020 •

edited

Loading

ryanmickler commented Aug 5, 2020 •

edited

Loading

RickyGrassmuck commented Aug 10, 2020 •

edited

Loading

ryanmickler commented Aug 12, 2020 •

edited

Loading