You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a LXD cluster (MicroCloud), trying to concurrently create instances backed by ceph will fail the first time because multiple cluster member will try to create the readonly snapshot on the shared ceph pool.
Steps to reproduce
Here's the minimal reproducer that assumes a MicroCloud made out of 3 nodes, micro1, micro2 and micro3. This means the default profile is configured to put the roofs of the instances onto ceph:
root@micro1:~# lxc profile show default
name: default
description: Default LXD profile
config: {}
devices:
eth0:
name: eth0
network: default
type: nic
root:
path: /
pool: remote
type: disk
used_by: []
Verify the downloaded image was never turned into an image on the storage pool:
root@micro1:~# FINGERPRINT="$(lxc image info ubuntu-minimal-daily:24.04 | awk '/^Fingerprint:/ {print $2}')"
root@micro1:~# lxc image list -fcsv -cF | grep -xF "${FINGERPRINT}"
46942e5befec5812ca67d893456cf2e1d77b5a84d52854e9892d62e9d41c5d3a
root@micro1:~# lxc storage volume list remote | grep -F "${FINGERPRINT}" && echo "the image should NOT exist in the pool"
Concurrently create instances using that downloaded image for the first time:
root@micro1:~# lxc init ubuntu-minimal-daily:24.04 c1 --target micro1 & lxc init ubuntu-minimal-daily:24.04 c2 --target micro2 & lxc init ubuntu-minimal-daily:24.04 c3 --target micro3
[1] 12916
[2] 12917
Creating c2
Creating c3
Creating c1
Error: Failed instance creation: Failed creating instance from image: Error inserting volume "46942e5befec5812ca67d893456cf2e1d77b5a84d52854e9892d62e9d41c5d3a" for project "default" in pool "remote" of type "images" into database "UNIQUE constraint failed: index 'storage_volumes_unique_storage_pool_id_node_id_project_id_name_type'"
Retrieving image: Unpacking image: 100% (642.75MB/s)Error: Failed instance creation: Failed creating instance from image: Failed to run: rbd --id admin --cluster ceph --image-feature layering --image-feature striping --image-feature exclusive-lock --image-feature object-map --image-feature fast-diff --image-feature deep-flatten clone lxd_remote/image_46942e5befec5812ca67d893456cf2e1d77b5a84d52854e9892d62e9d41c5d3a_ext4@readonly lxd_remote/container_c2: exit status 2 (2025-04-03T20:35:58.256+0000 7fc2ff7fe640 -1 librbd::image::RefreshRequest: failed to locate snapshot: readonly
2025-04-03T20:35:58.256+0000 7fc2ff7fe640 -1 librbd::image::OpenRequest: failed to find snapshot readonly
2025-04-03T20:35:58.256+0000 7fc2eeffd640 -1 librbd::image::CloneRequest: 0x557ca7f989b0 handle_open_parent: failed to open parent image: (2) No such file or directory
rbd: clone error: (2) No such file or directory)
[1]- Exit 1 lxc init ubuntu-minimal-daily:24.04 c1 --target micro1
[2]+ Exit 1 lxc init ubuntu-minimal-daily:24.04 c2 --target micro2
root@micro1:~#
root@micro1:~# lxc list
+------+---------+------+------+-----------+-----------+----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | LOCATION |
+------+---------+------+------+-----------+-----------+----------+
| c3 | STOPPED | | | CONTAINER | 0 | micro3 |
+------+---------+------+------+-----------+-----------+----------+
The above errors show 2 issues. The UNIQUE constaint failed one is about another bug but the failed to open parent image one is the one I'm reporting here.
At this point, it's possible to see the remote pool now has the image as micro3 was able to take the tarball/squashfs and turn it into a RBD volume with a @readonly snapshot:
Yes this is similar to the other issue mentioned, except its likely affecting all remote storage pools that try and use the downloaded image to create an image volume.
I suspect we need to create some /internal endpoints that allow us to perform cluster-wide operations for DB records and remote pool operations on the leader (which then allows the use of sync.Mutex).
Yes this is similar to the other issue mentioned, except its likely affecting all remote storage pools that try and use the downloaded image to create an image volume.
I can try with Pure next week to confirm this.
I suspect we need to create some /internal endpoints that allow us to perform cluster-wide operations for DB records and remote pool operations on the leader (which then allows the use of sync.Mutex).
Please confirm
Distribution
Ubuntu
Distribution version
24.04
Output of "snap list --all lxd core20 core22 core24 snapd"
# snap list --all lxd core20 core22 core24 snapd Name Version Rev Tracking Publisher Notes core22 20250210 1802 latest/stable canonical✓ base core24 20241217 739 latest/stable canonical✓ base lxd 5.21.3-c5ae129 33110 5.21/stable canonical✓ in-cohort snapd 2.67.1 23771 latest/stable canonical✓ snapd
Output of "lxc info" or system info if it fails
Issue description
In a LXD cluster (MicroCloud), trying to concurrently create instances backed by
ceph
will fail the first time because multiple cluster member will try to create thereadonly
snapshot on the sharedceph
pool.Steps to reproduce
Here's the minimal reproducer that assumes a MicroCloud made out of 3 nodes,
micro1
,micro2
andmicro3
. This means thedefault
profile is configured to put theroofs
of the instances ontoceph
:The above errors show 2 issues. The
UNIQUE constaint failed
one is about another bug but thefailed to open parent image
one is the one I'm reporting here.At this point, it's possible to see the
remote
pool now has the image asmicro3
was able to take the tarball/squashfs and turn it into a RBD volume with a@readonly
snapshot:From here on, creating new instances concurrently should no longer run into the bug as the needed image is in the
remote
pool already:Information to attach
dmesg
)lxc info NAME --show-log
)lxc config show NAME --expanded
)/var/log/lxd/lxd.log
or/var/snap/lxd/common/lxd/logs/lxd.log
)--debug
--debug
(or uselxc monitor
while reproducing the issue)The text was updated successfully, but these errors were encountered: