Wrong disk size in metrics for btrfs backend instances #15265

edlerd · 2025-03-26T16:33:27Z

Distribution

snap

Distribution version

6.3

Output of "snap list --all lxd core20 core22 core24 snapd"

lxd     6.3-d704dcb     32918  latest/stable  canonical✓  -

Issue description

For an instance on a btrfs pool with limited main disk size, the reported lxd_filesystem_size_bytes in the GET /1.0/metrics endpoint is wrongly sending the total storage pool size.

With other storage pool drivers, like zfs or directory, the size in the metrics result correctly returns the instance limit, not the total pool size.

I suspect this be an issue with btrfs integration.

See also canonical/lxd-ui#1155
Might be related to #8468

Steps to reproduce

Create a storage pool with btrfs driver with 5G size.
Create an instance on the pool
Restrict the instances main drive size to 2G.
Call the GET /1.0/metrics endpoint
see lxd_filesystem_size_bytes for the instance is 5G, but it should be 2G

The text was updated successfully, but these errors were encountered:

tomponline · 2025-03-27T08:13:12Z

Yes i was thinking #8468 sounded similar.

edlerd · 2025-03-27T08:27:19Z

Yes i was thinking #8468 sounded similar.

Though I now realize #8468 is about disk usage and here it is about disk total. But both issues might be relating to a similar cause, as both relate to disk on btrfs pools.

gabrielmougard · 2025-04-07T13:41:09Z

I think I have identified the cause of this issue. The problem stems from how different filesystems expose quota information to the kernel's VFS layer. While ZFS integrates quota limits directly into its filesystem statistics (making statfs calls correctly report the quota-limited size), BTRFS reports the entire pool's statistics regardless of any quotas applied to specific subvolumes. Currently, our metrics code relies on the standard filesystem statistics, which works correctly for ZFS but not for BTRFS. I'm working on a fix that will specifically handle the BTRFS case by directly querying the BTRFS quota information (we could parse the output of btrfs qgroup show -f --raw <path> . Should I put this logic directly in the getFSStats() function or should we have it as part of an exported BTRFS driver method ?) for the container's volume and using that value for the reported filesystem size instead of the raw pool size. @tomponline what do you think?

tomponline · 2025-04-07T13:44:03Z

@gabrielmougard which code path is the problem at currently, link please :)

gabrielmougard · 2025-04-07T14:29:24Z

In the getFSStats() function when filesystem.StatVFS is called (that would be my strong guess)

https://github.com/canonical/lxd/blob/05ca2853d6b307725da6c3479f23bda347c43ca4/lxd/instance/drivers/driver_lxc.go#L8593C4-L8593C44

gabrielmougard · 2025-04-07T14:31:28Z

Obviously, we have the same issue in getFilesystemMetrics() if the instance is a VM

lxd/lxd-agent/metrics.go

Line 285 in c3e921c

statfs, err := filesystem.StatVFS(stats.Mountpoint)

tomponline · 2025-04-07T14:33:38Z

@gabrielmougard can you give me a lxc CLI example of the incorrect output and the related instance config, as im not following currently. Thanks

gabrielmougard · 2025-04-07T14:50:16Z

Sure! Also, I want to stress that this idea of mine is a guess for now as I need to log the output of statfs with the reproducer scenario. But this guess seems to be corroborated by https://lore.kernel.org/linux-btrfs/[email protected]/T/

If I understand correctly: statfs(), only allows the kernel to report two numbers to describe space usage: total blocks and free blocks and the only space tracking we have at a subvolume level is qgroups, hence why the idea of using btrfs qgroups show ... instead of a statfs call in case of having a btrfs storage pool.

gabrielmougard self-assigned this Apr 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong disk size in metrics for btrfs backend instances #15265

Wrong disk size in metrics for btrfs backend instances #15265

edlerd commented Mar 26, 2025 •

edited

Loading

tomponline commented Mar 27, 2025

edlerd commented Mar 27, 2025

gabrielmougard commented Apr 7, 2025

tomponline commented Apr 7, 2025

gabrielmougard commented Apr 7, 2025

gabrielmougard commented Apr 7, 2025 •

edited

Loading

tomponline commented Apr 7, 2025

gabrielmougard commented Apr 7, 2025

Wrong disk size in metrics for btrfs backend instances #15265

Wrong disk size in metrics for btrfs backend instances #15265

Comments

edlerd commented Mar 26, 2025 • edited Loading

Distribution

Distribution version

Output of "snap list --all lxd core20 core22 core24 snapd"

Issue description

Steps to reproduce

tomponline commented Mar 27, 2025

edlerd commented Mar 27, 2025

gabrielmougard commented Apr 7, 2025

tomponline commented Apr 7, 2025

gabrielmougard commented Apr 7, 2025

gabrielmougard commented Apr 7, 2025 • edited Loading

tomponline commented Apr 7, 2025

gabrielmougard commented Apr 7, 2025

edlerd commented Mar 26, 2025 •

edited

Loading

gabrielmougard commented Apr 7, 2025 •

edited

Loading