Consider zstd for compression of shipped artifacts #1660

dustymabe · 2024-01-31T05:16:12Z

I did some investigation into zstd as our default compression algorithm. I set the compression level of zstd to 19 and xz to 9 (what we use today). Here is what I see for times on compress and decompress using xz of the metal and qemu artifacts:

Targeting build: 39.20240131.dev.0
Compressing: builds/39.20240131.dev.0/x86_64
2024-01-31 04:30:40,161 INFO - Running command: ['xz', '-c9', '-T12', 'builds/39.20240131.dev.0/x86_64/fedora-coreos-39.20240131.dev.0-ostree.x86_64-manifest.json']
Compressed: fedora-coreos-39.20240131.dev.0-ostree.x86_64-manifest.json.xz
2024-01-31 04:30:40,209 INFO - Running command: ['xz', '-c9', '-T12', 'builds/39.20240131.dev.0/x86_64/fedora-coreos-39.20240131.dev.0-qemu.x86_64.qcow2']
Compressed: fedora-coreos-39.20240131.dev.0-qemu.x86_64.qcow2.xz
2024-01-31 04:32:34,082 INFO - Running command: ['xz', '-c9', '-T12', 'builds/39.20240131.dev.0/x86_64/fedora-coreos-39.20240131.dev.0-metal.x86_64.raw']
Compressed: fedora-coreos-39.20240131.dev.0-metal.x86_64.raw.xz
Skipped compressing artifacts: ostree
Updated: builds/39.20240131.dev.0/x86_64/meta.json
+ rc=0
+ set +x

real    3m50.097s
user    0m0.155s
sys     0m0.153s


Targeting build: 39.20240131.dev.0
Uncompressing: builds/39.20240131.dev.0/x86_64
2024-01-31 04:51:32,434 INFO - Running command: ['xz', '-dc', '-T12', 'builds/39.20240131.dev.0/x86_64/fedora-coreos-39.20240131.dev.0-ostree.x86_64-manifest.json.xz']
Uncompressed: fedora-coreos-39.20240131.dev.0-ostree.x86_64-manifest.json
2024-01-31 04:51:32,452 INFO - Running command: ['xz', '-dc', '-T12', 'builds/39.20240131.dev.0/x86_64/fedora-coreos-39.20240131.dev.0-qemu.x86_64.qcow2.xz']
Uncompressed: fedora-coreos-39.20240131.dev.0-qemu.x86_64.qcow2
2024-01-31 04:51:38,337 INFO - Running command: ['xz', '-dc', '-T12', 'builds/39.20240131.dev.0/x86_64/fedora-coreos-39.20240131.dev.0-metal.x86_64.raw.xz']
Uncompressed: fedora-coreos-39.20240131.dev.0-metal.x86_64.raw
Skipped uncompressing artifacts: ostree
Updated: builds/39.20240131.dev.0/x86_64/meta.json
+ rc=0
+ set +x

real    0m13.809s
user    0m0.066s
sys     0m0.070s

and here is what I see for zstd:

 Compressing: builds/39.20240131.dev.1/x86_64
2024-01-31 04:42:08,112 INFO - Running command: ['zstd', '-19', '-c', '-T12', 'builds/39.20240131.dev.1/x86_64/fedora-coreos-39.20240131.dev.1-ostree.x86_64-manifest.json']
Compressed: fedora-coreos-39.20240131.dev.1-ostree.x86_64-manifest.json.zst
2024-01-31 04:42:08,138 INFO - Running command: ['zstd', '-19', '-c', '-T12', 'builds/39.20240131.dev.1/x86_64/fedora-coreos-39.20240131.dev.1-qemu.x86_64.qcow2']
Compressed: fedora-coreos-39.20240131.dev.1-qemu.x86_64.qcow2.zst
2024-01-31 04:43:35,600 INFO - Running command: ['zstd', '-19', '-c', '-T12', 'builds/39.20240131.dev.1/x86_64/fedora-coreos-39.20240131.dev.1-metal.x86_64.raw']
Compressed: fedora-coreos-39.20240131.dev.1-metal.x86_64.raw.zst
Skipped compressing artifacts: ostree
Updated: builds/39.20240131.dev.1/x86_64/meta.json
+ rc=0
+ set +x

real    3m2.790s
user    0m0.124s
sys     0m0.150s


Targeting build: 39.20240131.dev.1
Uncompressing: builds/39.20240131.dev.1/x86_64
2024-01-31 04:50:07,629 INFO - Running command: ['zstd', '-dc', 'builds/39.20240131.dev.1/x86_64/fedora-coreos-39.20240131.dev.1-ostree.x86_64-manifest.json.zst']
Uncompressed: fedora-coreos-39.20240131.dev.1-ostree.x86_64-manifest.json
2024-01-31 04:50:07,636 INFO - Running command: ['zstd', '-dc', 'builds/39.20240131.dev.1/x86_64/fedora-coreos-39.20240131.dev.1-qemu.x86_64.qcow2.zst']
Uncompressed: fedora-coreos-39.20240131.dev.1-qemu.x86_64.qcow2
2024-01-31 04:50:10,480 INFO - Running command: ['zstd', '-dc', 'builds/39.20240131.dev.1/x86_64/fedora-coreos-39.20240131.dev.1-metal.x86_64.raw.zst']
Uncompressed: fedora-coreos-39.20240131.dev.1-metal.x86_64.raw
Skipped uncompressing artifacts: ostree
Updated: builds/39.20240131.dev.1/x86_64/meta.json
+ rc=0
+ set +x

real    0m9.579s
user    0m0.051s
sys     0m0.071s

and here is what the difference in sizes look like:

        "qemu": {
            "path": "fedora-coreos-39.20240131.dev.0-qemu.x86_64.qcow2.xz",
            "sha256": "5e594eb29feb65e670e8c7e175d9b69eb31643ae9891074856bbd32b8bef2d56",
            "size": "662MiB",
            "uncompressed-sha256": "a117e5c02b04d93e158e246eca7409447d4808fd63e1d4a012fb688c613fc0e6",
            "uncompressed-size": "1609MiB"
        },
        "metal": {
            "path": "fedora-coreos-39.20240131.dev.0-metal.x86_64.raw.xz",
            "sha256": "132bc17c89ba82b9d0e91c3886b92447c0d1893c7c05ddeccc99b11706ec7b3a",
            "size": "661MiB",
            "uncompressed-sha256": "a8c1f04549136b3828bcb1beea7105f3b1ee70b17682cd9da5034a3ccf73b16c",
            "uncompressed-size": "2506MiB"
        }

        "qemu": {
            "path": "fedora-coreos-39.20240131.dev.1-qemu.x86_64.qcow2.zst",
            "sha256": "7697713189ff720a2a082b23948365fcdc6c71244f127ab6a16c99b11c2aec5e",
            "size": "720MiB",
            "uncompressed-sha256": "693edcc03dcb202775424c6fc4d9757a2042b374335100800b802fd0f82048e3",
            "uncompressed-size": "1609MiB"
        },
        "metal": {
            "path": "fedora-coreos-39.20240131.dev.1-metal.x86_64.raw.zst",
            "sha256": "563082baaef35847307f5ebff796992bfa1826589453861abdd905eec0d77dca",
            "size": "714MiB",
            "uncompressed-sha256": "0b802bd0a7b45b3a40760a25282c5bd8cccaa06ec180cfd87bcf033d50dde25d",
            "uncompressed-size": "2506MiB"
        }

To summarize:

Algo	Time Compress	Time Decompress	QEMU Uncompressed	QEMU Compressed	Metal Uncompressed	Metal Compressed
xz	3m50.097s	0m13.809s	1609MiB	662MiB	2506MiB	661MiB
zstd	3m2.790s	0m9.579ss	1609MiB	720MiB	2506MiB	714MiB

So we get about a 20% speedup in compression and 30% speedup in decompression with the tradeoff of 8-9% larger compressed files.

The text was updated successfully, but these errors were encountered:

dustymabe · 2024-01-31T16:17:05Z

Looking at our pipelines the compression step takes around 30m:

[2024-01-29T18:14:28.266Z] + set -xeuo pipefail
[2024-01-29T18:14:28.266Z] ++ umask
[2024-01-29T18:14:28.266Z] + '[' 0022 = 0000 ']'
[2024-01-29T18:14:28.266Z] + cosa compress
[2024-01-29T18:14:28.266Z] Targeting build: 39.20240128.1.0
[2024-01-29T18:14:28.519Z] Compressing: builds/39.20240128.1.0/x86_64
[2024-01-29T18:14:28.519Z] 2024-01-29 18:14:28,343 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-ostree.x86_64-manifest.json']
[2024-01-29T18:14:28.519Z] Compressed: fedora-coreos-39.20240128.1.0-ostree.x86_64-manifest.json.xz
[2024-01-29T18:14:28.519Z] 2024-01-29 18:14:28,379 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-qemu.x86_64.qcow2']
[2024-01-29T18:17:34.891Z] Compressed: fedora-coreos-39.20240128.1.0-qemu.x86_64.qcow2.xz
[2024-01-29T18:17:34.891Z] 2024-01-29 18:17:21,459 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-azure.x86_64.vhd']
[2024-01-29T18:20:26.296Z] Compressed: fedora-coreos-39.20240128.1.0-azure.x86_64.vhd.xz
[2024-01-29T18:20:26.296Z] 2024-01-29 18:20:12,008 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-aws.x86_64.vmdk']
[2024-01-29T18:22:02.671Z] Compressed: fedora-coreos-39.20240128.1.0-aws.x86_64.vmdk.xz
[2024-01-29T18:22:02.671Z] 2024-01-29 18:22:02,288 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-openstack.x86_64.qcow2']
[2024-01-29T18:24:54.058Z] Compressed: fedora-coreos-39.20240128.1.0-openstack.x86_64.qcow2.xz
[2024-01-29T18:24:54.058Z] 2024-01-29 18:24:50,011 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-aliyun.x86_64.qcow2']
[2024-01-29T18:27:45.446Z] Compressed: fedora-coreos-39.20240128.1.0-aliyun.x86_64.qcow2.xz
[2024-01-29T18:27:45.446Z] 2024-01-29 18:27:38,062 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-metal.x86_64.raw']
[2024-01-29T18:30:21.800Z] Compressed: fedora-coreos-39.20240128.1.0-metal.x86_64.raw.xz
[2024-01-29T18:30:21.800Z] 2024-01-29 18:30:08,872 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-metal4k.x86_64.raw']
[2024-01-29T18:32:58.169Z] Compressed: fedora-coreos-39.20240128.1.0-metal4k.x86_64.raw.xz
[2024-01-29T18:32:58.169Z] 2024-01-29 18:32:47,924 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-azurestack.x86_64.vhd']
[2024-01-29T18:35:49.569Z] Compressed: fedora-coreos-39.20240128.1.0-azurestack.x86_64.vhd.xz
[2024-01-29T18:35:49.569Z] 2024-01-29 18:35:38,359 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-exoscale.x86_64.qcow2']
[2024-01-29T18:38:41.159Z] Compressed: fedora-coreos-39.20240128.1.0-exoscale.x86_64.qcow2.xz
[2024-01-29T18:38:41.159Z] 2024-01-29 18:38:35,313 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-ibmcloud.x86_64.qcow2']
[2024-01-29T18:41:32.525Z] Compressed: fedora-coreos-39.20240128.1.0-ibmcloud.x86_64.qcow2.xz
[2024-01-29T18:41:32.525Z] 2024-01-29 18:41:26,729 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-vultr.x86_64.raw']
[2024-01-29T18:44:23.916Z] Compressed: fedora-coreos-39.20240128.1.0-vultr.x86_64.raw.xz
[2024-01-29T18:44:23.916Z] Skipped compressing artifacts: ostree applehv nutanix kubevirt hyperv gcp digitalocean vmware virtualbox live-iso live-kernel live-initramfs live-rootfs
[2024-01-29T18:44:23.916Z] Updated: builds/39.20240128.1.0/x86_64/meta.json

So we could possibly save 6-8m per run just on that. This could then compound because each of our CI runs may or may not run cosa compress too.

dustymabe · 2024-01-31T16:19:57Z

Another thing to mention here is that in my tests I used zstd compression level of 19 which is the highest you can specify without using --ultra which requires a lot more memory.

We could experiment with different levels to see what the differences are in size versus speed, but I assumed we wanted to increase the size as little as possible so I used 19.

jlebon · 2024-01-31T17:36:39Z

Huh, I was expecting more drastic differences in compression/decompression times. IMO spending an extra 6-8 minutes for 8% smaller images is worth it.

I implemented this to investigate it as an option for coreos/fedora-coreos-tracker#1660 so figured I may as well post the code up for inclusion.

dustymabe · 2024-02-01T21:41:39Z

Some more data:

Level	Time Compress	Time Decompress	Metal Uncompressed	Metal Compressed
19	3m2.790s	0m9.579ss	2506MiB	714MiB
14	0m53.438s	0m9.751s	2506MiB	754.8MiB
10	0m21.477s	0m9.487s	2506MiB	757.8MiB
5	0m7.368s	0m9.698s	2506MiB	793.5MiB

dustymabe · 2024-02-01T21:44:39Z

If we went with something like level 10 we'd get a 90% speedup in compression which I think would take our compress stage in our pipeline down to ~5m. The increase in image size would be around 10-15%.

Based on the discussion in coreos/fedora-coreos-tracker#1660 level 10 seems to give us a good speedup versus size tradeoff.

We'll experiment with this in `rawhide` to see what kind of real world gains we get from using zstd compression. Also see if there are any bugs that crop up. This is to further the discussion in coreos/fedora-coreos-tracker#1660

jlebon · 2024-02-07T22:12:28Z

This was discussed in today's community meeting:

INFO: we would like to gather more info on decompression speed and invite people to try out the zstd vs xz paths on their systems and report results. (@jlebon:fedora.im, 16:59:56)
INFO: this would require adding zstd image decompression support to coreos-installer (@jlebon:fedora.im, 17:06:04)

jbtrystram · 2024-02-12T11:10:15Z

I did some additionnal testing on the live qemu file

Level	Time Compress	Time Decompress	qemu Uncompressed	qemu Compressed
19	6m3.53s	0m1.66s	1611MiB	722MiB
14	1m16.84s	0m1.37s	1611MiB	763MiB

Another note: zstd was not installed in my f39 toolbox by default

baude · 2024-02-14T22:40:44Z

As a corollary, I also did some testing yesterday with the qemu image. The datasize compressed and uncompressed (cols 4 &5) were equivalent. No surprise there. Where the results differed for me was the decompression time. Mine was consistently double that. Were you passing any additional command-line switches?

jbtrystram · 2024-02-15T09:44:13Z

Where the results differed for me was the decompression time. Mine was consistently double that. Were you passing any additional command-line switches?

Simply running unzstd

Cyan4973 · 2024-02-29T03:49:33Z

Given that the image files tested are very large,
an interesting zstd option worth trying is --long,
giving the complete command : zstd -10 -T0 --long.
It may help detect repetitions (like near-identical files in the archive) at long distance.

jlebon · 2024-04-03T17:38:29Z

So... with the recent xz news, a lot of trust was lost in that project. Apart from the other benefits listed in this ticket, switching to zstd would now also avoid forcing people to use xz to use our artifacts if they're not comfortable with that.

baude · 2024-04-03T18:23:26Z

we have been using zstd with fcos images in podman machine now for a couple of months. lots of upside comments about the quicker decompression

Based on the discussion in coreos/fedora-coreos-tracker#1660 level 10 seems to give us a good speedup versus size tradeoff.

jlebon · 2024-05-27T18:10:08Z

Testing decompression speeds locally on a local 1.6G qcow2, I get 17.8s for xz and 0.99s for zstd. Weird that the decompression difference in #1660 (comment) between xz and zstd isn't much larger. As mentioned I think in the last community meeting where we discussed this, it's possibly hardware-related.

See: toolbx-images#128 See: https://fedoraproject.org/wiki/Changes/zstd:chunked See: https://docs.podman.io/en/latest/markdown/podman-push.1.html#compression-format-gzip-zstd-zstd-chunked See: coreos/fedora-coreos-tracker#1660

dustymabe added the meeting topics for meetings label Jan 31, 2024

dustymabe added a commit to dustymabe/coreos-assembler that referenced this issue Jan 31, 2024

cmd-compress: support zstd as an option

264037f

I implemented this to investigate it as an option for coreos/fedora-coreos-tracker#1660 so figured I may as well post the code up for inclusion.

dustymabe mentioned this issue Jan 31, 2024

cmd-compress: support zstd as an option coreos/coreos-assembler#3711

Merged

dustymabe added a commit to coreos/coreos-assembler that referenced this issue Jan 31, 2024

cmd-compress: support zstd as an option

15514f9

I implemented this to investigate it as an option for coreos/fedora-coreos-tracker#1660 so figured I may as well post the code up for inclusion.

dustymabe added a commit to dustymabe/coreos-assembler that referenced this issue Feb 6, 2024

cmd-compress: use level 10 for zstd compression

3257aec

Based on the discussion in coreos/fedora-coreos-tracker#1660 level 10 seems to give us a good speedup versus size tradeoff.

dustymabe mentioned this issue Feb 6, 2024

cmd-compress: use level 10 for zstd compression coreos/coreos-assembler#3721

Merged

dustymabe mentioned this issue Feb 6, 2024

[rawhide] image.yaml: use zstd compression coreos/fedora-coreos-config#2840

Open

jlebon removed the meeting topics for meetings label Feb 7, 2024

dustymabe added a commit to coreos/coreos-assembler that referenced this issue May 27, 2024

cmd-compress: use level 10 for zstd compression

28a26d6

Based on the discussion in coreos/fedora-coreos-tracker#1660 level 10 seems to give us a good speedup versus size tradeoff.

travier mentioned this issue Jul 10, 2024

github/workflows/centos: Also push zstd:chunked compressed images toolbx-images/images#129

Open

travier mentioned this issue Dec 4, 2024

Stop compressing applehv and hyperv by default coreos/coreos-assembler#3982

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider zstd for compression of shipped artifacts #1660

Consider zstd for compression of shipped artifacts #1660

dustymabe commented Jan 31, 2024

dustymabe commented Jan 31, 2024

dustymabe commented Jan 31, 2024

jlebon commented Jan 31, 2024

dustymabe commented Feb 1, 2024 •

edited

Loading

dustymabe commented Feb 1, 2024 •

edited

Loading

jlebon commented Feb 7, 2024

jbtrystram commented Feb 12, 2024 •

edited

Loading

baude commented Feb 14, 2024

jbtrystram commented Feb 15, 2024

Cyan4973 commented Feb 29, 2024

jlebon commented Apr 3, 2024 •

edited

Loading

baude commented Apr 3, 2024

jlebon commented May 27, 2024

Consider zstd for compression of shipped artifacts #1660

Consider zstd for compression of shipped artifacts #1660

Comments

dustymabe commented Jan 31, 2024

dustymabe commented Jan 31, 2024

dustymabe commented Jan 31, 2024

jlebon commented Jan 31, 2024

dustymabe commented Feb 1, 2024 • edited Loading

dustymabe commented Feb 1, 2024 • edited Loading

jlebon commented Feb 7, 2024

jbtrystram commented Feb 12, 2024 • edited Loading

baude commented Feb 14, 2024

jbtrystram commented Feb 15, 2024

Cyan4973 commented Feb 29, 2024

jlebon commented Apr 3, 2024 • edited Loading

baude commented Apr 3, 2024

jlebon commented May 27, 2024

dustymabe commented Feb 1, 2024 •

edited

Loading

dustymabe commented Feb 1, 2024 •

edited

Loading

jbtrystram commented Feb 12, 2024 •

edited

Loading

jlebon commented Apr 3, 2024 •

edited

Loading