Zchunk repodata in Rawhide

In my last post, I mentioned that we were hoping to get Zchunk metadata into Fedora 30, and I am pleased to announce that this feature is ready for preliminary testing. Last month, Daniel Mach reviewed and accepted the zchunk patches for librepo, libdnf and createrepo_c, and last week Kevin Fenzi turned on zchunk metadata generation in Rawhide.

If you install librepo and libdnf from my COPR (Rawhide only), you will download zchunk metadata if it’s available. Please note that, at the moment, only primary.xml, filelists.xml and other.xml are zchunked.

Once we’re convinced that we’re not going to break anybody’s install, we will see about getting these packages pushed to Rawhide.

Bugs

There are a couple of current known bugs:

  • The current zchunked metadata isn’t being compressed with Fedora’s repodata zdicts, so primary.xml.zck and other.xml.zck are roughly double the size of their gzip counterparts (filelists.xml.zck is about the same size, so overall download size is about 10-15% larger). We’re looking into this, and these sizes should be dramatically smaller once we figure it out. This has been fixed as of December 27th’s metadata. Zchunk metadata is now roughly 10% smaller than the equivalent gzip metadata, excluding any zchunk savings.
  • DNF resets the download progress bar multiple times when downloading zchunked metadata, and the final sum isn’t accurate. I believe this is due to how librepo is reporting the zchunk download, and I hope to have a fix soon.This is fixed here.

Testers wanted

If you are willing to take a risk with your Rawhide system, we’d appreciate some more people kicking the tires and making sure that, at minimum, we haven’t broken anything. To test, download a backup copy of librepo and libdnf, and then install librepo and libdnf from my COPR. Then install, update or remove packages and verify that it works as expected. I’ve tested with dnf, microdnf and PackageKit, but it should work out of the box with anything that uses libdnf as a backend.

If anything goes significantly wrong (i.e. dnf stops working), first try setting zchunk=False in /etc/yum.conf, and, if that doesn’t work, downgrade librepo and libdnf to your backup copies.

If, after enabling this, you run into any new bugs, for the moment please report them in bugzilla against zchunk (even if it’s librepo or libdnf that crash).

Updated 2018/12/28 as the first bug has been fixed

Updated 2019/01/01 as the second bug has been fixed

Zchunk update

Putting it all together

Zchunk 1.0

Eight months ago, I started working on zchunk, and it’s now almost ready for its 1.0 release. Once zchunk 1.0 is released, we will offer a stability guarantee. Only additions to the API will be allowed, and the ABI will always be backwards-compatible. All files created by older versions of zchunk will be able to be opened by new versions of zchunk, and files created by newer versions of zchunk will be able to be opened by the old versions.

There is one important caveat to the last item: the zchunk format supports mandatory feature flags. It is possible that an older version of zchunk doesn’t support a certain feature flag, and, if so, that version of zchunk will be unable to open files that contain the new flag.

As of version 0.9.12, zchunk also supports optional feature flags that provide extra information about the zchunk file. If a newer version of zchunk sets an optional flag, and the file is read by an older version that doesn’t recognize that particular flag, it will ignore the optional flag data and continue reading the file. This feature was requested at Flock this year, and I’m glad it will be available when zchunk 1.0 is released.

Coverity coverage and CI

In September, we managed to get Coverity to scan zchunk as part of its open source project support, and managed to eliminate 15 potential bugs that Coverity identified. New releases will continue to be scanned by Coverity. Thanks, Stephen Gallagher, for the suggestion.

I’ve also setup a jenkins instance for continuous integration. Every commit is run through a series of tests to verify that we’re not breaking anything.

Fedora metadata integration

One of the features that we hoped to get into Fedora 29 was Zchunk metadata, creating zchunk-compressed metadata for DNF. It was a stretch, and we were unable to get the zchunk patches reviewed upstream in time for Fedora 29’s release. The goal now is to get the patches accepted in time for Fedora 30.

We have patches for libdnf, librepo, createrepo_c and libsolv. In a point for inter-distribution cooperation, Michael Schroeder from SUSE merged the libsolv patch first, but the other patches are still awaiting review. Now that Fedora 29 is out the door, I’m hoping we’ll be able to get this done quickly.

Unless we hit major problems, I anticipate that Zchunk metadata will be a Fedora 30 feature. A huge thank you to Neal Gompa and Igor Gnatenko for their help with this.

Future features

There are a couple of ideas that I have bouncing around in my head for a couple of other places where zchunk might be useful, and I figured I should commit them to (digital) paper here.

zchunk-compressed RPMs

Deltarpms offer amazing space savings, but are very limited. In order to take advantage of a deltarpm, you must not only want a specific new version of the RPM, but also have a specific old version of the RPM installed. Because zchunk doesn’t look at any old versions at compression time, the same zchunk-compressed RPM can be used whatever the old version you have installed (or even if you have no old version installed).

To make this work, we would need to create some new feature flags in zchunk, and it would require some changes in the RPM format itself (the third rail, I know), but it is possible and could provide us with significant download savings without having to generate deltarpms at all.

zchunk-compressed container images

This would require that container registries and container management systems both support zchunk, but would allow pull operations to be significantly smaller. I don’t know much about how podman or docker actually pull their images, but I’ve been pulling updated images over 3G and it’s not much fun.

Other ideas?

We would welcome ports of zchunk to other languages. At Flock, a few people were keen on doing a Rust implementation, and I think that’s a brilliant idea.

If you have any other ideas for new features (or find a bug), please create an issue. I won’t guarantee that we’ll implement your new ideas, but we’ll take a look at them and see if they’re feasible. Obviously, pull requests greatly increase the chances that your idea will become part of zchunk.