Introducing zchunk

Introducing zchunk, a new file format that’s highly delta-able (is that even a word?), while still maintaining good compression. This format has been heavily influenced by both zsync and casync, but attempts to address the weaknesses (for at least some use-case) in both formats. I’ll cover the background behind this in a later post.

Like casync and zsync, zchunk works by dividing a file into independently compressed “chunks”. Using only standard web protocols, the zchunk utility zckdl downloads any new chunks for your file while re-using any duplicate chunks from a specified file on your filesystem.

Zchunk is a completely new compression format, and it uses a new extension, .zck. By default, it uses zstd internally, but, because it compresses each chunk separately, a zchunk file cannot be decompressed using the zstd utilities. A zchunk file can be decompressed using the unzck utility and compressed using the zck utility.

Zchunk also supports the use of a common dictionary to help increase compression. Since chunks may be quite small, but have repeated data, you can use a zstd dictionary to encode the most common data. The dictionary must be the same for every version of your file, otherwise the chunks won’t match. For our test case, Fedora’s update metadata, using a dictionary reduces the size of the file by almost 40%.

So what’s the final damage? In testing, a zchunk file with an average chunk size of a few kilobytes and a 100KB dictionary ends up roughly 23% larger than a zstd file using the same compression level, but almost 10% smaller than the equivalent gzip file. Obviously, results will vary, based on chunk size, but zchunk generally beats gzip in size while providing efficient deltas via both rsync and standard http.

The zchunk file format should be considered fixed in that any further changes will be backwards-compatible. The API for creating and decompressing a .zck file can be considered essentially finished, while the API for downloading a .zck file still needs some work.

Future features include embedded signatures, separate streams, and proper utilities.

zchunk-0.4.0 is available for download, and, if you’re running Fedora or RHEL, there’s a COPR that also includes zchunk-enabled createrepo_c (Don’t get too excited, as there’s no code yet in dnf/librepo to download the .zck metadata).

Development is currently on GitHub.

Updated 05/03/2018 to point to new repository location

Migrating from wordpress.com to Hugo

Refreshing changes

When I started this blog back in 2009, I chose to publish it on Wordpress because it was easy to use and maintain. I hosted it using wordpress.com’s free tier, and it has worked well enough for me since then, but when it came time to move the blog off of wordpress.com and onto something self-hosted, I wasn’t convinced that Wordpress was still the best solution for me.

As a system administrator, my biggest concern regarding Wordpress is its security. When our school’s website switched from some 90’s era framework to Wordpress a couple of years ago, it wasn’t long before our site was compromised. We switched from a web host to a DigitalOcean instance running the latest version of Fedora and a system copy of Wordpress (both kept up-to-date), which has (at least for now) kept our site from being compromised again, but that is one more service that we have to keep our eyes on.

The problem is that, as nice as it is to have a pretty GUI for inputting posts and the like, there’s a potential security hole with any public server that allows changes. Hugo works on a completely different basis. Instead of creating and editing posts online, Hugo allows you to create a site using text files and a git repository, and then publishes static web pages, greatly reducing the attack surface. There are some costs (I think I’m going to go without public comments, at least for the moment), but I think it’s well worth it.

The migration has consisted of a number of steps. First I exported my site from wordpress.com, created a local instance, and imported my site. I did this because wordpress.com doesn’t allow you to use custom plugins unless you’re ready to pay large amounts of money. Second, I used the Wordpress to Hugo exporter plugin to export my site to Hugo. I had already configured a new Hugo site, so I only copied the content/posts directory across from my blog export.

Finally came the time-consuming process of checking each post and changing how pictures are embedded. I’ve followed the steps in Hugo’s image processing page to automatically generate smaller versions of my images to post in the blog, with a hyperlink to the full-size image, and I’ve also checked each url to make sure that it matches the url on the old site. I am currently about half-way through my posts, and it sure has been interesting to see some of the things I wrote about over the last nine years.

Once I’m finished updating my posts, I’m going to pay wordpress.com ($13 a year, I think it is) to redirect everything on my old site to this new one. Then it’s a matter of updating my site information using the major search engines’ webmaster tools, and the conversion should be done.

The source for this site is published on https://github.com/jdieter/jdieter.net