On binary delta algorithms

Yesterday I was reading this update (warning: still subscriber-only) on the Courgette delta algorithm that will be used by Google to push Chrome updates.

The article focused on potential patent problems with the Courgette method, but also described how the Courgette method works and also suggested that deltarpm (which I maintain in Fedora) could learn from it.

While not wanting to argue that particular point (I would love to get a hold of the code and see how much it would actually help our deltarpms), I do want to correct a couple of misconceptions about deltarpm.

Courgette, deltarpm and bsdiff (the tool) are all based on the bsdiff algorithm, which is very efficient. The problem comes when dealing with binaries. One small change in the source code will normally result in many small changes through the binary as the locations of pointers change.

(In the graphs provided, location two is a byte of data that has changed, while locations 3-6 and 21-24 represent two relative jump pointers to the same location. In the new binary, the location has changed by 74 bytes.)

The first graph is that of a naive delta using the bsdiff algorithm.

Note that the delta is four bytes of not-very-compressible data (out of 12). Not so great.

The second graph is roughly what courgette does with the same data. It does some disassembly to convert the relative pointers into labels. When applying the delta, courgette will disassemble the old file, apply the delta, and then reassemble the resulting not-quite-machine code into machine code.

Note that we’re down to a one-byte delta! Wonderful!

Now, the final example is how both the bsdiff tool and deltarpm work. They use the algorithm Colin Percival describes here which has an interesting twist when dealing with binaries.

When the locations change in a binary, they tend to do so in a consistent way. As deltarpm is doing its comparison, it adds an extra step. If less than half of the bytes in a segment have changed, deltarpm will do a byte-wise subtraction of the old binary’s bytes from the new binary’s bytes. This is then stored in an “add block”, which will mostly contain zeros.

While this does give us a block the size of the old binary, this block is very highly compressible (especially using bzip2).

While we may not be getting the compression levels that we might using courgette, we are doing far better than we would be using a naive bsdiff algorithm.

While courgette obviously has the advantage of size, it’s primary disadvantage is that it is very archicture-specific. The algorithm must know how to both assemble and disassemble binaries for each architecture it’s built for.

The advantage of the “add block” method is that it is generic to all architectures, but its disadvantage is that it won’t reach the same delta levels as courgette.

Having said that, I still haven’t had a chance to do any direct comparisons, which may tell us how much better courgette is than deltarpm.

(Updated: Tried to make it clear that bsdiff (the tool) doesn’t just use a naive bsdiff algorithm)

I hate virtual machines (was I hate NFS)

(Please note that you’ll probably want to read the previous post before this one)

So, I set up a new virtual machine running Fedora rather than CentOS 5.4 and migrated the services over to it. We did see an improvement, but just not enough. I went into the computer room during break, and several students had gray screens for Firefox and OpenOffice.org.

So I’ve switched us back over to the original configuration (running NFS off of the real servers). I have to admit that I’m quite curious as to what the load will be tomorrow when everyone logs in.

Thanks to those who commented on my last post. The general consensus seems to be that this just isn’t the best area to use a virtual machine.

EDIT:

We’ve been running the new system for a few days now and it’s much more responsive. Logins never take longer than 30 seconds, and none of the students are getting gray windows. Load during breaks now ranges from 7 to 20. I’d still love to see a much lower load, but at least we’re back to a reasonably fast system.