Jonathan Dieter

Today is the last day of Flock in Dresden, and it has been a really good Flock! On Thursday, I got the opportunity to build my first module for Fedora Modularity, and I just wanted to document the experience. As things go, it was quite easy, but there were a couple of things that tripped me up.

Please note that the tooling is still being worked on, so if you’re reading this much after it was published, some things will have probably changed.

Background

First, for those who haven’t been following, Modularity is a system that allows the user to select different streams of software in the same release. Traditionally in Fedora, there’s only one available version of any given package. There are situations where this can be limiting (say, if you need a particular version of a certain Python framework, or if you want to test drive the latest release of a package without upgrading your whole system to something unstable), and Modularity tries to fill this gap.

I thought this might be useful for LizardFS in Fedora. The current stable version of LizardFS is 3.12.0 (available in all active Fedora releases and in EPEL), but a release candidate for 3.13.0 has been published, and I thought it would be useful for those who actively want to test LizardFS to be able to install it.

On Thursday, I went to the Expert Helpdesk: Module Creation session and noticed that the room was fairly full. I was a bit concerned that I might not get the expert help that I needed, but that worry was put to rest when I found out that I was the only person in the room that hadn’t built a module before.

A certain Modularity developer (who shall remain unnamed to protect the guilty) volunteered me to go to the front and plug my laptop into the projector so the room full of experts could watch and advise me as I built my first module. No pressure at all.

I was advised that for my suggested use-case, I should just keep packaging the stable releases in Fedora as usual, but create a devel module stream for the unstable releases.

The process

The first step in building a module is to read the documentation. If you’re like me, you’ll read the first step, see that it talks about getting your package into Fedora, and quickly move to the next step. It turns out, though, that you do need to go back and read that section because you’ll need to create a new branch in dist-git for each module stream you are going to create.

Branching

One thing that isn’t (at the time of writing) documented is that, when requesting a branch, you must specify a service level, even though it seems that these aren’t going to be used. The date must end in 12-01 or 06-01, so I just made something up.

To request my branches, I ran:

fedpkg --module-name=lizardfs request-branch devel --sl rawhide:2020-12-01

Mohan approved my requests in record time (there are advantages to having everybody watching you), and I was ready to work with both dist-git, and the modulemd git repo. I pushed the release candidate to my new devel branch in dist-git, and then it was time to create the module definition.

Creating the module definition

Step two, creating the module definition, should have been straightforward, but there was talk about using fedmod to automate the process, so I installed it, ran into problems getting it working, and we decided it would be easier to just generate the modulemd file manually. I used the minimum template here as a starting point, saving it as lizardfs.yaml in the modulemd git repo. It was pretty simple to setup and the only slightly tricky thing to remember is that the license is the license of the module definition, not the package itself.

One thing I didn’t do, but really need to, is to setup some profiles for LizardFS. A profile is a group of packages that fit a use-case, and make it easier for the end-user to install the packages they need.

Building the module

Once I had pushed the modulemd, it was time to build! I pushed the build and waited… and waited… The way my module was defined, it was going to be built for every Fedora release that has modules (currently F28 (Server) and F29), so it took a while. Our time was running out, and we had a walking tour and scavenger hunt in Dresden (a brilliant idea, huge thanks to the organizers!), so that was the end of the workshop.

While on the scavenger hunt (our team placed second, thanks to some very competitive members of the team), I got a notification that the module build had failed. I found that I was missing a dependency, so I fixed that and rebuilt, but ran into a rather strange problem: it tried to redo the first build rather than doing a new one.

It turns out that, as documented, you need to push an empty commit to the modulemd git repo in order for it to build from the latest dist-git repo.

I had another failed build because lizardfs-3.13.0 doesn’t build against 32-bit architectures, so I wrote a small patch to fix that, pushed another empty commit to the modulemd git repo, and finally managed to build my very first module! Cue slow clap.

Thoughts

The module building process is actually quite simple and being able to build a module stream for all active Fedora releases simplifies the workload quite a bit. I’m seriously thinking about retiring LizardFS in Rawhide and only providing it via modules (with the stable stream being the default stream that you would get if you ran dnf install lizardfs-client).

The workflow is still a bit clunky, especially with having to manage both dist-git and the modulemd git repo, and I’d love to see that simplified, if possible, but it’s not nearly as difficult as I thought it might be.

A huge thanks to everyone (even Stephen) who put the time and effort into making modules work! I think they have a lot of potential in making the distribution far more relevant to developers and users alike. And a huge thanks to all the experts at the session. I know how hard it is to sit and watch someone make mistakes as they try to use something you’ve created, and you all were very patient with me.

Last year, I ran some benchmarks on the GlusterFS, CephFS and LizardFS distributed filesystems, with some interesting results. I had a request to redo the test after a LizardFS RC was released with a FUSE3 client, since it is supposed to give better small file performance.

I did have a request last time to include RozoFS, but, after a brief glance at the documentation, it looks like it requires a minimum of four servers, and I only had three available. I also looked at OrangeFS (originally PVFS2), but it doesn’t seem to provide replication, and, in preliminary testing, it was over ten times slower than the alternatives. NFS was tested and its results are included as a baseline.

I once again used compilebench, which was designed to emulate real-life disk usage by creating a kernel tree, reading all the files in the tree, simulating a compile of the tree, running make clean, and finally deleting the tree.

The test was much the same as last time, but with one important difference. Last time, the clients were running on the same machines that were running the servers. LizardFS benefited hugely from this as it has a “prefer local chunkserver” feature that will skip the network completely if there’s a copy on the local server. This time around, the clients were run on completely separate machines from the servers, which removed that advantage for LizardFS, but which I believe is a better reflection on how distributed filesystems are generally used.

I would like to quickly note that there was very little speed difference between LizardFS’s FUSE2 and FUSE3 clients. The numbers included are from the FUSE3 client, but they only differed by a few percentage points from the FUSE2 client.

A huge thank you to my former employer, the Lebanon Evangelical School for Boys and Girls, for allowing me to use their lab for my test. The test was run on nine machines, three running as servers and six running the clients. The three servers operated as distributed data servers with three replicas per file. Each client machine ran five clients, giving us a simulated 30 clients.

All of the data was stored on XFS partitions on SSDs for speed, except for CephFS, which used an LVM partition with Bluestore. After running the benchmarks with one distributed filesystem, it was shut down and its data deleted, so each distributed filesystem had the same disk space available to it.

The NFS server was setup to export its shares async (for speed). The LizardFS clients used the recommended mount options, while the other clients just used the defaults (the recommended small file options for GlusterFS caused the test to hang). CephFS was mounted using the kernel module rather than the FUSE filesystem.

Before running the 30 clients simultaneously, I ran the test ten times in a row on a single client, to get a single client baseline. So let’s look at this performance (click for the full-size chart):

So, apart from the simulated “make clean”, CephFS dominated these tests. It even managed to beat out NFS on everything except clean and delete, and delete was within a couple of seconds. LizardFS and GlusterFS were close in most of the tests with LizardFS taking a slight lead. GlusterFS, though, was much slower than the alternatives when it came to the delete test, which is consistent with last year’s test.

Next, let’s look at multiple-client performance. With these tests, I ran 30 clients simultaneously, and, for the first four tests, summed up their speeds to give me the total speed that the server was giving the clients. Because deletions were running simultaneously, I averaged the time for the final test.

Ok, just wow. If you’re reading and writing large numbers of small files, NFS is probably still going to be your best bet. It was over five times faster than the competition in writing and over twice as fast in reading. The compile process is where things started to change, with both CephFS and LizardFS beating NFS, and LizardFS took a huge lead in the clean test and delete test. Interestingly, it took just 50% longer for LizardFS to delete 30 clients’ files compared with a single client’s files.

After CephFS’s amazing performance in the single-client mode, I was looking forward to some incredible results, but it really didn’t scale as well as I had hoped, though it was still competitive with the other distributed filesystems. Once again, LizardFS has shown that when it comes to metadata operations, it’s really hard to beat, but its aggregate read and write performance were disappointing. And, once again, GlusterFS really struggled with the test. I wish it would have worked with the performance tuning for small files enabled, as we might have seen better results.

Jonathan Dieter Developer with a background in operations, a passion for open source, and a love of teaching
jonathan@dieter.ie

Building a Fedora module