Where’s my blankey? (aka IPv6 on a flat network)

Familiarity is comforting

At the Lebanon Evangelical School – Loueizeh, we try to push the envelope when it comes to testing new technology.  Our website makes efficient use of some of the latest CSS technology from 1996, and we have just started experimenting with this new-fangled IPv6 technology, also from the mid 90’s.  Cue Macarena single…

<tl;dr> If you’re on a flat network and want to match MAC addresses to IPv6 addresses, ditch ISC’s DHCPv6 server, and instead use dhcpy6d </tl;dr>

After years of asking, our ISP has finally given us an IPv6 /48 prefix, so I finally have the privilege of setting up an IPv6 network to coexist with our IPv4 network.  Yay!  As an aside, I think we’re one of the first organizations in Lebanon to be working with IPv6.  A couple of years ago, I asked our old ISP about getting an IPv6 address and was told that they didn’t have any.  There were plenty of IPv4 addresses to go around, so why bother?  But I digress…

Our current IPv4 topology is… flat.  Very flat.  We use a single 10.x.y.z internal IP range, given out by two DHCP servers, where x is a fixed number, y is the category of the device and z is the individual address.  The categories allow us to divide devices into guest, staff and student groups.  We are very aware that MAC addresses can be easily spoofed, so these categories are primarily used for traffic shaping and nothing security related.  I am aware of (and would like to start working with) VLANs, but not all of our networking equipment supports them.

So our primary requirement for IPv6 is that we are able to set up corresponding categories to match our IPv4 categories (y in 10.x.y.z).  Another requirement is that we use some form of static IP addresses for logging purposes and that these static addresses be somehow tied to a device’s corresponding IPv4 address, mainly for convenience.

My first plan was to use SLAAC, which allows us to use 16 bits to specify categories (double what we have with our IPv4 addresses), while still allowing the client 64 bits to use in coming up with its own address.  Unfortunately, SLAAC falls down for us on both categories and static addresses.  To implement categories properly, we’d need to divide our network into VLANs, and we still have a few unmanaged switches that don’t know how to handle them.  And, while some devices just use their MAC addresses to generate their unique address, others (especially the more modern mobile OS’s which automatically implement RFC4941) will generate random unique addresses, which causes problems if we want to track who’s doing what.

Plan B was to use DHCPv6, which isn’t that much of a stretch, seeing as we’re already using DHCP on our IPv4 network.  Granted, since Android doesn’t work with DHCPv6 (really, Google?), that means all of the Android devices on our network will be stuck on IPv4, but, since we’re still in the experimental stage, I’m ok with that.  With DHCPv6 I can choose which address each device gets, allowing me to specify both categories and static (randomly generated) IP addresses.  So I started putting together an ISC DHCPv6 server, looked at the examples, and… “Where the heck am I supposed to put the MAC address?”

You see, in their infinite wisdom, the IETF decided that MAC addresses were no longer sufficient to tell the DHCPv6 server what IP address we should get.  No, now we get to use these new things called DUIDs which last forever like MAC addresses, but stay the same for all interfaces on a system, unlike MAC addresses.  Woo-hoo!  There are some small exceptions, especially with a computer or phone as opposed to an embedded device.  The DUID will not be automatically be the same if you dual boot.  Or if you wipe the OS.  Or, if like us, you make extensive use of netbooting for system maintenance.  But other than that, DUIDs will last forever.

The best part is that there are multiple ways of generating a DUID (including straight from the MAC address), but the server doesn’t have any say in telling the clients how it wants its DUIDs generated.  That means that clients like NetworkManager will quite happily generate a DUID based on /etc/machine-id, which is itself randomly generated the first time a system is booted.  Not very useful if you want to pass the same IP address whether a system is being netbooted into maintenance mode or booting normally off of the hard drive.

Now, all this wouldn’t be insurmountable, except for the fact that I currently have a database of MAC addresses matched with IPv4 addresses (complete with categories), but, because DUIDs don’t necessarily have anything to do with the MAC address, I have no idea how to match that same device to the IPv6 address I want to give it.

Apparently ISC’s DHCPv6 server has implemented a relatively new specification that allows link-local addresses to be sent to the DHCPv6 server through a relay, but our network is already flat, so I shouldn’t need to screw around with a relay.

There’s also some ambiguity about the Confirm message in DHCPv6.  The specific section says:

When the server receives a Confirm message, the server determines whether the addresses in the Confirm message are appropriate for the link to which the client is attached. If all of the addresses in the Confirm message pass this test, the server returns a status of Success. If any of the addresses do not pass this test, the server returns a status of NotOnLink.

The question is, if I’ve changed the static IP address for a client, is the old address still appropriate for the link?  As the sysadmin, my answer is, “No, please discontinue use of the old address immediately.”  Unfortunately, the ISC DHCPv6 server disagrees with me and will happily confirm the old addresses until the cows come home.

After a bit of searching, I’ve found a solution that I’m actually quite happy with.  The Leibniz Institute for Solid State and Materials Research in Dresden has released a DHCPv6 server called dhcpy6d written in Python that allows you to match based on MAC address, DUID or a combination of both.  There have been a few bugs in it, and it’s not yet handling Confirm messages the way I’d like it to, but upstream has been very responsive, and I’m looking forward to having an IPv6 system that works for us.

We’re not quite up and running 100% yet, but I hope to be there by the second week of November.  Of course, I had hoped to get some work done during this long weekend, but God had other plans (at least, judging by the lightning that destroyed quite a bit of my networking equipment at home just before school let out for the weekend).

Solving the mystery of the disappearing bluetooth device

This is a true[1] story

One of the features my laptop comes with is Bluetooth, which I’ve found to be quite handy considering all the highly important uses I have for Bluetooth (using Bluetooth tethering on my phone when traveling, controlling my presentations with my phone, using a Wii-mote for playing SuperTuxKart portable Bluetooth controller with built-in accelerometer to analyze the consistency of the matrices used when rendering three-dimensional objects onto a two-dimensional field).

About three months ago, I started to run into problems. Not the easy kind of problem where “BUG: unable to handle kernel paging request at 0000ffffd15ea5e” brings the laptop to an abrupt stop, but instead the kind of problem that causes real trouble.

My Bluetooth module starts to randomly reset itself. I’ll be working merrily, trying to connect my phone or the… portable Bluetooth controller… and, halfway through the process, it will hang. Kernel logs show that the Bluetooth module has been unplugged from the USB bus and then reconnected. Which, when you think about it, makes a whole lot of sense, given that the Bluetooth module is built into the WiFi card which is screwed onto the motherboard.

When faced with kernel logs that boggle the mind, the most logical thing to do is downgrade the kernel. I know that I was able to successfully… analyze the matrices used for, oh, whatever it was… back at the beginning of June, which means I had working Bluetooth on June 1. Let’s see what kernel was latest then, download and install it, boot from it, and…

kernel: usb 8-4: USB disconnect, device number 3
kernel: usb 8-4: new full-speed USB device number 4 using ohci-pci

#$@&%*!

Ok, the hardware must be dying.  Stupid Atheros card.  No idea why it’s just the Bluetooth and not the WiFi as well, but we’re in Ireland and I’m on eBay, so I’ll just order another one.  Made by a different company.  A week later, a slightly used Ralink combo card shows up. I plug it in, fire her up, and…

kernel: usb 8-4: USB disconnect, device number 3
kernel: ohci-pci 0000:00:13.0: HC died; cleaning up
kernel: ohci-pci 0000:00:13.0: frame counter not updating; disabled

Double #$@&%*! Now the Bluetooth module is completely gone and the only way to get it back is to reboot. Grrrrr.

At this point I’ve got a hammer in my hand, my laptop in front of me, and the only thing keeping me from submitting a video for a new OnePlus One is my wife warning me that we’re not going to be buying me a new laptop any time this decade.

So I take a deep breath, calmly return the hammer to the toolbox (no, dear, I have no idea how that dent got on the toolbox), and decide to instead go down the road less traveled. I open up Fedora’s bugzilla and start preparing my bug report, taking special care to only use words that I’d be willing to say in front of my children. “…so the Bluetooth module keeps getting disconnected. It’s almost like the USB bus is cutting its power for some stupid…”

Wait a minute! Just before we traveled to Ireland, I remember experimenting with PowerTOP. And PowerTOP has this cool feature that allows you to automatically enable all power saving options on boot. And I might have enabled it. So I check, and, yes I have turned on autosuspend for my Bluetooth module. I turn it off, try to connect my… portable Bluetooth controller… and it works, first time. I do some… matrix analysis… with it and everything continues to work perfectly.

So I am an idiot. I close the page with the half-finished bug report and go to admit to my wife that I just wasted €20 on a WiFi card that I didn’t really need.  And, uh, if any Atheros or Ralink people read this, well, I’m sorry for any negative thoughts I may have had about your WiFi cards.

[1] Well, mostly true, anyway. Some of the details might be mildly exaggerated.