KDump on Gentoo
I’ve been trying to learn a little more about kernel debugging (part of my quest to learn a little more about btrfs), and I figured I’d post a quick howto on getting kernel crash dumps captured on Gentoo.
Certainly a more elegant solution than this could be devised with more automation. It wouldn’t take much to setup, and turning it into a package might be nice. Maybe a project for the future if I manage to generate enough panics.
There are a lot of posts about kdump you’ll find on Google, but most are distro-specific and leave out how to actually make it work if it isn’t already configured. The best post I’ve found is this one, which this procedure is largely based on.
To keep things simple I’m going to just use a single kernel for the system and recovery, which creates just a few limitations on your kernel configuration.
- emerge kexec-tools – you won’t get anywhere without this.
- Check your kernel configuration for the following settings:
CONFIG_KEXEC=y(technically not needed, but what’s the point)
- Edit your grub.conf and add to your boot line crashkernel=64M for up to around 12GB of system RAM
- Create /etc/local.d/kdump.start containing (don’t forget to chmod it a+x):
kexec -p /[path-to-kernel] --append="root=[root-device] single irqpoll maxcpus=1 reset_devices"
That’s it. Note that your kernel has to be reachable, and the typical gentoo config leaves /boot unmounted, so you’ll either need to remove noauto from your fstab or place a copy of your kernel elsewhere.
I didn’t get this working with an initramfs – this is supposed to be possible but obviously the more complexity the trickier.
With these changes whenever you get a kernel panic or lockup (hard/soft if the kernel is set to detect them) the system will use kexec to run the kernel in crash mode, relocated to a reserved area of memory. The rest of RAM will be untouched. When the system boots up log in and copy /proc/vmcore to a file – this is your crash dump. Then reboot your system to get back to a normal configuration – you shouldn’t continue to operate in this state.
The obvious improvement to this is to create a script and run it with init= and have it copy the core file for you, then reboot…