KDump on Gentoo
I’ve been trying to learn a little more about kernel debugging (part of my quest to learn a little more about btrfs), and I figured I’d post a quick howto on getting kernel crash dumps captured on Gentoo.
Certainly a more elegant solution than this could be devised with more automation. It wouldn’t take much to setup, and turning it into a package might be nice. Maybe a project for the future if I manage to generate enough panics.
There are a lot of posts about kdump you’ll find on Google, but most are distro-specific and leave out how to actually make it work if it isn’t already configured. The best post I’ve found is this one, which this procedure is largely based on.
To keep things simple I’m going to just use a single kernel for the system and recovery, which creates just a few limitations on your kernel configuration.
- emerge kexec-tools – you won’t get anywhere without this.
- Check your kernel configuration for the following settings:
CONFIG_KEXEC=y
(technically not needed, but what’s the point)
CONFIG_SYSFS=y
CONFIG_DEBUG_INFO=Y
CONFIG_CRASH_DUMP=y
CONFIG_PROC_VMCORE=y
CONFIG_RELOCATABLE=y - Edit your grub.conf and add to your boot line crashkernel=64M for up to around 12GB of system RAM
- Create /etc/local.d/kdump.start containing (don’t forget to chmod it a+x):
#!/bin/bash
kexec -p /[path-to-kernel] --append="root=[root-device] single irqpoll maxcpus=1 reset_devices"
That’s it. Note that your kernel has to be reachable, and the typical gentoo config leaves /boot unmounted, so you’ll either need to remove noauto from your fstab or place a copy of your kernel elsewhere.
I didn’t get this working with an initramfs – this is supposed to be possible but obviously the more complexity the trickier.
With these changes whenever you get a kernel panic or lockup (hard/soft if the kernel is set to detect them) the system will use kexec to run the kernel in crash mode, relocated to a reserved area of memory. The rest of RAM will be untouched. When the system boots up log in and copy /proc/vmcore to a file – this is your crash dump. Then reboot your system to get back to a normal configuration – you shouldn’t continue to operate in this state.
The obvious improvement to this is to create a script and run it with init= and have it copy the core file for you, then reboot…
This article looks like an ideal candidate for the new and shiny official Gentoo Wiki: http://wiki.gentoo.org 🙂
eliasp
November 14, 2011 at 12:04 pm
Well, my wiki skills aren’t super-sharp, but I started a page at:
http://wiki.gentoo.org/wiki/KernelCrashDumps
The formatting clearly needs more cleanup, and if somebody beats me to it I’ll be sure to treat it as a learning opportunity…
rich0
November 14, 2011 at 2:49 pm
Thanks for the wiki cleanup – looks nice!
rich0
November 15, 2011 at 10:29 pm
doesnt work for me 😦
Przemysław Czerepaniak
October 30, 2012 at 6:57 am
good article, but how can i use kdump if my root is on lvm on raid ?, i here any quick solution without writing entire init script by hand ?
sss123next
March 12, 2013 at 3:05 am
Hmm, in that situation I’d probably use an initrd for your root, and a script to capture a dump and store it somewhere. I’m not sure I’d really want a recovery kernel messing with raid+lvm though there is no reason it couldn’t. You might try dumping the core to some other device. You can always boot to bash in an initramfs and then mount a usb flash drive or something like that manually.
An initramfs like dracut should let you build with full bash and have it drop to a shell. That isn’t what it was designed for, but it should work. Dracut is also modular and you can have it include anything you want when it is built (just be sure to grab dependencies or build static).
Just a few random thoughts…
rich0
March 12, 2013 at 7:46 am
Any tips on why it would load the rescue kernel without complaints, but not even try to start it when there’s a panic? I can load and start the kernel fine with the -l and -e options, but -p doesn’t seem to trigger…
tlhonmey
July 14, 2016 at 12:49 pm