Debug Linux Kernel Panics
[kernel
linux
qemu
gdb
debugging
software-development
]
In the Linux kernel, errors are usually divided into hard panics, such as Kernel panics, and soft panic, such as Oops. It’s important to analyze their dumps and understand the information reported.
Kernel panic
Kernel panics are actions taken by the operating systems when encoutering an internal fatal error from which is not possible to recover safely. This turns into the system to stop immediately to run in order to do not loose data or damage the system and to shutdown or reboot the system.
A kernel panic may occur as a result of a hardware failure or a software bug in the operating system thatis not capable of continue the operations after an error has occurred. If the system is in an unstable state, rather than risking security breaches and data corruption, the operating system stops in order to prevent further damage, which helps to facilitate diagnosis of the error and may restart automatically.
Oops
An Oops is a Linux kernel error that it may effect system reliability. In this case, the kernel finds something faulty or an exception in the kernel code, similarly to a segmentation fault of user space and throws the anomaly.
A Oops is a specific error the kernel encounters and dumps its message on the console including information such as:
- error description
- processor status and the code that was executing
- CPU register contents of when the fault occurred
The offending process or thread that triggered the Oops gets killed without releasing locks or cleaning up structure but in some cases the system can continue. Sometimes the system may not even resume its normal operations and the kernel has to stop running immediately proceding a kernel panic.
An Oops is a serious but non-fatal error that once occurred, the system cannot be trusted any further.
System.map
In Linux, the System.map file is a symbol table used by the kernel.
A symbol table is a look-up between symbol names and their addresses in memory. A symbol name may be the name of a variable or the name of a function.
The System.map is required when the address of a symbol name, or the symbol name of an address, is needed. It is especially useful for debugging kernel panics and kernel Oopses. The kernel does the address-to-name translation itself when CONFIG_KALLSYMS
is enabled so that tools like ksymoops are not required.
System.map is genereated at each kernel build, so addresses may change from one build to another one and it is necessary to have the System.map of the the same Linux kernel where kernel panics and Oopses have been reported.
How to Find Kernel panics and Oops
Conpile the kernel by editing the kernel configuration file .conf in the root of the tree, adding the CONFIG_DEBUG_INFO=y
flag for debug. Then, start the sysslog deamon syslogd
.
To check a Oops message you can either:
- search errors on Syzbot website within the “open” tabs
- or create your sample kernel module to trigger and test it. There are multiple guides online.
Let’s pick on of the latest open bug on Syzbot at the time of this writing: https://syzkaller.appspot.com/bug?extid=b93b65ee321c97861072
The report log the Oops error generates is the following:
netlink: 'syz.0.0': attribute type 61 has an invalid length.
loop0: detected capacity change from 0 to 32768
=======================================================
WARNING: The mand mount option has been deprecated and
and is ignored by this kernel. Remove the mand
option from the mount to silence this warning.
=======================================================
JBD2: Ignoring recovery information on journal
ocfs2: Mounting device (7,0) on (node local, slot 0) with ordered data mode.
loop0: detected capacity change from 32768 to 0
syz.0.0: attempt to access beyond end of device
loop0: rw=0, sector=17058, nr_sectors = 1 limit=0
(syz.0.0,5105,0):ocfs2_assign_bh:2416 ERROR: status = -5
(syz.0.0,5105,0):ocfs2_inode_lock_full_nested:2511 ERROR: status = -5
(syz.0.0,5105,0):ocfs2_prepare_inode_for_write:2262 ERROR: status = -5
(syz.0.0,5105,0):ocfs2_file_write_iter:2441 ERROR: status = -5
loop0: detected capacity change from 0 to 32767
OCFS2: ERROR (device loop0): int ocfs2_validate_inode_block(struct super_block *, struct buffer_head *): Invalid dinode #17058: signature = DE01
On-disk corruption discovered. Please run fsck.ocfs2 once the filesystem is unmounted.
OCFS2: File system is now read-only.
(syz.0.0,5105,0):ocfs2_assign_bh:2416 ERROR: status = -30
(syz.0.0,5105,0):ocfs2_inode_lock_full_nested:2511 ERROR: status = -30
(syz.0.0,5105,0):ocfs2_inode_lock_tracker:2695 ERROR: status = -30
(syz.0.0,5105,0):ocfs2_xattr_get:1335 ERROR: status = -30
(syz.0.0,5105,0):ocfs2_truncate_file:460 ERROR: bug expression: le64_to_cpu(fe->i_size) != i_size_read(inode)
(syz.0.0,5105,0):ocfs2_truncate_file:460 ERROR: Inode 17058, inode i_size = 0 != di i_size = 108086391056891904, i_flags = 0xeef82700
------------[ cut here ]------------
kernel BUG at fs/ocfs2/file.c:460!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 0 UID: 0 PID: 5105 Comm: syz.0.0 Not tainted 6.11.0-syzkaller-08068-g1ec6d097897a #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:ocfs2_truncate_file+0x1381/0x1560 fs/ocfs2/file.c:454
Code: 40 01 00 00 48 c7 c6 57 c7 11 8e ba cc 01 00 00 48 c7 c1 60 54 49 8c 4d 89 e8 4d 89 f9 50 41 56 e8 04 32 18 00 48 83 c4 10 90 <0f> 0b e8 c8 b6 37 08 f3 0f 1e fa 65 44 8b 3d 98 6b 7a 7c bf 07 00
RSP: 0018:ffffc90002ddf280 EFLAGS: 00010282
RAX: f6e02d0dd3845c00 RBX: ffff88801237542c RCX: f6e02d0dd3845c00
RDX: ffffc9000b6e1000 RSI: 0000000000004984 RDI: 0000000000004985
RBP: ffffc90002ddf4b0 R08: ffffffff81746dac R09: 1ffff11003f8519a
R10: dffffc0000000000 R11: ffffed1003f8519b R12: 1ffff1100246ea84
R13: 00000000000042a2 R14: 0180000000000000 R15: 0000000000000000
FS: 00007f29d21ff6c0(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f29d24b84b8 CR3: 0000000011ee4000 CR4: 0000000000350ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
ocfs2_setattr+0x1217/0x1f50 fs/ocfs2/file.c:1209
notify_change+0xbca/0xe90 fs/attr.c:503
do_truncate+0x220/0x310 fs/open.c:65
handle_truncate fs/namei.c:3395 [inline]
do_open fs/namei.c:3778 [inline]
path_openat+0x2e1e/0x3590 fs/namei.c:3933
do_filp_open+0x235/0x490 fs/namei.c:3960
do_sys_openat2+0x13e/0x1d0 fs/open.c:1415
do_sys_open fs/open.c:1430 [inline]
__do_sys_creat fs/open.c:1506 [inline]
__se_sys_creat fs/open.c:1500 [inline]
__x64_sys_creat+0x123/0x170 fs/open.c:1500
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f29d237def9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f29d21ff038 EFLAGS: 00000246 ORIG_RAX: 0000000000000055
RAX: ffffffffffffffda RBX: 00007f29d2535f80 RCX: 00007f29d237def9
RDX: 0000000000000000 RSI: 000000000000000a RDI: 0000000020000240
RBP: 00007f29d23f0b76 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007f29d2535f80 R15: 00007ffc3f5b0328
</TASK>.
Analyze the Oops dump
Now, we are going to navigate through the information reported in the crash.
First of all, the bug overview which explains brifely the type of bug and give an hint about what we are talging about and which can be the issue. In this case, probably, a netlink attribute requires exact length for some types.
netlink: 'syz.0.0': attribute type 61 has an invalid length.
loop0: detected capacity change from 0 to 32768
Then, the Oops error code value in hex.
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
Each bit has its meaning:
bit 0
== 0 means no page found, 1 means a protection faultbit 1
== 0 means read, 1 means writebit 2
== 0 means kernel, 1 means user-mode[#1]
— this value is the number of times the Oops occurred
Then, the line with the CPU information
CPU: 0 UID: 0 PID: 5105 Comm: syz.0.0 Not tainted 6.11.0-syzkaller-08068-g1ec6d097897a #0
that contains multiple info:
CPU: 0
indicates the number of CPU where the error occurredUID: 0
indicates thePID: 5105
indicates the process idComm
indicates the command that leads to the errorNot tainted
denotes the kernel was not tainted at the time of the event. If it was, it will print Tainted: and some following characters that indicates the reasons why the kernel got tainted earlier (such as a proprietary Module was loaded, a warning occurred, an externally-built module was loaded) picked up from thekernel/panic.c
file. The list of tainted states is available in the official documentation.
The RIP is the CPU register containing the address of the instruction that is getting executed, so the RIP pointer in this line indicates the instruction caused the crash:
RIP: 0010:ocfs2_truncate_file+0x1381/0x1560 fs/ocfs2/file.c:454
In particular, 0010 comes from the code segment register, ocfs2_truncate_file+0x1381/0x1560
is the <symbol> + the offset/length
, and fs/ocfs2/file.c:454
is associated filepath and line number.
Finally, the call trace with the list of functions called before the Oops occurred.
Debug the Oops dump
To debug a crash might be very useful to use GDB — The GNU Project Debugger in order to be able to disassemble a built kernel.
Follow this post to setup the GDB debugger to a specific vmlinux specified in the Syzbot report page.
After setting up GDB, take the RIP value (ocfs_truncate_file+0X1381/0x1560
) and disassemble the function with GDB:
(gdb) disassemble ocfs2_truncate_file
Dump of assembler code for function ocfs2_truncate_file:
0xffffffff838958a0 <+0>: endbr64
0xffffffff838958a4 <+4>: push %rbp
0xffffffff838958a5 <+5>: mov %rsp,%rbp
0xffffffff838958a8 <+8>: push %r15
0xffffffff838958aa <+10>: push %r14
...
End of assembler dump.
Now, sum the offset of the function with the offset of the offending code to find the right address that causes the error:
0xffffffff838958a0 + 0x1381 = 0xffffffff83896C21
0xffffffff838958a0 + 0x1560 = 0xffffffff83896e00
so scroll down in the disassembled code until you find that address and use the list function to point to the C code:
(gdb) list *0xffffffff83896C21
0xffffffff83896c21 is in ocfs2_truncate_file (fs/ocfs2/file.c:454).
449 in fs/ocfs2/file.c
This points to the point where the error is coming from.
Now, you can check on the kernel tree associated with the commit of the bug report the file fs/ocfs2/file.c
.
Other resources
Debugging Analysis of Kernel panics and Kernel oopses using System Map