linux System table

学习linux kernel system table.

1.Symbols
In the context of programming, a symbol is the building block of a program: it is a variable name or a function name. It should be of no surprise that the kernel has symbols, just like the programs you write. The difference is, of course, that the kernel is a very complicated piece of coding and has many, many global symbols.

2.Kernel Symbol Table

The kernel doesn’t use symbol names like BytesRead(). It’s much happier knowing a variable or function name by the variable or function’s address, like c0343f20. Humans, on the other hand, do not appreciate addresses like c0343f20. We prefer to use symbol names like BytesRead(). Normally, this doesn’t present much of a problem. The kernel is mainly written in C, so the compiler/linker allows us to use symbol names when we code and allows the kernel to use addresses when it runs. Everyone is happy.

There are situations, however, where we need to know the address of a symbol (or the symbol for an address). This is done by a symbol table, and is very similar to how gdb can give you the function name from an address (or an address from a function name). A symbol table is a listing of all symbols along with their address. Here is an example of a symbol table:

   c03441a0 B dmi_broken
   c03441a4 B is_sony_vaio_laptop
   c03441c0 b dmi_ident
   c0344200 b pci_bios_present
   c0344204 b pirq_table
   c0344208 b pirq_router
   c034420c b pirq_router_dev
   c0344220 b ascii_buffer
   c0344224 b ascii_buf_bytes

You can see that the variable named dmi_broken is at the kernel address c03441a0.

3.System.map

There are 2 files that are used as a kernel symbol table:

System.map
/proc/kallsyms

Every time you compile a new kernel, the addresses of various symbol names are bound to change.

The difference between “/proc/kallsyms” and “System.map”:

All global symbols are listed in /proc/kallsyms，it is a “proc file” that is created on the fly when a kernel boots up. Actually, it’s not really a disk file; it’s a representation of kernel data which is given the illusion of being a disk file. If you don’t believe me, try finding the filesize of /proc/kallsyms. Therefore, it will always be correct for the kernel that is currently running.

However, System.map is an actual file on your filesystem. When you compile a new kernel, your old System.map has wrong symbol information. A new System.map is generated with each kernel compile and you need to replace the old copy with your new copy.

3.1.System.map

“System.map”. is a file (produced via nm) containing symbol names and addresses of the linux kernel binary, vmlinux.

Its primary use is in debugging. If a kernel “oops” message appears, the utility ksymoops can be used to decode the message into something useful for developers.

//System.map生成命令：
scripts/mksysmap:
$NM -n $1 | grep -v '\( [aNUw] \)\|\(__crc_\)\|\( \$[adt]\)\|\( .L\)' > $2

System.map:

01000000 A phys_startup_32                                                                                                 
c1000000 T _text
c10000d4 t default_entry
c1167000 B __bss_stop
c1177000 b .brk.pagetables
...

符号表类型如下所示：
linux System table

3.2./proc/kallsyms

/proc/kallsysms have symbols of dynamically loaded modules as well static code and system.map is symbol tables of only static code. /proc/kallsysms包含了内核中的函数符号(包括没有EXPORT_SYMBOL)、全局变量(用EXPORT_SYMBOL导出的全局变量)。

3.2.1.形成过程：

/scripts/kallsyms.c 生成System.map
/kernel/kallsyms.c 生成/proc/kallsyms
/scripts/kallsyms.c 解析vmlinux(.tmp_vmlinux)生成kallsyms.S(.tmp_kallsyms.S)，然后内核编译过程中将kallsyms.S(内核符号表)编入内核镜像uImage

内核启动后./kernel/kallsyms.c解析uImage形成/proc/kallsyms.

3.2.2.查看/proc/kallsyms

为了保护系统符号地址泄露, 内核提供接口/proc/sys/kernel/kptr_restrict进行保护，从而使除 root 用户外的普通用户不能直接查看符号地址。该值取值范围为0，1，2 。详见kptr_restrict for hiding kernel pointers.
linux System table

注意：kptr_restrict 对内核中很多地址和符号表的信息导出都有影响, 比如 /proc/modules 等.

代码详见：

kernel/kallsyms.c
lib/vsprintf.c
Documentation/sysctl/kernel.txt
Documentation/printk-formats.txt

对比System.map和kallsysms发现同名符号对应的地址不一样，如下所示：

cat System.map
ffffffff81d2b5f0 T x86_64_start_kernel
ffffffff81d2bb33 T start_kernel
ffffffff81d2ee1b T xen_start_kernel
cat /proc/kallsyms | grep start_kernel
ffffffffa4d2b5f0 T x86_64_start_kernel
ffffffffa4d2bb33 T start_kernel
ffffffffa4d2ee1b T xen_start_kernel

The Linux kernel can use ASLR (address space layout randomization). Actually, only the base address can be randomized at the moment.See the description of CONFIG_RANDOMIZE_BASE in arch/xxx/Kconfig in the kernel sources for details.

4.EXPORT_SYMBOL

EXPORT_SYMBOL is to expose the function/Variable to the others. Assume you developed a function in one of the kernel module and you want to use same function in some other module, in this case you have to expose this function. Linux kernel provided EXPORT_SYMBOL to do this. Details see the EXPORT_SYMBOL.

You can think of kernel symbols as visible at three different levels in the kernel source code:

“static”, and therefore visible only within their own source file
“external”, and therefore potentially visible to any other code built into the kernel itself
“exported”, and therefore visible and available to any loadable module.

The kernel use two macros to export symbols:

EXPORT_SYMBOL exports the symbol to any loadable module
EXPORT_SYMBOL_GPL exports the symbol only to GPL-licensed modules.

Except for examine the kernel code to find whether a symbol is exported, is there anyway to identify it more easily? The answer is sure! All exported entry have another symbol prefixed with _ksymab. e.g.

ffffffff81a4ef00 r __ksymtab_printk
ffffffff81a4eff0 r __ksymtab_jiffies

How to access non-exported symbol

For each symbol in the kernel, we have an entry in /proc/kallsyms, and we have addresses for all of them. Since we are in the kernel, we can see any bit we want to see! Just read from that address. Let’s take resume_file as an example. Source code comes first:

#include <linux/module.h>
#include <linux/kallsyms.h>
#include <linux/string.h>
  
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("Access non-exported symbols");
MODULE_AUTHOR("Stephen Zhang");
  
static int __init lkm_init(void)
{
    char *sym_name = "resume_file";
    unsigned long sym_addr = kallsyms_lookup_name(sym_name);
    char filename[256];
  
    strncpy(filename, (char *)sym_addr, 255);
  
    printk(KERN_INFO "[%s] %s (0x%lx): %s\n", __this_module.name, sym_name, sym_addr, filename);
  
    return 0;
}
  
static void __exit lkm_exit(void)
{
}
  
module_init(lkm_init);
module_exit(lkm_exit);

Here, instead of parsing /proc/kallsyms to find the a symbol’s address, we use kallsyms_lookup_name() to do it. Then, we just treat the address as char *, which is the type of resume_file, and read it using strncpy().
Let’s see what happens when we run:

sudo insmod lkm_hello.ko
dmesg | tail -n 1
[lkm_hello] resume_file (0xffffffff81c17140): /dev/sda6
grep resume_file /proc/kallsyms
ffffffff81c17140 d resume_file

Yeap! We did it! And we see the symbol address returned by kallsyms_lookup_name() is exactly the same as in /proc/kallsyms.

5.Oops

What is the most common bug in the Linux kernel? The segfault. Except here, the notion of a segfault is much more complicated and can be, as you can imagine, much more serious. When the kernel dereferences an invalid pointer, it’s not called a segfault – it’s called an “oops”. An oops indicates a kernel bug and should always be reported and fixed.

Note that an oops is not the same thing as a segfault. Your program (usually) cannot recover from a segfault. The kernel doesn’t necessarily have to be in an unstable state when an oops occurs. The Linux kernel is very robust; the oops may just kill the current process and leave the rest of the kernel in a good, solid state.

5.1.The difference between oops and panic
An oops is not a kernel panic. In a panic, the kernel cannot continue; the system grinds to a halt and must be restarted. An oops may cause a panic if a vital part of the system is destroyed. An oops in a device driver, for example, will almost never cause a panic.

When an oops occurs, the system will print out information that is relevent to debugging the problem, like the contents of all the CPU registers, and the location of page descriptor tables. In particular, the contents of the EIP (instruction pointer) is printed.

5.2.interpret oopses with System.map
The System.map is required when the address of a symbol name, or the symbol name of an address, is needed. It is especially useful for debugging kernel panics and kernel oopses. The kernel does the address-to-name translation itself when CONFIG_KALLSYMS is enabled so that tools like ksymoops are not required.

参考资料：
https://www.jianshu.com/p/289f10ccef2d