如何在编译/链接时使用地址进行计算？答案

【问题标题】：How to do computations with addresses at compile/linking time?如何在编译/链接时使用地址进行计算？
【发布时间】：2015-07-11 19:20:13
【问题描述】：

我编写了一些代码来初始化IDT，它将 32 位地址存储在两个不相邻的 16 位半中。 IDT 可以存储在任何地方，您可以通过运行LIDT 指令告诉 CPU 的位置。

这是初始化表格的代码：

void idt_init(void) {
    /* Unfortunately, we can't write this as loops. The first option,
     * initializing the IDT with the addresses, here looping over it, and
     * reinitializing the descriptors didn't work because assigning a
     * a uintptr_t (from (uintptr_t) handler_func) to a descr (a.k.a.
     * uint64_t), according to the compiler, "isn't computable at load
     * time."
     * The second option, storing the addresses as a local array, simply is
     * inefficient (took 0.020ms more when profiling with the "time" command
     * line program!).
     * The third option, storing the addresses as a static local array,
     * consumes too much space (the array will probably never be used again
     * during the whole kernel runtime).
     * But IF my argument against the third option will be invalidated in
     * the future, THEN it's the best option I think. */

    /* Initialize descriptors of exception handlers. */
    idt[EX_DE_VEC] = idt_trap(ex_de);
    idt[EX_DB_VEC] = idt_trap(ex_db);
    idt[EX_NMI_VEC] = idt_trap(ex_nmi);
    idt[EX_BP_VEC] = idt_trap(ex_bp);
    idt[EX_OF_VEC] = idt_trap(ex_of);
    idt[EX_BR_VEC] = idt_trap(ex_br);
    idt[EX_UD_VEC] = idt_trap(ex_ud);
    idt[EX_NM_VEC] = idt_trap(ex_nm);
    idt[EX_DF_VEC] = idt_trap(ex_df);
    idt[9] = idt_trap(ex_res);  /* unused Coprocessor Segment Overrun */
    idt[EX_TS_VEC] = idt_trap(ex_ts);
    idt[EX_NP_VEC] = idt_trap(ex_np);
    idt[EX_SS_VEC] = idt_trap(ex_ss);
    idt[EX_GP_VEC] = idt_trap(ex_gp);
    idt[EX_PF_VEC] = idt_trap(ex_pf);
    idt[15] = idt_trap(ex_res);
    idt[EX_MF_VEC] = idt_trap(ex_mf);
    idt[EX_AC_VEC] = idt_trap(ex_ac);
    idt[EX_MC_VEC] = idt_trap(ex_mc);
    idt[EX_XM_VEC] = idt_trap(ex_xm);
    idt[EX_VE_VEC] = idt_trap(ex_ve);

    /* Initialize descriptors of reserved exceptions.
     * Thankfully we compile with -std=c11, so declarations within
     * for-loops are possible! */
    for (size_t i = 21; i < 32; ++i)
        idt[i] = idt_trap(ex_res);

    /* Initialize descriptors of hardware interrupt handlers (ISRs). */
    idt[INT_8253_VEC] = idt_int(int_8253);
    idt[INT_8042_VEC] = idt_int(int_8042);
    idt[INT_CASC_VEC] = idt_int(int_casc);
    idt[INT_SERIAL2_VEC] = idt_int(int_serial2);
    idt[INT_SERIAL1_VEC] = idt_int(int_serial1);
    idt[INT_PARALL2_VEC] = idt_int(int_parall2);
    idt[INT_FLOPPY_VEC] = idt_int(int_floppy);
    idt[INT_PARALL1_VEC] = idt_int(int_parall1);
    idt[INT_RTC_VEC] = idt_int(int_rtc);
    idt[INT_ACPI_VEC] = idt_int(int_acpi);
    idt[INT_OPEN2_VEC] = idt_int(int_open2);
    idt[INT_OPEN1_VEC] = idt_int(int_open1);
    idt[INT_MOUSE_VEC] = idt_int(int_mouse);
    idt[INT_FPU_VEC] = idt_int(int_fpu);
    idt[INT_PRIM_ATA_VEC] = idt_int(int_prim_ata);
    idt[INT_SEC_ATA_VEC] = idt_int(int_sec_ata);

    for (size_t i = 0x30; i < IDT_SIZE; ++i)
        idt[i] = idt_trap(ex_res);
}

宏idt_trap和idt_int，定义如下：

#define idt_entry(off, type, priv) \
    ((descr) (uintptr_t) (off) & 0xffff) | ((descr) (KERN_CODE & 0xff) << \
    0x10) | ((descr) ((type) & 0x0f) << 0x28) | ((descr) ((priv) & \
    0x03) << 0x2d) | (descr) 0x800000000000 | \
    ((descr) ((uintptr_t) (off) & 0xffff0000) << 0x30)

#define idt_int(off) idt_entry(off, 0x0e, 0x00)
#define idt_trap(off) idt_entry(off, 0x0f, 0x00)

idt 是uint64_t 的数组，因此这些宏被隐式转换为该类型。 uintptr_t 是保证能够将指针值保存为整数的类型，并且在 32 位系统上通常为 32 位宽。（64 位 IDT 有 16 字节条目；此代码用于 32 位）。

我收到警告，initializer element is not constant 由于正在播放地址修改。
绝对可以确定在链接时地址是已知的。
我可以做些什么来完成这项工作吗？ 使 idt 数组自动可以工作，但这需要整个内核在一个函数的上下文中运行，我认为这会很麻烦。

我可以在运行时通过一些额外的工作来完成这项工作（就像 Linux 0.01 一样），但让我烦恼的是，在链接时技术上可行的东西实际上是in可行的。

【问题讨论】：

这样的操作多久会给你一个可预测和有用/安全的结果，尤其是在编译时地址未知的情况下？
链接器会如何处理不是自然机器地址的分割地址？例如，在 32 位架构中，4 / 2 不是有效的对齐地址。
@cad：嗯，你可以用最少量的间接实现你的目标：加载地址，然后除以二。
我很抱歉很密集，但是在哪里编译时初始化到底发生了？代码似乎是在运行时在函数idt_init() 中初始化数组。没有？
@Ziffusion：是的，一开始我也很困惑。 OP 希望有一种方法来编写它来初始化全局，但正如所写的那样，它是在函数内部。

标签： c x86 linker interrupt osdev

【解决方案1】：

主要问题是函数地址是链接时常量，不是严格编译时常量。编译器不能只获取 32b 二进制整数并将其分成两个单独的部分粘贴到数据段中。相反，它必须使用目标文件格式向链接器指示它应该在链接完成时填写哪个符号的最终值（+偏移量）。常见的情况是作为指令的立即操作数、有效地址中的位移或数据段中的值。（但在所有这些情况下，它仍然只是填充 32 位绝对地址，因此所有 3 个都使用相同的 ELF 重定位类型。对于跳转/调用偏移的 relative 位移有不同的重定位。）

ELF 有可能被设计为存储符号引用，以便在链接时用地址的复杂函数替换（或至少像 MIPS 上的 lui $t0, %hi(symbol) / @987654326 的高/低半部分@ 从两个 16 位立即数构建地址常量）。但实际上唯一允许的函数是addition/subtraction，用于mov eax, [ext_symbol + 16]之类的东西。

当然，您的操作系统内核二进制文件有可能在构建时拥有一个具有完全解析地址的静态 IDT，因此您在运行时所需要做的就是执行一条 lidt 指令。然而，标准构建工具链是一个障碍。如果不对可执行文件进行后处理，您可能无法实现这一目标。

例如你可以这样写，在最终的二进制文件中生成一个带有完整填充的表，这样数据就可以就地打乱：

#include <stdint.h>

#define PACKED __attribute__((packed))

// Note, this is the 32-bit format.  64-bit is larger    
typedef union idt_entry {

    // we will postprocess the linker output to have this format
    // (or convert at runtime)
    struct PACKED runtime {   // from OSdev wiki
       uint16_t offset_1; // offset bits 0..15
       uint16_t selector; // a code segment selector in GDT or LDT
       uint8_t zero;      // unused, set to 0
       uint8_t type_attr; // type and attributes, see below
       uint16_t offset_2; // offset bits 16..31
    } rt;

    // linker output will be in this format
    struct PACKED compiletime {
       void *ptr; // offset bits 0..31
       uint8_t zero;
       uint8_t type_attr;
       uint16_t selector; // to be swapped with the high16 of ptr
    } ct;
} idt_entry;

// #define idt_ct_entry(off, type, priv) { .ptr = off, .type_attr = type, .selector = priv }
#define idt_ct_trap(off) { .ct = { .ptr = off, .type_attr = 0x0f, .selector = 0x00 } }
// generate an entry in compile-time format

extern void ex_de();  // these are the raw interrupt handlers, written in ASM
extern void ex_db();  // they have to save/restore *all* registers, and end with  iret, rather than the usual C ABI.

// it might be easier to use asm macros to create this static data, 
// just so it can be in the same file and you don't need cross-file prototypes / declarations
// (but all the same limitations about link-time constants apply)
static idt_entry idt[] = {
    idt_ct_trap(ex_de),
    idt_ct_trap(ex_db),
    // ...
};

// having this static probably takes less space than instructions to write it on the fly
// but not much more.  It would be easy to make a lidt function that took a struct pointer.
static const struct PACKED  idt_ptr {
  uint16_t len;  // encoded as bytes - 1, so 0xffff means 65536
  void *ptr;
} idt_ptr = { sizeof(idt) - 1, idt };


/****** functions *********/

// inline
void load_static_idt(void) {
  asm volatile ("lidt  %0"
               : // no outputs
               : "m" (idt_ptr));
  // memory operand, instead of writing the addressing mode ourself, allows a RIP-relative addressing mode in 64bit mode
  // also allows it to work with -masm=intel or not.
}

// Do this once at at run-time
// **OR** run this to pre-process the binary, after link time, as part of your build
void idt_convert_to_runtime(void) {
#ifdef DEBUG
  static char already_done = 0;  // make sure this only runs once
  if (already_done)
    error;
  already_done = 1;
#endif
  const int count = sizeof idt / sizeof idt[0];
  for (int i=0 ; i<count ; i++) {
    uint16_t tmp1 = idt[i].rt.selector;
    uint16_t tmp2 = idt[i].rt.offset_2;
    idt[i].rt.offset_2 = tmp1;
    idt[i].rt.selector = tmp2;
    // or do this swap in fewer insns with SSE or MMX pshufw, but using vector instructions before setting up the IDT may be insane.
  }
}

这确实编译。请参阅 Godbolt 编译器资源管理器上的 a diff of the -m32 and -m64 asm output。看数据部分的布局（注意.value是.short的同义词，是16位的。）（但注意IDT表格式对于64位模式是不同的。）

我认为我的大小计算正确 (bytes - 1)，如 http://wiki.osdev.org/Interrupt_Descriptor_Table 中所述。最小值100h 字节长（编码为0x99）。另见https://en.wikibooks.org/wiki/X86_Assembly/Global_Descriptor_Table。（lgdt size/pointer 的工作方式相同，尽管表格本身具有不同的格式。）

另一个选项，而不是将 IDT 静态放在数据部分中，而是将其放在 bss 部分中，并将数据作为立即常量存储在将其初始化的函数中（或在该函数读取的数组中）。

无论哪种方式，该函数（及其数据）都可以在 .init 部分中，您可以在完成后重新使用其内存。（Linux 这样做是为了在启动时从只需要一次的代码和数据中回收内存。）这将为您提供小二进制大小的最佳折衷（因为 32b 地址小于 64b IDT 条目），并且不会在代码上浪费运行时内存设置 IDT。在启动时运行一次的小循环的 CPU 时间可以忽略不计。（Godbolt 上的版本完全展开，因为我只有 2 个条目，并且它将地址作为 32 位立即数嵌入到每条指令中，即使使用 -Os。使用足够大的表（只需复制/粘贴复制一行）即使在-O3，你也会得到一个紧凑的循环。-Os 的阈值较低。）

如果没有内存重用 haxx，则可能是一个紧密循环来重写 64b 条目是要走的路。在构建时执行它会更好，但是您需要一个自定义工具来在内核二进制文件上运行转换。

将数据存储在立即数中理论上听起来不错，但每个条目的代码总计可能超过 64b，因为它无法循环。将地址一分为二的代码必须完全展开（或放置在函数中并调用）。即使您有一个循环来存储所有相同的多条目内容，每个指针也需要一个mov r32, imm32 来获取寄存器中的地址，然后是mov word [idt+i + 0], ax / shr eax, 16 / mov word [idt+i + 6], ax。那是很多机器代码字节。

【讨论】：

我更新了问题。请您再看一遍并根据我在问题中所做的更改更新您的答案吗？我开始了一笔小额赏金，所以... ;-)
@cad：也许这会有所帮助。
很好的解释，多种方法，一个非常甜蜜的union 解决方案（该死，我总是忘记它们）......拥有一切。
sizeof idt / sizeof idt[0] 不太对。这将防止将最大数量的描述符放入 IDT（如果它变得那么大）。为了避免这个问题，假设不存在真正的大小 0。长度应该减去 1。这也适用于由 LGDT 加载到 GDTR 中的结构。大小是大小减 1，因此可以表示最大值 65536。
@MichaelPetch：谢谢，将其更新为 max_index 而不是 len。

【解决方案2】：

一种方法是使用位于固定地址的中间跳转表。您可以使用此表中位置的地址初始化idt（这将是编译时间常数）。跳转表中的位置将包含指向实际isr 例程的jump 指令。

发送到isr 将是间接的，如下所示：

trap -> jump to intermediate address in the idt -> jump to isr

在固定地址创建跳转表的一种方法如下。

第一步：将跳转表放在一个section中

// this is a jump table at a fixed address
void jump(void) __attribute__((section(".si.idt")));

void jump(void) {
    asm("jmp isr0"); // can also be asm("call ...") depending on need
    asm("jmp isr1");
    asm("jmp isr2");
}

第 2 步：指示链接器将节定位在固定地址

SECTIONS
{
  .so.idt 0x600000 :
  {
    *(.si.idt)
  }
}

将其放在链接器脚本中就在.text 部分之后。这将确保表中的可执行代码将进入可执行内存区域。

您可以使用Makefile 中的--script 选项指示链接器使用您的脚本，如下所示。

LDFLAGS += -Wl,--script=my_script.lds

以下宏为您提供包含jump（或call）指令到相应isr 的位置的地址。

// initialize the idt at compile time with const values
// you can find a cleaner way to generate offsets
#define JUMP_ADDR(off)  ((char*)0x600000 + 4 + (off * 5))

然后您将使用修改后的宏按如下方式初始化idt。

// your real idt will be initialized as follows

#define idt_entry(addr, type, priv) \
    ( \
        ((descr) (uintptr_t) (addr) & 0xffff) | \
        ((descr) (KERN_CODE & 0xff) << 0x10) | \
        ((descr) ((type) & 0x0f) << 0x28) | \
        ((descr) ((priv) & 0x03) << 0x2d) | \
        ((descr) 0x1 << 0x2F) | \
        ((descr) ((uintptr_t) (addr) & 0xffff0000) << 0x30) \
    )

#define idt_int(off)    idt_entry(JUMP_ADDR(off), 0x0e, 0x00)
#define idt_trap(off)   idt_entry(JUMP_ADDR(off), 0x0f, 0x00)

descr idt[] =
{
    ...
    idt_trap(ex_de),
    ...
    idt_int(int_casc),
    ...
};

下面是一个演示工作示例，它显示了从固定地址的指令分派到具有非固定地址的isr。

#include <stdio.h>

// dummy isrs for demo
void isr0(void) {
    printf("==== isr0\n");
}

void isr1(void) {
    printf("==== isr1\n");
}

void isr2(void) {
    printf("==== isr2\n");
}

// this is a jump table at a fixed address
void jump(void) __attribute__((section(".si.idt")));

void jump(void) {
    asm("jmp isr0"); // can be asm("call ...")
    asm("jmp isr1");
    asm("jmp isr2");
}

// initialize the idt at compile time with const values
// you can find a cleaner way to generate offsets
#define JUMP_ADDR(off)  ((char*)0x600000 + 4 + (off * 5))

// dummy idt for demo
// see below for the real idt
char* idt[] =
{
    JUMP_ADDR(0),
    JUMP_ADDR(1),
    JUMP_ADDR(2),
};

int main(int argc, char* argv[]) {
    int trap;
    char* addr = idt[trap = argc - 1];
    printf("==== idt[%d]=%p\n", trap, addr);
    asm("jmp *%0\n" : :"m"(addr));
}

【讨论】：

好主意，+1。技术上适用，但我不喜欢这种额外的间接级别。而且不知道这个机制对以后的维护有没有好处。
在main 中，您可以将addr 声明为函数指针，然后取消引用它。尾调用优化应该把它变成间接跳转。（除非它不会因为main 必须返回 0 如果没有显式返回。）您至少可以使用"rm" 约束，以允许寄存器间接跳转。不过，有趣的想法是通过位于常量地址的跳转表添加一定程度的间接性，因此 IDT 条目可以真正成为编译时常量。