Professional Documents
Culture Documents
More about the idtentry macro you can read in the third part of the do_error_trap(regs, error_code, str, trapnr, signr); \
}
https://0xax.gitbooks.io/linux-insides/content/Interrupts/linux-interrupts-3.html chapter.
Ok, now we saw the preparation before an exception handler will be executed and now
time to look on the handlers. First of all let's look on the following handlers: Note on the ## tokens. This is special feature - GCC macro Concatenation which
concatenates two given strings. For example, first DO_ERROR in our example will expands to
divide_error the:
overflow
invalid_op dotraplinkage void do_divide_error(struct pt_regs *regs, long error_code)
coprocessor_segment_overrun {
...
invalid_TSS }
segment_not_present
stack_segment
alignment_check We can see that all functions which are generated by the DO_ERROR macro just make a call
to the do_error_trap function from the arch/x86/kernel/traps.c. Let's look on
All these handlers defined in the arch/x86/kernel/traps.c source code file with the implementation of the do_error_trap function.
DO_ERROR macro:
Trap handlers
DO_ERROR(X86_TRAP_DE, SIGFPE, "divide error", divide_error
DO_ERROR(X86_TRAP_OF, SIGSEGV, "overflow", overflow)
DO_ERROR(X86_TRAP_UD, SIGILL, "invalid opcode", invalid_op) The do_error_trap function starts and ends from the two following functions:
DO_ERROR(X86_TRAP_OLD_MF, SIGFPE, "coprocessor segment overrun", coprocessor_
DO_ERROR(X86_TRAP_TS, SIGSEGV, "invalid TSS", invalid_TSS) enum ctx_state prev_state = exception_enter();
DO_ERROR(X86_TRAP_NP, SIGBUS, "segment not present", segment_not_ ...
DO_ERROR(X86_TRAP_SS, SIGBUS, "stack segment", stack_segmen
...
DO_ERROR(X86_TRAP_AC, SIGBUS, "alignment check", alignment_ch
...
exception_exit(prev_state);
As we can see the DO_ERROR macro takes 4 parameters: from the include/linux/context_tracking.h. The context tracking in the Linux kernel
subsystem which provide kernel boundaries probes to keep track of the transitions
Vector number of an interrupt;
between level contexts with two basic initial contexts: user or kernel . The
Signal number which will be sent to the interrupted process;
exception_enter function checks that context tracking is enabled. After this if it is enabled,
String which describes an exception; the exception_enter reads previous context and compares it with the CONTEXT_KERNEL . If
Exception handler entry point. the previous context is user , we call context_tracking_exit function from the
kernel/context_tracking.c which inform the context tracking subsystem that a processor is
This macro defined in the same source code file and expands to the function with the
exiting user mode and entering the kernel mode:
do_handler name:
if (!context_tracking_is_enabled())
#define DO_ERROR(trapnr, signr, str, name) \ return 0;
dotraplinkage void do_##name(struct pt_regs *regs, long error_code) \
{ \ prev_ctx = this_cpu_read(context_tracking.state);
if (prev_ctx != CONTEXT_KERNEL)
context_tracking_exit(prev_ctx); First of all it calls the notify_die function which defined in the kernel/notifier.c. To get
notified for kernel panic, kernel oops, Non-Maskable Interrupt or other events the caller
return prev_ctx;
needs to insert itself in the notify_die chain and the notify_die function does it. The
Linux kernel has special mechanism that allows kernel to ask when something happens and
If previous context is non user , we just return it. The pre_ctx has enum ctx_state type this mechanism called notifiers or notifier chains . This mechanism used for example
which defined in the include/linux/context_tracking_state.h and looks as: for the USB hotplug events (look on the drivers/usb/core/notify.c), for the memory
hotplug (look on the include/linux/memory.h, the hotplug_memory_notifier macro, etc...),
enum ctx_state { system reboots, etc. A notifier chain is thus a simple, singly-linked list. When a Linux kernel
CONTEXT_KERNEL = 0, subsystem wants to be notified of specific events, it fills out a special notifier_block
CONTEXT_USER,
structure and passes it to the notifier_chain_register function. An event can be sent
CONTEXT_GUEST,
} state; with the call of the notifier_call_chain function. First of all the notify_die function fills
die_args structure with the trap number, trap string, registers and other values:
The atomic_notifier_call_chain function calls each function in a notifier chain in turn and
returns the value of the last notifier function called. If the notify_die in the
do_error_trap does not return NOTIFY_STOP we execute conditional_sti function from
the arch/x86/kernel/traps.c that checks the value of the interrupt flag and enables interrupt
depends on it:
The die function defined in the arch/x86/kernel/dumpstack.c source code file, prints
static inline void conditional_sti(struct pt_regs *regs)
{
useful information about stack, registers, kernel modules and caused kernel oops. If we
if (regs->flags & X86_EFLAGS_IF) came from the userspace the do_trap_no_signal function will return -1 and the
local_irq_enable(); execution of the do_trap function will continue. If we passed through the
} do_trap_no_signal function and did not exit from the do_trap after this, it means that
previous context was - user . Most exceptions caused by the processor are interpreted by
more about local_irq_enable macro you can read in the second part of this chapter. The Linux as error conditions, for example division by zero, invalid opcode, etc. When an
next and last call in the do_error_trap is the do_trap function. First of all the do_trap exception occurs the Linux kernel sends a signal to the interrupted process that caused the
function defined the tsk variable which has task_struct type and represents the current exception to notify it of an incorrect condition. So, in the do_trap function we need to
interrupted process. After the definition of the tsk , we can see the call of the send a signal with the given number ( SIGFPE for the divide error, SIGILL for a illegal
do_trap_no_signal function: instruction, etc.). First of all we save error code and vector number in the current interrupts
process with the filling thread.error_code and thread_trap_nr :
struct task_struct *tsk = current;
tsk->thread.error_code = error_code;
if (!do_trap_no_signal(tsk, trapnr, str, regs, error_code)) tsk->thread.trap_nr = trapnr;
return;
After this we make a check do we need to print information about unhandled signals for
The do_trap_no_signal function makes two checks: the interrupted process. We check that show_unhandled_signals variable is set, that
unhandled_signal function from the kernel/signal.c will return unhandled signal(s) and
Did we come from the Virtual 8086 mode;
printk rate limit:
Did we come from the kernelspace.
#ifdef CONFIG_X86_64
if (v8086_mode(regs)) { if (show_unhandled_signals && unhandled_signal(tsk, signr) &&
... printk_ratelimit()) {
} pr_info("%s[%d] trap %s ip:%lx sp:%lx error:%lx",
tsk->comm, tsk->pid, str,
if (!user_mode(regs)) { regs->ip, regs->sp, error_code);
... print_vma_addr(" in ", regs->ip);
} pr_cont("\n");
}
return -1; #endif
We will not consider first case because the long mode does not support the Virtual 8086 And send a given signal to interrupted process:
mode. In the second case we invoke fixup_exception function which will try to recover a
fault and die if we can't:
force_sig_info(signr, info ?: SEND_SIG_PRIV, tsk);
if (!fixup_exception(regs)) {
tsk->thread.error_code = error_code; This is the end of the do_trap . We just saw generic implementation for eight different
tsk->thread.trap_nr = trapnr; exceptions which are defined with the DO_ERROR macro. Now let's look at other exception
die(str, regs, error_code); handlers.
}
Double fault And after this we fill the interrupted process with the vector number of the Double fault
exception and error code as we did it in the previous handlers:
The next exception is #DF or Double fault . This exception occurs when the processor
detected a second exception while calling an exception handler for a prior exception. We tsk->thread.error_code = error_code;
set the trap gate for this exception in the previous part: tsk->thread.trap_nr = X86_TRAP_DF;
set_intr_gate_ist(X86_TRAP_DF, &double_fault, DOUBLEFAULT_STACK); Next we print useful information about the double fault (PID number, registers content):
#ifdef CONFIG_DOUBLEFAULT
Note that this exception runs on the DOUBLEFAULT_STACK Interrupt Stack Table which has
df_debug(regs, error_code);
index - 1 :
#endif
#define DOUBLEFAULT_STACK 1
And die:
The double_fault is handler for this exception and defined in the arch/x86/kernel/traps.c.
for (;;)
The double_fault handler starts from the definition of two variables: string that describes die(str, regs, error_code);
exception and interrupted process, as other exception handlers:
That's all.
static const char str[] = "double fault";
struct task_struct *tsk = current;
Device not available exception handler
The handler of the double fault exception split on two parts. The first part is the check
The next exception is the #NM or Device not available . The Device not available
which checks that a fault is a non-IST fault on the espfix64 stack. Actually the iret
exception can occur depending on these things:
instruction restores only the bottom 16 bits when returning to a 16 bit segment. The
espfix feature solves this problem. So if the non-IST fault on the espfix64 stack we The processor executed an x87 FPU floating-point instruction while the EM flag in
modify the stack to make it look like General Protection Fault : control register cr0 was set;
The processor executed a wait or fwait instruction while the MP and TS flags of
struct pt_regs *normal_regs = task_pt_regs(current); register cr0 were set;
...
exception_exit(prev_state);
General protection fault exception handler
The next exception is the #GP or General protection fault . This exception occurs when
In the next step we check that FPU is not eager: the processor detected one of a class of protection violations called general-protection
violations . It can be:
BUG_ON(use_eager_fpu());
Exceeding the segment limit when accessing the cs , ds , es , fs or gs segments;
Loading the ss , ds , es , fs or gs register with a segment selector for a system
When we switch into a task or interrupt we may avoid loading the FPU state. If a task will
segment.;
use it, we catch Device not Available exception exception. If we loading the FPU state
during task switching, the FPU is eager. In the next step we check cr0 control register on Violating any of the privilege rules;
the EM flag which can show us is x87 floating point unit present (flag clear) or not (flag and other...
set):
The exception handler for this exception is the do_general_protection from the
arch/x86/kernel/traps.c. The do_general_protection function starts and ends as other
#ifdef CONFIG_MATH_EMULATION
exception handlers from the getting of the previous context:
if (read_cr0() & X86_CR0_EM) {
struct math_emu_info info = { };
prev_state = exception_enter();
conditional_sti(regs); ...
exception_exit(prev_state);
info.regs = regs;
math_emulate(&info);
exception_exit(prev_state); After this we enable interrupts if they were disabled and check that we came from the
return; Virtual 8086 mode:
}
#endif
conditional_sti(regs);
If the x87 floating point unit not presented, we enable interrupts with the if (v8086_mode(regs)) {
conditional_sti , fill the math_emu_info (defined in the local_irq_enable();
handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code);
arch/x86/include/asm/math_emu.h) structure with the registers of an interrupt task and call
goto exit;
math_emulate function from the arch/x86/math-emu/fpu_entry.c. As you can understand
}
from function's name, it emulates X87 FPU unit (more about the x87 we will know in the
special chapter). In other way, if X86_CR0_EM flag is clear which means that x87 FPU unit is
As long mode does not support this mode, we will not consider exception handling for this
presented, we call the fpu__restore function from the arch/x86/kernel/fpu/core.c which
case. In the next step check that previous mode was kernel mode and try to fix the trap. If
copies the FPU registers from the fpustate to the live hardware registers. After this FPU
we can't fix the current general protection fault exception we fill the interrupted process
instructions can be used:
with the vector number and error code of the exception and add it to the notify_die
chain:
fpu__restore(¤t->thread.fpu);
if (!user_mode(regs)) {
if (fixup_exception(regs))
goto exit;
tsk->thread.error_code = error_code;
tsk->thread.trap_nr = X86_TRAP_GP;
Links
if (notify_die(DIE_GPF, "general protection fault", regs, error_code,
X86_TRAP_GP, SIGSEGV) != NOTIFY_STOP) Interrupt descriptor Table
die("general protection fault", regs, error_code);
iret instruction
goto exit;
} GCC macro Concatenation
kernel panic
kernel oops
If we can fix exception we go to the exit label which exits from exception state: Non-Maskable Interrupt
hotplug
exit:
interrupt flag
exception_exit(prev_state);
long mode
signal
If we came from user mode we send SIGSEGV signal to the interrupted process from user
mode as we did it in the do_trap function: printk
coprocessor
That's all.
Conclusion
It is the end of the fifth part of the Interrupts and Interrupt Handling chapter and we saw
implementation of some interrupt handlers in this part. In the next part we will continue to
dive into interrupt and exception handlers and will see handler for the Non-Maskable
Interrupts, handling of the math coprocessor and SIMD coprocessor exceptions and many
many more.
Please note that English is not my first language, And I am really sorry for any
inconvenience. If you find any mistakes please send me PR to linux-insides.