Professional Documents
Culture Documents
Initialization (1)
Taku Shimosawa
Agenda
• Initialization Phase of the Linux Kernel
• Turning on the paging feature
• Calling *init functions
• And miscellaneous things related to initialization
3
1. vmlinux
This is the linux kernel
4
vmlinux
• Main kernel binary
• Runs with the final CPU state
• Protected Mode in x86_32 (i386)
• Long Mode in x86_64
• And so on…
• Runs in the virtual memory space
• Above PAGE_OFFSET (default: 0xc0000000) (32-bit)
• Above __START_KERNEL_map (default: 0xff…f80000000)
• i.e. All the absolute addresses in the binary are virtual ones
• Entry points
Architecture Name Location Name (secondary)
x86_32 startup_32 arch/x86/kernel/head_32.S startup_32_smp
x86_64 startup_64 arch/x86/kernel/head_64.S secondary_startup_64
ARM stext arch/arm/kernel/head[_nommu].S secondary_startup
ARM64 stext arch/arm64/kenel/head.S secondary_holding_pen
secondary_entry
PPC _stext arch/powerpc/kernel/head_32.S* (__secondary_start)
5
0xFFFFFFFF
Up to ~896 MB LOWMEM
PAGE_OFFSET
(0xC0000000) PAGE_OFFSET
(0xFFFF8800
00000000)
0x00000000 0x0000000000000000
i386 Virtual Physical x86_64 Virtual
6
Initialization Overview
arch/*/boot/
Booting Code
(Preparing CPU states, Gathering HW information, Decompressing vmlinux etc.)
arch/*/kernel/head*.S, head*.c vmlinux
Low-level Initialization
(Switching to virtual memory world, Getting prepared for C programs)
init/main.c (rest_init)
Creating the “init” process, and letting it the rest “init” (PID=1)
initialization
(Setting up multiprocessing, scheduling) init/main.c (kernel_init)
kernel/sched/idle.c (cpu_idle_loop)
Performing final initialization
“Swapper” (PID=0) now sleeps and
“Exec”ing the “init” user
10
2. Towards Virtual
Memory
11
Enabling paging
• The early part is executed with paging off.
• Physical address space
• vmlinux is assumed to be executed with paging on.
• The addresses in the binary are not physical addresses.
• The first big job in vmlinux is enabling paging
• Creating a (transitional) page table
• Setting the CPU to use the page table, and to enable
paging
• Jumping to the entry point in C (compiled in the virtual
address space)
12
Identity Map
• At first, the goal page table cannot be used
• Since changing PC and enabling paging are (at least, in
x86) separate instructions.
Enable
Paging
PC Page Fault!
Physical Virtual Physical Virtual
13
Identity Map
• Therefore, identity map is created in addition to the
(goal) map.
Jump
PC
Physical Virtual
(1) Create an initial page table (2) Enable paging, and (3) Zap the low
Jump to a virtual address. mapping
14
3. Initialization
At last, we have come here!
17
Initialization (start_kernel)
• A lot of *_init functions!
• Furthermore, some init functions call another init
functions.
• At least, 80 functions are called in this function.
• This slide will pick up some topics from the
initialization functions
18
Special directives
• What are these?
asmlinkage __visible void __init start_kernel(void) {
…
}
• “I’m curious!”.
20
asmlinkage
• asmlinkage
• Ensures the symbol is not mangled
• (in x86_32) Ensures all the parameters are passed by the
stack
#ifdef __cplusplus
#define CPP_ASMLINKAGE extern "C"
#else
#define CPP_ASMLINKAGE
#endif
#ifndef asmlinkage
#define asmlinkage CPP_ASMLINKAGE
#endif
include/linux/linkage.h
#ifdef CONFIG_X86_32
#define asmlinkage CPP_ASMLINKAGE __attribute__((regparm(0)))
arch/x86/include/asm/linkage.h
21
__visible
• (Effective in gcc >=4.6)
#if GCC_VERSION >= 40600
/*
* Tell the optimizer that something else uses this function or
variable.
*/
#define __visible __attribute__((externally_visible))
#endif
include/linux/compiler-gcc4.h
commit 9a858dc7cebce01a7bb616bebb85087fa2b40871
author Andi Kleen <ak@linux.intel.com> Mon Sep 17 21:09:15 2012
committer Linus Torvalds <torvalds@linux-foundation.org> Mon Sep 17 22:00:38 2012
gcc 4.6+ has support for a externally_visible attribute that prevents the
optimizer from optimizing unused symbols away. Add a __visible macro to
use it with that compiler version or later.
__init (1)
• To mark code(text) and data as only necessary
during initialization
#define __init __section(.init.text) __cold notrace
#define __initdata __section(.init.data)
#define __initconst __constsection(.init.rodata)
#define __exitdata __section(.exit.data)
#define __exit_call __used __section(.exitcall.exit)
(include/linux/init.h)
#ifndef __cold
#define __cold __attribute__((__cold__))
#endif
(include/linux/compiler-gcc4.h)
#ifndef __section
# define __section(S) __attribute__ ((__section__(#S)))
#endif
...
#define notrace __attribute__((no_instrument_function))
(include/linux/compiler.h)
23
__init (2)
• The init* sections are concentrated to a contiguous memory area
. = ALIGN(PAGE_SIZE);
.init.begin : AT(ADDR(.init.begin) - LOAD_OFFSET) {
__init_begin = .; /* paired with __init_end */
}
...
INIT_TEXT_SECTION(PAGE_SIZE) __init_begin
#ifdef CONFIG_X86_64 init.text
:init init.data
#endif __init_end …
INIT_DATA_SECTION(16)
....
. = ALIGN(PAGE_SIZE);
...
.init.end : AT(ADDR(.init.end) - LOAD_OFFSET) {
__init_end = .;
}
arch/x86/kernel/vmlinux.lds.S
24
__init (3)
• And, they are discarded (free’d) after initialization
• Called from kernel_init
void free_initmem(void)
{
free_init_pages("unused kernel",
(unsigned long)(&__init_begin),
(unsigned long)(&__init_end));
}
arch/x86/mm/init.c
void free_initmem(void)
{
...
poison_init_mem(__init_begin, __init_end - __init_begin);
if (!machine_is_integrator() && !machine_is_cintegrator())
free_initmem_default(-1);
}
arch/arm/mm/init.c
25
head32.c, head64.c
• Before calling start_kernel, i386_start_kernel or
x86_64_start_kernel is called in x86
• Located in arch/x86/kernel/head{32,64}.c
• No underscore between head and 32!
• x86 (32-bit)
• Reserve BIOS memory (in conventional memory)
• x86 (64-bit)
• Erase the identity map
• Clear BSS, copy boot information from the low memory
• And reserve BIOS memory
26
memblock
• Data Structure (include/linux/memblock.h)
Array of memblock_region
memblock (memblock) memblock_region
memory • base, size, flags[, nid]
(memblock_type)
memblock_region
reserved memblock_region
(memblock_type)
Array of memblock_region
(memblock: Global variable) memblock_region
Reserving in memblock
• Reserving adds the region to the region array in the
“reserved” type
static int __init_memblock memblock_reserve_region(phys_addr_t base,
phys_addr_t size,
int nid,
unsigned long flags)
{
struct memblock_type *_rgn = &memblock.reserved;
...
return memblock_add_region(_rgn, base, size, nid, flags);
}
• ARM
• arm_memblock_init
• Also called by setup_arch (8/80)
30
Resizing, or reallocation.
• Memblock uses slab for resizing if available
• # of e820 entries may be more than 128
• However, slab is available at kmem_cache_init called by
mm_init (25/80), so not at this time.
• Memblock tries to allocate by itself by finding an
area in memory && !reserved.
static int __init_memblock memblock_double_array(struct memblock_type *type,
phys_addr_t new_area_start,
phys_addr_t new_area_size)
{
…
addr = memblock_find_in_range(new_area_start + new_area_size,
memblock.current_limit,
new_alloc_size, PAGE_SIZE);
31
3. Initialization
Okay, okay.
33
start_kernel
• What’s the first initialization function called?
smp_setup_processor_id() ((at least 2.6.18) ~ 3.2)
lockdep_init () (3.3 ~)
commit 73839c5b2eacc15cb0aa79c69b285fc659fa8851
Author: Ming Lei <tom.leiming@gmail.com>
Date: Thu Nov 17 13:34:31 2011 +0800
[ 0.000000] WARNING: lockdep init error! Arch code didn't call lockdep_init() early
enough?
[ 0.000000] Call stack leading to lockdep invocation was:
[ 0.000000] [<c00164bc>] save_stack_trace_tsk+0x0/0x90
[ 0.000000] [<ffffffff>] 0xffffffff
#ifdef CONFIG_X86_64
BUILD_BUG_ON(offsetof(union irq_stack_union, stack_canary) != 40);
#endif
get_random_bytes(&canary, sizeof(canary));
tsc = __native_read_tsc();
canary += tsc + (tsc << 32UL);
current->stack_canary = canary;
#ifdef CONFIG_X86_64
this_cpu_write(irq_stack_union.stack_canary, canary);
#else
this_cpu_write(stack_canary.canary, canary);
#endif
}
38
cpumask
• A bit map
typedef struct cpumask { DECLARE_BITMAP(bits, NR_CPUS); } cpumask_t;
include/linux/cpumask.h
#define DECLARE_BITMAP(name,bits) \
unsigned long name[BITS_TO_LONGS(bits)]
include/linux/types.h
NR_CPU bits
41
smp_processor_id
• Returns the core ID (in the kernel)
• In ARM (and old days in x86)
• Located in “current”
• Located in the top of the current stack
• In x86
• Located in the per-cpu area.
#define raw_smp_processor_id() (this_cpu_read(cpu_number))
arch/x86/include/asm/smp.h
Next
• Topics and the rest of initialization
• Setup parameters (early_param() etc.)
• Initcalls
• Multiprocessor supports
• Per-cpus
• SMP boot (secondary boot)
• SMP altenatives
• And other alternatives
• And Others?
• Modules?