CS 498 Lecture 4 An Overview of Linux Kernel Structure

Jennifer Hou Department of Computer Science University of Illinois at Urbana-Champaign
Reading: Chapter 1-2, The Linux Networking Architecture: Design and Implementation of Network Protocols in the Linux Kernel

Overview of the Kernel Structure Activities in the Linux Kernel Locking Kernel Modules /proc File System Memory Management Timing


Structure of Linux Kernel
Applications and tools System calls
Process management

User space component Network
Network functionality Functionality

Memory management
Virtual memory

File systems

Device drivers

Scheduler Architecture specific code

Files, directories Device access

Memory manager

File system types

Block devices Hard disk CD, floppy

Character devices

Network protocols Network drivers Network adapter

Software support Hardware support





Overview of the Kernel Structure
Process management

The scheduler handles all the active, waiting, and blocked processes. Is responsible for allocating memory to each process and for protecting allocated memory against access by other processes.

Memory management

File system
In UNIX, almost everything is handled over the file system interface.  Device drivers can be addressed as files  /proc file system allows us to access data and parameters in the kernel

Overview of the Kernel Structure
Device drivers

Abstract from the underlying hardware and allow us to access the hardware with well-defined APIs

Incoming packets are asynchronous events and have o be collected and identified, before a process can handle them.  Most network operations cannot be allocated to a specific process.

Features of Linux Kernel
Is a Monolithic kernel
The entire functionality is contained in one kernel.  In contrast, in microkernels (e.g., Mach kernel and Windows NT), only memory management, IPC are contained in the kernel. The remaining functionality is moved to independent processes/threads running outside the OS.  + accessing resources directly from within the kernel, avoiding expensive system calls and context switches.  - OS becomes quite complex.

Feature of Linux Kernel
A cure is the use of kernel modules
Linux allows kernel modules to be dynamically loaded into (removed from) the kernel at run time.  This is achieved with the use of well-defined interfaces, e.g., register_netdev(), register_chrdev(), register_blkdev().  The run-time performance is guaranteed by having modules run in protected kernel mode.

Activities in the Linux Kernel

Activities – Processes and System Calls
Processes operate exclusively in the user address space, and can only access the memory allocated to them.

Violation leads to exceptions.

When a process wants to access devices or use a functionality in the kernel  system call.

The control is transferred to the kernel, which executes the system call on behalf of the user process.

Processes can be interrupted voluntarily (wait on semaphore or sleep) or involuntarily (interrupt).

Other Forms of Activities
Hardware interrupts Software interrupts Tasklets

Interrupts – Hardware IRQs
Peripherals use hardware interrupts to inform OS of events (e.g., a packet has arrived at the network adapter)  an interrupt handling routine is called. The handling routine for a specific interrupt can be registered (de-registered) by register_irq() (free_irq()). Fast interrupts
 

have a very short handling routine (that cannot be interrupted). Are specified by the flag SA_INTERRUPT in request_irq(). Have a longer handling routine and can be interrupted by other interrupts during their execution.

Slow interrupts

in_irq() (include/asm/hardirq.h) can be used to check whether or not the current activity is an interrupt-handling routine.

Not every operation that needs to be executed in an interrupt can be completed in a few instructions (e.g., a packet that arrives at a network adapter). To keep interrupt handling short, the routine is usually divided into two parts:

Top-half: handles the most important tasks (e.g., copying the arrived packet to a kernel buffer queue waiting for detailed handling later) Bottom-half: handles non-time critical operations. It is being scheduled for execution right after the top half is executed (e.g., when a packet arrives, the bottom half is run as a software interrupt NET_RX_SOFTIRQ).

Software Interrupts
When a system call or a hardware interrupt terminates, the scheduler calls do_softirq(). do_softirq() schedules software interrupts for execution. A maximum of 32 software interrupts can be defined in Linux.  NET_RX_SOFTIRQ and NET_TX_SOFTIRQ are two software interrupts. Multiple software interrupts can run concurrently, and hence need to be reentrant. in_softirq() (include/asm/softirq.h) can be used to check whether or not the current activity is a software interrupt.

A more formal mechanism of scheduling software interrupts (and other tasks).  The macro DECLARE_TASKLET(name, func,data)
name: a name for the tasklet_struct data structure  func: the tasklet’s handling routine.  data: a pointer to private data to be passed to func().

tasklet_schedule() schedules a tasklet for execution.  tasklet_disable() stops a tasklet from running, even if it has been scheduled for execution.  tasklet_enable() reactivates a deactivated tasklet.

Tasklet Example
#include <linux/interrupt.h> /* Handling routine of new tasklet */ void test_func(unsigned long); /* Data of new tasklet */ char test_data[] = “Hello, I am a test tasklet”; DECLARE_TASKLET(test_tasklet, test_func, (unsigned long) &test_data); void test_func(unsigned long data) { printk(KERN_DEBUG, “%s\n”, (char *) data); } …. tasklet_schedule(&test_tasklet);


Locking -- spinlock
A mechanism for busy wait locks.
spin_lock_init(&my_spinlock)  spin_lock (spinlock_t *my_spinlock)

Tries to set the spinlock my_spinlock. If it is not free, then wait or test until the lock is released. Releases a lock.

 

spin_unlock(spinlock_t *my_spinlock)

spin_is_lock(spinlock_t *my_lock) returns the current value of the lock (non-zero value  lock is set)  spin_trylock(spinlock_t *my_lock) sets the spinlock, if it is currently unlocked; otherwise, the function returns a non-zero value.

Spinlock Example
#include <linux/spinlock.h> spin_lock_init(&my_spinlock); // One thread spin_lock(&my_spinlock); // Critical section spin_unlock(&my_spinlock); …. // Another thread spin_lock(&my_spinlock); // Critical section spin_unlock(&my_spinlock);

Read-Write Spinlocks
Some data structure, such as the list of registered network devices (dev_base), does not change frequently, but is subject to many read accesses  use of read-write spinlock to improve run-time performance. read_lock(): if there is no lock or only read lock, then the critical section can be immediately accessed. If there is a write lock, then we have to wait. read_unlock(): A read activity leaves the critical section. If a write activity is waiting and there exists no other read activity, it gains access. write_lock(): if there is a (read/write) lock, we have to wait; otherwise, we put an exclusive lock. write_unlock()

Kernel Modules

Kernel Modules
Each kernel module implements init_module() and cleanup_module().  To load a kernel module into the kernel space manually, use insmod modulename.o [argument]. In turns the following system calls are called:

sys_create_module() allocates memory to accommodate the module in the kernel space.  sys_get_kernel_syms() returns the kernel’s symbol table to resolve the missing references within the module to kernel symbols.  sys_init_module() copies the module’s object code into the kernel address space and calls the module’s init_module().  Insmod wvlan_cs eth=1 network_name=“mywavelan”

Kernel Modules
rmmod modulename

Removes the specified module from the kernel address space. In turn, the system call sys_delete_module() is called, which in turn calls cleanup_module().

lsmod lists all currently loaded modules and their dependencies and reference counts. modinfo gives the information about a module. The information is set by the macros MODULE_DESCRIPTION, MODULE_AUTHOR in the module’s source.

Passing Module Parameters
MODULE_PARM(var, type) designates the variable var as a parameter of the module, and a value can be assigned to this parameter during loading. Possible types are:  b: byte; h: short (two bytes); i: integer; l: long; s: string. MODULE_PARM_DESC(var, desc) adds a description (desc) for the parameter var. MODULE_DESCRIPTION(desc) contains a description of the module. EXPORT_SYMBOL(name) exports and adds a function or variable of the kernel to the symbol table.

The Proc File System

The Proc File System
All files in /proc are virtual files, and are generated to export the kernel information in the user space. The files and directories are based on proc_dir_entry.

proc_dir_entry Structure
struct proc_dir_entry { unsigned short low_ino; /* Inode number; automatically filled by proc_register */ unsigned short namelen; /* length of the file or directory name */ const char *name; /* a pointer to the name of the file */ mode_t mode; /* the file’s mode */ nlink_t nlink; /* the number of links to this file (default = 1) */ uid_t uid; gid_t gid; unsigned long size; /* length of the file as shown when the directory is displayed. */ struct inode_operations * proc_iops; struct file_operations * proc_fops; get_info_t *get_info(buffer, start, off, count); struct module *owner; struct proc_dir_entry *next, *parent, *subdir; /* pinters to link the proc directory structure. */ void *data; /* a pointer to private data */ read_proc_t *read_proc (buffer, start, off, count, eof, data); write_proc_t *write_proc(file, buffer,count,data); atomic_t count; /* use count */ int deleted; /* delete flag */ kdev_t rdev; };

Handling of /proc Entries.
create_proc_entry(name,mode,parent): creates a file with name in the proc directory; returns a pointer to the proc_dir_entry structure.

The name is relative to /proc/

test_entry = create_proc_entry(“test”, 0600, proc_net); test_entrynlink = 1; test_entrydata = (void *) &test_data; test_entryread_proc = test_read_proc; test_entrywrite_proc = test_write_proc;

remove_proc_entry(name,parent) removes the proc file specified in name.

Handling of /proc Entries
proc_mkdir(name,parent) creates directories in the proc directory; returns a pointer to the proc_dir_entry structure. create_proc_read_entry(name,mode,base, get_info) creates the proc file name and uses the function get_info() to initizlize read accesses.
test_entry=create_proc_read_entry(“test”, 0600, proc_net, test_get_info);

Memory Management

Reserving/Releasing Memory In the Kernel
kmalloc(size,priority): attempts to reserve consecutive memory space with a size of size bytes in the kernel memory.
GFS_KERNEL: is used when the requesting activity can be interrupted during the reservation.  GFS_ATOMIC: is used when the memory request should be atomic.

kfree(objp): releases the memory space reserved at address objp

Reserving/Releasing Memory In the Kernel
copy_from_user(to, from, count) copies count bytes from the address from in the user address space to the address to in the kernel address space. copy_to_user(to,from,count) copies count bytes from the address from in the kernel address space to the address to in the user address space. access_ok() confirms the corresponding virtual memory page is actually residing in the physical memory.

Memory Caches
Linux allows us to create a cache with memory spaces of specific sizes  slab caches.

kmem_cache_create(name, size, offset, flags, ctor, dtor) creates a slab cache of memory spaces with sizes in size bytes.
name points to a string containing the name of the slab cache; offset is usually set to null.  flags specifies additional options, e.g., SLAB_HWCACHE_ALIGN (aligns to the size of the first level cache in the CPU)  ctor, dtor: specifies a constructor and a destructor for the memory spaces used to initialize or clean up the reserved memory spaces. skbuff_head_cache = kmem_cache_create (“skbuffer_head_cache”, sizeof(struct sk_buff), 0, SLAB_HWCACHE_ALIGN, skb_headerinit, NULL).

Memory Caches
kmem_cache_destroy(cachep): releases the slab cache cachep. kmem_cache_shrink(cachep): is called by the kernel when the kernel itself requires memory space and has to reduce the cache. kmem_cache_alloc(cachep,flags): is used to request a memory space from the slab cache, cachep. If the slab cache is empty, then kmalloc() is used to reserve new memory space. kmem_cache_free(cachep, ptr): frees the meory space that begins at ptr, and gives it back to the cache, cachep.

Sign up to vote on this title
UsefulNot useful

Master Your Semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master Your Semester with a Special Offer from Scribd & The New York Times

Cancel anytime.