You are on page 1of 25

Linux Kernel Networking

Raoul Rivas

Kernel vs Application Programming

No memory protection

We share memory with


devices, scheduler

Sometimes no preemption

an hog the P!

oncurrency is di""icult

No li#raries

Print", "open

No security descriptors

$n Linux no access to "iles

%irect access to hardware

&emory Protection

Segmentation 'ault

Preemption

Scheduling isn(t our


responsi#ility

Signals )ontrol*+

Li#raries

Security %escriptors

$n Linux everything is a "ile


descriptor

Access to hardware as "iles



,utline

!ser Space and Kernel Space

Running ontext in the Kernel

Locking

%e"erring Work

Linux Network Architecture

Sockets, 'amilies and Protocols

Packet reation

'ragmentation and Routing

%ata Link Layer and Packet Scheduling

-igh Per"ormance Networking



System alls

A system call is an interrupt

syscall)num#er,
arguments+

.he kernel runs in a


di""erent address space

%ata must #e copied #ack


and "orth

copy/to/user)+,
copy/"rom/user)+

Never directly dere"erence


any pointer "rom user space
Kernel Space
!ser Space
Syscall
ta#le
write)ptr, si0e+1
ptr
syscall)WR$.2, ptr, si0e+
sys/write)+
opy/"rom/user)+
$N.
3x43
3x''''53
3x366375

ontext

ontext8 2ntity whom the kernel is running code on #ehal" o"

Process context and Kernel ontext are preempti#le

$nterrupts cannot sleep and should #e small

.hey are all concurrent

Process context and Kernel context have a P$%8

Struct task/struct9 current


Kernel ontext Process
ontext
$nterrupt
ontext
Preempti#le :es :es No
P$% $tsel" Application P$% No
an Sleep; :es :es No
2xample Kernel .hread System all .imer $nterrupt

Race onditions

Process context, Kernel ontext and $nterrupts


run concurrently

-ow to protect critical 0ones "rom race


conditions;

Spinlocks

&utex

Semaphores

Reader*Writer Locks )&utex, Semaphores+

Reader*Writer Spinlocks

$nside Locking Primitives

Spinlock
//spinlock_lock:
disable_interrupts();
while(locked==true);
//critical region
//spinlock_unlock:
enable_interrupts();
locked=false;

&utex
//mutex_lock:
If (locked==true)
{
n!ueue(this);
"ield();
#
locked=true;
//critical region
//mutex_unlock:
If $ismpt%(wait!ueue)
{
wakeup(&e!ueue());
#
lse locked=false;
We can't sleep while the
spinlock is locked!
We can't use a mutex in
an interrupt because
interrupts can't sleep!
THE MUTEX SLEES THE S!"L#$% S!"S&&&

When to use what;
Mutex Spinlock
Short Lock .ime
Long Lock .ime
$nterrupt ontext
Sleeping

!sually "unctions that handle memory, user space or


devices and scheduling sleep

Kmalloc, printk, copy/to/user, schedule

wake/up/process does not sleep



Linux Kernel &odules

2xtensi#ility

$deally you don(t want to


patch #ut #uild a kernel
module

Separate ompilation

Runtime*Linkage

2ntry and 2xit 'unctions

Run in Process ontext

LK& <-ello*World=
'define ()&*+
'define +I,*-
'define __./,+__
'include 0linux/module1h2
'include 0linux/kernel1h2
'include 0linux/init1h2
static int __init m%init(3oid)
{
printk(./,_4+/5 67ello8
world9n6);
/eturn :;
#
static 3oid __exit m%exit(3oid)
{
printk(./,_4+/5 6;oodb%e8
world9n6);
#
module_init(m%init);
module_exit(m%exit);
()&*+_+I<,=(6;>+6);

.he Kernel Loop

.he Linux kernel uses the concept o"


>i""ies to measure time

$nside the kernel there is a loop to


measure time and preempt tasks

A >i""y is the period at which the timer


in this loop is triggered

?aries "rom system to system 633


-0, @53 -0, 6333 -0A

!se the varia#le -B to get the


valueA

.he schedule "unction is the


"unction that preempts tasks
schedule)+
.imer
6C-B
add/timer)6 >i""y+
>i""iesDD
scheduler/tick)+
tick/periodic8

%e"erring Work C .wo -alves

Kernel .imers are used to create


timed events

.hey use >i""ies to measure time

.imers are interrupts

We can(t do much in themE

Solution8 %ivide the work in two


parts

!se the timer handler to signal a


threadA ).,P -AL'+

Let the kernel thread do the


real >o#A )F,..,& -AL'+
.imer
.imer -andler8
wake/up)thread+1
.hread8
While)6+
G
%o work)+1
Schedule)+1
H
$nterrupt
context
Kernel
context
.,P -AL'
F,..,&
-AL'

Linux Kernel &ap

Linux Network Architecture
Socket Access
$N2. !N$I
?'S
Socket Splice
Protocol 'amilies
N'S S&F iSS$
Network Storage
!%P .P
Protocols
$P
43@A66 ethernet
Network $nter"ace
Network %evice %river
'ile Access
Logical 'ilesystem
2I.J

Socket Access

ontains the system call


"unctions like socket, connect,
accept, #ind

$mplements the P,S$I


socket inter"ace

$ndependent o" protocols or


socket types

Responsi#le o" mapping socket


data structures to integer
handlers

alls the underlying layer


"unctions

sys/socket)+Ksock/create
sys/socket
socket
$nteger
handler
Socket
create
-andler
ta#le

Protocol 'amilies

$mplements di""erent socket


"amilies $N2., !N$I

2xtensi#le through the use


o" pointers to "unctions and
modulesA

Allocates memory "or the


socket

alls net/proto/"amiliy K
create "or "amiliy speci"ic
initili0ation
9p"
inet/create
net/proto/"amily
A'/L,AL
A'/!N$I

Socket Splice

!nix uses the a#straction o" 'iles as "irst class


o#>ects

Linux supports to send entire "iles #etween "ile


descriptorsA

A descriptor can #e a socket

Also !nix supports Network 'ile Systems

N'S, Sam#a, oda, Andrew

.he socket splice is responsi#le o" handling


these a#stractions

Protocols

'amilies have multiple


protocols

$N2.8 .P, !%P

Protocol "unctions are


stored in proto/ops

Some "unctions are not


used in that protocol so they
point to dummies

Some "unctions are the


same across many
protocols and can #e
shared
inet/#ind
inet/listen
inet/stream/connect
socket
inet/stream/ops
proto/ops
inet/#ind
N!LL
inet/dgram/connect
inet/dgram/ops

Packet reation

At the sending "unction, the


#u""er is packeti0edA

Packets are represented #y


the sk/#u"" data structure

ontains pointers the8

transport layer header

Link*layer header

Received .imestamp

%evice we received it

Some "ields can #e N!LL


tcp/send/msg
tcp/transmit/sk#
ip/Lueue/xmit
Struct sk/#u"
char9
Struct sk/#u"
.P -eader

'ragmentation and Routing

'ragmentation is per"ormed
inside ip/"ragment

$" the packet does not have


a route it is "illed in #y
ip/route/output/"low

.here are routing


mechanisms used

Route ache

'orwarding $n"ormation
Fase

Slow Routing
ip/"ragment
'$F
Slow routing
ip/route/output/"low
Route cache
"orward
dev/Lueue/xmit
)Lueue packet+
N :
N
N
N
:
:
:
ip/"orward
)packet "orwarding+

%ata Link Layer

.he %ata Link Layer is


responsi#le o" packet
scheduling

.he dev/Lueue/xmit is
responsi#le o" enLueing
packets "or transmission in
the Ldisc o" the device

.hen in process context it is


tried to send

$" the device is #usy we


schedule the send "or a
later time

.he dev/hard/start/xmit is
responsi#le "or sending to
the device
%ev/Lueue/xmit)sk/#u"+
%ev Ldisc enLueue
dev/hard/start/xmit)+
%ev Ldisc deLueue

ase Study8 iN2.

$N2. is an 2%' )2arliest


%eadline 'irst+ packet
scheduler

2ach Packet has a deadline


speci"ied in the .,S "ield

We implemented it as a
Linux Kernel &odule

We implement a packet
scheduler at the Ldisc levelA

Replace Ldisc enLueue and


deLueue "unctions

2nLueued packets are put


in a heap sorted #y deadline
enLueue)sk/#u"+
deLueue)sk/#u"+
-W
%eadline
heap

-igh*Per"ormance Network Stacks

&inimi0e copying

Bero copy techniLue

Page remapping

!se good data structures

$net v3A6 used a list instead o" a heap

,ptimi0e the common case

Franch optimi0ation

Avoid process migration or cache misses

Avoid dynamic assignment o" interrupts to di""erent P!s

om#ine ,perations within the same layer to minimi0e


passes to the data

hecksum D data copying



-igh*Per"ormance Network Stacks

acheCReuse as much as you can

-eaders, SLAF allocator

-ierarchical %esign D $n"ormation -iding

%ata encapsulation

Separation o" concerns

$nterrupt &oderationC&itigation

Receive packets in timed intervals only )eAgA A.&+

Packet &itigation

Similar #ut at the packet level



onclusion

.he Linux kernel has M main contexts8 Kernel, Process and


$nterruptA

!se spinlock "or interrupt context and mutexes i" you plan to
sleep holding the lock

$mplement a module avoid patching the kernel main tree

.o de"er work implement two halvesA .imers D .hreads

Socket "amilies are implemented through pointers to


"unctions )net/proto/"amily and proto/ops+

Packets are represented #y the sk/#u" structure

Packet scheduling is done at the Ldisc level in the Link Layer



Re"erences

Linux Kernel &ap http8CCwwwAmakelinuxAnetCkernel/map

AA himata, Path o" a Packet in the Linux Kernel Stack, !niversity


o" Kansas, @335

Linux Kernel ross Re"erence Source

RA Love, Linux Kernel %evelopment , @


nd
2dition, Novell Press,
@33N

-A Nguyen, RA Rivas, i%SR.8 $ntegrated %ynamic So"t Realtime


Architecture "or ritical $n"rastructure %ata %elivery over WAN,
Oshine @33P

&A -assan and RA Qain, -igh Per"ormance .PC$P Networking8


oncepts, $ssues, and Solutions, Prentice*-all, @33M

KA $lhwan, .imer*Fased $nterrupt &itigation "or -igh Per"ormance


Packet Processing, -P, @336

Anand ?A, .P$P Network Stack Per"ormance in Linux Kernel @AJ


and @A5, $F& Linux .echnology enter

You might also like