Professional Documents
Culture Documents
Abhilash Jindal
Agenda
• Vocabulary (OSTEP Ch. 4)
• What is a process? System calls? Scheduler? Address space?
• Memory management (OSTEP Ch. 13-17)
• How to manage and isolate memory? What are memory APIs? How are they
implemented?
• Processes in action (xv6 Ch. 3: system calls, x86 protection, trap handlers)
• Process control block, user stack<>kernel stack, sys call handling
• Scheduling (xv6 Ch5: context switching, OSTEP Ch. 6-9)
• Response time, throughput, fairness
Process is a running program
• Load program from disk to memory
• Exactly how we loaded OS
• Give control to the process. Jump cs, eip
Processes can ask OS to do work for them
System calls
50
• + 5 0 = (move pointer to 30)
30 • + 3 0 = (move pointer to 10)
10 • + 1 0 = (move pointer to 20)
20
• + 2 0 = (move pointer to 10)
10
Sharing the calculator
• Steps to share the calculator:
20 10
• 20 + 10 = 30 + 30 = 60
10 70
• Write 60 in notebook, remember that we were done till
30 20 30, give calculator
50 40
• 10 + 70 = 80
30 20
• Write 80 in notebook, remember that we were done till
10 10 70, give the calculator back
20 50
10 10
Memory isolation and
management
OSTEP Ch. 13-17
Abhilash Jindal
Process Address Space
Code, Heap, Stack
Stack
free
Heap
Code
Process address space
Function calling in action ebp
Stack
esp
02.s 0x3c
0x..
_foo: 0x38
pushl %ebp Save caller’s base pointer 0
movl %esp, %ebp ebp = esp
movl 8(%ebp), %eax eax = *(ebp + 8) 0x34
addl 12(%ebp), %eax eax = eax + *(ebp + 12)
popl %ebp Restore caller’s base pointer 0x30
retl change eip to return address
Program
free
malloc/free
Virtual Memory
allocator
Code
Process address space
OS memory allocator
• sbrk(int increment) increments process’ Stack
break. increment can be negative.
free
Program Program
ptr=malloc(size_t size);
free(ptr);
fi
Memory allocator
Fragmented heap over time
• Assumptions
• Do not apriori know allocation size and order
• Cannot move memory once it is allocated. Program might have the pointer
to it.
• Goals
• Quickly satisfy variable-sized memory allocation requests. How to track free
memory?
• Minimize fragmentation
Memory (de)allocation patterns
sptr = malloc(100)
optr = malloc(100)
free(sptr)
free(ptr)
free(optr)
Which block to allocate?
Example: malloc(15)
• Best t
• Slow. need to search the whole list
• First t
• Faster. (xv6: umalloc.c)
• Fragmentation
• Example: malloc(25)
fi
fi
In which order to maintain lists?
• (De)allocation order
ptr
• No splitting, coalescing
• Internal fragmentation: allocates more than asked
Head-4
• Wastes memory: If no allocation from object size, have
unnecessary reserved space for it
Head-8
fi
fi
fi
fi
Slab allocator
Used in Linux Kernel
• Knows the size of allocations made by the Linux kernel. Keep segregated lists
for each such struct.
• The language runtime manages memory. Programmer need not call free.
• Largely prevents memory leaks, dangling references, double free, null
pointer dereference
Process 2
0x00300000
Process 1 free
0x00200000
esp OS stack
OS
eip OS code
0x00100000 0x00100000
0x000C0000 Code
VGA display
0x000A0000 Process 1 address space
Bootloader 0x00007C00
0x00000000 Due to address translation, compiler need
not worry where the program will be loaded!
Segmentation
Stack • Place each segment
independently to Process 2 stack
not map free space Process 1 stack
Process 2
0x00300000 • Share code
Process 1
0x00200000 free segments to save
OS space
0x00100000 Process 2 heap
• Mark non-
Heap
writeable Process 1 heap
Process 1,2 code
0x00200000
Code
Process 1 address OS
0x00100000
0x00000000
0x00000
Many segments can be initialised in GDT
… … … …
… … … …
Process 3 heap
Process 2 heap
Process 3 code
Process 1 heap
Process 1,2 code
0x00200000
OS
0x00100000
0x00000000
External fragmentation
• After many processes start and exit,
memory might become
“fragmented” (similar to disk)
• Copying is expensive
• Growing heap becomes not
possible
Processes in action
xv6 Ch. 3: system calls, x86 protection, trap handlers
Memory isolation and address space
Stack
0x00400000
Process 2
0x00300000
Process 1 free
0x00200000
OS
0x00100000
Heap
Code
Process 1 address space
0x00000000
Protection
• Process cannot modify OS code since it is not in process’ address space
• Process cannot modify GDT and IDT entries since GDT, IDT are not in its address space
• What stops process from calling lgdt and lidt instructions?
• Ring 0: kernel mode. Ring 3: user mode
• lgdt and lidt are privileged instructions only callable from “ring 0”
How does hardware know
the current privilege level?
• Two LSBs of %cs determine
“current privilege level” (CPL)
• Process cannot change GDT entries, IDT entries since they are
not in address space of the process! Process cannot call lgdt,
lidt since it will run in ring 3. 0x00000000
fl
fl
scheduler() {
Understanding swtch …
sp
%e ags=FL_IF
%cs=UCODE
pinit(){ swtch(p->context);
%eip=0
p = allocproc(); } 0
memmove(p->offset, _binary_initcode_start,); swtch: 0
p->tf->ds,es,ss = (SEG_UDATA<<3) | DPL_USR; movl 4(%esp), %eax %ds
p->tf->cs = (SEG_UCODE<<3) | DPL_USR; movl %eax, %esp %es
…
p->tf->eflags = FL_IF; movl $0, %eax
%edi
p->tf->eip = 0; ret
%eip=trapret p->tf
} .globl trapret p->context
allocproc() { trapret:
eip Process code
sp = (char*)(STARTPROC + (PROCSIZE<<12)); popal
sp -= sizeof *p->tf; popl %gs p->o set
p->tf = (struct trapframe*)sp; popl %fs
sp -= sizeof *p->context; popl %es
p->context = (struct context*)sp; popl %ds
p->context->eip = (uint)trapret; addl $0x8, %esp p->context
esp Return address
return p; iret
}
fl
ff
Stack
Interrupt handling revisited …
param 1
vectors.S
%eip
eip .globl vector0
for(;;) ebp %ebp_old
; vector0:
local var 1
pushl $0 local var 2 esp
pushl $0 %e ags
jmp alltraps IDT %cs
trap frame
trap.c %eip
trapasm.S …
void 0
alltraps:
trap(struct trapframe *tf) … 0
pushal
{ %eax
pushl %esp GDT
switch(tf->trapno){ %ecx
call trap …
case T_IRQ0 + IRQ_TIMER: …
addl $4, %esp
ticks++; cs …
%edi
popal tf
cprintf("Tick! %d\n\0", ticks);
addl $0x8, %esp %eip
lapiceoi();
iret
..
return
fl
Interrupt handling problem
• We cannot trust %esp of the process. Hardware might write
(%e ags, %cs, %eip) into another process’ address space
or into OS memory.
void seginit(void) {
c->gdt[SEG_UCODE] = SEG(.., STARTPROC, (PROCSIZE-1) << 12, DPL_USER);
c->gdt[SEG_UDATA] = SEG(.., STARTPROC, (PROCSIZE-1) << 12, DPL_USER);
}
void scheduler(void) {
static struct proc* allocproc(void) { // pick RUNNABLE process p
sp = (char*)(STARTPROC + (PROCSIZE>>12)); switchuvm(p); 1’s kstack
p->kstack = sp - KSTACKSIZE; swtch(p->context);
Process
} }
1
void switchuvm(struct proc *p) {
mycpu()->gdt[SEG_TSS] = SEG16(STS_T32A, &mycpu()->ts,
sizeof(mycpu()->ts)-1, 0);
OS memory
mycpu()->ts.ss0 = SEG_KDATA << 3;
mycpu()->ts.esp0 = (uint)p->kstack + KSTACKSIZE;
ltr(SEG_TSS << 3);
}
Interrupt handling
trap frame
vector0: 0
; pushl %ds..
pushl $0 %ds
pushal
%es
movw $(SEG_KDATA<<3), %ax pushl $0
%fs
movw %ax, %ds.. jmp alltraps IDT
%gs
pushl %esp %cs, %eip %eax
call trap …
%ecx
addl $4, %esp …
popal %edi
GDT
popl %ds.. tf
…
addl $0x8, %esp %eip esp
cs …
iret
TR Process code
TSS …
%ss0, %esp0 …
SS
…
fl
Protection so far
• Can processes read/write each other’s memory?
• No, since the other process’ memory is not addressable by (not in the address space of) the
process
• Memory mapped IO
• Can directly read from / write to
IO devices if OS keeps it in
process address space
Port-mapped IO protection
inb(0x1F7), outb(0x1F2, 1), ..
trap frame
0
• tvinit sets DPL of IDT entry for T_SYSCALL to 3 %ds
• trap calls syscall if interrupt vector is T_SYSCALL %es
%fs
• syscall.c %gs
%eax
• syscall reads eax from trap frame to nd which syscall is %ecx
made and calls that particular call …
%edi
• Return value of sys call is set in trap frame’s eax tf
%eip esp
• initproc.S
Process code
• Calls three system calls: open “console” le, write “hello
world”, close
fl
fi
fi
%ss
%esp
Visualising syscall handling p19-syscall %e ags
%cs
trap frame
%eip
# sys_open("console", O_WRONLY)
eip pushl $1
int fetchint(uint addr, int *ip) { 0
if(addr >= p->sz || addr+4 > p->sz) T_SYSCALL
pushl $console %ds..
return -1;
pushl $0 %eax=SYS_open
%eax=fd
*ip = *(int*)(addr + p->offset); %ecx
movl $SYS_open, %eax
} …
int $T_SYSCALL
%edi
pushl %eax
int argint(int n, int *ip) { tf
int sys_open(void) { %eip
return fetchint((myproc()->tf->esp)
int fd, omode;
+ 4 + 4*n, ip);
if(argint(1, &omode) < 0) {
} esp
return -1;
void syscall(void) { 1
} *console
int num = curproc->tf->eax;
..
curproc->tf->eax = syscalls[num](); p->sz 0
return fd;
}
} Process code
p->o set
fl
ff
Syscall parameter checking and translation
int fetchint(uint addr, int *ip) {
if(addr >= p->sz || addr+4 > p->sz)
return -1;
*ip = *(int*)(addr + p->offset);
}
GDT
1
• Syscall parameters must be in process’ CS3: UCODE (CPL=3) DPL=3
*console
address space:
DS3: UDATA (RPL=3) DPL=3 0 p->sz
• Check virtual address (VA) is within limit
CS0: KCODE (CPL=0) DPL=0
• Add base to VA to get physical address (PA) Process code
DS0: KDATA (RPL=0) DPL=0 p->o set
• Alternative design: use process’ DS to read
syscall parameters
• Calls pinit twice to start two processes and then calls scheduler Process
2
• proc.c
1’s kstack
• pinit calls allocproc which sets up processes like earlier with the
added di erence that it calls kalloc to nd empty 1MB of space Process
1
• kalloc.c
• Maintains free list in chunks of 1MB. kalloc returns the rst OS memory
element from free list. There is no coalescing and splitting since
our OS always asks for 1MB.
ff
fi
fi
Code walkthrough: context switching
p18-sched
• main.c
• Calls scheduler to run the rst RUNNABLE process. When giving control, we also
remember “scheduler context” to come back to scheduler. Earlier, we were only
running one process and were never coming back to the scheduler.
• trap.c
• Calls yield when timer interrupt happens
• proc.c
• yield calls sched which switches control from the process to the scheduler
• scheduler looks for the next RUNNABLE process and switches to it
fi
TF.%eip=0
%eip=trapret
p2->kstack
Context switching in action: giving control
%ebp..%edi
p18-sched
Process 2 stack, heap
void scheduler(void) { Process 2 code
.globl swtch
eip struct proc *p; struct cpu *c = mycpu(); TF.%eip=0
swtch:
for(;;){ %eip=trapret
movl 4(%esp), %eax
p1->kstack
%ebp..%edi
movl 8(%esp), %edx for(p = ptable.proc; p < &ptable.proc[NPROC]; p++){
OS stack
} p1->ctx
movl %edx, %esp
&(c->scheduler)
popl %edi }
%eip=scheduler
popl %esi }
%ebp..%edi
popl %ebx
void yield(void) { void trap(struct trapframe *tf) {
OS heap
popl %ebp c->scheduler
struct proc *p = myproc(); ..
ret p1->ctx
p->state = RUNNABLE; if(tf->trapno == T_IRQ0+IRQ_TIMER)
p2->ctx
swtch(&p->context, c->scheduler); yield();
} } OS code
TF.%eip=0
%eip=trapret
p2->kstack
Context switching in action: taking control
%ebp..%edi
p18-sched
Process 2 stack, heap
.globl swtch void scheduler(void) { Process 2 code
swtch: struct proc *p; struct cpu *c = mycpu(); TF.%eip=123
p1->kstack
c->scheduler
movl 8(%esp), %edx for(p = ptable.proc; p < &ptable.proc[NPROC]; p++){
&(p1->ctx)
pushl %ebp if(p->state != RUNNABLE)
%eip=yield
pushl %ebx continue; %ebp..%edi esp
pushl %esi p->state = RUNNING; Process 1 stack, heap
pushl %edi switchuvm(p); eip Process 1 code
movl %esp, (%eax) swtch(&(c->scheduler), p->context);
OS stack
} p1->ctx
movl %edx, %esp
&(c->scheduler)
popl %edi }
%eip=scheduler
popl %esi }
%ebp..%edi
popl %ebx
void yield(void) { void trap(struct trapframe *tf) {
OS heap
popl %ebp c->scheduler
struct proc *p = myproc(); ..
ret p1->ctx
p->state = RUNNABLE; if(tf->trapno == T_IRQ0+IRQ_TIMER)
p2->ctx
swtch(&p->context, c->scheduler); yield();
} } OS code
Scheduling policies
OSTEP Ch. 7-9
Scheduling problem
Computer operators
1960s: Batch jobs
Batch job characteristics
fi
Shortest time to completion first
CPU 1 1 2 3 3 3 2 2 2 3 3 3 1 1 2 3 3 3 2
Disk 1 1 1 1 1 1 1 1 1 1
Multi-level feedback queue (MLFQ)
• Strict priority using multiple queues
• (A > C) => A runs
• (A = B) => Round robin
• Goal: IO bound (interactive) processes shall
take higher priority over CPU intensive
processes
CPU 1 1 2 3 3 3 2 2 2 3 3 3 1 1 2 3 3 3 2
Disk 1 1 1 1 1 1 1 1 1 1
• If the kernel was updating its data structures, it needs to be careful to not
run into races
struct buf {
Race conditions
..
struct buf *qnext; // disk queue
Example: Processes reading a block in ide.c uchar data[BSIZE];
};
idequeue
static struct buf *idequeue;
• Processes in action: Process control block, user stack<>kernel stack, sys call
handling, system calls, protection