You are on page 1of 77

Multi process-Multi Threaded

Amir Averbuch Nezer J. Zaidenberg

Referances From APUE 2e


and pselect Ch. 14.5 Process and forking Ch. 8 Threads Ch. 11 + 12

Doing things in parallel


times we are faced with a system that must handle multiple requests in parallel.

Busy waiting is usually a bad idea. Several APIs provide alternatives.

multiple inputs in multiple terminals (or sockets, or sessions etc.) Processing multiple requests by a server Handling several transactions, avoiding being hang if one transactions takes too long. Doing things while waiting for something else(I/O computation etc.)

Busy waiting

waiting (v) a process who keeps asking the kernel do I have something to do? (do I have I/O? did I wait enough time)

Doing things with single process


solution was to do things in a single process. With API that allows for concurrency Select(2) API is the most common Other APIs include

API is the API that is most widely used today

Aio_XXX API (and kaio_XXX) Various forms of graceful multi-tasking Signals



come over several file descriptors (in the UNIX OS an open terminal, communication socket, and actual file I/O are all done over file descriptors) Output may be written to several interfaces and it may take time to write (less frequent) Waiting for exceptions on file descriptors (practically non-existent) Usually it takes very little to process input or output

situation :


Select(2) API
int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout); Nfds - The first nfds file descriptors are checked in each set. Therefore, should be equal max fd used +1 (since fds start from zero) Fd_sets - actually bit_array. The OS provide facilities to manipulate. Timeout - return with timeout after XXX seconds

int main(void) { struct timeval tv; fd_set readfds; tv.tv_sec = 10; tv.tv_usec = 0; FD_ZERO(&readfds); FD_SET(0, &readfds); select(1, &readfds, NULL, NULL, &tv); if (FD_ISSET(0, &readfds)) { char c = getc(stdin); printf("%c was pressed",c); } else printf("timeout\n"); return 0; }

Select example

Problems with select


file descriptors can be handled. (In Windows ONLY sockets can be handled) One can not wait for file descriptor and semaphore /computation/ mutex / etc. Un-fairness in large set on some implementation Select ruins its arguments (timeval and fdsets) however no assumption can be made on how they are ruined (I.e. how much time is left on timeval) Many modern UNIX OS support Poll(2) a select replacement. On such systems Select is usually implemented using poll. (but Poll is not available anywhere!)

When to use select(2)

Needing to handle multiple inputs Inputs treatment can be sequential Treating individual request is very short All inputs are file descriptors Wishing to avoid threading/multi process

select(2) when the following occur :


inputs comes in different format it may be possible to use other API. However same considerations apply.


What not to use instead of select


Async I/O methods, unless you know what you are doing. (use Poll if you like, but take note of portability issues) Busy waiting Extra Thread/Process to do what select can do just fine.

select and pselect


implementation of UNIX also include pselect(2) system call which is similar to select(2) pselect(2) have almost the same parameters with two differences Wait times can be given in nanoseconds instead of milliseconds A signal mask for signals to be ignored while waiting is given

Running multiple tasks

Thread, Process, Task - definitions


a running program. With its own memory protected by the OS from other processes. Thread(n) mini-program or program within a program a separate running environment inside a process.
A single process may contain numerous Threads have memory protection from


will be used to refer to either thread or process.

threads. other processes (including threads in these process) but not from other threads in the same process.


process we know of has atleast one thread the main() thread.

Multi tasking methods


multi tasking each task specify when it agrees to be moved out of the CPU for another task. lwp library an example.
multitasking The kernel decides which process receives CPU and when. The kernel moves tasks into the running scope.
Does not exist MasOS classic


today and Windows 3.11 are examples

Multi tasking definition


processes Scheduler (n) - Part of the OS kernel that is responsible on pre-empting tasks and putting new tasks to execute

(v) the act of swapping

Multi-process programming

process. Task switching is managed by the OS in pre-emptive multi-tasking. Each process has its own memory space. (heap, stack, global variables, process environment) Information and synchronization should be delivered from process to process using multi process communications API (such as Unix domain sockets)

multiple tasks tasks in different

Creating new process : Fork(2)


creates a new process identical to the current one except for the response to fork(2) Other methods to invoke a new processes under UNIX
System (run executable) execXXX (function family

to replace current process image with a new one)

Fork example + why does hello world printed twice

Int main() { printf(hello world); fork(); printf(\n) fflush(stdout); }


works with buffers (that we can fflush(3) later. First printf(3) just copied stuff to the buffer fork(2) duplicated the process. (including the buffer) Both buffers were flushed.

This example doesnt work on every system because printf(3) and flushing implementation are not standard and depend on compiler versions) but when it does work its KEWL!

How to pass information between process

Using network sockets Using Unix domain sockets Using Sys V/Posix IPC (message queues) Using shared memory Using RPC (or COM/CORBA/RMI etc.) File locking semaphore kludge, Linux

unmapped shared memory kludge Platfrom specific APIs (Linux sendfile, Sun Doors etc.)

In this course

will discuss network and unix domain sockets as means to deliver information We will discuss file locking via fcntl(2) as means to implement semaphores. Other methods are described in APUE.

process will usually run un effected by other process it spawned. When process terminates it returns a return code (the int from int main()) to its parent process. The parent process usually (unless we do something smart) ignores it. Parent process can wait for a child process (or any child process.) to terminate using wait(2) and waitpid(2) API.

Waiting for process to die

Zombie process

process that terminates, but whose parent has not received its termination status (usually means something is wrong with the parent) remain in the system as zombie process Orphaned processes are adopted by init (process number 1) who always wait for its children to die


can die and notify its parent about its exit status using the exit(2) system call. Calling this system call terminate the calling process

Network sockets example for IPC


guide to network programming provide helpful tutorial on how to communicate between two process on a single host. This guide will be described at recitation. e/bgnet.html

Select - revisited

child process terminates parent process receive signal which causes select(2) to abort returning EINTR value. If you code multi process application and use select you should usually ignore this return status. (or mask SIGCHLD and use pselect(2))

Problems with multi process


processes are memory protected it is relatively hard to sync and pass information between multiple processes. Using APIs force us to some constraints inherited by the API Process overhead especially process creation overhead is heavy Context switching is expensive

Software engineering : when to use multi-process environment


should be handled simultaneously. select not suitable. Process are created infrequently (or preferably, only once). Relatively low number of processes overall IPC is not needed frequently. You want process memory protection.

When not to use processes


we can (reasonably) do the job in one process. Lots of information is transferred. High performance is needed and you dont know what you are doing. (context switch is expensive.) In almost any case when thread be just as good, much simpler and wont hurt us.

User threads Multi-thread programming


are managed in separate memory spaces by the OS that requires us to use IPC to transfer information between processes. Threads are mini-processes. Sharing heap, process environment and global variables scope but each thread has a different stack for its own. Using threads - the entire heap is shared memory! (actually the entire process!)

Threads API

95 threads API is now common on all UNIX OS and should be used whenever threads are needed on UNIX OS for all new applications. Legacy applications may use different threads API (usually prior to Posix 95) such as Solaris threads. Those APIs are usually almost identical to Posix API. Microsoft windows has similar API.

In this course

will cover POSIX threads API. We will briefly discuss microsoft windows threads API We will give example to cross platform thread class.


a new thread Gets a function pointer to serve as the thread main function Threads can be manipulated (waited for, prioritized) in a similar way to processes but only internally.

Critical section

often we reach a situation when two tasks need access to the same memory area.


access to both tasks will very often result in corrupt reads or writes.
When both try to write When one write and one read No problem with two reads

can happen with processes and shared memory This occurs very frequently with threads.

Memory corruption

two tasks try to access same memory space We would like to guarantee that After a read either the new or old state of the memory will be given (not a mishmash) After multiple write either write state will be reside completely in the memory (but no a mishmash of two writes) Failing that we have memory corruption,

Handling critical section

Elimination (preferred method) Locking / Mutex / semaphores etc. Risk memory overrun - DO NOT DO

IT. Even if you are 100% sure you know what you are doing!!!! (and if you do, consult some one, think again, and consult somebody else too!)


Mutex or Mutually exclusion is a device served to lock other threads from entering critical section while I (I am a thread) am using it. Cond - sort of reverse mutex a device that is served to lock myself (I am a thread) from entering critical section while another thread prepares it for use.

95 provide two main forms of sync

Deadlock (software engineering bug)


a state were two resources are required in order to do something. Each resource is protected by a mutex. Two tasks each locks a mutex and wait for the other mutex to be available. Both tasks hang and no work is done. It is up to the software engineer to avoid deadlocks.

Recursive mutex

happens if a thread locks a mutex then by some chain of events re-locks it? the thread unlock the mutex (which was locked twice) does it unlocks or should it be unlocked twice?


no means should the process be blocked (deadlocked) by itself.

implementation have different answers. Linux requires equal numbers of locks and unlocks while default Solaris behavior is to unlock all locks. Default behavior can be changed (for Linux or Solaris) by specifying the mutex is/is not recursive. Recursive = Linux interpretation.

Using recursive mutexes is usually deprecated way to write code. (since programmers reading the code tend to think the mutex is unlocked while in practice it is) But programmers do it anyway.


is a reverse mutex i.e. unlike a mutex which is usable in first use and is blocked until released, a cond is blocked when first acquired and is released when a second thread acquires it. Cond is typically used in a producerconsumer environment when the consumer is ready to consume before the producer is ready to produce. The consumer locks the cond. The producer unlocks when something is available.

Pthread create
int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine)(void *), void *arg);

Arguments for pthread_create

First argument is the thread id. (so that we can later do stuff with the thread) 2nd argument is used for creation attributes (can be safely ignored in this course) 3rd argument is the thread start routine. (the thread int main()) 4th argument is the thread function arg (the thread argc/argv) More on that in the recitation

Windows create thread

HANDLE WINAPI CreateThread( __in_opt LPSECURITY_ATTRIBUTES lpThreadAttributes, __in SIZE_T dwStackSize, __in LPTHREAD_START_ROUTINE lpStartAddress, __in_opt LPVOID lpParameter, __in DWORD dwCreationFlags, __out_opt LPDWORD lpThreadId );

Comparison of windows and UNIX threads functions

Windows 1st,

2nd and 5th arguments are contained in UNIX 2nd arguments (the thread attributes) 3rd and 4th windows argument correspond to 3rd and 4th unix argument. (the thread function and its arguments) 6th windows argument correspond to first unix argument (thread id)

Different OSs have different API but same principles rule everywhere. (including embedded OSs, realtime OSs, mainframe, cellphone OSs etc.)

Threads benefits

provide easy method for multiprogramming (because we have easier time passing information) Threads are lighter to create and delete then process Threads have easy access to other threads variables (and thus doesnt need to write a IPC protocol) Context switching is usually cheaper then process Threads are cool and sexy

Problems when using threads


need to do IPC means all problems with locking and unlocking are up to the programmer All seasoned programmers have several horror stories chasing bugs past midnight in dreaded threaded environment! Context switching makes it more efficient to use single thread the multi thread. Because threads are cool and sexy, Threads use is often overdone. De-threading is common task in many mature applications.

Common misconception about thread stacks


thread require its own stack in order to enter function define automatic variables etc. So the OS gives a new stack to each thread But the OS have no memory protection between threads period That means if we create a pointer and point to thread local stack scope, other threads can change it with no locking.

As always When people are doing things that will only confuse other programmers this is deprecated.

Thread safety and re-entrant code

Consider the function strtok(3). Beside the fact that this function

is one of the worst atrocities devised by mankind it is also non-reentrant This function uses a char * in the global scope so that multiple calls can be called with NULL as the first argument. Consider what happens to this function when multiple threads use it simultaneously.

Calling strtok from two threads


thread calls strtok. Gives char pointer which is saved in strtok static char * Second thread calls strtok. Overwrites first char pointer. First thread call strtok with NULL Poetic justice? Just what the caller deserve?

Strtok example contd.


global char * is a CRITICAL SECTION in the sense it may not be used twice by two different threads So the second call for strtok would ruin it for the first call. A different function was offered that doesnt use global buffer. strtok_r() this function takes an external buffer. Similarly ctime() now has ctime_r() localtime() has localtime_r() etc.

Remove the critical section

#include <string.h> char * strtok(char *str, const char *sep); char * strtok_r(char *str, const char *sep, char **last);

Compiling multi-threaded code


code requires several compile time consideration Usually a compile/link switch (-lpthread or pthread in UNIX platfroms or /MT (/MTd) in Microsoft Windows) Linking multi-thread and nonmultithreaded code may result in link or runtime errors on different platforms.

Software engineering : when to use threads


cannot do things in a single thread efficiently. Multi processing is required. Lots of data is shared between threads. We dont need OS memory protection. We think the new thread is absolutely necessary.

Common mal usage of threads


a new thread for every request received in a server.

expensive to create and delete threads. often causes starvation. (the OS doesnt


multiple threads for each running transaction on a server. (such as DB).

thread needs CPU) reduce overall performance.

know which


create a thread pull of worker thread. Share a work queue.

Common mal use of threads 2


many this little thread only does this threads. Impossible to design reasonable locking and unlocking state machines. once number of threads go up, too many thread-2-thread interfaces locking and unlocking are guaranteed to cause bugs. Only create threads when things must be done in parallel and no other thread can reasonably do the task.

Single process should provide best overall performance Easiest to program Single process may be hard to design, specifically if needs to handle inputs from multiple sources types Single process may be prune to be hang on specific request Should be preferred when ever complexity rising from multiplicity is not severe


Multi process use the OS to create processes, swap process and context switch, thus adding load IPC makes it hard to program Usually easy to design if process tasks are easily separated Should be preferred when IPC is minimal and we wish to have better control over memory access in each process.

Multi thread use the OS to create threads and context switch, adding load. However not as much as process because threads are lighter Easy to program and pass information between threads, but also dangerous Usually hard to design to avoid deadlocks, bottlenecks, etc Should be preferred when lots of IPC is needed Dangerous : novice programmers reading and writing to unprotected memory segments

Common multi-threaded design patterns

Producer - Consumer

something Put it in queue. Inform consumer it is ready

Wait on queue Take stuff from Consume it Return to queue


Producer - Consumer

Consumer used typically with handler threads. Some thread does some work and puts it for the other thread(s) to consume. Sometimes a series of producer-consumer define a single transaction Real world examples : handle requests by web server, db server or many other server that gets request in a single pipe and have several handling threads


Converting non reentrant code to reentrant code is sometimes tedious task.


Guard or Scope Mutex is a class that wraps a mutex implementation Class destructor releases the mutex. By using the C++ destructor mechanism we insure that when we leave the critical segment the mutex will be released

from multiple threads may enter non reentrant scope from many places. If we use locking and forget to release the mutex we may suffer from deadlocks (sometimes releasing the mutex is not as trivial as it sounds because legacy code tends to have many surprises in store (such as break, continue, goto and other goodies))

Scope Mutex header

class CScopeMutex { public: CScopeMutex(Cmutex& mutex); ~CScopeMutex() {unlock();} void wait(); void signal(); inline void lock() { wait(); } inline void unlock() { signal(); } private: Cmutex& Mutex;


select is very easy and is very often required. We cannot wait on socket and cond using select. Instead we will use socket buffer. We will read (using select(2) off course) 1 byte from the socket buffer, when we wish to wait for cond We will write 1 byte when we wish release cond


Signal header
class Csignal { private: int fd[2]; char buf; void InitSignal(); public: Csignal(); virtual ~Csignal(); Csignal(const Csignal& other) { InitSignal(); } int send(); int signal() {return send();} int wait(); int GetWaitFD(); };

Signal implementation
{ Csignal::Csignal() InitSignal(); buf = 42;

} void Csignal::InitSignal() { if (socketpair(AF_UNIX, SOCK_STREAM, 0, fd) == -1) THROW_SOCKETERROR; } Csignal::~Csignal() { close(fd[0]); close(fd[1]); }

Signal example
int Csignal::send() { if (::send (fd[0], &buf, sizeof(char), 0) != sizeof(char)) THROW_ERRNO; return 1; } int Csignal::wait() { char res; if (recv(fd[1], &res, sizeof(char), 0) != 1) THROW_ERRNO; return 1; } int Csignal::GetWaitFD() { return fd[1]; }

Further reading and examples


libraries exist on the net to manage OS services and provide infrastructure design pattern on efficient multi platfrom environment Examples include
Nspr (netscape portable ICE ACE which I prefer

run time)

This class will create a thread using Windows threads, Solaris threads and Posix threads. The class has an Execute method to be inherited and modified by derived classes (The derived class is a thread) I will only discuss the thread creation. Real world implementation should also include Methods to wait for termination, suspend, prioritize Attributes to get status (started, stopped, terminated) and return code Queues for working threads Etc etc.

Example code : Thread wrapper class

Cthread : header file

class CThread { public: CThread(CMutexedSignal* FinishedSignal = NULL); virtual ~CThread(); virtual void * Execute() = 0; int CreateThread(); void WaitFor(); void Terminate(); thread_t Thread; CMutexedSignal* FinishedSignal; bool Started; bool Finished; };

Cthread function body

int CThread::CreateThread() { #ifdef WIN32 HANDLE Thread; DWORD ID; Thread = ::CreateThread(NULL, 0, call_start, (void*)this, 0, &ID); this->Thread = Thread; return (int)(Thread == NULL); #else return ctf_thread_create(&Thread, call_start, this); #endif }

Call_start - Thread main()

#ifndef WIN32 extern "C" { static void * call_start(void * This) #else DWORD WINAPI call_start(LPVOID This) #endif { if (This) { ((CThread *)This)->Started = true; ((CThread *)This)->Execute(); ((CThread *)This)->Finished = true; if ( ((CThread *)This)->FinishedSignal) ((CThread *)This)->FinishedSignal->signal(); } return NULL; } #ifndef WIN32 } #endif

inline int ctf_thread_create(pthread_t *thread, void* (*start_routine)(void *), void* arg) { #ifdef Solaris_threads return thr_create(NULL, (size_t)0, start_routine, arg, 0, thread); #else // POSIX THREADS return pthread_create(thread, NULL, start_routine, arg); #endif }