You are on page 1of 54

Pthreads Programming: A Hands-on Introduction

Adrien Lamothe

Monday, July 23, 2007 8:30 am !2:00 noon "oom #!3$%!&0

Copyright 2007 Adrien Lamothe All Rights Reser ed

We're going to cover fundamental concepts of multi-threaded programming and Pthreads. We'll play around with some simple C programs, that utilize Pthreads, to learn the basics and to illustrate some important concepts. Due to time limitation, we may not be able to tal about some of the finer points of Pthreads programming. !his is not a comprehensive tutorial. !he goal is to get you "uic ly familiar with multi-threaded programming concepts and Pthreads programming. #f you haven't wor ed with threads before, this tutorial should serve as a good starting point. #f you are more e$perienced, you may or may not benefit from this tutorial% there will be some discussion that may interest those of you with nowledge and e$perience of the topic. !his set of slides will serve as our guide. &nce we get to the e$ample programs the activity will also involve actually running and changing the programs, in order to have some fun and see first-hand how Pthreads actually wor .

Concurrent systems and multi-threaded programming have been with us a number of years.

'ecent developments (clusters, multi-core microprocessors) have led to renewed interest in concurrent programming.

New Multicore Processors

*un +icrosystems ,iagara - ./ cores #0+ Cell 0roadband 1ngine (C01) - up to 2 cores Paralla$ Propeller Chip - 3 cores #ntel Core / Duo, 4uad Core - / and 5 cores 6+D Dual Core (soon to release "uad core)

#n the past, support for concurrent programming was uneven among vendors. Concurrent programming environments and tools were mostly confined to specialized applications.

'eal-time operating systems, in particular, strongly supported concurrent programming, due in some cases to a lac of memory protection. +any real-time systems also demand pea responsiveness.

Why Bother with Concurrent Programming?

!he current trend in microprocessor design is parallelism.

Cloc speed wasn't the answer, and 7hit the wall8.

+ulti-core CP9s increase both performance and energy efficiency.

Des top and noteboo computers are now parallel machines.

Why Bother with Concurrent Programming?

#f your software application demands high performance and:or responsiveness, you should consider enabling it to ta e advantage of the parallelism inherent in today's computer systems.

Why Bother with Concurrent Programming?

,ot all software benefits from concurrency. Profile your application. Consider the cost:benefits.

!he learning curve is not as bad as you may thin .

What types of software benefit?

!he following conditions mar an application as a candidate for concurrency;

<arge numbers of asynchronous events.

0loc ing or waiting is fre"uent and tas s independent of those in the wait state are critical.

Program is compute-intensive (7CP9 0ound8) and it is possible to decompose the wor .

=our program needs to feel responsive to users. 1ven a simple program with a graphical user interface can benefit from running the >9# in a separate thread. !here is a nice e$ample of this in the &''eilly Python Coo boo , where a ! in er >9# runs in a separate thread.

Concurrent Software Architecture

!wo main approaches to concurrent programming;

Process +odel !hreading +odel

Process Model
6pplication distributed across multiple processes.

Processes wor in parallel, running on separate e$ecution units (CP9s or cores.)

9ni$ a great platform for Process +odel applications, due to a good set of interprocess functionality (shared memory, semaphores, process control)

hreading Model
Parallelism achieved within a single process.

6dvantages; <ess time to conte$t switch. Cleaner synta$ (in my opinion). Creating a thread is faster than creating a process. <ess memory usage than multiple processes. 6 threaded process is easier to manage.

!ser Space "s# $ernel Space hreads

9ser *pace !hreads (79ser !hreads8); Controlled by a scheduler lin ed into the process. Coordination with ernel scheduler re"uired. 7Dynamic8 languages (Perl, Python, 'uby) use 9ser !hreads (each with their own semantics.) ?ava utilizes 9ser !hreads (called 7green threads8.)

@ernel *pace !hreads (7@ernel !hreads8); Controlled only by ernel scheduler. When supported well by the &*, the better way to implement threading.

hreads in %inu&
&riginal <inu$!hreads library was bro en and not fully P&*#A compliant.

,e$t >eneration P&*#A !hreads (,>P!) was offered as a replacement for <inu$!hreads.

,ative P&*#A !hread <ibrary (,P!<) was introduced in /BB., along with ernel modification. 4uic ly became the new <inu$ ernel library. <inu$ now supports Pthreads in the ernel. 9lrich Drepper and #ngo +olnar of 'ed Cat were the heroes behind ,P!<.

'endor Support

!hreading support varied among vendors.

#n D22E, a P&*#A standard, PDBB..Dc ( nown as 7Pthreads8) was issued, to standardize multi-threaded programming. !his standard has largely remained the same, with some minor changes.

Fendor support has improved (but # still don't trust themG)

<inu$ supports threading well. Cey, this is &*C&,G Who cares about proprietary operating systemsHG (especially those posing as open) *o let's Iust use <inu$ (&.@., maybe 0*D or +ac &* A.) # feel better alreadyG

Current rends in Multi( hreaded Programming &pen+P ( is a tool it that aims to simplify multi-threaded programming in C:CJJ and K&'!'6,.

*ome compilers now have switches that cause generation of parallel code, without re"uiring any special programming by the developer. *un +icrosystems has this in their C:CJJ compiler. *un's motto is 7let the compiler handle the complications of producing multi-threaded code.8

Current rends in Multi( hreaded Programming

Cas ell (www.has is a language that aims to simplify multithreaded programming. !his afternoon, in DD.L from D;.B to E;BB pm, a Cas ell tutorial will be presented.

!ransactional +emory is a techni"ue to Iournal memory #:&, which promises to alleviate some of the potential pitfalls of multi-threaded programming. !here will be a short tal about this on Wednesday, 2;.Bam-2;5Epm, Portland 0allroom.

!he >,9 organization has developed a library for C, called >,9 Pth (for >,9 Portable !hreads, see Pth was a foundation of the abandoned ,>P! proIect.

>,9 also has their own &pen+P proIect (called >&+P,) which appears to be inactive at this time.

!here are other &pen+P implementations.

Current rends in Multi( hreaded Programming

#ntel Corporation has created a tool it, a CJJ class library, to simplify mutli-threaded programming and also has a debugger and profiler for threaded software. #ntel plans to release a new open-source product for multi-threaded development at this &*C&, (!uesday, 3;.B am - D/;BB noon, DD.2-D5B.)

6 new &''eilly boo has Iust been released, titled #ntel !hreading 0uilding 0loc s.

)ntel hreading Building Blocks * BB+

#ntel has tied together a number of different parallel programming concepts and implemented them in !00.

!00 is only available for #ntel processors and those that emulate them (i.e. 6+D). #ntel may open-source !00, in which case you will be able to rebuild it for different platforms.

#f #ntel open-sources !00, be sure to chec whether critical performance functionality has been moved into the #ntel compilers. 6lso carefully chec licensing.

Do your own testing and benchmar ing of !00.

)ntel hreading Building Blocks * BB+

!00 contains different types of mute$es, for different situations.

!00 seems to do a good Iob supporting manipulation of comple$ data structures.

!00 seems to be aimed at parallel mathematical operations.

Comple$ity doesn't go away, it Iust gets e$pressed differently. #n terms of coding, !00 is in some ways more comple$ than wor ing with Pthreads.

!00 is not standards compliant. #t is a proprietary 6P#.


Why Pthreads?
Pthreads is a good way to start learning about concurrent programming.

Pthreads is an international standard. =1*, !C6! #* 6 >&&D !C#,>G

<inu$ supports Pthreads well and is getting better. !he recently released /.L.// <inu$ ernel improves thread performance.

Why Pthreads?

Pthreads get the Iob done. 1$amples of successful Pthreads applications; 6pache +y*4<

Pthreads are a central component of MyS,%

+y*4< has created a CJJ class to handle threads (class !CD). !he !CD class is used to handle all client connections, and for other purposes (replication, delayed insert). !he &''eilly 9nderstanding +y*4< #nternals boo describes the !CD class in some detail.

MyS,% is a strong endorsement for Pthreads

+y*4< is perhaps the strongest endorsement for Pthreads. Pthreads get the Iob done, and do it well, for this high-performance database.

What book should ) read to learn Pthreads?

!he &''eilly Pthreads Programming boo does an outstanding Iob of teaching Pthreads and fundamental concurrent programming concepts. While it may seem dated, most of the material is still relevant.

Pthread Attributes
creation scheduling synchronization (mute$es, read:write loc s, semaphores, condition variables) data access ( eys) signal delivery and handling cancellation M destruction

-ow .o ) !se Pthreads?

!he bare essentials; #include <pthread.h> pthread_t // pthread thread handle pthread_create() pthread_detach() -or- pthread_join() pthread_cancel() pthread_exit() Keep in mind that the main() function is also a thread! It is su ject to the same eha!iour as the other threads.

#include <pthread.h> pthread_t m"_thread# pthread_attr_t thd_attri # pthread_create($m"_thread% $thd_attri % $m"_function% $parameter)# !oid m"_function(& parameter) ' // do some stuff... ( 'arameters are (assed by re)erence, so i) you need to (ass multi(le (arameters to your )unction, you must (lace them in a structure and (ass a (ointer to it. *he thread handle +my,thread- is used )or all subse.uent re)erences to the thread. *he attribute (arameter +2nd (arameter- can be /0LL, or a (thread,attr,t 1ariable that has been initiali2ed and set. *his is how the (riority and schedulin3 attributes o) a thread are set.

// )he important header is pthread.h% #include <pthread.h> // ,-._/0_)123456 controls ho7 man" threads our examples 7ill create. #define NUM_OF_THREADS #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include <stdio.h> <stdli .h> <strin8.h> <unistd.h> <si8nal.h> <fcntl.h> <s"s/t"pes.h> <s"s/stat.h> <s"s/soc9et.h> <netd .h> <s"s/un.h> <s"s/ipc.h> <s"s/shm.h> <netinet/in.h> <s"slo8.h> ut let*s include a unch of stuff+

#include :our_include_file.h: !oid m"_function(int &)# int main() ' int num er_of_threads ; ,-._/0_)123456# pthread_t thread_handle!NUM_OF_THREADS"# int thread_id<,-._/0_)123456=# int i# for (i ; ># i < num er_of_threads# i??) ' thread_id<i= ; i# pthread_create$%thread_handle!i"& NU''& $()id *+ ,-_functi)n& $int *+ %thread_id!i"+# // 7e should call pthread_detach() here% ut 7e don*t to sho7 the // pro8ram 7ill 7or9 an"7a" ( ut ma" end up not releasin8 s"stem resources.) ( for(##) ' // if 7e don*t 8o into this loop% the pro8ram 7ill exit // child threads ha!e completed their jo s. ( ( // end main() !oid m"_function(int &m"_thread_id) ' printf(:1ello from thread #@i!An:% &m"_thread_id)# ( // end m"_function() efore the

Ncc -pthread -o pthreadOtest pthread.c - or Ncc -lpthread -o pthreadOtest pthread.c !hen, try running it; N.:pthreadOtest

Basic Synchroni1ation
pthreadOIoin() Calling thread waits for Ioined thread to terminate.

6lso has the effect of freeing thread resources.

#f you don't call pthreadOIoin(), you need to call pthreadOdetach(), which also frees thread resources.

#include :our_include_file.h: !oid m"_function(int &)# int main() ' int num er_of_threads ; ,-._/0_)123456# pthread_t thread_handle<,-._/0_)123456=# int thread_id<,-._/0_)123456=# int i# for (i ; ># i < num er_of_threads# i??) ' thread_id<i= ; i# pthread_create($thread_handle<i=% ,-BB% (!oid &) m"_function% (int &) $thread_id<i=)# pthread_.)in$thread_handle!i"& NU''+# ( // in this example% the calls to pthread_join() 7ait for the child threads // to complete% so 7e don*t put the main thread into an endless loop. ( // end main() !oid m"_function(int &m"_thread_id) ' printf(:1ello from thread #@i!An:% &m"_thread_id)# ( // end m"_function()

hreads can work together

Comple$ numeric calculation is a good use for threads. We'll create a simple calculation (and pretend it is a comple$ one), to illustrate this.

/// ,o7% the threads 7ill collecti!el" perform a calculation #include :our_include_file.h: !oid m"_function(int &)# int result ; ># int main() ' int num er_of_threads ; ,-._/0_)123456# pthread_t thread_handle<,-._/0_)123456=# int inte8er_!alue<,-._/0_)123456=# int i# for (i ; ># i < num er_of_threads# i??) ' inte8er_!alue<i= ; i# pthread_create($thread_handle<i=% ,-BB% (!oid &) m"_function% (int &) $inte8er_!alue<i=)# pthread_.)in$thread_handle!i"& NU''+# ( printf(:)he calculation result is+ @iAn:% result)# ( // end main() !oid m"_function(int &m"_inte8er_!alue) ' re/ult 01 *,-_inte2er_(alue# ( // end m"_function()

Pthread Barrier Synchroni1ation

6nother type of synchronization is the Pthread 0arrier;

pthreadObarrierOt :: data type pthreadObarrierOinit() pthreadObarrierOwait() pthreadObarrierOdestroy()

/// Ce no7 use a Dthread arrier to s"nchroniEe the threads #include :our_include_file.h: !oid m"_function(int &)# int result ; ># pthread_3arrier_t 3arrier# int main() ' int num er_of_threads ; ,-._/0_)123456# pthread_t thread_handle<,-._/0_)123456=# int inte8er_!alue<,-._/0_)123456=# int i# lon8 expected_result# pthread_3arrier_init$ %3arrier& NU''& $ nu,3er_)f_thread/ 0 4 + +# for (i ; >% expected_result ; ># i < num er_of_threads# expected_result ?; i% i?? ) ' inte8er_!alue<i= ; i# pthread_create($thread_handle<i=% ,-BB% (!oid &) m"_function% (int &) $inte8er_!alue<i=)# pthread_detach$thread_handle!i"& NU''+# ( pthread_3arrier_5ait$%3arrier+# printf(:)he calculation expected result is+ @iAn:% expected_result)# printf(:)he calculation actual result is+ @iAn:% result)#

pthreads/3(2#c *cont#+
pthread_3arrier_de/tr)-$%3arrier+# pthread_exit(,-BB)# ( // end main() !oid m"_function(int &m"_inte8er_!alue) ' result ?; &m"_inte8er_!alue# pthread_3arrier_5ait$%3arrier+# ( // end m"_function()

Pthread Synchroni1ation with Mute&es

!he Pthread +ute$;

pthreadOmute$Ot :: data type pthreadOmute$Oinit() pthreadOmute$Oloc () pthreadOmute$Ounloc () pthreadOmute$Odestroy()

// Ce ha!en*t protected the result !aria le in the pre!ious examples. // ,o7% 7e 7ill protect result, usin8 a Dthread .utex loc9. #include :our_include_file.h: !oid m"_function(int &)# lon8 result ; ># pthread_ arrier_t arrier# pthread_,ute6_t )ur_,ute6# int main() ' int num er_of_threads ; ,-._/0_)123456# pthread_t thread_handle<,-._/0_)123456=# int inte8er_!alue<,-._/0_)123456=# int i# lon8 expected_result# pthread_ arrier_init( $ arrier% ,-BB% ( num er_of_threads ? F ) )# pthread_,ute6_init$%)ur_,ute6& NU''+# for (i ; >% expected_result ; ># i < num er_of_threads# expected_result ?; i% i?? ) ' inte8er_!alue<i= ; i# pthread_create($thread_handle<i=% ,-BB% (!oid &) m"_function% (int &) $inte8er_!alue<i=)# pthread_detach(thread_handle<i=)# ( // end for pthread_ arrier_7ait($ arrier)#

pthreads/3(3#c *cont#+
printf(:)he calculation expected result is+ @iAn:% expected_result)# printf(:)he calculation actual result is+ @iAn:% result)# pthread_ arrier_destro"($ arrier)# pthread_,ute6_de/tr)-$%)ur_,ute6+# pthread_exit(,-BB)# ( // end main() !oid m"_function(int &m"_inte8er_!alue) ' pthread_,ute6_l)c7$%)ur_,ute6+# result ?; &m"_inte8er_!alue# pthread_,ute6_unl)c7$%)ur_,ute6+# pthread_ arrier_7ait($ arrier)# pthread_exit((!oid&) >)# ( // end m"_function()

Signal -andling with Pthreads

!hreads have their own signal mas s. 6ll threads share the sigaction structure of the process. *ynchronously generated signals are delivered to specific threads (i.e. e$ceptions are handled by the offending thread, a pthreadO ill() signal is delivered to the targeted thread. 6synchronously generated signals are delivered to a thread that can handle the signal. !hreads can define their own signal handlers, but their usefulness is limited by not being able to loc mute$es. #n the end, it is best to simply handle signals as usual.

Shared .ata
*hared data is the bugaboo of parallel programming. =ou'll encounter constant warnings to avoid e$cessive shared data reads. >et over it. 6ll computer systems share resources, in different areas. =es, you should minimize shared data access, but at some point your threads will need to share some data. While on the subIect of shared data...

)BM Cell Broadband 4ngine *CB45 or the 6Cell7+

!he C01 has a main e$ecution core and up to 3 7*ynergistic Processing 1lements8 (*P1s). *P1s are e$ecution cores. What ma es them different and special, is they each have their own memory. !his alleviates much of the trouble associated with shared memory. C01 has very fast data buses, to facilitate message passing and broadband data transmission. # find C01 the most interesting new processor architecture.

)BM Cell Broadband 4ngine *CB45 or the 6Cell7+

)BM Cell Broadband 4ngine *CB45 or the 6Cell7+ #0+ has ta en a hybrid approach to programming C01, adding a library with functions applying to the *P1's. 1ach *P1 has dedicated memory. !his greatly reduces the need for threads to share physical memory.

System Considerations for Concurrency8Parallelism

!o achieve the best results from concurrency, the entire computer system needs to be designed with as much parallelism as possible. !his means things such as;

Parallel networ interface controllers *imultaneous memory access Dedicated memory for e$ecution cores

1ven so, there are always going to be bottlenec s, there is no way to escape this. !he way to minimize the effect of bottlenec s is to increase processing speed at bottlenec points. !hus, system design becomes an e$ercise in balancing bandwidth and concurrency.

Problems to a"oid when programming threads

'ace Conditions Priority #nversion Deadly 1mbrace

!hread unsafe libraries (due to reentrancy and thread cancellation)

'unning out of process resources

hings to do when working with Pthreads

@eep things as clean and simple as possible +inimize shared data

9ther opics *time permitting+

!hread *cheduling !hread Cancellation !est harnesses for our programs Debugging with >,9 gdb Condition Fariables Process resources and thread attributes Cleanup functions !hread cancellation *cheduling Pthreads *etting process resources

. P hours is only enough time to introduce the topic and loo at some simple Pthreads e$amples. Pthreads programming is "uite different than se"uential programming. #t involves thin ing a bit differently about what is happening with your code and is also a bit tric y. +any people are comparing concurrent programming to obIect-oriented programming, in that it re"uires a change of thin ing to accommodate the paradigm shift. # thin this is a good analogy. Despite uneven implementation among vendors, Pthreads is proven technology that powers some of the most pervasive and successful infrastructure software.

Summary *cont#+
&f course, Pthreads isn't for everyone. #f you enIoy (out of either pleasure or necessity) the advantages of wor ing with dynamic languages, then by all means continue wor ing with them. =our programs can still benefit from the threading mechanisms of those languages. *ome people have developed Python e$tensions, written in C, that allow them to access Pthreads from their Python programs. !hose of you developing server software, demanding high performance, should definitely ta e a close loo at Pthreads. Download +y*4< source code and see how they have harnessed Pthreads. #f your software is mathematically intensive, you should ta e a loo at #ntel !hreading 0uilding 0loc s (!00).

!han" you #or coming$ en%oy the rest o# the sho&'

Adrien Lamothe adrien(adriens&e)'com