You are on page 1of 5

# Is Parallel Programming Hard?

"Is Parallel Programming Hard?" I believe that at the right level of abstraction, it is not and in fact can be as easy as sequential programming. Note, however, that "at the right level of abstraction" should be considered with care. Pragmatically there are some problems today with taking advantage of the "easiness" of parallel programming. Let me describe what I consider the three main problems and why I expect these can and will be solved.

Parallel Thinking: The first problem is that programmers have been trained for years in sequential thinking. Most of us have been taught over and over since our very earliest days of programming how to think about programming in a sequential way. This permeates just about everything we do in code development, from the abstract description and analysis of algorithms, to the coding of these algorithms, and from debugging and verifying code, to the way that we read code. Wheat from Chaff: The second problem is that most existing parallel programming environments do not properly separate the wheat from the chaff. Parallel programming today often involves a huge number of decisions that have little to do with the inherent parallelism at hand and more to do with specifics of the particular programming model, compiler, or machine type. Since the various choices are not cleanly separated, the programmer is left with the sense that parallel programming is a hugely complicated task. Determinism: The third problem is a lack of support for deterministic parallelism and overuse of non-deterministic parallelism. By deterministic parallelism I mean a parallel program for which the results and partial results are the same independent of how the computation is scheduled or the relative timing of instruction among processors. In non-deterministic parallelism, the results depend on such scheduling and timing.

These three problems are not unrelated, as I will return to at the end, but let me first address each in more detail. Sequential Vs. Parallel Thinking When we think of a summation in mathematics we often write it using the Sum symbol. In studying mathematics most of us do not think of it as a loop but rather as just the sum of elements---we implicitly understand that addition is associative and the order does not matter. But yet many programming courses spend weeks describing how to write loops to take sums of elements and similar tasks. Guy Steele pointed out in a talk that once you put an x=0 in your code, you are already doomed to think sequentially. I argue that there

are many problems in which we are taught to do in some sequential way, but when thought about abstractly the parallel solution is natural. Quick sort is my favorite example of an algorithm that is naturally extremely parallel but rarely taught as such. Here is the code for Quick sort from Aho, Hopcroft and Ullman's textbook "The Design and Analysis of Computer Algorithms", a book for us old timers:
procedure QUICKSORT(S): if S contains at most one element then return S else begin choose an element a randomly from S; let S1, S2, S3 be the sequences for elements in S less than equal to, and greater than a, respectively; return (QUICKSORT(S1) followed by S2 followed by QUICKSORT(S3) end

The interesting property of the pseudo code is that whether they intended to or not the authors wrote a highly parallel description of the algorithm. In the code there are two forms of parallelism. Firstly there is parallelism from the recursive calls to Quick sort. The code does not indicate that these calls need to be executed sequentially---it only says how the results should be put together. Secondly, there is parallelism from the fact that when selecting the elements in S1, similarly S2 and S3, we can compare all the keys to the pivot a in parallel. In combination these two forms of parallelism lead to an algorithm that is extremely parallel---in a theoretical sense it can effectively take advantage of O(n log n) processors. This means that in theory when sorting 20 million keys we could effectively take advantage of over a million processors. Practice might be less generous, but we would be happy with enough parallelism for 100,000 or 10,000 processors. Yet most of our classes teach Quick sort without pointing out it is parallel. We carefully go through the sequential execution and analysis and argue it takes O(n log n) expected time. Most students are left clueless of the amount of parallelism in the algorithm, and certainly on how to analyze the parallelism. For example, what if one only took advantage of one or the other form of parallelism, how much is left. Separating the Wheat from the Chaff In the early days of sequential programming people apparently programmed with gotos, important languages did not have recursion requiring the user to use a stack, and it was reasonably common to go into assembly code. I'm often surprised to go back and read some of the algorithm papers from 40 years ago. They typically spend pages describing various details we would never spend time on today (e.g. a memory allocation). The typical programmer today almost never thinks of gotos, simulating recursion on how to implement memory management. It seems that parallel programming today has some of the same issues as sequential programming had in the early days. In the interest of performance there has been

significant emphasis on low-level details and often these low level details are not properly separated from the high-level ideas in parallelism. Various libraries and languages extensions require the user, for example, to specify stack sizes for forked tasks, implement their own schedulers or specify scheduling policies, or understand mapping of memory to processors. When presented with such level of detail it both makes programming seem much harder and also obscures the core ideas in the rest. The programmer might be left believing that specifying stack size or scheduling policy when forking a task is an inherent necessity of parallel programming. Deterministic vs. Non-Deterministic Parallelism Surely the most important property of a program is correctness. Therefore a major concern about parallel programming should be that it will make it harder to write and maintain correct programs. Indeed it would be quite unfortunate if parallel programs were less reliable. Most programmers who have done some coding of parallel or concurrent applications, however, have experience with how much more difficult it can be to convince oneself that such a program is correct or to debug the program. Correctness of of parallel programs is therefore a legitimate concern. The problem with developing correct parallel programs is mostly to do with nondeterminism introduced by the different interleaving of instructions in parallel streams. Different executions might interleave the instructions differently depending on many factors including what else is running on the machine, how many processors are being used, the initial state of the machine, the layout of the input in memory and its effect on cache or page misses, or varying delays in communication networks. A program might work correctly on odd numbers of processors but incorrectly on even. It might seem to work on Tuesdays but not Wednesdays. Such non-determinacy makes regression testing of code of questionable utility. It also makes debugging much more difficult since a bug might not be repeatable. Non-determinacy is not only a problem with testing and debugging programs, but in verifying their logic. Whether the verification is done in ones head, on the back of an envelope, in a lemma in a paper, or using automated verification tools, the nondeterminism requires considering all feasible interleaving. Many bugs have been found in published algorithms for concurrent data structures, and it is well known in the verification community that verifying programs with arbitrary interleaving is much more difficult than verifying a sequential program. Does this mean that we are doomed? The answer is no. The fact of the matter is that programs or parts of programs that don't involve non-determinacy from the environment do not need to be non-deterministic. Furthermore it is known how to have a programming model and/or development environment enforce determinacy. If we consider the Quick sort code above, although it is parallel, the logic is such that it will always return the same answer independent of how it is run. The programmer need not consider interleaving.

As described in a previous post on this blog one way to do get deterministic programs is to have a serial semantics to the program. In this approach the programmer reasons about the correctness with regards to a particular serial ordering of the program (typically a depth-first traversal of the fork-join structure). The language then makes a guarantee that if the program does not have race conditions, then any feasible interleaving of the execution will return the same result as this sequential ordering. In the case of Cilk++ race conditions can be checked using a race detector. This is perhaps not a perfect solution since the races can only be detected on the test cases, but is as good as any regression testing. Also it is much easier to verify that a program or program fragment does not have race conditions than to verify correctness in general. In Summary: It is all About Abstraction Although presented separately, I believe the three problems I described are really three faces of the same problem. The problem is one of abstraction---both identifying the right abstractions and working at the right level of abstraction. In the case of sequential vs. parallel thinking, I believe that at the right level of abstraction they are really not that much different. In both the example of summation and of Quick sort when expressed at the right level the code is either sequential or parallel. This is not to say that all algorithms are naturally parallel when abstracted. In the case of merging two sequences, for example, the standard algorithm that starts at the front of each sequence is inherently sequential. On the other hand there is a very simple and efficient divide-and-conquer parallel algorithm. To convert programmers to parallel thinking will require some effort and time, but it is actually an opportunity to simultaneously raise the level of abstraction at which programmers approach problems. In the case of separating the wheat from the chaff, it is really about identifying the right abstractions for parallel programming and avoiding or depreciating other non-essential issues. Of course there is not universal agreement on what the important issues are, but it is important to error in the direction of fewer core ideas than more ad hoc ideas even at the risk of loosing some performance. My opinion is that as a programming language and runtime system C++ does an excellent job at separating the wheat from the chaff. It introduces a small set of parallel constructs each with few if any options and presents a simple model for analyzing performance and correctness. Indeed I think that divide-andconquer parallelism, parallel loops, analyzing performance in terms of work and span, and analyzing correctness using sequential semantics are perhaps the four most important concepts in deterministic parallel programming. In the case of non-determinism, I also believe this is an issue of abstraction. Under the hood the program is always going to be non-deterministic since there is no reasonable way to run a parallel program in lock-step in exactly the same way on every run. Beyond the technical problems, such lock-step execution would likely be terribly inefficient. The idea is therefore to abstract away the non-deterministic behavior so that the programmer only sees deterministic behavior and any non-determinacy is hidden in the run-time system. Daring programmers might expose non-determinacy at some levels but then

encapsulate it into deterministic components. Even in non-deterministic environments where a program is interacting with concurrent request from the outside world, the majority of the program should be deterministic and non-determinism should be limited to what is needed by the task. I think this topic needs more study and many parallel programmers need significantly more education on the implications of non-determinacy, and approaches to deterministic parallel programming. In summary I believe that when parallel programming is brought to the right level of abstraction it is not inherently difficult. It requires parallel thinking; simple programming constructs, and limited use of non-determinism.