You are on page 1of 6

64-bit Insider

Volume I, Issue 14

The 64-bit Advantage Introduction to optimization


The computer industry is
changing, and 64-bit technology
for multi-core and hyper-
is the next, inevitable step. The
64-bit Insider newsletter will help threaded processors
you adopt this technology by As 64-bit processors become more widespread, so also
providing tips and tricks for a
successful port. do multi-core processors. Many PC’s today already have
multi-core 6-bit processors installed in them. In servers,
Development and migration of 64- multiple multi-core processors are often present. And to
bit technology is not as add to the mix, many of these processors also support
complicated as the 16-bit to 32-bit hyper-threading. How can these technologies help my
transition. However, as with any
new technology, several areas do performance? What are the differences between these
require close examination and technologies? How can I be sure that my applications
consideration. The goal of the 64- take advantage of these technologies effectively? This
bit Insider newsletter is to identify newsletter will start to answer these questions and will
potential migration issues and give you pointers to where to go to get further
provide viable, effective solutions
to these issues. With a plethora of information.
Web sites already focused on 64-
bit technology, the intention of Multi-processor, multi-core, & hyper-threading:
this newsletter is not to repeat What are they?
previously published information. Normally a processor reads instructions from RAM into
Instead, it will focus on 64-bit
issues that are somewhat isolated one or more caches on the processor itself before loading
yet extremely important to the instruction into the execution core and executing it. A
understand. It will also connect processor’s speed is often measured by the number of
you to reports and findings from “clock cycles” that it produces per second. However, it is
64-bit experts. also true that the more instructions that a processor can
execute per clock cycle the faster it will be.

64-bit Insider Newsletter


Volume 1, Issue 14
Page 1/6
With a single processor system, the processor
can normally read a single instruction into the
execution core per “clock cycle”. However,
many instructions take more than a single

CACHE

RAM
EC
clock cycle to execute due to something called
memory latency, for example a “load”
instruction must read data from memory
which can take many clock cycles to arrive at
Fig. 1. Operation of a single core processor the processor. The result is that a lot of time
with no hyper-threading the processor is idle. This means that the
instructions are executed and the results of the
execution are committed to memory much less than one instruction per clock cycle.

Hyper-threading is an Intel technology that


enables the processor to execute alternative
instructions when it would normally be idle.
It is still a single processor but the processor

CACHE

RAM
presents itself as two logical processors to EC
allow the operating system to schedule
separate “threads” of instructions for
execution on each logical processor. Now
when one of the instructions for the first
logical processor pauses while it waits for Fig. 2. Operation of a single core processor
with hyper-threading
data to arrive from main memory, the
execution core can execute instructions that
were scheduled for the other logical processor.

A multi-core processor also presents


itself as two logical processors to the
operating system. However, this time
there really are two execution cores on
CACHE

the processor. These two cores can


RAM

EC EC
execute separate “threads” of
instructions in a truly simultaneous
fashion but they are located on a single
processor die and therefore use up less
Fig. 3. Operation of a multi-core core processor space. Unlike a situation where there are
with hyper-threading two processors in a system, the two
execution cores share the some of the
same cache which can be very helpful if the two threads of execution are executing the
same instructions and if they are working on the same set of data.

How do I take advantage of these technologies?


The operating system can easily take advantage of multiple processors because it deals
mostly with whole processes. So, it can schedule one process to run on one processor and
another process to run on another processor. However, if there is only one process

64-bit Insider Newsletter


Volume 1, Issue 14
Page 2/6
executing, how can it distribute the work done by this process among multiple logical or
physical processors?

A key term that appears several times in the description of the different types of
processors, above, is “threads”. Creating several threads of execution means writing a
program that at some point splits into two separate sequences of code, in such a way that
both pieces of code continue to run at the same time. A program written in this way is
called a multi-thread program and it explicitly demarcates independent sequences of
instructions that the operating system can have run on different processors.

Multi-threaded programming has been around for a long time. It involves creating several
“threads” of execution inside your application that all run at the same time. Most modern
applications use them today even if you are not fully aware of them. For example,
ASP.NET web applications normally consist of a single thread of execution for each
HTTP request that is received. Multiple simultaneous requests mean multiple
simultaneous threads of execution.

Also, they allow you to create highly responsive GUI applications that perform long
compute intensive processes but still respond appropriately to user input. For example,
threads allow you to create a progress bar that is painted correctly on the desktop while it
simultaneously scans the hard-disk for viruses.

Adding threads to your applications


To add threads to your application you need to use some mechanism to tell the operating
system that you wish to do so. Running processes are scheduled by the operating system.
That is, they are given time on the processor depending on how many other processes are
running on the operating system. Similarily, multiple threads of execution within a
process must be scheduled by the operating system.

There are two ways to add threads to your program. One is to use a specialized API like
the Windows API or the System.Threading namespace of the .NET framework to create
and control threads manually. The other is to use compiler directives as defined by the
OpenMP standard to have the threads created for you automatically.

Threading in .NET
The .NET Framework provides a set of classes to enable multithreaded programming. In
it’s most basic form starting a thread is just a matter of creating a Thread object, passing
it the name of a method that will do the work of the thread and then calling it’s Start()
method. When the Start() method returns there are two threads of execution. Have a look
at the following example:
class ThreadingExample
{
static void Main(string[] args)
{
ThreadStart work = new ThreadStart(printEvens);
Thread thread = new Thread(work);
thread.Start();

printOdds();
64-bit Insider Newsletter
Volume 1, Issue 14
Page 3/6
System.Console.WriteLine("Done.");
}

private static void printEvens()


{
for(int i=0; i<=10; i+=2)
{
System.Console.WriteLine(i);
Thread.Sleep(100);
}
}

private static void printOdds()


{
for(int i=1; i<10; i+=2)
{
System.Console.WriteLine(i);
Thread.Sleep(100);
}
}
}

All .NET applications start running in what’s called the “main” thread which continues to
run normally until the Main() method finishes. This simple program creates a new thread
from within Main() that it has display the even numbers from 0 to 10.
1
While that new thread is running, the main thread is still running too. 0
The main thread displays all the odd numbers from 0 to 10 and then 3
2
finishes. When both threads are finished their work the application 5
terminates. 4
7
6
On the right is an example of what might be printed by this program. 8
9
This is just a sample output because in fact, each time the program runs 10
it may print something else. In fact, sometimes, the message “Done” is Done.

not the last thing printed. This demonstrates nicely a characteristic of


multithreaded programming that can cause a lot of difficulty for more complicated tasks.
Without additional synchronization, it is not possible to tell beforehand in what order
multiple threads will execute. Also, if the threads share data, global variables for
example, then care needs to be taken to coordinate access to that data. Look out for a
future edition of this whitepaper when we will cover synchronization in more detail.

Threading with OpenMP


Sometimes you do not need a lot of control over multiple threads all doing different
things. In fact, in most cases you simply have a tedious procedure to complete that takes a
long time and could be done faster if different parts of the procedure could be executed
on different threads. Splitting up a procedure so that different parts can be executed in
parallel is called parallelization.

This situation is so common, in fact, that a standard for parallelizing pieces of code
without writing complex thread management logic was developed called OpenMP.
OpenMP consists of a set of compiler directives called pragmas, and specialized
functions that are used, more often than not, to split up the work done in for loops in
C++.

Let’s look at an example.


64-bit Insider Newsletter
Volume 1, Issue 14
Page 4/6
void multiply_vectors(float a[][COLS], float b[], float result[])
{
#pragma omp parallel for
for (int i = 0; i < ROWS; i++)
{
result[i] = 0;
for (int j = 0; j < COLS; j++)
result[i] += a[i][j] * b[j];
}
}

In this example, we have some C++ code that multiplies a matrix and a vector. If these
objects are very large then it might make sense to perform some of the matrix
multiplication in different threads. The great thing about OpenMP is that we can test this
theory with just a single line of code!

The pragma in this sample code tells the compiler to create a set of threads before the for
loop begins and to distribute the iterations of the loop evenly among all the threads.

Just like in the API example we have potential problems when data is shared between the
OpenMP threads. In this example, there is no problem because there are no dependencies
between the iterations of the loops and every iteration writes to a different part of the
result array.

There are many options in OpenMP to configure some aspects of the parallelism. For
example, you can specify things like how many threads are created, how many iterations
are given to each thread, and how to share data between the threads so as to avoid
conflicts. However, flexibility is currently limited. For example, you would find it
difficult to use OpenMP to manage elements of your user interface.

OpenMP is supported by the Visual C++ 2005 from Microsoft. It is also supported by the
Intel C++ compiler. Another advantage of OpenMP is that it is portable. It is a standard
that will be understood by many different compilers on different platforms. And those
that do not understand OpenMP can ignore it.

Summary
To take advantage of the multiple physical and logical processors available in 64-bit
systems and in some 32-bit systems you need to understand the concept of threads and
you need to implement threads in your own application.
API’s exist for most languages on Windows that allow you to create threads in your own
programs. Two common API’s are the one in .NET and the standard Windows API. Also
a standard called OpenMP defines pragmas that can be used to create threads in a more
declarative fashion.

In a future newsletter we will look at the issues that surround synchronizing multiple
threads and how to identify and resolve those issues.

URLs

64-bit Insider Newsletter


Volume 1, Issue 14
Page 5/6
What is Multicore?
http://en.wikipedia.org/wiki/Multicore

What is hyperthreading?
http://en.wikipedia.org/wiki/Hyper-threading

Reap the Benefits of Multithreading without All the Work -


http://msdn.microsoft.com/msdnmag/issues/05/10/OpenMP/

Multithreading for Rookies -


http://msdn.microsoft.com/library/default.asp?url=/library/en-
us/dndllpro/html/msdn_threads.asp

64-bit Insider Newsletter


Volume 1, Issue 14
Page 6/6

You might also like