Lec 4 Superscalarprocessor Updated PDF

Multithreading,
Superscalar,
Intel's HT
Contents…
2
 Using ILP support to exploit

thread –level parallelism
 performance and efficiency in

advanced multiple issue
processors
Threads
3
 A thread is a basic unit of CPU utilization.
 A thread is a separate process with its own instructions and data.
 A thread may represent a process that is part of a parallel program

consisting of multiple processes, or it may represent an
independent program.
Threads
4
 It comprises of a thread ID, a program counter, a register set and a

stack.
 It shares its code section, data section, and other operating-system

resources, such as open files and signals with other threads
belonging to the same Process.
 A traditional process has a single thread of control. If a process has

multiple threads of control, it can perform more than one task at a time.
Threads
5
 Many software packages that run

on modern desktop PCs are
multithreaded.
 For example:
A word processor may have:
a thread for displaying graphics,
another thread for responding to
keystrokes from the user, and
a third thread for performing spelling
and grammar checking in the
background.
Threads
6
 Threads also play a vital role in remote procedure call (RPC)

systems.
 RPCs allows interprocess communication by providing a

communication mechanism similar to ordinary function or procedure
calls.
 Many operating system kernels are multithreaded; several threads

operate in the kernel, and each thread performs a specific task, such as
managing devices or interrupt handling.
Multithreading
7
 Benefits:
1. Responsiveness: Multithreading is an interactive application that

may allow a program to continue running even if part of it is
blocked, thereby increasing responsiveness to the user.
For example: A multithreaded web browser could still allow user
interaction in one thread while an image was being loaded in another
thread.
2. Resource sharing: By default, threads share the memory and the

resources of the process to which they belong. The benefit of sharing
code and data is that it allows an application to have several different
threads of activity within the same address space.
Multithreading
8
 Benefits:
3. Economy: Allocating memory and resources for process creation is

costly. Since threads share resources of the process to which they
belong, they will provide cost effective solution.
4. Utilization of multiprocessor architectures: In multiprocessor

architecture, threads may be running in parallel on different processors.
A single threaded process can only run on one CPU, no matter how
many are available.
Multithreading on a multi-CPU machine increases concurrency.
Multithreading Models
9
 Support for threads may be provided either at the user level or at

the kernel level.
 User threads are supported above the kernel and are managed
without kernel support, whereas kernel threads are supported and
managed directly by the operating system.
10
 Many-to-One Model:
 The many-to-one model maps many user-

level threads to one kernel thread.
 Thread management is done by the

thread library in user space, so it is
efficient.
 Only one thread can access the kernel at

a time, hence multiple threads are unable to
run in parallel on multiprocessors.
11
 One-to-One Model:
 The one-to-one model maps each user

thread to a kernel thread.
 It provides more concurrency than the many-

to-one model. It allows multiple threads to run in
parallel on multiprocessors.
 The only drawback to this model is that

creating a user thread requires creating the
corresponding kernel thread.
 The overhead of creating kernel threads can

burden the performance of an application.
12
 Many-to-Many Model :
 The many-to-many model multiplexes many

user-level threads to a smaller or equal
number of kernel threads.
 The number of kernel threads may be specific

to either a particular application or a particular
machine.
 Developers can create as many user threads

as necessary, and the corresponding kernel
threads can run in parallel on a
multiprocessor.
Multithreading: ILP Support to Exploit
Thread-Level Parallelism
14
 Although ILP increases the performance of system; then also ILP

can be quite limited or hard to exploit in some applications.
Furthermore, there may be parallelism occurring naturally at a higher
level in the application.
For example:
An online transaction-processing system has parallelism among the
multiple queries and updates. These queries and updates can be
processed mostly in parallel, since they are largely independent of one
another.
15
 This higher-level parallelism is called thread-level parallelism (TLP)

because it is logically structured as separate threads of execution.
 ILP is parallel operations within a loop or straight-line code.
 TLP is represented by the use of multiple threads of execution that

are in parallel.
16
 Thread-level parallelism is an important alternative to instruction-

level parallelism.
 In many applications thread-level parallelism occurs naturally (many

server applications).
 If software is written from scratch, then expressing the parallelism

is much easy.
 But if established applications written without parallelism in mind,

then there can be significant challenges and can be extremely costly
to rewrite them to exploit thread-level parallelism.
19
 There are two main approaches to multithreading.
 Fine-grained multithreading &
 Coarse-grained multithreading
20
 Fine-grained multithreading:
 It switches between threads on each instruction, causing the

execution of multiple threads to be interleaved.
 This interleaving is often done in a round-robin fashion.
 To make fine-grained multithreading practical, the CPU must be

able to switch threads on every clock cycle.

21
 Coarse-grained multithreading:
 It was invented as an alternative to fine-grained multithreading.
 Coarse-grained multithreading switches threads only on costly

(larger) interrupt/process.
 This change relieves the need to have thread switching.
 The main difference between fine grained and coarse grained

multithreading is that, in fine grained multithreading, the threads issue
instructions in round-robin manner while in coarse grained
multithreading, the threads issue instructions until a stall occurs.
SCALAR PROCESSOR
 Scalar processors is classified as SISD

processor(single instruction,single data) .A scalar
processor processes only one datum at a time.
 In a scalar organization, a single pipelined functional
unit exists for:
 • Integer operations;
 • And one for floating-point operations (FLOPS);
 Functional unit:
 • Part of the CPU responsible for calculations

SUPERSCALAR PROCESSOR
 A superscalar processor is a CPU that implements
a form of parallelism called instruction-level
parallelism within a single processor.
 A superscalar CPU can execute more than one
instruction per clock cycle. Because processing
speeds are measured in clock cycles per second
(megahertz), a superscalar processor will be faster
than a scalar processor
 Ability to execute instructions in different
pipelines:
 • Independently and concurrently;
 execution scenario
 o:
Simultaneous multithreading(SMT)
•Mix of superscalar and multithreading
technique
•All hardware contexts are active leading to
competition
•Issue multiple instructions from multiple
threads
•Both TLP and ILP comes into play
•Multiple slots for different threads are
filled
•Resource organization
•Resource sharing
 Hyper-Threading is a technology used by
some Intel microprocessor’s that allows a
single microprocessor to act like two separate
processors to the operating system and the
application program’s that use it. HT allows
the processor to work more efficiently by
processing two sets of instructions at the
same time, making it look like two logical
processors.
 This enables a processor to perform
tasks faster (usually 25% - 40% speed
increase) than non-HT enabled
processor.
 Adds support for multithreaded code,
and improves reaction and response
times.
 Hyper-threading can boost system performance
by up to 30%. For dual socket systems, hyper-
threading can boost performance by up to
15%. For quad-socket (or higher) systems,
performance testing with and without hyper-
threading enabled is recommended.
 Faster clock speeds are an important
way to deliver more computing
power. But clock speed is only half
the story. The other route to higher
performance is to accomplish more
work on each clock cycle, and that's
where Hyper-Threading Technology
comes in.
 As threads are processed, some of the internal
components of the core called execution units.
 EU’s are frequently idle during each clock cycle.
 By enabling hyper-threading, the execution units
can process instructions from two threads
simultaneously, which means fewer execution
units will be idle during each clock cycle.
 As a result, enabling hyper-threading may
significantly boost system performance.
I. Run demanding applications simultaneously while
maintaining system responsiveness
II. Keep systems protected, efficient, and manageable
while minimizing impact on productivity
III. Provide headroom for future business growth and new
solution capabilities
 The Architecture State consists of registers including the
general-purpose registers, the control registers, the
advanced programmable interrupt controller (APIC)
registers, and some machine state registers.
 In 2002, Intel releases the first Xeon processor
model with Hyper Threading.
 Several goals were at the heart of the micro
architecture design choices made for the Intel
Xeon processor.
 1st goal was to minimize the die area cost of
implementing HT Technology. The die area cost of the
first implementation was less than 5% of the total die
area.
 2nd goal was to ensure that when one logical processor is
stalled the other logical processor could continue to
make forward progress. A logical processor may be
temporarily stalled for a variety of reasons, including
servicing cache misses, handling branch miss
predictions, or waiting for the results of previous
instructions.
 This is accomplished by partitioning the threads.
 3rd goal was to allow a processor running only one
active software thread to run at the same speed on a
processor with HT Technology as on a processor
without this capability.
i. Possible increased speed i. Can’t take advantage in
single threaded software.
up ii. Increase Power
ii. Uses less space compared Consumption
to another core physically iii. Heat Output
iv. older operating systems
does not support
Thanks to Intel® HT Technology, businesses can:
I. Improve productivity by doing more simultaneously without
slowing down
II. Provide fast response times for Internet and e-commerce
applications, enhancing customer experiences
III. Increase the number of transactions that can be processed
simultaneously
IV. Utilize existing 32-bit application technologies while
maintaining 64-bit future readiness
 The following documents are referenced in this application note,
and provide background or supporting information for
understanding the topics presented in this document.
Intel® 64 and IA-32 Architectures Optimization Reference

Manual
 Using Spin-Loops on Intel® Pentium® 4 Processor and Intel®
Xeon® Processor,
 Intel Application Note AP-949

Lec 4 Superscalarprocessor Updated PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec 4 Superscalarprocessor Updated PDF

Uploaded by

Copyright:

Available Formats

Multithreading,

 Using ILP support to exploit

 performance and efficiency in

 A thread is a basic unit of CPU utilization.

 A thread is a separate process with its own instructions and data.

 A thread may represent a process that is part of a parallel program

 It comprises of a thread ID, a program counter, a register set and a

 It shares its code section, data section, and other operating-system

 A traditional process has a single thread of control. If a process has

 Many software packages that run

 Threads also play a vital role in remote procedure call (RPC)

 RPCs allows interprocess communication by providing a

 Many operating system kernels are multithreaded; several threads

1. Responsiveness: Multithreading is an interactive application that

2. Resource sharing: By default, threads share the memory and the

3. Economy: Allocating memory and resources for process creation is

4. Utilization of multiprocessor architectures: In multiprocessor

 Support for threads may be provided either at the user level or at

 The many-to-one model maps many user-

 Thread management is done by the

 Only one thread can access the kernel at

 The one-to-one model maps each user

 It provides more concurrency than the many-

 The only drawback to this model is that

 The overhead of creating kernel threads can

 The many-to-many model multiplexes many

 The number of kernel threads may be specific

 Developers can create as many user threads

 Although ILP increases the performance of system; then also ILP

 This higher-level parallelism is called thread-level parallelism (TLP)

 ILP is parallel operations within a loop or straight-line code.

 TLP is represented by the use of multiple threads of execution that

 Thread-level parallelism is an important alternative to instruction-

 In many applications thread-level parallelism occurs naturally (many

 If software is written from scratch, then expressing the parallelism

 But if established applications written without parallelism in mind,

 There are two main approaches to multithreading.

 Fine-grained multithreading &

 It switches between threads on each instruction, causing the

 This interleaving is often done in a round-robin fashion.

 To make fine-grained multithreading practical, the CPU must be

 It was invented as an alternative to fine-grained multithreading.

 Coarse-grained multithreading switches threads only on costly

 This change relieves the need to have thread switching.

 The main difference between fine grained and coarse grained

 Scalar processors is classified as SISD

 • And one for floating-point operations (FLOPS);

 • Part of the CPU responsible for calculations

Intel® 64 and IA-32 Architectures Optimization Reference

You might also like