Professional Documents
Culture Documents
Objectives achieved
Great work on making it to the end of this session Here is what we learned:
Throughput is the total amount of data that can be transferred in a given amount of time.
Latency is the amount of delay before that transfer of data begins.
The smallest unit of data is the bit, and the performance of any data transfer, be it long-
term storage devices, short-term RAM, or Internet devices is measured in the throughput
and latency of this data.
Both CPUs and GPUs have similar measures for computation performance. Each is made
up of cores that can do one operation at a time, and these cores have set clock speeds that
determine how often they can perform these operations.
The energy devices use is measured in watts. The greater the wattage, the greater the
amount of heat the device creates. Electricity costs can be a substantial part of the cost
over the lifetime of the device.
If you’re wondering how to check your clock speed, click the Start menu (or click the Windows
key) and type “System Information.” Your CPU’s model name and clock speed will be listed
under “Processor.”
In this case, a “cycle” is the basic unit that measures a CPU’s speed. During each cycle, billions
of transistors within the processor open and close . This is how the CPU executes the
calculations contained in the instructions it receives.
Sometimes, multiple instructions are completed in a single clock cycle; in other cases, one
instruction might be handled over multiple clock cycles. Since different CPU designs handle
instructions differently, it’s best to compare clock speeds within the same CPU brand and
generation.
For example, a CPU with a higher clock speed from five years ago might be outperformed by a
new CPU with a lower clock speed, as the newer architecture deals with instructions more
efficiently.
Recent features like the Intel® Thread Director allow the latest gen Intel processors to
intelligently distribute workloads to multiple cores. That’s one reason why newer processors
often outperform older ones on benchmark tests even when they have similar clock speeds.
Within the same generation of CPUs, a processor with a higher clock speed will generally
outperform a processor with a lower clock speed across many applications. This is why it’s
important to compare processors from the same brand and generation. The K-series of Intel®
Core™ processors, for example, denotes a set of chips that are unlocked and allow for
overclocking, indicating that they can achieve more ambitious clock speeds than their peers in
the same generation.
For more information on how to interpret Intel® Core™ processors by their naming conventions,
read our guide.
The impact of clock speed on an individual game depends on the game’s engine and the tools
used to create it. For example, FromSoftware’s Elden Ring uses a proprietary game engine that
leans heavily on single core performance3. On the other hand, Infinity Ward’s Call of Duty:
Modern Warfare 2 is designed to take full advantage of core multi-threading. This allows the
game to hit impressive performance benchmarks even when using older, slower processors —
provided it has enough cores to work with.3
These examples show that specific benchmarks are the best way to assess CPU performance in a
particular game engine. However, clock speed remains a good general guide to the relative
performance of processors within a product family.
Intel Turbo Boost Technology enhances clock speed dynamically to deal with heavy workloads.
It works without requiring any installation or configuration by the user. The technology judges
the amount of heat the system can tolerate, as well as the number of cores in use, and then boosts
clock speed to the maximum safe level.
Base Processor Frequency and Max Turbo Frequency are two core performance metrics that
refer to different usage scenarios. For high-intensity gaming, the turbo frequency is the more
important metric.. Given adequate cooling, this is the speed your CPU will operate at when
dealing with heavy gaming workloads such as traveling through a highly detailed environment,
or calculating AI behavior on an enemy turn in a strategy game in the most CPU-intensive titles.
Overclocking can yield improved FPS4, even for high-end CPUs like the latest gen Intel®
Core™ processors. Intel® Speed Optimizer provides one-click overclocking for all Intel®
processors. This means experts and beginners alike can optimize overclocking performance
safely.
The BCLK sets not only the speed of the CPU, but also the speed of memory, PCIe bus, CPU
cache, and more. It’s easier for overclockers to simply adjust the CPU multiplier than to change
the BCLK, which can cause instability by affecting many components at once.
Why Does Clock Speed Matter?
CPU clock speed is a good indicator of overall processor performance. Though applications like
video editing and streaming are known to rely on multi-core performance, many new video
games still benchmark best on CPUs with the highest clock speed.
Clock speed is a useful metric for comparing processor models in the same generation. When
selecting a processor for a new gaming computer, it provides at-a-glance information about the
general performance of products in the same lineup, like the latest gen Intel® Core™ Processor
family. For more specialized use cases, individual benchmarks are usually more appropriate.
= × ×
The performance equation analyzes execution time as a product of three factors that are
relatively independent of each other.
This equation remains valid if the time units are changed on both sides of the equation. The left-
hand side and the factors on the right-hand side are discussed in the following sections.
The three factors are, in order, known as the instruction count (IC), clocks per instruction (CPI),
and clock time (CT). CPI is computed as an effective value.
Performance Equation
Instruction Count Effective Values Clocks Per Instruction Clock Time Example Improvements
The Performance Equation
The performance equation analyzes execution time as a product of three factors that are
relatively independent of each other.
Instruction Count
Computer architects can reduce the instruction count by adding more powerful instructions to the
instruction set. However, this can increase either CPI or clock time, or both.
Clocks Per Instruction
Computer architects can reduce CPI by exploiting more instruction-level parallelism. If they add
more complex instructions it often increases CPI.
Clock Time
Clock time depends on transistor speed and the complexity of the work done in a single clock.
Clock time can be reduced when transistor sizes decrease. However, power consumption
increases when clock time is reduced. This increase the amount of heat generated.
Instruction Count
Instruction (IC) count is a dynamic measure: the total number of instruction executions involved
in a program. It is dominated by repetitive operations such as loops and recursions.
Instruction count is affected by the power of the instruction set. Different instruction sets may do
different amounts of work in a single instruction. CISC processor instructions can often
accomplish as much as two or three RISC processor instructions. Some CISC processor
instructions have built-in looping so that they can accomplish as much as several hundred RISC
instruction executions.
For predicting the effects of incremental changes, architects use execution traces of benchmark
programs to get instruction counts. If the incremental change does not change the instruction set
then the instruction count normally does not change. If there are small changes in the instruction
set then trace information can be used to estimate the change in the instruction count.
For comparison purposes, two machines with different instruction sets can be compared based on
compilations of the same high-level language code on the two machines.
Clocks per instruction (CPI) is an effective average. It is averaged over all of the instruction
executions in a program.
For computing clocks per instruction as an effective average, the cases are categories of
instructions, such as branches, loads, and stores. Frequencies for the categories can be extracted
from execution traces. Knowledge of how the architecture handles each category yields the
clocks per instruction for that category.
Clock Time
Clock time (CT) is the period of the clock that synchronizes the circuits in a processor. It is the
reciprocal of the clock frequency.
For example, a 1 GHz processor has a cycle time of 1.0 ns and a 4 GHz processor has a cycle
time of 0.25 ns.
Clock time is affected by circuit technology and the complexity of the work done in a single
clock. Logic gates do not operate instantly. A gate has a propagation delay that depends on the
number of inputs to the gate (fan in) and the number of other inputs connected to the gate's
output (fan out). Increasing either the fan in or the fan out slows down the propagation time.
Cycle time is set to be the worst-case total propagation time through gates that produce a signal
required in the next cycle. The worst-case total propagation time occurs along one or more signal
paths through the circuitry. These paths are called critical paths.
For the past 35 years, integrated circuit technology has been greatly affected by a scaling
equation that tells how individual transistor dimensions should be altered as the overall
dimensions are decreased. The scaling equations predict an increase in speed and a decrease in
power consumption per transistor with decreasing size. Technology has improved so that about
every 3 years, linear dimensions have decreased by a factor of 2. Transistor power consumption
has decreased by a similar factor. Speed increased by a similar factor until about 2005. At that
time, power consumption reached the point where air cooling was not sufficient to keep
processors cool if the ran at the highest possible clock speed.
Problem Statement
Suppose a program (or a program task) takes 1 billion instructions to execute on a processor
running at 2 GHz. Suppose also that 50% of the instructions execute in 3 clock cycles, 30%
execute in 4 clock cycles, and 20% execute in 5 clock cycles. What is the execution time for the
program or task?
THE END