You are on page 1of 44

UNDERSTANDI

NG CPUS
UNDERSTANDING CPUS

“Operating Systems Fundamentals,” one of the main functions of the operating system is to
provide the interface between the various application programs running on a computer and the
hardware inside. Central to understanding the hardware is the system architecture of the
computer, which is built around the CPU, or processor. The system architecture includes the
number and type of CPUs in the hardware, and the communication routes, called buses, between
the CPUs and other hardware components, such as memory and disk storage. A bus is a path or
channel between a computer’s CPU and the devices it manages, such as memory and I/O devices.
UNDERSTANDING CPUS

The CPU is the chip that performs the actual computational and logic work. Most modern PCs
have one such chip, and are referred to as single-processor computers. In reality, to ensure
complete functionality, the CPU requires several support chips, such as chips that help manage
communications with devices and device drivers.
UNDERSTANDING CPUS

Chip technology continues to develop with the addition of multicore processors. A processor core is
the part of a CPU that reads and executes very basic instructions, such as reading and writing data
from and to memory or executing an arithmetic operation. CPUs were originally created to have only
one core and thus perform only one instruction at a time. A multicore processor has two or more
cores—for example, a dual-core processor contains two cores and a quad-core processor has four. The
most processor cores used in traditional PC desktops and servers is 16 as of this writing, but high-end
CPUs, such as Intel’s Knights Landing, have up to 72 cores. Development continues in this area, and
scientists so far have put as many as 1000 cores on a single CPU chip.
UNDERSTANDING CPUS

The most processor cores used in traditional PC desktops and servers is 16 as of this writing, but
high-end CPUs, such as Intel’s Knights Landing, have up to 72 cores. Development continues in
this area, and scientists so far have put as many as 1000 cores on a single CPU chip.
Some computers have multiple physical CPUs; many have two, and some have as many as 128 or
more. This type of computer is generally referred to as a multiprocessor computer
BASIC CPU
ARCHITECTURE
BASIC CPU ARCHITECTURE : CONTROL UNIT
Control unit—The control unit (CU) is the director
of operations in the CPU. The control unit provides
timing and coordination between the other parts of
the CPU, such as the arithmetic logic unit, registers,
and system bus. For example, when a new
instruction should be executed, the control unit
receives and decodes the instruction and tells the
arithmetic logic unit to execute it.
BASIC CPU ARCHITECTURE : CONTROL UNIT
Arithmetic logic unit—The arithmetic logic unit
(ALU) performs the primary task of any CPU, which
is to execute instructions. These might be
arithmetic instructions, such as addition or
multiplication of integers, or logic instructions,
such as binary AND or binary OR instructions. Most
CPUs also contain a floating point unit (FPU) that
performs floating point operations.
BASIC CPU ARCHITECTURE : CONTROL UNIT
Registers—A register is a temporary holding location
on a CPU where data must be placed before the CPU
can use it. There are instruction registers that hold the
instruction the CPU executes, such as add, multiply, or
store. Also, the CPU uses address registers to access
data stored in RAM, and data registers that hold the
data the CPU is currently working with, such as two
numbers used in an add or multiply instruction.
BASIC CPU ARCHITECTURE : CONTROL UNIT
System bus—The system bus is a series of lanes that are used to
communicate between the CPU and other major parts of the
computer, such as RAM and input/output (I/O) devices. There are
actually three types of buses: the control bus, address bus, and data
bus. The control bus carries status signals between the CPU and
other devices. Status signals inform the CPU that a device needs
attention; for example, when an input device has data ready, the CPU
must execute the device driver code to read the data from the device.
The address bus carries address signals to indicate where data
should be read from or written to in the system’s memory. The data
bus carries the actual data that is being read from or written to
system memory.
NOTE

While modern CPUs are much more complex than


the simple block diagram in Figure 3-1, most CPUs
follow the basic design and contain the elements
described.
CPUS CAN BE CLASSIFIED BY SEVERAL HARDWARE
ELEMENTS, THE MOST IMPORTANT OF WHICH ARE:
• Design type
• Speed
• Cache
• Address bus
• Data bus
• Control bus
• CPU scheduling
DESIGN TYPE
DESIGN TYPE

Two general CPU designs are used in today’s computers: Complex Instruction Set Computing (CISC) and
Reduced Instruction Set Computing (RISC). The main difference between the two is the number of different
instructions the chip can process and the complexity of the instructions. When a program executes on a
computer, the CPU reads instruction after instruction from the program to perform the tasks specified in the
program. When the CPU has read such an instruction, it carries out the associated operations. The CPU can
process as many as 20 million complex operations per second on the low end, and several billion on the high
end. Clock speed and CPU design are the factors that determine how fast operations are executed. It is
convenient for the programmer to have many instructions available to perform many different operations.
TWO DESIGN TYPES RISC AND CISC

https://www.youtube.com/watch?v=_EKgwOAAWZA
https://www.youtube.com/watch?v=g16wZWKcao4
KEY TERMS

Clock cycle - The clock speed is measured in cycles per second, and one cycle per second is known as 1
hertz. This means that a CPU with a clock speed of 2 gigahertz (GHz) can carry out two thousand million
(or two billion) cycles per second. The higher the clock speed a CPU has, the faster it can process
instructions.

Pipelining—Pipelining is the ability of the CPU to perform more than one task on a single clock cycle. For
example, if you have a series of additions to perform, a RISC processor doesn’t have to wait for all four
instructions to complete before it moves on to the next addition. While the second load instruction of the
first addition is being performed, the RISC processor can be loading the first value for the next addition.
KEY TERMS

While pipelining does occur in CISC CPUs, the varying number of cycles it takes to complete each
instruction makes pipelining more difficult and not as effective, compared to RISC CPUs. Figure
3-2 shows how pipelining and the number of cycles required per instruction can affect execution
time. In the figure, a CISC CPU and RISC CPU are each performing a series of three additions.
With the CISC CPU, only three instructions are required, but each instruction takes four clock
cycles and the first instruction is completed before the second instruction is started. With the
RISC CPU, each addition takes four instructions of one clock cycle each, but using pipelining, the
second addition is started while the first addition is still under way, allowing the RISC CPU to
complete all three additions in only six cycles, as opposed to 12 cycles for the CISC CPU
KEY TERMS

Hardware versus microcode—Because CISC instructions are so complex, there is actually a small
program inside the chip that must interpret and execute each instruction. This small program is
called microcode. RISC instructions are all executed directly by the CPU hardware, with no
microcode middleman. This approach makes for faster execution of individual instructions.
KEY TERMS
Compiler—A compiler is a computer program that takes a high- into assembly code for a CISC CPU. The assembly

level language like C# or Java and turns it into assembly code that code might also be a single line of code:

is executed by the CPU. Due to the complexity of CISC-based add X, Y

instructions, the compiler has less work to do because the high- By comparison, several lines of code might be

level language code need not be broken down into as many needed when the compiler translates the

assembly language steps. Taking the example of an addition statement into assembly code for a RISC CPU:

instruction from before, the C# or Java code might look like the load R1, X

following: load R2, Y

add R1, R2
X = X + Y;
stor R1, X
The compiler then translates that statement
NOTE

After a high-level language program is compiled to assembly language, another step called
assembly is needed before it can be executed by the CPU. Assembly translates assembly code into
machine code. With machine code, each assembly language statement is translated into a series
of numbers—for example, the statement load R1, X might be translated to something like 09 101
2215. Remember, computers can only interpret numeric values; the software they run translates
numbers into human-readable form, and vice versa.
KEY TERMS

Number and usage of registers—As mentioned, a register is a temporary holding location on a


CPU where data must be placed before the CPU can use it. Because so much room is used for
microcode on CISC CPUs, there are far fewer registers than on a RISC chip, which doesn’t use
microcode. The more registers there are, the more simultaneous operations the CPU can
perform, as you saw with pipelining. One of the reasons pipelining is easier with RISC CPUs is
because there are more registers to store data for pipelined instructions. In addition, CISC CPUs
erase their registers after each instruction and require them to be reloaded with each successive
instruction, whereas RISC CPUs can leave data in registers until the register is needed for
another operation.
NOTE

CISC and RISC CPUs continue to be produced. The debate between which is better is muddied by
the inclusion of CISC features in RISC CPUs and RISC features in CISC CPUs. Intel processors are
still considered to be CISC.
SPEED
CPU SPEED

The speed of a CPU defines how fast it can perform operations. There are many ways to indicate
speed, but the most frequently used indicator is the internal clock speed of the CPU. As you may
know, a CPU runs on a very rigid schedule along with the rest of the computer. The clock
provides this schedule to make sure that all the chips know what to expect at a given time. The
internal clock speed tells you how many clock pulses, or ticks, are available per second. Typically,
the CPU performs some action on every tick. The more ticks per second there are, the faster the
CPU executes commands, and the harder the electronics on the CPU must work.
CPU SPEED

The clock speed for a CPU can be lower than 1 million ticks per second (1 megahertz or MHz) or
higher than 4 billion ticks per second (4 gigahertz or GHz). The faster the clock is, the faster the
CPU, and the more expensive the hardware. Also, as more components are needed to make a CPU,
the chip uses more energy to do its work. Part of this energy is converted to heat, causing faster
CPUs to run warmer, which requires more fans in the chassis. Overheating of computer
components in general and CPUs in particular is a constant battle faced by IT departments; it
requires considerable investment in the cooling systems of data centers.
CPU SPEED

In addition to performing fast operations inside the CPU, the chips must be able to communicate
with the other chips in the computer. This is where the external clock speed of the CPU comes in.
While a CPU may run internally at a speed of 3 GHz, it typically uses a lower clock speed to
communicate with the rest of the computer. The reason for the lower speed is again cost, to a
large extent. It would be extremely expensive to make every component in the computer run as
fast as the CPU. Therefore, the other components in the computer typically run at a reduced clock
rate. Usually, the external clock speed is one-half, one-third, one-fourth, or one-eighth the speed
of the internal CPU clock.
https://www.youtube.com/watch?v=FZGugFqdr60
CACHE
CACHE

If a CPU wants to get a few numbers out of memory, and its internal clock speed is four times faster than its external
clock speed, it obviously must wait on the external clock, which could be very inefficient. To avoid this problem,
modern CPUs have cache memory built into the chip. Cache memory works by providing extremely fast access to data
so the CPU doesn’t have to wait for main RAM. While the CPU is executing program code, instructions or data that are
most likely to be used next are fetched from main memory and placed in cache memory. When the CPU needs the next
bytes of data or the next instruction, it looks in cache first. If the information cannot be found, the CPU then fetches it
from main memory. The more often the CPU can find the data in cache, the faster the program will execute. There are
different levels of cache, with each successive level becoming larger, but slower:
CACHE : LEVEL 1

Level 1 cache—Level 1 (L1) cache is the fastest of the cache types; usually it runs at the same
speed as the CPU, so the CPU won’t have to wait for data if it can be found in L1 cache. However,
L1 cache is the least plentiful—typically 8 to 32 KB per processor core—so it can’t hold much
data. L1 cache is usually divided into two parts: instruction cache and data cache. L1 cache is
always an integral part of the CPU chip on modern CPUs.
CACHE : LEVEL 2

Level 2 cache—Level 2 (L2) cache is somewhat slower than L1 cache, but much larger. Many
CPUs today have 256 KB of L2 cache per processor core. The combination of L1 cache and L2
cache greatly increases the chances that the data the CPU needs will be located in cache, so the
CPU will not have to access much slower main memory. L2 cache is also an integral part of the
CPU chip on modern CPUs.
CACHE : LEVEL 3

Level 3 cache—Level 3 (L3) cache, until the last several years, was not part of the CPU chip, but
was instead a part of the motherboard. This meant that L3 cache could be fairly large, but
considerably slower than L1 and L2 cache. On the more advanced CPUs, L3 cache is part of the
CPU and is shared among the CPU cores. L3 cache can often be found in sizes of 8 MB, 16 MB, and
greater.
CACHE : LEVEL 4

Level 4 cache—Level 4 (L4) cache, if it exists, will usually be found on the motherboard. If a CPU
has L1, L2, and L3 cache and is installed on a motherboard that has built-in cache, the cache on
the motherboard will become L4 cache. If a CPU only has L1 and L2 cache, the motherboard
cache will become L3 cache. The exception is thehigh-end version of some CPUs that are starting
to come with on-board L4 cache. Some of these high-end CPUs have as much as 128 MB of L4
cache that is shared among the CPU cores.
CACHE

The amount of cache, especially for larger CPUs, determines the speed of the CPU. In many cases,
up to 95 percent of the data a CPU needs to transfer to and from memory is present in one of the
caches when the CPU needs it. A specialized piece of hardware called the cache controller
predicts what data will be needed, and makes that data available in cache before it is needed.
Most modern CPUs can also use the cache to write data to memory and ensure that the CPU will
not have to wait when it wants to write results to memory. You can see that intelligent, fast cache
controllers and large amounts of cache are important components for increasing the speed of a
CPU.
ADDRESS BUS
CACHE

The address bus is an internal communications pathway that specifies the source and target addresses for
memory reads and writes. It is instrumental in the transfer of data to and from computer memory. The address
bus typically runs at the external clock speed of the CPU. The address, like all data in the computer, is in digital
form and is conveyed as a series of bits. The width of the address bus is the number of bits that can be used to
address memory. A wider bus means the computer can address more memory, and therefore store more data or
larger, more complex programs. For example, a 16-bit address bus can address 64 kilobytes, or KB (65,536
bytes) of memory, and a 32-bit address bus can address roughly 4 billion bytes, or 4 gigabytes (GB) of memory.
Modern processors have a 64-bit address bus, allowing them to address 16 terabytes (TB) of memory.
DATA BUS
DATA BUS

The data bus allows computer components, such as the CPU, display adapter, and main memory, to share
information. The number of bits in the data bus indicates how many bits of data can be transferred from
memory to the CPU, or vice versa, in a single operation. A CPU with an external clock speed of 1 GHz will have
1 billion ticks per second to the external bus. If this CPU has a 16-bit data bus, it could theoretically transfer 2
GB (2,000,000,000 bytes) of data to and from memory every second. (One byte consists of 8 bits, so 1 billion x
16 bits/8 bits per second ¼ 2 GB per second.) A CPU with an external clock speed of 1 GHz and a 64-bit data
bus could transfer as much as 8 GB per second (1 billion x 64 bits/8 bits per byte). That is four times as much
data in the same time period, so in theory, the CPU will work four times as fast.
DATA BUS

There are a couple of catches here. First, the software must be able to instruct the CPU to use all of the
data bus, and the rest of the computer must be fast enough to keep up with the CPU. Most CPUs work
internally with the same number of bits as on the data bus. In other words, a CPU with a 64-bit data
bus can typically perform operations on 64 bits of data at a time. Almost all CPUs can also be
instructed to work with chunks of data narrower than the data bus width, but in this case the CPU is
not as efficient because the same number of clock cycles is required to perform an operation, whether
or not all bits are used. All Windows versions from Windows XP forward include a 64-bit version, and
starting with Windows Server 2008 R2, all server versions are 64-bit only.
CONTROL BUS
CONTROL BUS

The CPU is kept informed of the status of the computer’s resources and devices, such as the memory and disk
drives, by information transported on the control bus. The most basic information transported across the
control bus indicates whether a particular resource is active and can be accessed. If a disk drive becomes active,
for example, the disk controller provides this information to the CPU over the control bus. Other information
that may be transported over the control bus includes whether a particular function is for input or output.
Memory read and write status is transported on this bus, as well as interrupt requests (IRQs). An interrupt
request is a request to the processor to “interrupt” whatever it is doing to take care of a process, such as a read
from a disk drive, which in turn might be interrupted by another process, such as a write into memory.
CPU SCHEDULING

CPU scheduling determines which process to execute when multiple processes are waiting to run. For
example, if you have three applications open on your computer, each application must be scheduled to get CPU
time. The CPU switches between the applications very quickly based on factors like priority, so users don’t
typically notice that this switching, or time slicing, is occurring. CPU scheduling is not a function built into the
CPU; rather, it is a function of the operating system. However, the architecture of the CPU can greatly facilitate
a system’s ability to efficiently schedule multiple processes. Recall from Chapter 1 that a process is a program
that’s loaded into memory and run by the CPU. Most PC operating systems of the 1970s and 1980s were
basically single threaded, meaning they could only schedule the process to run as a whole.
CPU SCHEDULING

Beginning with the Windows NT operating system, the use of CPU scheduling algorithms began to
evolve to allow multithreading, which is the ability to run two or more parts f a process, known as
threads, at the same time. A thread is the smallest piece of computer code that can be independently
scheduled for execution. For example, if a user is running a word processor, one thread might accept
input from the keyboard and format it on the screen while another thread does a spell check as the
user types. Switching between threads takes a considerable number of CPU instructions to
accomplish, so it was only practical to begin including this feature in OSs when CPUs became powerful
enough to support it. Modern CPUs with multiple cores are designed specifically for multithreading,
so switching between threads is extremely efficient when compared to the operation in older CPUs.
Some Intel CPUs contain a feature called Hyper-Threading.
CPU SCHEDULING

Hyper-Threading (HT) allows two threads to run on each CPU core simultaneously. In some
ways, this feature doubles the amount of work a CPU can do. When monitoring a CPU with a
program such as Task Manager, each CPU core is actually seen as two logical processors, so a 4-
core CPU will be reported as having 8 logical processors.

You might also like