You are on page 1of 245

COMPUTER SYSTEM &

NETWORK

(COMP 23)
ENCODED BY: DONDON LEDAMA

PREPARED BY: MARL T. GONZALEZ


SCHOOL ADMINISTRATOR

LESSON I:

Introduction

1
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Computer System Components
Recent advances in microelectronic technology have made computers an integral
part of our society. Each step in our everyday lives maybe influenced by computer
technology: we awake to a digital alarm clock’s beaming of reselected music at the right
time, drive to work in a digital alarm clock’s beaming of reselected music at the right
time, drive to work in a digital-processor-controlled automobile, work in an extensively
automated office, shop for computer-coded grocery items and return to rest in the
computer-regulated heating and cooling environment of our homes. It may not be
necessary to understand the detailed operating principles of a jet plane or an automobile
on order to use and enjoy the benefits of these technical marvels. But a fair
understanding of the operating principles, capabilities, and limitations of digital computers
is necessary, if we would use them in an efficient manner. This book is designed to give
such an understanding of the operating principles of digital computers. This chapter will
begin by describing the organization of a general-purpose digital computer system and
then will briefly trace the evolution of computers.
The diagram shows a general view of how desktop and workstation computers are
organized. Different systems have different details, but in general all computers consist of
components (processor, memory, controllers, video) connected together with a bus.
Physically, a bus consists of many parallel wires, usually printed (in copper) on the main
circuit board of the computer. Data signals, clock signals, and control signals are sent on
the bus back and forth between components. A particular type of bus follows a carefully
written standard that describes the signals that are carried on the wires and what the
signals mean. The PCI standard (for example) describes the PCI bus used on most current
PCs.

The processor continuously executes the machine cycle, executing machine


instructions one by one. Most instructions are for an arithmetical, a logical, or a control
operation. A machine operation often involves access to main storage or involves an I/O
2
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
controller. If so, the machine operation puts data and control signals on the bus, and
(may) wait for data and control signals to return. Some machine operations take place
entirely inside the processor (the bus is not involved). These operations are very fast.
Input/output Controllers

The way in which devices connected to a bus cooperate is another part of a bus
standard.

Input/output controllers receive input and output requests from the central
processor, and then send device-specific control signals to the device they control. They
also manage the data flow to and from the device. This frees the central processor from
involvement with the details of controlling each device. I/O controllers are needed only
for those I/O devices that are part of the system.

Often the I/O controllers are part of the electronics on the main circuit board (the
mother board) of the computer. Sometimes an uncommon device requires its own
controller which must be plugged into a connector (an expansion slot) on the mother
board.

Main Memory

In practice, data and instructions are often placed in different sections of memory,
but this is a matter of software organization, not a hardware requirement. Also, most
computers have special sections of memory that permanently hold programs (firmware
stored in ROM), and other sections that are permanently used for special purposes.

Main memory (also called main storage or just memory) holds the bit patterns of
machine instructions and the bit patterns of data. Memory chips and the electronics that
controls them are concerned only with saving bit patterns and returning them when
requested. No distinction is made between bit patterns that are intended as instructions
and bit patterns that are intended as data. The amount of memory on a system is often
described in terms of:
Kilobyte: 210 = 1024 bytes.
Megabyte: 220 = 1024 Kilobytes
30 = 1024
Gigabyte: 2
Megabytes
40 = 1024 Gigabytes
Terabyte: 2

3
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
These days (Winter 2005) the amount of main memory in a new desktop computer
ranges from 256 megabytes to 1 gigabyte. Hard disks and other secondary storage
devices are tens or hundreds of gigabytes. Backup storage comes in sizes as large as
several terabytes

Addresses

Each byte of main storage has an address.


Most modern processors use 32-bit addresses, so
there are 232 possible addresses. Think of main
storage as if it were an array:
Byte [0x00000000 ... 0xFFFFFFFF] main
Storage;
A main storage address is an index into memory. A
32-bit address is the address of a single byte.
Thirty-two wires of the bus contain an address
(there are many more bus wires for timing and
control).

Sometimes people talk about addresses like


0x2000, which looks like a pattern of just 16 bits.
But this is just an abbreviation for the full 32-bit
address. The actual address is 0x00002000.

The first MIPS processors (designed in 1985) used


32-bit addresses. From 1991 to present, top-end
MIPS processors use 64-bit addresses. The MIPS32
chip is a modern chip designed for embedded
applications. It uses 32-bit addresses, since
embedded applications often don't need 64 bits.
Recent processor chips from AMD and Intel have 64-bit addresses, although 32-bit
versions are still available.

The assembly language of this course is for the MIPS32 chip, so we will use 32-bit
addresses. The assembly language of the 64-bit MIPS chips is similar.

The MIPS has an address space of 232 bytes. A Gigabyte is 230, so the MIPS have
4 gigabytes of address space. Ideally, all of these memory locations would be
implemented using memory chips (usually called RAM). RAM costs about $200 per
4
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
gigabyte. Installing the maximum amount of memory as RAM would cost about $800.
This might be more than you want to spend. Hard disk storage costs much less per
gigabyte. Hard disks cost about $50 per gigabyte (winter, 2005).

On modern computers, the full address space is present no matter how much RAM
has been installed. This is done by keeping some parts of the full address space on disk
and some parts in RAM. The RAM, the hard disk, some special electronics, and the
operating system work together to provide the full 32 bit address space. To a user or an
applications programmer it looks as if all 232 bytes of main memory are present.

This method of providing the full address space by using a combination of RAM memory
and the hard disk is called virtual memory. The word virtual means "appearing to
exist, but not really there." Some computer geeks have a virtual social life.

Cache Memory

Disk access is slow compared to RAM access. Potentially, using a combination of


real memory and disk memory to implement the address space could greatly slow down
program execution. However, with clever electronics and a good operating system,
virtual memory is only slightly slower than physical memory.

Computer systems also have cache memory. Cache memory is very fast RAM that is
inside (or close to) the processor. It duplicates sections of main storage that are heavily
used by the currently running programs. The processor does not have to use the

5
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
system bus to get or store data in cache memory. Access to cache memory is much
faster than to normal main memory.

Like virtual memory, cache memory is invisible to most programs. It is an


electronic detail below the level of abstraction provided by assembly language.
Hardware keeps cache up to date and in synch with main storage. Your programs are
unaware that there is cache memory and virtual memory. They just see "main
memory". Application programs don't contain instructions that say "store this in cache
memory", or say "get this from virtual memory". They only refer to the contents of
main memory at a particular address. The hardware makes sure that the program gets
or stores the correct byte, no matter where it really is.

Contents of Memory

The memory system merely stores bit patterns. That some of these patterns
represent integers, that some represent characters, and that some represent
instructions (and so on) is of no concern to the electronics. How these patterns are used
depends on the programs that use them. A word processor program, for example, is
written to process patterns that represent characters. A spreadsheet program processes
patterns that represent numbers.

Of course, most programs process several types of data, and must keep track of how
each is used. Often programs keep the various uses of memory in separate sections,
but that is a programming convention, not a requirement of electronics.
Any byte in main storage can contain any 8-bit pattern. No byte of main storage can
contain anything but an 8-bit pattern. There is nothing in the memory system of a
computer that says what a pattern represents.
Computer System Organization

Before we look at the C language, let us look at the overall organization of computing
systems. Figure 1.1 shows a block diagram of a typical computer system. Notice it is
divided into two major sections; hardware and software.

6
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Computer
Hardware
The physical
machine,
consisting of
electronic
circuits, is
called the
hardware.
It consists
of several
major units:
the Central
P r o c e s si n g
Unit (CPU),
M a i n
Memory,
Secondary
Memory and
Peripherals.

The CPU is the major component of a computer; the ``electronic brain'' of the machine.
It consists of the electronic circuits needed to perform operations on the data. Main
Memory is where programs that are currently being executed as well as their data are
stored. The CPU fetches program instructions in sequence, together with the required
data, from Main Memory and then performs the operation specified by the instruction.
Information may be both read from and written to any location in Main Memory so the
devices used to implement this block are called random access memory chips (RAM).
The contents of Main Memory (often simply called memory) are both temporary (the
programs and data reside there only when they are needed) and volatile (the contents
are lost when power to the machine is turned off).

The Secondary Memory provides more long term and stable storage for both programs
and data. In modern computing systems this Secondary Memory is most often
implemented using rotating magnetic storage devices, more commonly called disks
(though magnetic tape may also be used); therefore, Secondary Memory is often
referred to as the disk. The physical devices making up Secondary Memory, the disk

7
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
drives are also known as mass storage devices because relatively large amounts of
data and many programs may be stored on them.

The disk drives making up Secondary Memory are one form of Input/Output (I/O)
device since they provide a means for information to be brought into (input) and taken
out of (output) the CPU and its memory. Other forms of I/O devices which transfer
information between humans and the computer are represented by the Peripherals box
in Figure 1.1. These Peripherals include of devices such as terminals -- a keyboard (and
optional mouse) for input and a video screen for output, high-speed printers, and
possibly floppy disk drives and tape drives for permanent, removable storage of data
and programs. Other I/O devices may include high-speed optical scanners, plotters,
multi-user and graphics terminals, networking hardware, etc. In general, these devices
provide the physical interface between the computer and its environment by allowing
humans or even other machines to communicate with the computer.

Computer Software -- The Operating System


Hardware is called ``hard'' because, once it is built, it is relatively difficult to change.
However, the hardware of a computer system, by itself, is useless. It must be given
directions as to what to do, i.e. a program. These programs are called software;
``soft'' because it is relatively easy to change both the instructions in a particular
program as well as which program is being executed by the hardware at any given
time. When a computer system is purchased, the hardware comes with a certain
amount of software which facilitates the use of the system. Other software to run on the
system may be purchased and/or written by the user. Some major vendors of computer
systems include: IBM, DEC, HP, AT&T, Sun, Compaq, and Apple.

The remaining blocks in Figure 1.1 are typical software layers provided on most
computing systems. This software may be thought of as having a hierarchical, layered
structure, where each layer uses the facilities of layers below it. The four major blocks
shown in the figure are the Operating System, Utilities, User Programs and
Applications.

The primary responsibility of the Operating System (OS) is to ``manage'' the


``resources'' provided by the hardware. Such management includes assigning areas of
memory to different programs which are to be run, assigning one particular program to
run on the CPU at a time, and controlling the peripheral devices. When a program is
called upon to be executed (its operations performed), it must be loaded, i.e. moved
from disk to an assigned area of memory. The OS may then direct the CPU to begin
fetching instructions from this area. Other typical responsibilities of the OS include
Secondary Storage management (assignment of space on the disk), a piece of software
called the file system, and Security (protecting the programs and data of one user from
activities of other users that may be on the same system).

8
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Many mainframe machines normally use proprietary operating systems, such as VM and
CMS (IBM) and VAX VMS and TOPS 20 (DEC). More recently, there is a move towards a
standardized operating system and most workstations and desktops typically use UNIX
(AT&T and other versions). A widely used operating system for IBM PC and compatible
personal computers is DOS (Microsoft). Apple Macintosh machines are distinguished by
an easy to use proprietary operating system with graphical icons.

Utility Programs
The layer above the OS is labeled Utilities and consists of several programs which are
primarily responsible for the logical interface with the user, i.e. the ``view'' the user
has when interacting with the computer. (Sometimes this layer and the OS layer below
are considered together as the operating system). Typical utilities include such
programs as shells, text editors, compilers, and (sometimes) the file system.

A shell is a program which serves as the primary interface between the user and the
operating system. The shell is a ``command interpreter'', i.e. is prompts the user to
enter commands for tasks which the user wants done, reads and interprets what the
user enters, and directs the OS to perform the requested task. Such commands may
call for the execution of another utility (such as a text editor or compiler) or a user
program or application, the manipulation of the file system, or some system operation
such as logging in or out. There are many variations on the types of shells available,
from relatively simple command line interpreters (DOS) or more powerful command line
interpreters (the Bourne Shell, sh, or C Shell, csh in the Unix environment), to more
complex, but easy to use graphical user interfaces (the Macintosh or Windows). You
should become familiar with the particular shell(s) available on the computer you are
using, as it will be your primary means of access to the facilities of the machine.

A text editor (as opposed to a word processor) is a program for entering programs
and data and storing them in the computer. This information is organized as a unit
called a file similar to a file in an office filing cabinet, only in this case it is stored on the
disk. (Word processors are more complex than text editors in that they may
automatically format the text, and are more properly considered applications than
utilities). There are many text editors available (for example vi and emacs on Unix
systems) and you should familiarize yourself with those available on your system.

As was mentioned earlier, in today's computing environment, most programming is


done in high level languages (HLL) such as C.However; the computer hardware cannot
understand these languages directly. Instead, the CPU executes programs coded in a
lower level language called the machine language. A utility called a compiler is
program which translates the HLL program into a form understandable to the hardware.
Again, there are many variations in compilers provided (for different languages, for
example) as well as facilities provided with the compilers (some may have built-in text

9
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
editors or debugging features). Your system manuals can describe the features
available on your system.

Finally, another important utility (or task of the operating system) is to manage the file
system for users. A file system is a collection of files in which a user keeps programs,
data, text material, graphical images, etc. The file system provides a means for the
user to organize files, giving them names and gathering them into directories (or
folders) and to manage their file storage. Typical operations which may be done with
files include creating new files, destroying, renaming, and copying files.

COMPUTER EVOLUTION
500 B.C.
The 2/5 Abacus is invented by the Chinese. Find out
more about the Abacus at "Abacus: The Art of Calculating
with Beads" by Luis Fernandes.

1 A.D.
The Antikythera Device, a mechanism that mimicked the actual movements of
the sun, moon, and planets, past, present, and future. This technology was then
lost for millennia.

10
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
1632
Wilhelm Schickard builds the first "automatic calculator", the
"Calculating Clock" which was used for computing astronomical
tables.
Wilhelm Schickard (born 1592 in Herrenberg - died 1635 in
Tübingen) built the first automatic calculator in 1623.

Contemporaries called this machine the Calculating Clock. It precedes the less versatile
Pascaline of Blaise Pascal and the calculator of Gottfried Leibniz by twenty years.
Schickard's letters to Johannes Kepler show how to use the machine for calculating
astronomical tables. The machine could add and subtract six-digit numbers, and
indicated an overflow of this capacity by ringing a bell; to aid more complex
calculations, a set of Napier's bones were mounted on it. The designs were lost until
the twentieth century; a working replica was finally constructed in 1960.

Schickard's machine, however, was not programmable. The first design of a


programmable computer came roughly 200 years later (Charles Babbage). And the first
working program-controlled machine was completed more than 300 years later (Konrad
Zuse's Z3, 1941).

The Schickard crater on the moon is named after Schickard.

1642
Blaise Pascal, a French religious philosopher and
mathematician, builds the first practical m echa nica l
calculating machine. Thereby etching his name in history to
be resurrected later for the name of a, now a r c a n e ,
programming language.

1830
The "Analytical Engine" is designed by Charles Babbage.

1850

11
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Japanese refined the Abacus into the 1/5 with one bead on top deck and five on bottom
deck.

1890
The U.S. Census Bureau adopts the Hollerith Punch Card, Tabulating
Machine and Sorter to compile results of the 1890 census, reducing an
almost 10-year process to 2 ½ years, saving the government a
whopping $5 million. Inventor Herman Hollerith, a Census Bureau
statistician, forms the Tabulating Machine Company in 1896. The TMC
eventually evolved into IBM.

1930
Abacus is again changed, to the 1/4 design.

1939
The first semi-electronic digital computing device is constructed by John Atanassoff.
The "Mark I" Automatic Sequence Controlled Calculator, the first fully automatic
calculator, is begun at Harvard by mathematician Howard Aiken. Its designed purpose
was to generate ballistic tables for Navy artillery.

1941
German inventor Konrad Zuse produces the Z3 for use in aircraft and missile design but
the German government misses the boat and does not support him. There is some
debate as to whether the Mark I or the Z3 came first.

1943
English mathematician Alan Turing (bio by Andrew Hodges) begins operation of his secret
computer for the British military. It was used by cryptographers to break secret German
military codes. It was the first vacuum tube computer but its existence was not made
public until decades later.

1946

12
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Eniac (Electronic Numerical Integrator
and Calculator), the first credited, all
electronic computer is completed at the
University of Pennsylvania. It used
thousands of vacuum tubes.

1951
Seymour Cray gets his Masters degree in
Applied Mathematics, soon after joins
Engineering Research Associates and
starts working on the 1100 series
computers for what ended up being
Univac.

Remington's Univac I (Universal


Automatic Computer), using a Teletype
keyboard and printer for user interaction, and became the first commercially
available computer. It could handle both numerical and alphabetic data.

1957
Bill Norris and friends start Control Data Corporation (CDC) bring Seymour Cray on-board
and begin building Large Scale Scientific Computers.

1958
The first "integrated circuit" is designed by American Jack Kirby. It included resistors,
capacitors and transistors on a single wafer chip.

1960
Digital Equipment delivers PDP-1, an interactive computer with CRT and keyboard. Its big
screen inspires MIT students to write the world's first computer game.

1963
Sketchpad, first WYSIWYG interactive drawing tool, is published by Ivan Sutherland as his
MIT doctoral thesis.

1965
Sutherland demonstrates first VR head-mounted 3-D display.
Ted Nelson coins the terms hypertext and hypermedia in a paper at the Association for
Computing Machinery's 20th national conference.

13
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
1968
Doug Engelbart demonstrates the first mouse.

1970
First four nodes are established on Arpanet, precursor of the Internet and World Wide
Web.

1971
IBM introduces the 3270 mainframe terminal; its character-based interface becomes the
standard for business applications.
The first "microprocessor" is produced by American engineer Marcian E. Hoff.

1972
First GUI appears as part of Xerox Parc's Smalltalk programming environment.
Seymour Cray incorporates Cray Research.

1974
Xerox PARC researches create Alto, the first computer to use the WIMP interface.
Altair 8800 microcomputer, based on Intel's 8080 processor; Interface uses toggle
switches, LEDs.

1975
Bill Gates and Paul Allen create and license the first microcomputer version of Basic, for
the Altair; Loads via a paper tape.
14
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
1977
Tandy (Radio Shack) produces the first practical personal computer, using a cassette tape
drive for programs and storage.
Apple ships Apple II, with integrated keyboard, 16-color graphics, and command-line disk
operating system.

1978
At Apple Computer, Steve Jobs proposes a "next generation"
business machine with graphical user interface. It becomes the Lisa
project.
Don Brickland and Bob Frankston's VisiCalc's text-based
spreadsheet interface becomes the personal computer's first killer
app, runs on Apple II.

1981
IBM releases the PC with 4.77 MHz, MS-DOS, command line interfaces,
and monochrome block graphics.

1984
Apple ships the Macintosh, the first mass-market computer with a monochrome desktop
GUI, plug and play, and suite of GUI productivity applications.

1985
Microsoft ships Windows 1.0, its first graphical environment.

1990
Microsoft announces Windows 3.0; adds 3-D look and feel, Program
Manager and File Manager.

1992

15
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Apple announces Newton PDA with pen-based user interface.

1993
Early Web Browsers: ECP Web browser for Macintosh released. NCSA releases Marc
Andreessen's Mosaic Web browser for X Window.

1995
Microsoft introduces Bob, industry's first "Social User Interface", and
featuring animated "assistants." Bob bombs. Watch the "Remembering
the Bob" at Tech TV.
Microsoft ships Windows 95, regarded by many as the release that
offers features comparable with Apple's Mac. It's the fastest-selling
operating system ever shipped.

1997
Microsoft Active Desktop integrates the Web with Windows.
Netscape Communicator and Constellation combine Web and desktop GUI.
Microsoft invests $150,000,000 in Apple Computers.

1998
Windows 98 released.
A good portion of the world still using the abacus, maybe 2 people
using the TRS-80.

Defining the Terms Architecture, Design, and Implementation

Introduction

Over the past 10 years many practitioners and researchers have sought to define
software architecture. At the SEI, we use the following definition:

The software architecture of a program or computing system is the structure or structures


of the system, which comprise software elements, the externally visible properties of
those elements, and the relationships among them.

However, we are interested not only in understanding the term “software


architecture” but in clarifying the difference between architecture and other related terms
such as “design” and “implementation.” The lack of a clear distinction among these terms
16
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
is the cause of much muddy thinking, imprecise communication, and wasted, overlapping
effort. For example, “architecture” is often used as a mere synonym for “design”
(sometimes preceded with the adjective “high-level”). And many people use the term
“architectural patterns” as a synonym for “design patterns.”

Confusion also stems from the use of the same specification language for both
architectural and design specifications. For example, UML is often used as an architectural
description language. In fact, UML has become the industry de facto standard for
describing architectures, although it was specifically designed to manifest detailed design
decisions (and this is still its most common use). This merely contributes to the confusion,
since a designer using UML has no way (within UML) of distinguishing architectural
information from other types of information.

Confusion also exists with respect to the artifacts of design and implementation. UML class
diagrams, for instance, are a prototypical artifact of the design phase. Nonetheless, class
diagrams may accumulate enough detail to allow code generation of very detailed
programs, an approach that is promoted by CASE tools such as Rational Rose and System
Architect. Using the same specification language further blurs the distinction between
artifacts of the design (class diagrams) and artifacts of the implementation (source
code). Having a unified specification language is, in many ways, a good thing. But a user
of this unified language is given little help in knowing if a proposed change is
“architectural” or not.

Why are we interested in such distinctions? Naturally, a well-defined language improves


our understanding of the subject matter. With time, terms that are used interchangeably
lose their meaning, resulting inevitably in ambiguous descriptions given by developers,
and significant effort is wasted in discussions of the form “by design I mean…and by
architecture I mean…”

Seeking to separate architectural design from other design activities, definers of software
architecture in the past have stressed the following:

1. “Architecture is concerned with the selection of architectural elements, their


interaction, and the constraints on those elements and their interactions…Design is
concerned with the modularization and detailed interfaces of the design elements,
their algorithms and procedures, and the data types needed to support the
architecture and to satisfy the requirements.”

2. Software architecture is “concerned with issues...beyond the algorithms and data


structures of the computation.”

3. “Architecture…is specifically not about…details of implementations (e.g., algorithms


and data structures.)…Architectural design involves a richer collection of
abstractions than is typically provided by OOD” (object-oriented design).

17
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
In suggesting typical “architectures” and “architectural styles,” existing definitions consist
of examples and offer anecdotes rather than providing clear and unambiguous notions. In
practice, the terms “architecture,” “design,” and “implementation” appear to connote
varying degrees of abstraction in the continuum between complete details
(“implementation”), few details (“design”), and the highest form of abstraction
(“architecture”). But the amount of detail alone is insufficient to characterize the
differences, because architecture and design documents often contain detail that is not
explicit in the implementation (e.g., design constraints, standards, performance goals).
Thus, we would expect a distinction between these terms to be qualitative and not merely
quantitative.

The ontology that we provide below can serve as a reference point for these discussions.

The Intension/Locality Thesis

To elucidate the relationship between architecture, design, and implementation, we


distinguish at least two separate interpretations for abstraction in our context:

1. Intensional (vs. extensional) design specifications are “abstract” in the sense that
they can be formally characterized by the use of logic variables that range over an
unbounded domain. For example, a layered architectural pattern does not restrict
the architect to a specific number of layers; it applies equally well to 2 layers or 12
layers.

2. Non-local (vs. local) specifications are “abstract” in the sense that they apply to all
parts of the system (as opposed to being limited to some part thereof).

Both of these interpretations contribute to the distinction among architecture, design, and
implementation, summarized as the “intension/locality thesis”:

1. Architectural specifications are intensional and non-local

2. Design specifications are intensional but local

3. Implementation specifications are both extensional and local

Table 1 summarizes these distinctions.


Table 1. The Intension/Locality Thesis
Architecture Intensional Non-local

Design Intensional Local

Implementation Extensional Local

18
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Implications

What are the implications of such definitions? They give us a firm basis for determining
what is architectural (and hence crucial for the achievement of a system’s quality attribute
requirements) and what is not.

Consider the concept of a strictly layered architecture (an architecture in which each layer
is allowed to use only the layer immediately below it). How do we know that the
architectural style “layered” is really architectural? To answer that we need to answer
whether this style is intentional and whether it is local or non-local. First of all, are there
an unbounded number of implementations that qualify as layered? Clearly there are.
Secondly, is the layered style local or non-local? To answer that, we need only consider a
violation of the style, where a layer depends on a layer above it, or several layers below
it. Since this would be a violation wherever it occurred, the notion of a layered
architecture must be non-local.

What about a design pattern, such as the factory pattern? This is intensional, because
there may be an unbounded number of realizations of a factory design pattern within a
system. But is it local or non-local? One may use a design pattern in some corner of the
system and not use it (or even violate it) in a different portion of the same system. So
design patterns are local.

Similarly, it is simple to show that the term “implementation” refers only to artifacts that
are extensional and local.

Conclusions

Since the inception of architecture as a distinct field of study, there has been much
confusion about what the term “architecture” means. Similarly, the distinction between
architecture and other forms of design artifacts has never been clear. The
intension/locality thesis provides a foundation for determining the meaning of the terms
architecture, design, and implementation that accords not only with intuition but also with
best industrial practices. A more formal and complete treatment of this topic can be found
in our paper, “Architecture, Design, Implementation.” But what are the consequences of
precisely knowing the differences among these terms? Is this an exercise in definition for
definition’s sake? We think not. Among others, these distinctions facilitate

1. determining what constitutes a uniform program (e.g., a collection of modules that


satisfy the same architectural specifications)

2. determining what information goes into architecture documents and what goes into
design documents

19
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
3. determining what to examine and what not to examine in an architectural
evaluation or a design walkthrough

4. understanding the distinction between local and non-local rules (i.e., between the
design rules that are enforced throughout a project versus those that are of a more
limited domain, because the architectural rules define the fabric of the system and
how it will meet its quality attribute requirements, and the violation of architectural
rules typically has more far-reaching consequences than the violation of a local
rule)

Furthermore, in the industrial practice of software architecture, many statements that are
said to be “architectural” are in fact local (e.g., both tasks A and B execute on the same
node, or task A controls B). Instead, a truly architectural statement would be, for
instance, for each pair of tasks A,B that satisfy some property X, A and B will execute on
the same node and the property Control(A,B) holds.

More generally, for each specification we should be able to determine whether it is a


design statement, describing a purely local phenomenon (and hence of secondary interest
in architectural documentation, discussion, or analysis), or whether it is an instance of an
underlying, more general rule. This is a powerful piece of information.

How you understand difference between architecture and design?

· I'd say that architecture is a view of software that's at a higher level than design,
i.e. more abstract and less connected with the actual implementation. The
architecture gives structure to the design elements, while the design elements give
structure to the implemented code.

· The software architecture of a program or computing system is the structure or


structures of the system, which comprise software elements, the externally visible
properties of those elements, and the relationships among them.
Design -- The process of defining the architecture, components, interfaces, and
other characteristics of a system or component.
So, Design is a process of producing an instance of Software architecture. Software
architecture is a domain of knowledge about abstract models and organization.
Software architecture is not a low-level design.

· I would add that architecture is design, but not all design is architecture.

20
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
LESSON II Combination Logic

Introduction
Digital electronics is classified into combinational logic and sequential logic. Combinational
logic output depends on the inputs levels, whereas sequential logic output depends on
stored levels and also the input levels.

The memory elements are devices capable of storing binary info. The binary info stored in
the memory elements at any given time defines the state of the sequential circuit. The
input and the present state of the memory element determine the output. Memory
elements next state is also a function of external inputs and present state. A sequential
circuit is specified by a time sequence of inputs, outputs, and internal states.

21
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
There are two types of sequential circuits. Their classification depends on the timing of
their signals:

· Synchronous sequential circuits

· Asynchronous sequential circuits


Asynchronous sequential circuit
This is a system whose outputs depend upon the order in which its input variables
change and can be affected at any instant of time.

Gate-type asynchronous systems are basically combinational circuits with feedback


paths. Because of the feedback among logic gates, the system may, at times, become
unstable. Consequently they are not often used.

Synchronous sequential circuits


This type of system uses storage elements called flip-flops that are employed to change
their binary value only at discrete instants of time. Synchronous sequential circuits use
logic gates and flip-flop storage devices. Sequential circuits have a clock signal as one of
their inputs. All state transitions in such circuits occur only when the clock value is
either 0 or 1 or happen at the rising or falling edges of the clock depending on the type
of memory elements used in the circuit. Synchronization is achieved by a timing device
called a clock pulse generator. Clock pulses are distributed throughout the system in
such a way that the flip-flops are affected only with the arrival of the synchronization
pulse. Synchronous sequential circuits that use clock pulses in the inputs are called
clocked-sequential circuits. They are stable and their timing can easily be broken down
into independent discrete steps, each of which is considered separately.

22
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
A clock signal is a periodic square wave that indefinitely switches from 0 to 1 and from
1 to 0 at fixed intervals. Clock cycle time or clock period: the time interval between
two consecutive rising or falling edges of the clock.

Clock Frequency = 1 / clock cycle time (measured in cycles per second or Hz)

Example: Clock cycle time = 10ns clock frequency = 100Mhz


Concept of Sequential Logic
A sequential circuit as seen in the last page is combinational logic with some feedback
to maintain its current value, like a memory cell. To understand the basics let's
consider the basic feedback logic circuit below, which is a simple NOT gate whose
output is connected to its input. The effect is that output oscillates between HIGH and
LOW (i.e. 1 and 0). Oscillation frequency depends on gate delay and wire delay.
Assuming a wire delay of 0 and a gate delay of 10ns, then oscillation frequency would
be (on time + off time = 20ns) 50Mhz.

The basic idea of having the feedback is to store the value or hold the value, but in the
above circuit, output keeps toggling. We can overcome this problem with the circuit
below, which is basically cascading two inverters, so that the feedback is in-phase, thus
avoids toggling. The equivalent circuit is the same as having a buffer with its output
connected to its input.

But there is a problem here too: each gate output value is stable, but what will it be?
Or in other words buffer output can not be known. There is no way to tell. If we could
know or set the value we would have a simple 1-bit storage/memory element.

23
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
The circuit below is the same as the inverters connected back to back with provision to
set the state of each gate (NOR gate with both inputs shorted is like an inverter). I am
not going to explain the operation, as it is clear from the truth table. S is called set
and R is called Reset.

S R Q Q+

0 0 0 0

0 0 1 1

0 1 X 0

1 0 X 1

1 1 X 0

There still seems to be some problem with the above configuration, we can not control
when the input should be sampled, in other words there is no enable signal to control
when the input is sampled. Normally input enable signals can be of two types.

· Level Sensitive or ( LATCH)

· Edge Sensitive or (Flip-Flop)

Level Sensitive: The circuit below is a modification of the above one to have level
sensitive enable input. Enable, when LOW, masks the input S and R. When HIGH,
presents S and R to the sequential logic input (the above circuit two NOR Gates). Thus
Enable, when HIGH, transfers input S and R to the sequential cell transparently, so this
kind of sequential circuits are called transparent Latch. The memory element we get is
an RS Latch with active high Enable.

24
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Edge Sensitive: The circuit below is a cascade of two level sensitive memory
elements, with a phase shift in the enable input between first memory element and
second memory element. The first RS latch (i.e. the first memory element) will be
enabled when CLK input is HIGH and the second RS latch will be enabled when CLK is
LOW. The net effect is input RS is moved to Q and Q' when CLK changes state from
HIGH to LOW, this HIGH to LOW transition is called falling edge. So the Edge Sensitive
element we get is called negative edge RS flip-flop.

Now that we know the sequential circuit’s basics, let's look at each of them in detail in
accordance to what is taught in colleges. You are always welcome to suggest if this can
be written better in any way.

Latches and Flip-Flops

There are two types of sequential circuits.

· Asynchronous Circuits.

· Synchronous Circuits.

As seen in last section, Latches and Flip-flops are one and the same with a slight
variation: Latches have level sensitive control signal input and Flip-flops have edge
sensitive control signal input. Flip-flops and latches which use this control signals are
called synchronous circuits. So if they don't use clock inputs, then they are called
asynchronous circuits.
25
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
RS Latch

RS latch have two inputs, S and R. S is called set and R is called reset. The S input is
used to produce HIGH on Q (i.e. store binary 1 in flip-flop). The R input is used to
produce LOW on Q (i.e. store binary 0 in flip-flop). Q' is Q complementary output, so
it always holds the opposite value of Q. The output of the S-R latch depends on
current as well as previous inputs or state, and its state (value stored) can change as
soon as its inputs change. The circuit and the truth table of RS latch are shown below.
(This circuit is as we saw in the last page, but arranged to look beautiful :-)).

S R Q Q+

0 0 0 0

0 0 1 1

0 1 X 0

1 0 X 1

1 1 X 0

The operation has to be analyzed with the 4 inputs combinations together with the 2
possible previous states.

· When S = 0 and R = 0: If we assume Q = 1 and Q' = 0 as initial condition, then


output Q after input is applied would be Q = (R + Q')' = 1 and Q' = (S + Q)' = 0.
Assuming Q = 0 and Q' = 1 as initial condition, then output Q after the input
applied would be Q = (R + Q')' = 0 and Q' = (S + Q)' = 1. So it is clear that when
both S and R inputs are LOW, the output is retained as before the application of
inputs. (i.e. there is no state change).

· When S = 1 and R = 0: If we assume Q = 1 and Q' = 0 as initial condition, then


output Q after input is applied would be Q = (R + Q')' = 1 and Q' = (S + Q)' = 0.
26
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Assuming Q = 0 and Q' = 1 as initial condition, then output Q after the input
applied would be Q = (R + Q')' = 1 and Q' = (S + Q)' = 0. So in simple words when
S is HIGH and R is LOW, output Q is HIGH.

· When S = 0 and R = 1: If we assume Q = 1 and Q' = 0 as initial condition, then


output Q after input is applied would be Q = (R + Q')' = 0 and Q' = (S + Q)' = 1.
Assuming Q = 0 and Q' = 1 as initial condition, then output Q after the input
applied would be Q = (R + Q')' = 0 and Q' = (S + Q)' = 1. So in simple words when
S is LOW and R is HIGH, output Q is LOW.

When S = 1 and R =1: No matter what state

· Q and Q' are in, application of 1 at input of NOR gate always results in 0 at output
of NOR gate, which results in both Q and Q' set to LOW (i.e. Q = Q'). LOW in both
the outputs basically is wrong, so this case is invalid.

The waveform below shows the operation of NOR gates based RS Latch.

It is possible to construct the RS latch using NAND gates (of course as seen in Logic
gates section). The only difference is that NAND is NOR gate dual form (Did I say that
in Logic gates section?). So in this case the R = 0 and S = 0 case becomes the invalid
case. The circuit and Truth table of RS latch using NAND is shown below.

27
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
S R Q Q+

1 1 0 0

1 1 1 1

0 1 X 0

1 0 X 1

0 0 X 1

If you look closely, there is no control signal (i.e. no clock and no enable), so these
kinds of latches or flip-flops are called asynchronous logic elements. Since all the
sequential circuits are built around the RS latch, we will concentrate on synchronous
circuits and not on asynchronous circuits.
RS Latch with Clock
We have seen this circuit earlier with two possible input configurations: one with level
sensitive input and one with edge sensitive input. The circuit below shows the level
sensitive RS latch. Control signal "Enable" E is used to gate the input S and R to the RS
Latch. When Enable E is HIGH, both the AND gates act as buffers and thus R and S
appears at the RS latch input and it functions like a normal RS latch. When Enable E is
LOW, it drives LOW to both inputs of RS latch. As we saw in previous page, when both
inputs of a NOR latch are low, values are retained (i.e. the output does not change).

28
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Setup and Hold Time

For synchronous flip-flops, we have special requirements for the inputs with respect to
clock signal input. They are

· Setup Time: Minimum time period during which data must be stable before the
clock makes a valid transition. For example, for a posedge triggered flip-flop, with a
setup time of 2 ns, Input Data (i.e. R and S in the case of RS flip-flop) should be
stable for at least 2 ns before clock makes transition from 0 to 1.

· Hold Time: Minimum time period during which data must be stable after the clock
has made a valid transition. For example, for a posedge triggered flip-flop, with a
hold time of 1 ns. Input Data (i.e. R and S in the case of RS flip-flop) should be
stable for at least 1 ns after clock has made transition from 0 to 1.

If data makes transition within this setup window and before the hold window, then the
flip-flop output is not predictable, and flip-flop enters what is known as meta stable
state. In this state flip-flop output oscillates between 0 and 1. It takes some time for the
flip-flop to settle down. The whole process is called metastability. You could refer to
tidbits section to know more information on this topic.

The waveform below shows input S (R is not shown), and CLK and output Q (Q' is not
shown) for a SR posedge flip-flop.

29
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
D Latch

The RS latch seen earlier contains ambiguous state; to eliminate this condition we can
ensure that S and R are never equal. This is done by connecting S and R together with an
inverter. Thus we have D Latch: the same as the RS latch, with the only difference that
there is only one input, instead of two (R and S). This input is called D or Data input. D
latch is called D transparent latch for the reasons explained earlier. Delay flip-flop or delay
latch is another name used. Below is the truth table and circuit of D latch.

In real world designs (ASIC/FPGA Designs) only D latches/Flip-Flops are used.

D Q Q+

1 X 1

0 X 0

30
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Below is the D latch waveform, which is similar to the RS latch one, but with R removed.

JK Latch

The ambiguous state output in the RS latch was eliminated in the D latch by joining the
inputs with an inverter. But the D latch has a single input. JK latch is similar to RS latch in
that it has 2 inputs J and K as shown figure below. The ambiguous state has been
eliminated here: when both inputs are high, output toggles. The only difference we see
here is output feedback to inputs, which is not there in the RS latch.

J K Q

1 1 0

1 1 1

1 0 1

31
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
0 1 0

T Latch

When the two inputs of JK latch are shorted, a T Latch is formed. It is called T latch as,
when input is held HIGH, output toggles.

T Q Q+

1 0 1

1 1 0

0 1 1

0 0 0

JK Master Slave Flip-Flop

All sequential circuits that we have seen in the last few pages have a problem (All level
sensitive sequential circuits have this problem). Before the enable input changes state
from HIGH to LOW (assuming HIGH is ON and LOW is OFF state), if inputs changes, then
another state transition occurs for the same enable pulse. This sort of multiple transition
problem is called racing.

If we make the sequential element sensitive to edges, instead of levels, we can overcome
this problem, as input is evaluated only during enable/clock edges.

32
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
In the figure above there are two latches, the first latch on the left is called master latch
and the one on the right is called slave latch. Master latch is positively clocked and slave
latch is negatively clocked.

Sequential Circuits Design

We saw in the combinational circuits section how to design a combinational circuit from
the given problem. We convert the problem into a truth table, then draw K-map for the
truth table, and then finally draw the gate level circuit for the problem. Similarly we have
a flow for the sequential circuit design. The steps are given below.

· Draw state diagram.

· Draw the state table (excitation table) for each output.

· Draw the K-map for each output.

· Draw the circuit.

33
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Looks like sequential circuit design flow is very much the same as for combinational
circuit.

State Diagram

The state diagram is constructed using all the states of the sequential circuit in question.
It builds up the relationship between various states and also shows how inputs affect the
states.

To ease the following of the tutorial, let's consider designing the 2 bit up counter (Binary
counter is one which counts a binary sequence) using the T flip-flop.

Below is the state diagram of the 2-bit binary counter.

State Table

The state table is the same as the excitation table of a flip-flop, i.e. what inputs need to
be applied to get the required output. In other words this table gives the inputs required
to produce the specific outputs.

Q1 Q0 Q1+ Q0+ T1 T0

0 0 0 1 0 1

0 1 1 0 1 1

1 0 1 1 0 1

1 1 0 0 1 1

K-map

The K-map is the same as the combinational circuits K-map. Only difference: we draw
K-map for the inputs i.e. T1 and T0 in the above table. From the table we deduct that we

34
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
don't need to draw K-map for T0, as it is high for all the state combinations. But for T1 we
need to draw the K-map as shown below, using SOP.

Circuit

There is nothing special in drawing the circuit; it is the same as any circuit drawing from
K-map output. Below is the circuit of 2-bit up counter using the T flip-flop.

35
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
OPERATING MANUAL FOR

LOGIC BASIC SERIES

ELECTRONIC INDICATORS

Choice of Three Power Sources

1. Batteries
A set of two Manganese Dioxide Lithium batteries will operate this electronic indicator for
approximately 250 hours of normal usage. Because milliampere hour ratings vary widely
with manufacturers, normal usage time is very hard to predict. The lithium battery used in
this indicator is an IEC standard, type CR2450. The indicators are shipped with the
batteries not installed, and should not be installed until battery operation is desired.

NOTE: This indicator has an .AUTO-OFF. feature to conserve battery life. After 10 minutes
of .no activity. (no key presses or spindle movement), the gage will turn itself off. This
feature may be disabled if continuous operation is desired; see .AUTO-OFF On/Off.
instructions in this book.

Installing Batteries

36
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Using a narrow screwdriver, gently pry under the tab on the left side of plastic bezel and
slide out the battery tray as you turn the indicator face side down. Insert two batteries,
.+. side up, into tray cavities, then slide the tray back into its bezel slot, taking care that
the batteries stay in proper position.

AC Adapter
AC adapters (providing 9VDC at 30ma.
maximum to the indicator from a 115 or 230
VAC, 50/60 Hz line source) may be
purchased from CDI. Although other 9V AC adapters with a 3/32. (2.5mm) mini-plug
(center +) may be used, CDI adapters are recommended because they include current
limiting to prevent damage from line fluctuations. For 115 V (USA) operation - Order CDI
Part #G11-0012 For 230 V (Europe) operation - Order CDI Part #G11-0014 First insert
the mini-plug into the socket on the lower left side of the bezel (see
drawing on page 2), then plug the adapter into a wall outlet. After turning the indicator
.ON., disable the .AUTO-OFF. feature; see .AUTO OFF On/Off.

2. Data I/O Connector


Power also may be provided through the data I/O connector, for special featuring or
applications where the indicator is integrated with another piece of equipment A
ripple-free 5 VDC (4.9 to 5. 7 V) regulated voltage source is required. CDI Cable
#G13-0034 or a custom variation of another CDI data cable must be used.
Contact CDI for full information.

37
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Button Functions
NOTE: Most functions are active on
release of button(s).

Key Function Controlled


OFF . Press & Release: Turns indicator off
ON/CLR - Press & Release: Turns indicator on, or
clears/resets indicator.
With HOLD off: Clears display to .0.
With MAX HOLD on: Clears display to spindle
position, leaves
HOLD on.
-Press & Hold (For longer than 5 seconds): Enter/Exit display and
key test mode.
HOLD . Press & Release: Turns hold function on/off and cancels last
selection.
2ND . Press & Hold (for more than 2 seconds until 2ND is displayed):
Enables 2ND and 3RD functions such as TR REV (Travel
Reverse), IN/MM and AUTO OFF.
CHNG - Used with 2ND key to activate selectable resolution.

38
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Display-Operating Prompts & Conditions

Operating Instructions

39
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
To Turn
AUTO OFF On/Off - Press and hold "2ND" until
2ND appears at bottom of display then release. - Press
and release "OFF" within 3 seconds.

NOTE: An hourglass appears at left side of display if


’AUTO OFF’ is active. TO Clear Display . to zero - Press
and release "ON/CLR".

To Verify
DATA I/0 FORMAT
To view the current output format. - Press and release "2ND", until the 2ND appears in
display, then "ON/CLR" and "2ND" in sequence. Format information is displayed for
about 3 seconds, then indicator automatically returns to normal operation. Format
information is displayed as:

RS232 =rS232
MTI compatible =SEr
CDI mux BCD =Cdi
Bypass =bP

To Use
HOLD
To select type of HOLD - Freeze, Minimum or
Maximum: -Press and hold "HOLD" until cursor
moves under desired type of hold; FRZ, MIN or MAX,
then release.
To turn HOLD On/Off:
. Press and release "HOLD"
. MAX HOLD - Holds and displays highest reading.
. MIN HOLD - Holds and displays lowest reading.
. FREEZE HOLD - Freezes display when "HOLD" button is pressed.

NOTE: Pressing CLR button resets indicator to spindle position.

To Change
INCH/MILLIMETER
To change from one to the other: - Press and hold "MOVE/2ND" until 2ND appears at
bottom of display then release. - Press and release "TOL" within 3 seconds.
NOTE: MM or IN will appear at bottom of display.

40
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
To Turn

INDICATOR ON

Press "ON/CLR" and release when ’clr’ appears on


display To Turn INDICATOR OFF- Press and release
"OFF"

TO
Reset to DEFAULT
A total reset: clears all user settings and returns to factory-set defaults.
1. Press and hold "2ND" until 2ND appears at bottom of display, then release.
2. Press and release "ON/CLR" within 3 seconds.
3. Press and release "CHNG" within 3 seconds.
NOTE: Cannot be done if Lock
feature is on.

To Change
RESOLUTION
-Press and hold "2ND" until 2ND appears at bottom of display then release. - Press and
release "ON/CLR" within 3 seconds.
- Press and release "HOLD" within 3 seconds.
Use "CHNG" key to step through available resolution selections:
1 = .00005" (.001mm)
2 = .0001" (.002mm)
3 = .00025" (.005mm)
4 = .0005" (.O1mm)
5 = .001" (.02mm)
Press and release "CHNG" and "2ND" simultaneously to save.
Note: Only resolutions coarser than indicator resolution-as-purchased are
available.

To Enter
TEST MODE
Press and hold (for more than 5 seconds) "ON/CLR" to enter ’display and key’
test mode.
To Exit

41
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
TEST MODE
Press and hold (for more than 5 seconds) "ON/CLR" to exit ’display and key’ test
mode.

To Change
TRAVEL DIRECTION
- Press and hold "2ND" until 2ND appears at bottom of display then release. - Press and
release "HOLD" within 3 seconds.

Note: Arrow in upper right corner will show positive direction of spindle travel.
NOTE: Most functions are active on release of key(s).

Internal Memory
"LOGIC" Series indicators and remote displays include internal non-volatile memory to
store all factory default and user settings. When the indicator is turned on, user settings
and preset numbers will be the same as when the indicator was turned off.

NOTE: Many of the user settings are stored when the indicator is turned .Off. by using the
"OFF" key, or when the indicator turns itself off (AUTO OFF). However, if the indicator is
turned off by removing power (by disconnecting the AC adapter or cutting power through
the Data 1/0 connector), some or all of the user settings and/or changes may be lost!
Operating Precautions

1. Do not use the bottom of the spindle stroke as a base of measurement reference, as it
is protected with a rubber shock absorber to prevent shock to the internal mechanism.
The spindle should be offset .005.-.010" (.12 -.25 mm) from the bottom of travel.

2. Use of CDI type MS-10 or similar sturdy stands or fixtures for indicator mounting,
where the base plate and indicator are mounted to a common post, is highly
recommended for accurate and repeatable readings. The indicator must be mounted with
the spindle perpendicular to the reference or base plate. If the indicator is stem-mounted,
protect the indicator from attempted rotation, and from being stuck or bumped, to
prevent stem/case mechanical alignment damage. Do not over-tighten the mounting
mechanism, and use clamp mounting rather than set screws if at all possible, to prevent
damage to the stem.

3. The bezel face can be rotated from its normal horizontal position for convenient
viewing. Rotation is limited to 270 degrees and attempts to force it past its internal stop
may damage the indicator.
4. Frequently clean the spindle to prevent sluggish or sticky movement. Dry wiping with a
lint-free cloth usually will suffice, but isopropyl alcohol may be used to remove gummy

42
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
deposits. Do not apply any type of lubricant to the spindle. Spindle dust caps and spindle
boots are available for operation in dirty or abrasive environments. 1" Spindle dust cap -
Order CDI Part #A21.0131 l. Spindle boot - Order CDI Part #CD170-1 Use a soft cloth
dampened with a mild detergent to clean the bezel and front face of the indicator. Do not
use aromatic solvents as they may cause damage.

5. Extremely high electrical transients - from nearby arc welders, SCR motor/lighting
controls, radio transmitters, etc. - may cause malfunctions of the indicator’s internal
circuitry or ’ERROR 1’ indications, even through the electronic design was created to
minimize such problems. If at all possible, do not operate the indicator in plant areas
subject to these transients. Turning the indicator ’OFF’ for a few seconds, then back ’ON’
from time-to-time may eliminate any problems. Also, use of an isolated AC line (for AC
adapter operated indicators and AC powered remote displays), or an AC line filter - plus
solid grounding of stands and fixtures - is recommended in these conditions.

Additional Display-Operating Prompts & Conditions

FLASHING DIGIT or +/- sign - Digit or sign affected by .CHNG. key when setting or
changing preset numbers.
FLASHING READING, with HIGH or LOW displayed Reading is out of tolerance, to
the high or low side.
ERROR 1 - Spindle speed too fast, high electrical noise, etc.
ERROR 2 - Counter overflow, i.e. counter number (spindle + preset number) out of
counter range.
ERROR 3 - Improper tolerance combination, i.e. both "HIGH" and ’LOW" set to ’O’ or
same number, or "LOW’ set to a higher number than ’HIGH’. Occurs only when ’TOL’ is
on.
ERROR 4 - Display overflow, i.e. number too large to be properly displayed. Moving
spindle to acceptable range eliminates error message.

Data Output

’LOGIC’ Series indicators and remote displays provide users with multiple data output
formats. The cable attached to the indicator when it is turned on determines the output
format in use. Cables for each format can be purchased from CDI. These cables also
provide remote control of ’ON/CLR’ and ’HOLD’ functions, plus +5v regulated power input.
For special applications, an ERROR FLAG output and/or custom cables also can be
provided; contact CDI for
information.

CAUTION: Use of cables other than those provided or approved by CDI can cause
irreparable damage to the indicator or data output port, and such damage is not covered
by the CDI Limited Warranty.

43
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Standard RS232 Format - Communications protocol is 1200 baud, no parity, 8 data
bits, and 1 stop bit. RS232 can be read by any IBM PC-compatible computer, RS232 serial
printer or other device, provided the device can be set to this protocol. A DB25 pin
adapter may be necessary for non-standard devices. "WINDOWS" terminal and other
communications software, "WEDGE" software, etc., may be used with this format. Cables
Required:
CDI #GO3-0018 - For IBM Compatible PC (CDI indicator to DB25F)
CDI #GO3-0021 - For CDI serial printer types G19-0001/Gl9- 0002 & G19-0003
(CDI indicator to DB25M)

MITUTOYO Compatible Format - Use with MITUTOYO compatible printers, collection


devices, etc.
Cable Required:
CDI #G03-0019 - CDI indicator to MTI 10 pin

CDI (Multiplexed BCD) Format - Furnished with pigtails one end.


Cable Required:
CDI #Gl3-0034 - Also may be used for remote control of ’ON/CLR’ or ’HOLD’ functions, or
external power (+5V regulated) input. (CDI indicator to pigtail wires.)

BYPASS FORMAT - Permits indicator to be used as a probe for the CDI remote display:
bypasses ’raw’ unprocessed signals from the detector system directly to the data output
connector. In this operation mode, power for the indicator is supplied by the remote
display. Cable Required:
CDI #Gl3-0022 - CDI indicator to 6-pin DIN

IMPORTANT- Indicator and remote display must be of same base resolution. If the two
(2) are different base resolutions, you will experience compatibility problems.

Limited Warranty

"PLUS SERIES" INDICATORS ARE WARRANTED FOR A PERIOD OF ONE YEAR AGAINST
DEFECTIVE MATERIALS OR WORKMANSHIP. THIS WARRANTY DOES NOT APPLY TO
PRODUCTS THAT ARE MISHANDLED, MISUSED, ETCHED, STAMPED, OR OTHERWISE
MARKED OR DAMAGED, NOR DOES IT APPLY TO DAMAGE OR ERRONEOUS OPERATION
CAUSED
BY USER TAMPERING OR ATTEMPTS TO MODIFY THE INDICATOR. UNITS FOUND TO BE
DEFECTIVE WITHIN THE WARRANTY PERIOD WILL BE REPAIRED OR REPLACED FREE OF
CHARGE AT THE OPTION OF CDI. A NOMINAL CHARGE WILL BE MADE FOR
NON-WARRANTY REPAIRS, PROVIDED THE UNIT IS NOT DAMAGED BEYOND REPAIR.

Boolean algebra

44
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
For a basic introduction to sets, Boolean operations, Venn diagrams, truth tables, and
Boolean applications, see Boolean logic.
For an alternative perspective see Boolean algebras canonically defined.
In abstract algebra, a Boolean algebra is an algebraic structure (a collection of
elements and operations on them obeying defining axioms) that captures essential
properties of both set operations and logic operations. Specifically, it deals with the set
operations of intersection, union, complement; and the logic operations of AND, OR,
NOT.

For example, the logical assertion that a statement a and its negation ¬a cannot both
be true,

Boolean lattice of subsets

parallels the set-theory assertion that a subset A and its complement AC have empty
intersection,

Because truth values can be represented as binary numbers or as voltage levels in logic
circuits, the parallel extends to these as well. Thus the theory of Boolean algebras has
many practical applications in electrical engineering and computer science, as well as in
mathematical logic.

A Boolean algebra is also called a Boolean lattice. The connection to lattices (special
partially ordered sets) is suggested by the parallel between set inclusion, A ⊆ B, and
ordering, a ≤ b. Consider the lattice of all subsets of {x,y,z}, ordered by set inclusion.
This Boolean lattice is a partially ordered set in which, say, {x} ≤ {x,y}. Any two
lattice elements, say p = {x,y} and q = {y,z}, have a least upper bound, here {x,y,z},
and a greatest lower bound, here {y}. Suggestively, the least upper bound (or join or
supremum) is denoted by the same symbol as logical OR, p∨q; and the greatest lower
bound (or meet or infimum) is denoted by same symbol as logical AND, p∧q.

45
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
The lattice interpretation helps in generalizing to Heyting algebras, which are Boolean
algebras freed from the restriction that either a statement or its negation must be true.
Heyting algebras correspond to intuitionist (constructivist) logic just as Boolean
algebras correspond to classical logic.

Formal definition
A Boolean algebra is a set A, supplied with two binary operations (called AND),
(called OR), a unary operation (called NOT) and two distinct elements 0 (called
zero) and 1 (called one), such that, for all elements a, b and c of set A, the following
axioms hold:

associativity

commutativity

absorption

distributivity

complements

The first three pairs of axioms above: associativity, commutativity and absorption,
mean that (A, , ) is a lattice. If A is a lattice and one of the above distributivity laws
holds, then the second distributivity law can be proven. Thus, a Boolean algebra can
also be equivalently defined as a distributive complemented lattice.

From these axioms, one can show that the smallest element 0, the largest element 1,
and the complement ¬a of any element a are uniquely determined. For all a and b in A,
the following identities also follow:
idempotency

bounded ness

0 and 1 are complements

De Morgan's laws

involution

Examples

46
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
· The simplest Boolean algebra has only two elements, 0 and 1, and is defined by the
rules:

∧ 0 1 ∨ 0 1
a 0 1
0 0 0 0 0 1
¬a 1 0
1 0 1 1 1 1

· It has applications in logic, interpreting 0 as false, 1 as true, ∧ as and, ∨ as


or, and ¬ as not. Expressions involving variables and the Boolean operations
represent statement forms, and two such expressions can be shown to be
equal using the above axioms if and only if the corresponding statement
forms are logically equivalent.

· The two-element Boolean algebra is also used for circuit design in electrical
engineering; here 0 and 1 represent the two different states of one bit in a
digital circuit, typically high and low voltage. Circuits are described by
expressions containing variables, and two such expressions are equal for all
values of the variables if and only if the corresponding circuits have the same
input-output behavior. Furthermore, every possible input-output behavior
can be modeled by a suitable Boolean expression.

· The two-element Boolean algebra is also important in the general theory of


Boolean algebras, because an equation involving several variables is
generally true in all Boolean algebras if and only if it is true in the
two-element Boolean algebra (which can always be checked by a trivial brute
force algorithm). This can for example be used to show that the following
laws (Consensus theorems) are generally valid in all Boolean algebras:

· (a ∨ b) ∧ (¬a ∨ c) ∧ (b ∨ c) ≡ (a ∨ b) ∧ (¬a ∨ c)

· (a ∧ b) ∨ (¬a ∧ c) ∨ (b ∧ c) ≡ (a ∧ b) ∨ (¬a ∧ c)

· Starting with the propositional calculus with κ sentence symbols, form the
Lindenbaum algebra (that is, the set of sentences in the propositional calculus
modulo tautology). This construction yields a Boolean algebra. It is in fact the free
Boolean algebra on κ generators. A truth assignment in propositional calculus is
then a Boolean algebra homomorphism from this algebra to {0,1}.

· The power set (set of all subsets) of any given nonempty set S forms a Boolean
algebra with the two operations ∨ := ∪ (union) and ∧ := ∩ (intersection). The
smallest element 0 is the empty set and the largest element 1 is the set S itself.

· The set of all subsets of S that are either finite or cofinite is a Boolean algebra.

47
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
· For any natural number n, the set of all positive divisors of n forms a distributive
lattice if we write a ≤ b for a | b. This lattice is a Boolean algebra if and only if n is
square-free. The smallest element 0 of this Boolean algebra is the natural number
1; the largest element 1 of this Boolean algebra is the natural number n.

· Other examples of Boolean algebras arise from topological spaces: if X is a


topological space, then the collection of all subsets of X which are both open and
closed forms a Boolean algebra with the operations ∨ := ∪ (union) and ∧ := ∩
(intersection).

· If R is an arbitrary ring and we define the set of central idempotents by


A = { e ∈ R: e2 = e, ex = xe, ∀x ∈ R }
then the set A becomes a Boolean algebra with the operations e ∨ f := e + f − ef
and e ∧ f := ef.

· Certain Lindenbaum–Tarski algebras.

Order theoretic properties

Boolean lattice of subsets


Like any lattice, a Boolean algebra (A, , ) gives rise to a partially ordered set (A, ≤)
by defining
a ≤ b precisely when a = a b
(which is also equivalent to b = a b).

In fact one can also define a Boolean algebra to be a distributive lattice (A, ≤)
(considered as a partially ordered set) with least element 0 and greatest element 1,
within which every element x has a complement ¬x such that
x ¬x = 0 and x ¬x = 1

48
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Here and are used to denote the infimum (meet) and supremum (join) of two
elements. Again, if complements in the above sense exist, then they are uniquely
determined.

The algebraic and the order theoretic perspective can usually be used interchangeably
and both are of great use to import results and concepts from both universal algebra
and order theory. In many practical examples an ordering relation, conjunction,
disjunction, and negation are all naturally available, so that it is straightforward to
exploit this relationship.

Principle of duality
One can also apply general insights from duality in order theory to Boolean algebras.
Especially, the order dual of every Boolean algebra, or, equivalently, the algebra
obtained by exchanging and , is also a Boolean algebra. In general, any law valid for
Boolean algebras can be transformed into another valid, dual law by exchanging 0 with
1, with , and ≤ with ≥.

Other notation
The operators of Boolean algebra may be represented in various ways. Often they are
simply written as AND, OR and NOT. In describing circuits, NAND (NOT AND), NOR (NOT
OR) and XOR (exclusive OR) may also be used. Mathematicians, engineers, and
programmers often use + for OR and · for AND (since in some ways those operations
are analogous to addition and multiplication in other algebraic structures and this
notation makes it very easy to get sum of products form for people who are familiar
with normal algebra) and represent NOT by a line drawn above the expression being
negated. Sometimes, the symbol ~ or ! is used for NOT.

Here we use another common notation with "meet" for AND, "join" for OR, and ¬ for
NOT.

Homomorphisms and isomorphisms


A homomorphism between the Boolean algebras A and B is a function f : A → B such
that for all a, b in A:
f(a b) = f(a) f(b)
f(a b) = f(a) f(b)
f(0) =0
f(1) =1
It then follows that f(¬a) = ¬f(a) for all a in A as well. The class of all Boolean algebras,
together with this notion of morphism, forms a category. An isomorphism from A to B is
a homomorphism from A to B which is bijective. The inverse of an isomorphism is also
an isomorphism, and we call the two Boolean algebras A and B isomorphic. From the
49
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
standpoint of Boolean algebra theory, they cannot be distinguished; they differ only in
the notation of their elements.

Boolean rings, ideals and filters


Every Boolean algebra (A, , ) gives rise to a ring (A, +, *) by defining a + b = (a
¬b) (b ¬a) (this operation is called "symmetric difference" in the case of sets and
XOR in the case of logic) and a * b = a b. The zero element of this ring coincides with
the 0 of the Boolean algebra; the multiplicative identity element of the ring is the 1 of
the Boolean algebra. This ring has the property that a * a = a for all a in A; rings with
this property are called Boolean rings.

Conversely, if a Boolean ring A is given, we can turn it into a Boolean algebra by


defining x y = x + y + xy and x y = xy. Since these two operations are inverses of
each other, we can say that every Boolean ring arises from a Boolean algebra, and vice
versa. Furthermore, a map f : A → B is a homomorphism of Boolean algebras if and
only if it is a homomorphism of Boolean rings. The categories of Boolean rings and
Boolean algebras are equivalent.

An ideal of the Boolean algebra A is a subset I such that for all x, y in I we have x y in
I and for all a in A we have a x in I. This notion of ideal coincides with the notion of
ring ideal in the Boolean ring A. An ideal I of A is called prime if I ≠ A and if a b in I
always implies a in I or b in I. An ideal I of A is called maximal if I ≠ A and if the only
ideal properly containing I is A itself. These notions coincide with ring theoretic ones of
prime ideal and maximal ideal in the Boolean ring A.

The dual of an ideal is a filter. A filter of the Boolean algebra A is a subset p such that
for all x, y in p we have x y in p and for all a in A if a x = a then a in p.

Representing Boolean algebras


It can be shown that every finite Boolean algebra is isomorphic to the Boolean algebra
of all subsets of a finite set. Therefore, the number of elements of every finite Boolean
algebra is a power of two.

Stone's celebrated representation theorem for Boolean algebras states that every
Boolean algebra A is isomorphic to the Boolean algebra of all closed-open sets in some
(compact totally disconnected Hausdorff) topological space.

Axiomatics for Boolean algebras


Let the unary functional symbol n be read as 'complement'. In 1933, the American
mathematician Edward Vermilye Huntington (1874–1952) set out the following elegant
axiomatization for Boolean algebra:

1. Commutativity: x + y = y + x.
50
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
2. Associativity: (x + y) + z = x + (y + z).

3. Huntington equation: n(n(x) + y) + n(n(x) + n(y)) = x.

Herbert Robbins immediately asked: If the Huntington equation is replaced with its
dual, to wit:
4. Robbins Equation: n(n(x + y) + n(x + n(y))) = x,
do (1), (2), and (4) form a basis for Boolean algebra? Calling (1), (2), and (4) a
Robbins algebra, the question then becomes: Is every Robbins algebra a Boolean
algebra? This question remained open for decades, and became a favorite question of
Alfred Tarski and his students.

In 1996, William McCune at Argonne National Laboratory, building on earlier work by


Larry Wos, Steve Winker, and Bob Veroff, answered Robbins's question in the
affirmative: Every Robbins algebra is a Boolean algebra. Crucial to McCune's proof was
the automated reasoning program EQP he designed. For a simplification of McCune's
proof, see Dahn (1998).

Boolean algebra

<mathematics, logic> (After the logician George Boole)

1. Commonly, and especially in computer science and digital electronics, this term is
used to mean two-valued logic.

2. This is in stark contrast with the definition used by pure mathematicians who in the
1960s introduced "Boolean-valued models" into logic precisely because a
"Boolean-valued model" is an interpretation of a theory that allows more than two
possible truth values!

Strangely, a Boolean algebra (in the mathematical sense) is not strictly an algebra, but
is in fact a lattice. A Boolean algebra is sometimes defined as a "complemented
distributive lattice".

Boole's work which inspired the mathematical definition concerned algebras of sets,
involving the operations of intersection, union and complement on sets. Such algebras
obey the following identities where the operators ^, V, - and constants 1 and 0 can be
thought of either as set intersection, union, complement, universal, empty; or as
two-valued logic AND, OR, NOT, TRUE, FALSE; or any other conforming system.
a^b=b^a aVb = bVa (commutative laws)
(a ^ b) ^ c = a ^ (b ^ c)
(a V b) V c = a V (b V c) (associative laws)
a ^ (b V c) = (a ^ b) V (a ^ c)
a V (b ^ c) = (a V b) ^ (a V c) (distributive laws)
a^a = a aVa = a (idempotence laws)
51
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
--a = a
-(a ^ b) = (-a) V (-b)
-(a V b) = (-a) ^ (-b) (de Morgan's laws)
a ^ -a = 0 a V -a = 1
a^1 = a aV0 = a
a^0 = 0 aV1 = 1
-1 = 0 -0 = 1

There are several common alternative notations for the "-" or logical complement
operator.
If a and b are elements of a Boolean algebra, we define a <= b to mean that a ^ b = a,
or equivalently a V b = b. Thus, for example, if ^, V and - denote set intersection, union
and complement then <= is the inclusive subset relation. The relation <= is a partial
ordering, though it is not necessarily a linear ordering since some Boolean algebras
contain incomparable values.

Note that these laws only refer explicitly to the two distinguished constants 1 and 0
(sometimes written as LaTeX \top and \bot), and in two-valued logic there are no
others, but according to the more general mathematical definition, in some systems
variables a, b and c may take on other values as well.

History
The term "Boolean algebra" honors George Boole (1815–1864), a self-educated English
mathematician. The algebraic system of logic he formulated in his 1854 monograph The
Laws of Thought differs from that described above in some important respects. For
example, conjunction and disjunction in Boole were not a dual pair of operations.
Boolean algebra emerged in the 1860s, in papers written by William Jevons and Charles
Peirce. To the 1890 Vorlesungen of Ernst Schröder we owe the first systematic
presentation of Boolean algebra and distributive lattices. The first extensive treatment
of Boolean algebra in English is A. N. Whitehead's 1898 Universal Algebra. Boolean
algebra as an axiomatic algebraic structure in the modern axiomatic sense begins with
a 1904 paper by Edward Vermilye Huntington. Boolean algebra came of age as serious
mathematics with the work of Marshall Stone in the 1930s, and with Garrett Birkhoff's
1940 Lattice Theory. In the 1960s, Paul Cohen, Dana Scott, and others found deep new
results in mathematical logic and axiomatic set theory using offshoots of Boolean
algebra, namely forcing and Boolean-valued models.

Building series-parallel resistor circuits


Once again, when building battery/resistor circuits, the student or hobbyist is faced with
several different modes of construction. Perhaps the most popular is the solder less
breadboard: a platform for constructing temporary circuits by plugging components and
52
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
wires into a grid of interconnected points. A breadboard appears to be nothing but a
plastic frame with hundreds of small holes in it. Underneath each hole, though, is a
spring clip which connects to other spring clips beneath other holes. The connection
pattern between holes is simple and uniform:

Suppose we wanted to construct the following series-parallel combination circuit on a


breadboard:

53
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
54
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
The recommended way to do so on a
breadboard would be to arrange the resistors in
approximately the same pattern as seen in the
schematic, for ease of relation to the schematic.
If 24 volts is required and we only have 6-volt
batteries available, four may be connected in
series to achieve the same effect:

This is by no means the only way to connect


these four resistors together to form the circuit
shown in the schematic. Consider this
alternative layout:

If greater permanence is desired without resorting to soldering or wire-wrapping, one


could choose to construct this circuit on a terminal strip (also called a barrier strip, or
terminal block). In this method, components and wires are secured by mechanical
tension underneath screws or heavy clips attached to small metal bars. The metal bars,
in turn, are mounted on a no conducting body to keep them electrically isolated from
each other.

55
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Building a circuit with components secured to a terminal strip isn't as easy as plugging
components into a breadboard, principally because the components cannot be physically
arranged to resemble the schematic layout. Instead, the builder must understand how
to "bend" the schematic's representation into the real-world layout of the strip. Consider
one example of how the same four-resistor circuit could be built on a terminal strip:

Another terminal strip layout, simpler to understand and relate to the schematic,
involves anchoring parallel resistors (R1//R2 and R3//R4) to the same two terminal
points on the strip like this:

56
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Building more complex
circuits on a terminal strip involves the same spatial-reasoning skills, but of course
requires greater care and planning. Take for instance this complex circuit, represented in
schematic form:

The terminal strip used in the


prior example barely has
enough terminals to mount all
seven resistors required for
this circuit! It will be a
challenge to determine all the
necessary wire connections
between resistors, but with
patience it can be done. First,
begin by installing and
labeling all resistors on the
strip. The original schematic
diagram will be shown next to
the terminal strip circuit for
reference:

Next, begin connecting components together wire by wire as shown in the schematic.
Over-draw connecting lines in the schematic to indicate completion in the real circuit.
Watch this sequence of illustrations as each individual wire is identified in the schematic,
then added to the real circuit:

57
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
58
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
59
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
60
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
61
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
62
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Although
there are minor variations possible with this terminal strip circuit, the choice of
connections shown in this example sequence is both electrically accurate (electrically
identical to the schematic diagram) and carries the additional benefit of not burdening
any one screw terminal on the strip with more than two wire ends, a good practice in any
terminal strip circuit.

An example of a "variant" wire connection might be the very last wire added (step 11),
which I placed between the left terminal of R2 and the left terminal of R3. This last wire
completed the parallel connection between R2 and R3 in the circuit. However, I could
have placed this wire instead between the left terminal of R2 and the right terminal of
R1, since the right terminal of R1 is already connected to the left terminal of R3 (having
been placed there in step 9) and so is electrically common with that one point. Doing
this, though, would have resulted in three wires secured to the right terminal of R1
instead of two, which is a faux pax in terminal strip etiquette. Would the circuit have
worked this way? Certainly! It's just that more than two wires secured at a single
terminal makes for a "messy" connection: one that is aesthetically unpleasing and may
place undue stress on the screw terminal.

Integrated Circuits (Chips)

Integrated Circuits are usually called ICs or chips. They are complex circuits
which have been etched onto tiny chips of semiconductor (silicon). The chip is packaged
in a plastic holder with pins spaced on a 0.1" (2.54mm) grid which will fit the holes on

63
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
strip board and breadboards. Very fine wires inside the package link the chip to the
pins.

Pin numbers

The pins are numbered anti-clockwise around


the IC (chip) starting near the notch or dot. The
diagram shows the numbering for 8-pin and 14-pin
ICs, but the principle is the same for all sizes.

Chip holders (DIL sockets)


ICs (chips) are easily damaged by heat when soldering and their
short pins cannot be protected with a heat sink. Instead we use a chip
holder, strictly called a DIL socket (DIL = Dual In-Line), which can be
safely soldered onto the circuit board. The chip is pushed into the holder
when all soldering is complete.

Chip holders are only needed when soldering so they are not used on breadboards.

Commercially produced circuit boards often have chips soldered directly to the board
without a chip holder, usually this is done by a machine which is able to work very
quickly. Please don't attempt to do this yourself because you are likely to destroy the
chip and it will be difficult to remove without damage by de-soldering.

Removing a chip from its holder


If you need to remove a chip it can be gently prised out of the holder with a small
flat-blade screwdriver. Carefully lever up each end by inserting the screwdriver blade
between the chip and its holder and gently twisting the screwdriver. Take care to start
lifting at both ends before you attempt to remove the chip, otherwise you will bend and
possibly break the pins.

Static precautions

Antistatic bags for ICs

64
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Many ICs are static sensitive and can be damaged when you touch them because your
body may have become charged with static electricity, from your clothes for example.
Static sensitive ICs will be supplied in antistatic packaging with a warning label and they
should be left in this packaging until you are ready to use them.
It is usually adequate to earth your hands by touching a metal water pipe or window
frame before handling the IC but for the more sensitive (and expensive!) ICs special
equipment is available, including earthed wrist straps and earthed work surfaces. You can
make an earthed work surface with a sheet of aluminum kitchen foil and using a crocodile
clip to connect the foil to a metal water pipe or window frame with a 10k resistor in
series.

Datasheets

Datasheets are available for most ICs giving detailed information about their ratings
and functions. In some cases example circuits are shown. The large amount of
information with symbols and abbreviations can make datasheets seem overwhelming to
a beginner, but they are worth reading as you become more confident because they
contain a great deal of useful information for more experienced users designing and
testing circuits.

Sinking and sourcing current


Chip outputs are often said to 'sink' or 'source' current. The terms refer to the direction of
the current at the chip's output.
If the chip is sinking current it is flowing into the output.
This means that a device connected between the positive supply
(+Vs) and the chip output will be switched on when the
output is low (0V).

If the chip is sourcing current it is flowing out of the output.


This means that a device connected between the chip output
and the negative supply (0V) will be switched on when the
output is high (+Vs).

It is possible to connect two devices to a chip output so that one


is on when the output is low and the other is on when the
output is high. This arrangement is used in the Level Crossing
project to make the red LEDs flash alternately.

The maximum sinking and sourcing currents for a chip output are usually the
same but there are some exceptions, for example 74LS TTL logic chips can sink up to
16mA but only source 2mA.
65
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Using diodes to combine outputs
The outputs of chips (ICs) must never be directly
connected together. However, diodes can be used to
combine two or more digital (high/low) outputs from a
chip such as a counter. This can be a useful way of
producing simple logic functions without using logic gates!

The diagram shows


two ways of
combining outputs
using diodes. The
diodes must be
capable of passing the output current. 1N4148
signal diodes are suitable for low current devices
such as LEDs.

For example the outputs Q0 - Q9 of a 4017 1-of-10


counter go high in turn. Using diodes to combine the
2nd (Q1) and 4th (Q3) outputs as shown in the
bottom diagram will make the LED flash twice
followed by a longer gap. The diodes are performing
the function of an OR gate.

The 555 and 556 Timers


The 8-pin 555 timer chip is used in many projects, a popular version is the NE555.
Most circuits will just specify '555 timer IC' and the NE555 is suitable for these. The 555
output (pin 3) can sink and source up to 200mA. This is more than most chips and it is
sufficient to supply LEDs, relay coils and low current lamps. To switch larger currents you
can connect a transistor.
The 556 is a dual version of the 555 housed in a 14-pin package. The two timers (A
and B) share the same power supply pins.

Low power versions of the 555 are made, such as the ICM7555, but these should
only be used when specified (to increase battery life) because their maximum output
current of about 20mA (with 9V supply) is too low for many standard 555 circuits. The
ICM7555 has the same pin arrangement as a standard 555.

Logic ICs (chips)


Logic ICs process digital signals and there are many devices, including logic gates,
flip-flops, shift registers, counters and display drivers. They can be split into two groups
according to their pin arrangements: the 4000 series and the 74 series which consists of
various families such as the 74HC, 74HCT and 74LS.

66
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
For most new projects the 74HC family is the best choice. The older 4000 series is
the only family which works with a supply voltage of more than 6V. The 74LS and 74HCT
families require a 5V supply so they are not convenient for battery operation.

The table below summarizes the important properties of the most popular logic families:
74 Series 74 Series 74 Series
Property 4000 Series
74HC 74HCT 74LS
High-speed CMO High-speed CMOS TTL Low-power
Technology CMOS
S TTL compatible Schottky
Power Supply 3 to 15V 2 to 6V 5V ±0.5V 5V ±0.25V
Very high
Very high impedance. Unused
impedance. Unused 'Float' high to logic
inputs must be connected to +Vs
inputs must be 1 if unconnected.
or 0V. Inputs cannot be reliably
Inputs connected to +Vs 1mA must be
driven by 74LS outputs unless a
or 0V. Compatible drawn out to hold
'pull-up' resistor is used (see
with 74LS (TTL) them at logic 0.
below).
outputs.
Can sink and
source about Can sink and Can sink and Can sink up to
5mA (10mA with source about source about 16mA (enough to
9V supply), 20mA, enough 20mA, enough to light an LED), but
Outputs enough to light to light an LED. light an LED. To source only about
an LED. To To switch larger switch larger 2mA. To switch
switch larger currents use a currents use a larger currents
currents use a transistor. transistor. use a transistor.
transistor.
One output can
drive up to 50 One output can
One output can drive up to 50
CMOS, 74HC or drive up to 10
Fan-out CMOS, 74HC or 74HCT inputs, but
74HCT inputs, 74LS inputs or 50
only 10 74LS inputs.
but only one 74HCT inputs.
74LS input.
Maximum
about 1MHz about 25MHz about 25MHz about 35MHz
Frequency
Power
consumption A few µW. A few µW. A few µW. A few mW.
of the IC itself

Mixing Logic Families

67
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
It is best to build a circuit using just one logic
family, but if necessary the different families may be
mixed providing the power supply is suitable for all of
them. For example mixing 4000 and 74HC requires the
power supply to be in the range 3 to 6V. A circuit
which includes 74LS or 74HCT ICs must have a 5V
supply.
A 74LS output cannot reliably drive a 4000 or
74HC input unless a 'pull-up' resistor of 2.2k is
connected between the +5V supply and the input to correct the slightly different voltage
ranges used for logic 0.
Driving 4000 or 74HC inputs from a

Note that a 4000 series output can drive only one


74LS input.

4000 Series CMOS


This family of logic ICs is numbered from 4000 onwards, and from 4500 onwards.
They have a B at the end of the number (e.g. 4001B) which refers to an improved design
introduced some years ago. Most of them are in 14-pin or 16-pin packages. They use
CMOS circuitry which means they use very little power and can tolerate a wide range of
power supply voltages (3 to 15V) making them ideal for battery powered projects.
CMOS is pronounced 'see-moss' and stands for Complementary Metal Oxide
Semiconductor.
However the CMOS circuitry also means that they are static sensitive. Touching a
pin while charged with static electricity (from your clothes for example) may damage
the IC. In fact most ICs in regular use are quite tolerant and earthing your hands by
touching a metal water pipe or window frame before handling them will be adequate.
ICs should be left in their protective packaging until you are ready to use them. For the
more sensitive (and expensive!) ICs special equipment is available, including earthed
wrist straps and earthed work surfaces.

74 Series: 74LS, 74HC and 74HCT


There are several families of logic ICs numbered from 74xx00 onwards with letters
(xx) in the middle of the number to indicate the type of circuitry, eg 74LS00 and 74HC00.
The original family (now obsolete) had no letters, eg 7400.
The 74LS (Low-power Schottky) family (like the original) uses TTL
(Transistor-Transistor Logic) circuitry which is fast but requires more power than later
families.

The 74HC family has High-speed CMOS circuitry, combining the speed of TTL
with the very low power consumption of the 4000 series. They are CMOS ICs with the
68
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
same pin arrangements as the older 74LS family. Note that 74HC inputs cannot be
reliably driven by 74LS outputs because the voltage ranges used for logic 0 are not
quite compatible, use 74HCT instead.

The 74HCT family is a special version of 74HC with 74LS TTL-compatible inputs
so 74HCT can be safely mixed with 74LS in the same system. In fact 74HCT can be
used as low-power direct replacements for the older 74LS ICs in most circuits. The
minor disadvantage of 74HCT is a lower immunity to noise, but this is unlikely to be a
problem in most situations.

Beware that the 74 series is often still called the 'TTL series' even though the latest ICs
do not use TTL!

The CMOS circuitry used in the 74HC and 74HCT series ICs means that they are
static sensitive. Touching a pin while charged with static electricity (from your clothes
for example) may damage the IC. In fact most ICs in regular use are quite tolerant and
earthing your hands by touching a metal water pipe or window frame before handling
them will be adequate. ICs should be left in their protective packaging until you are
ready to use them.

PIC microcontrollers
A PIC is a Programmable Integrated Circuit microcontroller, a
'computer-on-a-chip'. They have a processor and memory to run a program responding
to inputs and controlling outputs, so they can easily achieve complex functions which
would require several conventional ICs.

Programming a PIC microcontroller may seem daunting to


a beginner but there are a number of systems designed to make
this easy. The PICAXE system is an excellent example because it
uses a standard computer to program (and re-program) the
PICs; no specialist equipment is required other than a low-cost
download lead. Programs can be written in a simple version of
BASIC or using a flowchart. The PICAXE programming software
and extensive documentation is available to download free of charge, making the
system ideal for education and users at home.

If you think PICs are not for you because you have never written a computer
program, please look at the PICAXE system! It is very easy to get started using a few
simple BASIC commands and there are a number of projects available as kits which are
ideal for beginners.

Static Timing Analysis is a method of computing the expected timing of a digital


circuit without requiring simulation.

69
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
High-performance integrated circuits have traditionally been characterized by the clock
frequency at which they operate. Gauging the ability of a circuit to operate at the
specified speed requires an ability to measure, during the design process, its delay at
numerous steps. Moreover, delay calculation must be incorporated into the inner loop of
timing optimizers at various phases of design, such as logic synthesis, layout
(placement and routing), and in in-place optimizations performed late in the design
cycle. While such timing measurements can theoretically be performed using a rigorous
circuit simulation, such an approach is liable to be too slow to be practical. Static timing
analysis plays a vital role in facilitating the fast and reasonably accurate measurement
of circuit timing. The speedup appears due to the use of simplified delay models, and on
account of the fact that its ability to consider the effects of logical interactions between
signals is limited. Nevertheless, it has become a mainstay of design over the last few
decades; one of the earliest descriptions of a static timing approach was published in
the 1970s.

Purpose
In a synchronous digital system, data is supposed to move in lockstep, advancing one
stage on each tick of the clock signal. This is enforced by synchronizing elements such
as flip-flops or latches, which copy their input to their output when instructed to do so
by the clock. To first order, only two kinds of timing errors are possible in such a
system:

· A hold time violation, when a signal arrives too early, and advances one clock
cycle before it should

· A setup time violation, when a signal arrives too late, and misses the time when
it should advance.

The time when a signal arrives can vary due to many reasons - the input data may
vary, the circuit may perform different operations, the temperature and voltage may
change, and there are manufacturing differences in the exact construction of each part.
The main goal of static timing analysis is to verify that despite these possible variations,
all signals will arrive neither too early nor too late, and hence proper circuit operation
can be assured.

Also, since STA is capable of verifying every path, apart from helping locate setup and
hold time violations, it can detect other serious problems like glitches, slow paths and
clock skew.

Definitions
· The critical path is defined as the path between an input and an output with the
maximum delay. Once the circuit timing has been computed by one of the
techniques below, the critical path can easily found by using a trace back method.

70
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
· The arrival time of a signal is the time elapsed for a signal to arrive at a certain
point. The reference, or time 0.0, is often taken as the arrival time of a clock signal.
To calculate the arrival time, delay calculation of all the component of the path will
be required. Arrival times, and indeed almost all times in timing analysis, are
normally kept as a pair of values - the earliest possible time at which a signal can
change, and the latest.

· Another useful concept is required time. This is the latest time at which a signal
can arrive without making the clock cycle longer than desired. The computation of
the required time proceeds as follows. At each primary output, the required times
for rise/fall are set according to the specifications provided to the circuit. Next, a
backward topological traversal is carried out, processing each gate when the
required times at all of its fan outs are known.

· The slack associated with each connection is the difference between the required
time and the arrival time. A positive slack s at a node implies that the arrival time
at that node may be increased by s without affecting the overall delay of the circuit.
Conversely, negative slack implies that a path is too slow, and the path must sped
up (or the reference signal delayed) if the whole circuit is to work at the desired
speed.

Corners and STA


Quite often, designers will want to qualify their design across many conditions. Behavior
of an electronic circuit is often dependent on various factors in its environment like
temperature or local voltage variations. In such a case either STA needs to be
performed for more than one such set of conditions, or STA must be prepared to work
with a range of possible delays for each component, as opposed to a single value. If the
design works at each extreme condition, then under the assumption of monotonic
behavior, the design is also qualified for all intermediate points.

The use of corners in static timing analysis has several limitations. It may be overly
optimistic, since it assumes perfect tracking - if one gate is fast, all gates are assumed
fast, or if the voltage is low for one gate, it's also low for all others. Corners may also be
overly pessimistic, for the worst case corner may seldom occur. In an IC, for example, it
may not be rare to have one metal layer at the thin or thick end of its allowed range,
but it would be very rare for all 10 layers to be at the same limit, since they are
manufactured independently. Statistical STA, which replaces delays with distributions,
and tracking with correlation, is a more sophisticated approach to the same problem.

The most prominent techniques for STA


In static timing analysis, the word static alludes to the fact that this timing analysis is
carried out in an input-independent manner, and purports to find the worst-case delay
of the circuit over all possible input combinations. The computational efficiency (linear in
71
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
the number of edges in the graph) of such an approach has resulted in its widespread
use, even though it has some limitations. A method that is commonly referred to as
PERT is popularly used in STA. In fact, PERT is a misnomer, and the so-called PERT
method discussed in most of the literature on timing analysis refers to the CPM (Critical
path method) that is widely used in project management.

While the CPM-based methods are the dominant ones in use today, other methods for
traversing circuit graphs, such as depth-first search, have been used by various timing
analyzers.

Interface Timing Analysis


Many of the common problems in chip designing are related to interface timing between
different components of the design. These can arise because of many factors including
incomplete simulation models, lack of test cases to properly verify interface timing,
requirements for synchronization, incorrect interface specifications, and lack of designer
understanding of a component supplied as a 'black box'. There are specialized CAD tools
designed explicitly to analyze interface timing, just as there are specific CAD tools to
verify that an implementation of an interface conforms to the functional specification
(using techniques such as model checking).

Statistical static timing analysis


Statistical STA (SSTA) is a procedure that is becoming increasingly necessary to handle
the complexities of process and environmental variations in integrated circuits. See
Statistical Analysis and Design of Integrated Circuits for a much more in-depth
discussion of this topic.

LESSON III

Synchronous Sequential Circuits

Flip-flops are synchronous bistable devices. The term synchronous means the output
changes state only when the clock input is triggered. That is, changes in the output
occur in synchronization with the clock.

Flip-flop is a kind of multivibrator. There are three types of multivibrators:

1. Monostable multivibrator (also called one-shot) has only one stable state. It
produces a single pulse in response to a triggering input.

2. Bistable multivibrator exhibits two stable states. It is able to retain the two SET
and RESET states indefinitely. It is commonly used as a basic building block for
counters, registers and memories.

72
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
3. Astable multivibrator has no stable state at all. It is used primarily as an oscillator
to generate periodic pulse waveforms for timing purposes.

The three basic categories of bistable elements are emphasized: edge-triggered


flip-flop, pulse-triggered (master-slave) flip-flop, and data lock-out flip-flop. Their
operating characteristics and basic applications will also be discussed.

Edge-Triggered Flip-flops
An edge-triggered flip-flop changes states either at the positive edge (rising edge) or at
the negative edge (falling edge) of the clock pulse on the control input. The three basic
types are introduced here: S-R, J-K and D.

Positive edge-triggered (without bubble at Clock input):

S-R, J-K, and D.

Negative edge-triggered (with bubble at Clock input):


S-R, J-K, and D.

The S-R, J-K and D inputs are called synchronous inputs because data on these
inputs are transferred to the flip-flop's output only on the triggering edge of the clock
pulse. On the other hand, the direct set (SET) and clear (CLR) inputs are called
asynchronous inputs, as they are inputs that affect the state of the flip-flop independent
of the clock. For the synchronous operations to work properly, these asynchronous inputs
must both be kept LOW.

Edge-triggered S-R flip-flop


The basic operation is illustrated below, along with the truth table for this type of
flip-flop. The operation and truth table for a negative edge-triggered flip-flop are the
same as those for a positive except that the falling edge of the clock pulse is the
triggering edge.

As S = 1, R = 0. Flip-flop SETS on the rising clock edge.


73
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Note that the S and R inputs can be changed at any time when the clock input is LOW or
HIGH (except for a very short interval around the triggering transition of the clock)
without affecting

the output. This is illustrated in the timing diagram below:

Edge-triggered J-K flip-flop


The J-K flip-flop works very similar to S-R flip-flop. The only difference is that this flip-flop
has NO invalid state. The outputs toggle (change to the opposite state) when both J and
K inputs are HIGH. The truth table is shown below.

Edge-triggered D flip-flop
The operations of a D flip-flop is much more simpler. It has only one input addition to the
clock. It is very useful when a single data bit (0 or 1) is to be stored. If there is a HIGH
on the D input when a clock pulse is applied, the flip-flop SETs and stores a 1. If there is
a LOW on the D input when a clock pulse is applied, the flip-flop RESETs and stores a 0.
The truth table below summarize the operations of the positive edge-triggered D flip-flop.
As before, the negative edge-triggered flip-flop works the same except that the falling
edge of the clock pulse is the triggering edge.

74
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Pulse-Triggered (Master-Slave) Flip-flops
The term pulse-triggered means that data are entered into the flip-flop on the rising
edge of the clock pulse, but the output does not reflect the input state until the falling
edge of the clock pulse. As this kind of flip-flops are sensitive to any change of the input
levels during the clock pulse is still HIGH, the inputs must be set up prior to the clock
pulse's rising edge and must not be changed before the falling edge. Otherwise,
ambiguous results will happen.

The three basic types of pulse-triggered flip-flops are S-R, J-K and D. Their logic symbols
are shown below. Notice that they do not have the dynamic input indicator at the clock
input but have postponed output symbols at the outputs.

The truth tables for the above pulse-triggered flip-flops are all the same as that for
the edge-triggered flip-flops, except for the way they are clocked. These flip-flops are
also called Master-Slave flip-flops simply because their internal construction are divided
into two sections. The slave section is basically the same as the master section except
that it is clocked on the inverted clock pulse and is controlled by the outputs of the master
section rather than by the external inputs. The logic diagram for a basic master-slave S-R
flip-flop is shown below.

75
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Data Lock-Out Flip-flops
The data lock-out flip-flop is similar to the pulse-triggered (master-slave) flip-flop
except it has a dynamic clock input. The dynamic clock disables (locks out) the data
inputs after the rising edge of the clock pulse. Therefore, the inputs do not need to be
held constant while the clock pulse is HIGH.

The master section of this flip-flop is like an edge-triggered device. The slave section
becomes a pulse-triggered device to produce a postponed output on the falling edge of
the clock pulse.

The logic symbols of S-R, J-K and D data lock-out flip-flops are shown below. Notice they
all have the dynamic input indicator as well as the postponed output symbol.

Again, the above data lock-out flip-flops have same the truth tables as that for the
edge-triggered flip-flops, except for the way they are clocked.

Operating Characteristics
The operating characteristics mentions here apply to all flip-flops regardless of the
particular form of the circuit. They are typically found in data sheets for integrated
76
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
circuits. They specify the performance, operating requirements, and operating limitations
of the circuit.

Propagation Delay Time - is the interval of time required after an input signal has been
applied for the resulting output change to occur.
Set-Up Time - is the minimum interval required for the logic levels to be maintained
constantly on the inputs (J and K, or S and R, or D) prior to the triggering edge of the
clock pulse in order for the levels to be reliably clocked into the flip-flop.

Hold Time - is the minimum interval required for the logic levels to remain on the
inputs after the triggering edge of the clock pulse in order for the levels to be reliably
clocked into the flip-flop.

Maximum Clock Frequency - is the highest rate that a flip-flop can be reliably
triggered.

Power Dissipation - is the total power consumption of the device.

Pulse Widths - are the minimum pulse widths specified by the manufacturer for the
Clock, SET and CLEAR inputs.

Frequency Division
When a pulse waveform is applied to the clock input of a J-K flip-flop that is connected to
toggle, the Q output is a square wave with half the frequency of the clock input. If more
flip-flops are connected together as shown in the figure below, further division of the clock
frequency can be achieved.

77
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
The Q output of the second flip-flop is one-fourth the frequency of the original clock
input. This is because the frequency of the clock is divided by 2 by the first flip-flop, then
divided by 2 again by the second flip-flop. If more flip-flops are connected this way, the
frequency division would be 2 to the power n, where n is the number of flip-flops.

Parallel Data Storage


In digital systems, data are normally stored in groups of bits that represent numbers,
codes, or other information. So, it is common to take several bits of data on parallel lines
and store them simultaneously in a group of flip-flops. This operation is illustrated in the
figure below.

Each of the three parallel data lines is connected to the D input of


a flip-flop. Since the entire clock inputs are connected to the
same clock, the data on the D inputs are stored simultaneously
by the flip-flops on the positive edge of the clock. Registers, a
group of flip-flops use for data storage, will be explained in more
detail in a later chapter.

Counting

78
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Another very important application of flip-flops is
in digital counters, which are covered in detail in
the next chapter.
A counter that counts from 0 to 3 is illustrated
in the timing diagram on the right. The two-bit
binary sequence repeats every four clock
pulses. When it counts to 3, it recycles back to
0 to begin the sequence again. 0

Flip-flop (electronics)

In digital circuits, the flip-flop, latch, or bistable multivibrator is an electronic circuit


which has two stable states and thereby is capable of serving as one bit of memory. A
flip-flop is controlled by one or two control signals and/or a gate or clock signal. The
output often includes the complement as well as the normal output. As flip-flops are
implemented electronically, they naturally also require power and ground connections.

Flip-flops can be either simple or clocked. Simple flip-flops consist of two cross-coupled
inverting elements – transistors, or NAND, or NOR-gates – perhaps augmented by some
enable/disable (gating) mechanism. Clocked devices are specially designed for
synchronous (time-discrete) systems and therefore ignores its inputs except at the
transition of a dedicated clock signal (known as clocking, pulsing, or strobing). This
causes the flip-flop to either change or retain its output signal based upon the values of
the input signals at the transition. Some flip-flops change output on the rising edge of
the clock, other on the falling edge.

Clocked flip-flops are typically implemented as master-slave devices* where two basic
flip-flops (plus some additional logic) collaborates to make it insensitive to spikes and
noise between the short clock transitions; they nevertheless also often include
asynchronous clear or set inputs which may be used to change the current output
independent of the clock.

Flip-flops can be further divided into types that have found common applicability in both
asynchronous and clocked sequential systems: the SR ("set-reset"), D ("data"), T
("toggle"), and JK types are the common ones; all of which may be synthetisized from
(most) other types by a few logic gates. The behavior of a particular type can be

79
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
described by what is termed the characteristic equation, which derives the "next" (i.e.,
after the next clock pulse) output, Qnext, in terms of the input signal(s) and/or the
current output, Q.

The first electronic flip-flop was invented in 1919 by William Eccles and F. W. Jordan
[1]. It was initially called the Eccles-Jordan trigger circuit and consisted of two
active elements (radio-tubes). The name flip-flop was later derived from the sound
produced on a speaker connected with one of the back coupled amplifiers output during
the trigger process within the circuit.

* Early master-slave devices actually remained (half) open between the first and
second edge of a clocking pulse; today most flip-flops are designed so they may be
clocked by a single edge as this gives large benefits regarding noise immunity, without
any significant downsides.

Set-Reset flip-flops (SR flip-flops)


See SR latch.

Toggle flip-flops (T flip-flops)

A circuit symbol for a T-type flip-flop, where > is the clock input, T is the toggle input and
Q is the stored data output.
If the T input is high, the T flip-flop changes state ("toggles") whenever the clock input
is strobe. If the T input is low, the flip-flop holds the previous value. This behavior is
described by the characteristic equation:
(or, without benefit of the XOR operator, the equivalent:
)
and can be described in a truth table:

Q n e Commen
T Q
xt t

0 0 0 hold state

80
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
0 1 1 hold state

1 0 1 toggle

1 1 0 toggle

A toggle flip-flop composed of a single RS flip-flop becomes an oscillator, when it is


clocked. To achieve toggling, the clock pulse must have exactly the length of half a
cycle. While such a pulse generator can be built, a toggle flip-flop composed of two RS
flip-flops is the easy solution. Thus the toggle flip-flop divides the clock frequency by 2
i.e. if clock frequency is 4 MHz, the output frequency obtained from the flip-flop will be
2 MHz. This 'divide by' feature has application in various types of digital counters.

JK flip-flop

JK flip-flop timing diagram


The JK flip-flop augments the behavior of the SR flip-flop by interpreting the S = R = 1
condition as a "flip" command. Specifically, the combination J = 1, K = 0 is a command
to set the flip-flop; the combination J = 0, K = 1 is a command to reset the flip-flop;
and the combination J = K = 1 is a command to toggle the flip-flop, i.e., change its
output to the logical complement of its current value. Setting J = K = 0 does NOT result
in a D flip-flop, but rather, will hold the current state. To synthesize a D flip-flop, simply
set K equal to the complement of J. The JK flip-flop is therefore a universal flip-flop,
because it can be configured to work as an SR flip-flop, a D flip-flop or a T flip-flop.

A circuit symbol for a JK flip-flop, where > is the clock input, J and K are data inputs, Q is
the stored data output, and Q' is the inverse of Q.
The characteristic equation of the JK flip-flop is:
81
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
and the corresponding truth table is:

Q nex Commen
J K
t t

0 0 hold state

0 1 reset

1 0 set

1 1 toggle

The origin of the name for the JK flip-flop is detailed by P. L. Lindley, a JPL engineer, in
a letter to EDN, an electronics design magazine. The letter is dated June 13, 1968, and
was published in the August edition of the newsletter. In the letter, Mr. Lindley explains
that he heard the story of the JK flip-flop from Dr. Eldred Nelson, who is responsible for
coining the term while working at Hughes Aircraft.

Flip-flops in use at Hughes at the time were all of the type that came to be known as
J-K. In designing a logical system, Dr. Nelson assigned letters to flip-flop inputs as
follows: #1: A & B, #2: C & D, #3: E & F, #4: G & H, #5: J & K. Given the size of the
system that he was working on, Dr. Nelson realized that he was going to run out of
letters, so he decided to use J and K as the set and reset input of each flip-flop in his
system (using subscripts or some such to distinguish the flip-flops), since J and K were
"nice, innocuous letters."

Dr. Montgomery Phister, Jr., an engineer under Dr. Nelson at Hughes, in his book
"Logical Design of Digital Computers" (Wiley,1958) picked up the idea that J and K were
the set and reset input for a "Hughes type" of flip-flop, which he then termed "J-K
flip-flops." He also defined R-S, T, D, and R-S-T flip-flops, and showed how one could
use Boolean Algebra to specify their interconnections so as to carry out complex
functions.

82
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
D flip-flop
The D flip-flop can be interpreted as a primitive delay line or zero-order hold, since the
data is posted at the output one clock cycle after it arrives at the input. It is called delay
flip flop since the output takes the value in the Data-in.

The characteristic equation of the D flip-flop is:

and the corresponding truth table is:

Qne
D Q >
xt

0 X Rising 0

1 X Rising 1

These flip flops are very useful, as they form the basis for shift registers, which are an
essential part of many electronic devices.

The advantage of this circuit over the D-type latch is that it "captures" the signal at the
moment the clock goes high, and subsequent changes of the data line do not matter,
even if the signal line has not yet gone low again.

Master-slave D flip-flop
A master-slave D flip-flop is created by connecting two gated D latches in series, and
invert the enable input to one of them. It is called master-slave because the second
latch in the series only changes in response to a change in the first (master) latch.

A master slave D flip flop. It responds on the negative edge of the enable input (usually a
clock).
For a positive-edge triggered master-slave D flip-flop, when the clock signal is low
(logical 0) the “enable” seen by the first or “master” D latch (the inverted clock signal)
is high (logical 1). This allows the “master” latch to store the input value when the clock
83
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
signal transitions from low to high. As the clock signal goes high (0 to 1) the inverted
“enable” of the first latch goes low (1 to 0) and the value seen at the input to the
master latch is “locked”. Nearly simultaneously, the twice inverted “enable” of the
second or “slave” D latch transitions from low to high (0 to 1) with the clock signal. This
allows the signal captured at the rising edge of the clock by the now “locked” master
latch to pass through the “slave” latch. When the clock signal returns to low (1 to 0),
the output of the “slave” latch is "locked", and the value seen at the last rising edge of
the clock is held while the “master” latch begins to accept new values in preparation for
the next rising clock edge.

An implementation of a master-slave D flip-flop that is triggered on the positive edge of


the clock.
By removing the left-most inverter in the above circuit, a D-type flip flop that strobes on
the falling edge of a clock signal can be obtained. This has a truth table like this:

Qne
D Q >
xt

0 X Falling 0

1 X Falling 1

Most D-type flip-flops in ICs have the capability to be set and reset, much like an SR
flip-flop. Usually, the illegal S = R = 1 condition is resolved in D-type flip-flops.

Inputs Outputs

84
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
S R D > Q Q'

0 1 X X 0 1

1 0 X X 1 0

1 1 X X 1 1

By setting S = R = 0, the flip-flop can be used as described above.

Edge-triggered D flip-flop
A more efficient way to make a D flip-flop is not as easy to understand, but it works the
same way. While the master-slave D flip flop is also triggered on the edge of a clock, its
components are each triggered by clock levels. The "edge-triggered D flip flop" does not
have the master slave properties.

A positive-edge-triggered D flip-flop.

85
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Uses
· A single flip-flop can be used to store one bit, or binary digit, of data.

· Static RAM, which is the primary type of memory used in registers to store
numbers in computers and in many caches, is built out of flip-flops.

· Any one of the flip-flop types can be used to build any of the others. The data
contained in several such flip-flops may represent the state of a sequencer, the
value of a counter, an ASCII character in a computer's memory or any other piece
of information.

· One use is to build finite state machines from electronic logic. The flip-flops
remember the machine's previous state, and digital logic uses that state to
calculate the next state.

· The T flip-flop is useful for constructing various types of counters. Repeated signals
to the clock input will cause the flip-flop to change state once per high-to-low
transition of the clock input, if its T input is "1". The output from one flip-flop can be
fed to the clock input of a second and so on. The final output of the circuit,
considered as the array of outputs of all the individual flip-flops, is a count, in
binary, of the number of cycles of the first clock input, up to a maximum of 2n-1,
where n is the number of flip-flops used. See: Counters

· One of the problems with such a counter (called a ripple counter) is that the output
is briefly invalid as the changes ripple through the logic. There are two solutions to
this problem. The first is to sample the output only when it is known to be valid.
The second, more widely used, is to use a different type of circuit called a
synchronous counter. This uses more complex logic to ensure that the outputs of
the counter all change at the same, predictable time. See: Counters

· Frequency division: a chain of T flip-flops as described above will also function to


divide an input in frequency by 2n, where n is the number of flip-flops used
between the input and the output.

Timing and metastability


A flip-flop in combination with a Schmitt trigger can be used for the implementation of
an arbiter in asynchronous circuits.

Clocked flip-flops are prone to a problem called metastability, which happens when a
data or control input is changing at the instant of the clock pulse. The result is that the
output may behave unpredictably, taking many times longer than normal to settle to its
correct state, or even oscillating several times before settling. Theoretically it can take

86
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
infinite time to settle down. In a computer system this can cause corruption of data or a
program crash.

In many cases, metastability in flip-flops can be avoided by ensuring that the data and
control inputs are held constant for specified periods before and after the clock pulse,
called the setup time (tsu) and the hold time (th) respectively. These times are
specified in the data sheet for the device, and are typically between a few nanoseconds
and a few hundred picoseconds for modern devices.

Unfortunately, it is not always possible to meet the setup and hold criteria, because the
flip-flop may be connected to a real-time signal that could change at any time, outside
the control of the designer. In this case, the best the designer can do is to reduce the
probability of error to a certain level, depending on the required reliability of the circuit.
One technique for suppressing metastability is to connect two or more flip-flops in a
chain, so that the output of each one feeds the data input of the next, and all devices
share a common clock. With this method, the probability of a metastable event can be
reduced to a negligible value, but never to zero. The probability of metastability gets
closer and closer to zero as the number of flip-flops connected in series is increased.

So-called metastable-hardened flip-flops are available, which work by reducing the


setup and hold times as much as possible, but even these cannot eliminate the problem
entirely. This is because metastability is more than simply a matter of circuit design.
When the transitions in the clock and the data are close together in time, the flip-flop is
forced to decide which event happened first. However fast we make the device, there is
always the possibility that the input events will be so close together that it cannot detect
which one happened first. It is therefore logically impossible to build a perfectly
metastable-proof flip-flop.

Another important timing value for a flip-flop is the clock-to-output delay (common
symbol in data sheets: tCO) or propagation delay (tP), which is the time the flip-flop
takes to change its output after the clock edge. The time for a high-to-low transition
(tPHL) is sometimes different from the time for a low-to-high transition (tPLH).

When connecting flip-flops in a chain, it is important to ensure that the tCO of the first
flip-flop is longer than the hold time (tH) of the second flip-flop, otherwise the second
flip-flop will not receive the data reliably. The relationship between tCO and tH is
normally guaranteed if both flip-flops are of the same type.

Analysis of Sequential Circuits

The behavior of a sequential circuit is determined from the inputs, the outputs and the
states of its flip-flops. Both the output and the next state are a function of the inputs
and the present state.

87
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
The suggested analysis procedure of a sequential circuit is set out in Figure 6 below.

We start with the logic


schematic from which we
can derive excitation
equations for each flip-flop
input. Then, to obtain
next-state equations, we
insert the excitation
equations into the
characteristic equations. The
output equations can be
derived from the schematic,
and once we have our output
and next-state equations, we
can generate the next-state
and output tables as well as
state diagrams. When we
88
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
reach this stage, we use
either the table or the state
diagram to develop a timing
diagram which can be
verified through simulation.

Figure 6. Analysis procedure


of sequential circuits.

Example 1.1. Modulo-4 counter

Derive the state table and state diagram for the sequential circuit shown in Figure 7.

Figure 7.
L o g i c
schematic
of a
sequentia
l circuit.

SOLUTION:

STEP 1: First we derive the Boolean expressions for the inputs of each flip-flops in the
schematic, in terms of external input Cnt and the flip-flop outputs Q1 and Q0. Since there are

89
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
two D flip-flops in this example, we derive two expressions for D1 and D0:
D0 = Cnt Q0 = Cnt'*Q0 + Cnt*Q0'
D1 = Cnt'*Q1 + Cnt*Q1'*Q0 + Cnt*Q1*Q0'
These Boolean expressions are called excitation equations since they represent the inputs to
the flip-flops of the sequential circuit in the next clock cycle.

STEP 2: Derive the next-state equations by converting these excitation equations into
flip-flop characteristic equations. In the case of D flip-flops, Q(next) = D. Therefore the next
state equal the excitation equations.
Q0(next) = D0 = Cnt'*Q0 + Cnt*Q0'
Q1(next) = D1 = Cnt'*Q1 + Cnt*Q1'*Q0 + Cnt*Q1*Q0'
STEP 3: Now convert these next-state equations into tabular form called the next-state
table.
Present State Next State
Q1Q0 Cnt = 0 Cnt = 1

0 0 0 0 0 1
0 1 0 1 1 0
1 0 1 0 1 1
1 1 1 1 0 0
Each row is corresponding to a state of the sequential circuit and each column represents one
set of input values. Since we have two flip-flops, the number of possible states is four - that is,
Q1Q0 can be equal to 00, 01, 10, or 11. These are present states as shown in the table.

For the next state part of the table, each entry defines the value of the sequential circuit in the
next clock cycle after the rising edge of the Clk. Since this value depends on the present state
and the value of the input signals, the next state table will contain one column for each
assignment of binary values to the input signals. In this example, since there is only one input
signal, Cnt, the next-state table shown has only two columns, corresponding to Cnt = 0 and
Cnt = 1.

Note that each entry in the next-state table indicates the values of the flip-flops in the next
state if their value in the present state is in the row header and the input values in the column
header.

Each of these next-state values has been computed from the next-state equations in STEP 2.

STEP 4: The state diagram is generated directly from the next-state table, shown in Figure
8.

90
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Figur e
8. State
diagra
m

Each arc is labelled with the values of the input signals that cause the transition from the
present state (the source of the arc) to the next state (the destination of the arc).

In general, the number of states in a next-state table or a state diagram will equal 2m , where
m is the number of flip-flops. Similarly, the number of arcs will equal 2m x 2k , where k is the
number of binary input signals. Therefore, in the state diagram, there must be four states and
eight transitions. Following these transition arcs, we can see that as long as Cnt = 1, the
sequential circuit goes through the states in the following sequence: 0, 1, 2, 3, 0, 1, 2, ....
On the other hand, when Cnt = 0, the circuit stays in its present state until Cnt changes to 1,
at which the counting continues.

Since this sequence is characteristic of modulo-4 counting, we can conclude that the sequential
circuit in Figure 7 is a modulo-4 counter with one control signal, Cnt, which enables counting
when Cnt = 1 and disables it when Cnt = 0.

To see how the states changes corresponding to the input signals Cnt, click on this

image.

Below, we show a timing diagram, representing four clock cycles, which enables us to observe
the behavior of the counter in greater detail.

91
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Fig ur e
9 .
T im in g
Dia gr a
m

In this timing diagram we have assumed that Cnt is asserted in clock cycle 0 at t0 and is
disasserted in clock cycle 3 at time t4. We have also assumed that the counter is in state
Q1Q0 = 00 in the clock cycle 0. Note that on the clock's rising edge, at t1, the counter will go
to state Q1Q0 = 01 with a slight propagation delay; in cycle 2, after t2, to Q1Q0 = 10; and in
cycle 3, after t3 to Q1Q0 = 11. Since Cnt becomes 0 at t4, we know that the counter will stay
in state Q1Q0 = 11 in the next clock cycle. To see the timing behavior of the circuit click on
this image .

In Example 1.1 we demonstrated the analysis of a sequential circuit that has no outputs by
developing a next-state table and state diagram which describes only the states and the
transitions from one state to the next. In the next example we complicate our analysis by
adding output signals, which means that we have to upgrade the next-state table and the
state diagram to identify the value of output signals in each state.

Example 1.2

Derive the next state, the output table and the state diagram for the sequential circuit
shown in Figure 10.

92
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Figure
10. Logic
schemati
c of a
sequenti
al circuit.

SOLUTION:

The input combinational logic in Figure 10 is the same as in Example 1.1, so the
excitation and the next-state equations will be the same as in Example 1.1.

Excitation equations:

D0 = Cnt Q0 = Cnt'*Q0 + Cnt*Q0'


D0 = Cnt'*Q1 + Cnt*Q1'*Q0 + Cnt*Q1*Q0'
Next-state equations:

Q0(next) = D0 = Cnt'*Q0 + Cnt*Q0'


Q1(next) = D0 = Cnt'*Q1 + Cnt*Q1'*Q0 + Cnt*Q1*Q0'
In addition, however, we have computed the output equation.

Output equation: Y = Q1Q0

As this equation shows, the output Y will equal to 1 when the counter is in state Q1Q0 =
11, and it will stay 1 as long as the counter stays in that state.

Next-state and output table:


Present State Next State Output
Q1 Q0 Cnt=0 Cnt=1 Z

00 00 01 0
01 01 10 0
10 10 11 0

93
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
11 11 00 1
State diagram:

Figure 11. State


diagram of
sequential circuit in
Figure 10.

To see how the


states move from
one to another click
on the image.

Timing diagram:

94
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Figure
1 2 .
Timing
diagram
o f
sequentia
l circuit in
Figure
10.

Click on
t h e
image to
see its
timing
behavior.

Note that the counter will reach the state Q1Q0 = 11 only in the third clock cycle, so the
output Y will equal 1 after Q0 changes to 1. Since counting is disabled in the third clock
cycle, the counter will stay in the state Q1Q0 = 11 and Y will stay asserted in all
succeeding clock cycles until counting is enabled again.

Design of Sequential Circuits

The design of a synchronous sequential circuit starts from a set of specifications and
culminates in a logic diagram or a list of Boolean functions from which a logic diagram
can be obtained. In contrast to a combinational logic, which is fully specified by a truth
table, a sequential circuit requires a state table for its specification. The first step in the
design of sequential circuits is to obtain a state table or an equivalence representation,
such as a state diagram.

A synchronous sequential circuit is made up of flip-flops and combinational gates. The


design of the circuit consists of choosing the flip-flops and then finding the
combinational structure which, together with the flip-flops, produces a circuit that fulfils
the required specifications. The number of flip-flops is determined from the number of
states needed in the circuit.

The recommended steps for the design of sequential circuits are set out below.

95
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Design of Sequential Circuits

This example is taken from M. M.


Mano, Digital Design, Prentice Hall,
1984, p.235.

Example 1.3 We wish to design a


synchronous sequential circuit
whose state diagram is shown in
Figure 13. The type of flip-flop to be
use is J-K.

Figure 13.
State diagram
From the state diagram, we can generate the state table shown in Table 9. Note that
there is no output section for this circuit. Two flip-flops are needed to represent the four
states and are designated Q0Q1. The input variable is labeled x.
Present State Next State
x=0 x=1
96
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Q0 Q1

0 0 0 0 0 1
0 1 1 0 0 1
1 0 1 0 1 1
1 1 1 1 0 0
Table 9. State table.

We shall now derive the excitation table and the combinational structure. The table is
now arranged in a different form shown in Table 11, where the present state and input
variables are arranged in the form of a truth table. Remember, the excitable for the JK
flip-flop was derive in Table 1.

Table 10. Excitation table for JK flip-flop

Output Transitions Flip-flop inputs

Q àQ(next) JK

0 à 0 0 X
0 à 1 1 X
1 à 0 X 1
1 à 1 X 0

Table 11. Excitation table of the circuit


Present State Next State Input Flip-flop Inputs
Q0 Q1 Q0 Q1 x J0K0 J1K1

0 0 0 0 0 0 X 0 X
0 0 0 1 1 0 X 1 X
0 1 1 0 0 1 X X 1
0 1 0 1 1 0 X X 0
1 0 1 0 0 X 0 0 X
1 0 1 1 1 X 0 1 X
1 1 1 1 0 X 0 X 0
1 1 0 0 1 X 1 X 1
In the first row of Table 11, we have a transition for flip-flop Q0 from 0 in the present
state to 0 in the next state. In Table 10 we find that a transition of states from 0 to 0
requires that input J = 0 and input K = X. So 0 and X are copied in the first row under
J0 and K0 respectively. Since the first row also shows a transition for the flip-flop Q1
from 0 in the present state to 0 in the next state, 0 and X are copied in the first row
97
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
under J1 and K1. This process is continued for each row of the table and for each
flip-flop, with the input conditions as specified in Table 10.

The simplified Boolean functions for the combinational circuit can now be derived. The
input variables are Q0, Q1, and x; the outputs are the variables J0, K0, J1 and K1. The
information from the truth table is plotted on the Karnaugh maps shown in Figure 14.

Figure 14. Karnaugh Maps

The flip-flop input functions are derived:


J0 = Q1*x' K0 = Q1*x
J1 = x K1 = Q0'*x' + Q0*x = Q0¤x
Note: the symbol ¤ is exclusive-NOR.

The logic diagram is drawn in Figure 15.

Figure
15. Logic
diagram
of the
sequentia
l circuit.

Example 1.4 Design a sequential circuit whose state tables are specified in Table 12,
using D flip-flops.

Table 12. State table of a sequential circuit.

98
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Output
Present State Next State
x = x =
Q0 Q1 x=0 x=1
0 1

0 0 0 0 0 1 0 0
0 1 0 0 1 0 0 0
1 0 1 1 1 0 0 0
1 1 0 0 0 1 0 1
Table 13. Excitation table for a D flip-flop.

Output Transitions Flip-flop inputs

QàQ(next) D

0 à 0 0
0 à 1 1
1 à 0 0
1 à 1
1
Next step is to derive the excitation table for the design circuit, which is shown in Table
14. The output of the circuit is labeled Z.
F l i p - f l o p Output
Present State Next State Input
Inputs
Q0 Q1 Q0 Q1 x
D0 D1 Z

0 0 0 0 0 0 0 0
0 0 0 1 1 0 1 0
0 1 0 0 0 0 0 0
0 1 1 0 1 1 0 0
1 0 1 1 0 1 1 0
1 0 1 0 1 1 0 0
1 1 0 0 0 0 0 0
1 1 0 1 1 0 1 1
Table 14. Excitation table

Now plot the flip-flop inputs and output functions on the Karnaugh map to derive the
Boolean expressions, which is shown in Figure 16.

99
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Figure 16. Karnaugh maps

The simplified Boolean expressions are:


D0 = Q0*Q1' + Q0'*Q1*x
D1 = Q0'*Q1'*x + Q0*Q1*x + Q0*Q1'*x'
Z = Q0*Q1*x
Finally, draw the logic diagram.

Figure 17. Logic diagram of the sequential circuit.

Register Transfer Language (RTL) is a term used in computer science. It is an


intermediate representation used by the GCC compiler.

RTL is used to represent the code being generated, in a form closer to assembly
language than to the high level languages which GCC compiles. RTL is generated from
the GCC Abstract Syntax Tree representation, transformed by various passes in the

100
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
GCC 'middle-end', and then converted to assembly language. GCC currently uses the
RTL form to do a part of its optimization work.

RTL is usually written in a form which looks like a Lisp S-expression:


(set:SI (reg:SI 140) (plus:SI (reg:SI 138) (reg:SI 139)))
This "side-effect expression" says "add the contents of register 138 to the contents of
register 139 and store the result in register 140."

The RTL generated for a program is different when GCC generates code for different
processors. However, the meaning of the RTL is more-or-less independent of the
target: it would usually be possible to read and understand a piece of RTL without
knowing what processor it was generated for. Similarly, the meaning of the RTL doesn't
usually depend on the original high-level language of the program.

LESSON IV

Memory and Storage

Types of memory

Many types of memory devices are available for use in modern computer
systems. As an embedded software engineer, you must be aware of the differences
between them and understand how to use each type effectively. In our discussion, we
will approach these devices from the software developer's perspective. Keep in mind
that the development of these devices took several decades and that their underlying
hardware differs significantly. The names of the memory types frequently reflect the
historical nature of the development process and are often more confusing than
insightful. Figure 1 classifies the memory devices we'll discuss as RAM, ROM, or a hybrid
of the two.

Figure 1. Common memory types in embedded systems

Types of RAM
The RAM family includes two important memory devices: static RAM (SRAM) and
dynamic RAM (DRAM). The primary difference between them is the lifetime of the data
101
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
they store. SRAM retains its contents as long as electrical power is applied to the chip. If
the power is turned off or lost temporarily, its contents will be lost forever. DRAM, on
the other hand, has an extremely short data lifetime-typically about four milliseconds.
This is true even when power is applied constantly.

In short, SRAM has all the properties of the memory you think of when you hear the
word RAM. Compared to that, DRAM seems kind of useless. By itself, it is. However, a
simple piece of hardware called a DRAM controller can be used to make DRAM behave
more like SRAM. The job of the DRAM controller is to periodically refresh the data stored
in the DRAM. By refreshing the data before it expires, the contents of memory can be
kept alive for as long as they are needed. So DRAM is as useful as SRAM after all.

When deciding which type of RAM to use, a system designer must consider access time
and cost. SRAM devices offer extremely fast access times (approximately four times
faster than DRAM) but are much more expensive to produce. Generally, SRAM is used
only where access speed is extremely important. A lower cost-per-byte makes DRAM
attractive whenever large amounts of RAM are required. Many embedded systems
include both types: a small block of SRAM (a few kilobytes) along a critical data path
and a much larger block of DRAM (perhaps even Megabytes) for everything else.

Types of ROM
Memories in the ROM family are distinguished by the methods used to write new data to
them (usually called programming), and the number of times they can be rewritten.
This classification reflects the evolution of ROM devices from hardwired to
programmable to erasable-and-programmable. A common feature of all these devices is
their ability to retain data and programs forever, even during a power failure.

The very first ROMs were hardwired devices that contained a preprogrammed set of
data or instructions. The contents of the ROM had to be specified before chip
production, so the actual data could be used to arrange the transistors inside the chip.
Hardwired memories are still used, though they are now called masked ROMs to
distinguish them from other types of ROM. The primary advantage of a masked ROM is
its low production cost. Unfortunately, the cost is low only when large quantities of the
same ROM are required.

One step up from the masked ROM is the PROM (programmable ROM), which is
purchased in an unprogrammed state. If you were to look at the contents of an
unprogrammed PROM, you would see that the data is made up entirely of 1's. The
process of writing your data to the PROM involves a special piece of equipment called a
device programmer. The device programmer writes data to the device one word at a
time by applying an electrical charge to the input pins of the chip. Once a PROM has
been programmed in this way, its contents can never be changed. If the code or data
stored in the PROM must be changed, the current device must be discarded. As a result,
PROMs are also known as one-time programmable (OTP) devices.

102
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
An EPROM (erasable-and-programmable ROM) is programmed in exactly the same
manner as a PROM. However, EPROMs can be erased and reprogrammed repeatedly.
To erase an EPROM, you simply expose the device to a strong source of ultraviolet light.
(A window in the top of the device allows the light to reach the silicon.) By doing this,
you essentially reset the entire chip to its initial--unprogrammed--state. Though more
expensive than PROMs, their ability to be reprogrammed makes EPROMs an essential
part of the software development and testing process.

Hybrids
As memory technology has matured in recent years, the line between RAM and ROM
has blurred. Now, several types of memory combine features of both. These devices do
not belong to either group and can be collectively referred to as hybrid memory devices.
Hybrid memories can be read and written as desired, like RAM, but maintain their
contents without electrical power, just like ROM. Two of the hybrid devices, EEPROM
and flash, are descendants of ROM devices. These are typically used to store code. The
third hybrid, NVRAM, is a modified version of SRAM. NVRAM usually holds persistent
data.

EEPROMs are electrically-erasable-and-programmable. Internally, they are similar to


EPROMs, but the erase operation is accomplished electrically, rather than by exposure
to ultraviolet light. Any byte within an EEPROM may be erased and rewritten. Once
written, the new data will remain in the device forever--or at least until it is electrically
erased. The primary tradeoff for this improved functionality is higher cost, though write
cycles are also significantly longer than writes to a RAM. So you wouldn't want to use an
EEPROM for your main system memory.

Flash memory combines the best features of the memory devices described thus far.
Flash memory devices are high density, low cost, nonvolatile, fast (to read, but not to
write), and electrically reprogrammable. These advantages are overwhelming and, as a
direct result, the use of flash memory has increased dramatically in embedded systems.
From a software viewpoint, flash and EEPROM technologies are very similar. The major
difference is that flash devices can only be erased one sector at a time, not
byte-by-byte. Typical sector sizes are in the range 256 bytes to 16KB. Despite this
disadvantage, flash is much more popular than EEPROM and is rapidly displacing many
of the ROM devices as well.

The third member of the hybrid memory class is NVRAM (non-volatile RAM). No
volatility is also a characteristic of the ROM and hybrid memories discussed previously.
However, an NVRAM is physically very different from those devices. An NVRAM is
usually just an SRAM with a battery backup. When the power is turned on, the NVRAM
operates just like any other SRAM. When the power is turned off, the NVRAM draws just
enough power from the battery to retain its data. NVRAM is fairly common in embedded
systems. However, it is expensive--even more expensive than SRAM, because of the

103
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
battery--so its applications are typically limited to the storage of a few hundred bytes of
system-critical information that can't be stored in any better way.

Table 1 summarizes the features of each type of memory discussed here, but keep in
mind that different memory types serve different purposes. Each memory type has its
strengths and weaknesses. Side-by-side comparisons are not always effective.
E r a s e Max Erase Cost (per
Type Volatile? Writeable? Speed
Size Cycles Byte)
SRAM Yes Yes Byte Unlimited Expensive Fast
DRAM Yes Yes Byte Unlimited Moderate Moderate
Masked
No No n/a n/a Inexpensive Fast
ROM
Once, with a
PROM No d e v i c e n/a n/a Moderate Fast
programmer
Yes, with a L i m i t e d
Entire
EPROM No d e v i c e ( c o n s u l t Moderate Fast
Chip
programmer datasheet)
L i m i t e d Fast to read,
EEPROM No Yes Byte ( c o n s u l t Expensive slow to
datasheet) erase/write
L i m i t e d Fast to read,
Flash No Yes Sector ( c o n s u l t Moderate slow to
datasheet) erase/write
Expensive
NVRAM No Yes Byte Unlimited (SRAM + Fast
battery)
Table 1. Characteristics of the various memory types

Computer storage, computer memory, and often casually memory refer to


computer components, devices and recording media that retain data for some interval
of time. Computer storage provides one of the core functions of the modern computer,
that of information retention. It is one of the fundamental components of all modern
c o m p u t e r s ,

104
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
a n d
coupled with a central processing unit (CPU), implements the basic Von Neumann
computer model used since the 1940s.

In contemporary usage, memory usually refers to a form of solid state storage known
as random access memory (RAM) and sometimes other forms of fast but temporary
storage. Similarly, storage more commonly refers to mass storage - optical discs,
forms of magnetic storage like hard disks, and other types of storage which are slower
than RAM, but of a more permanent nature. These contemporary distinctions are
helpful, because they are also fundamental to the architecture of computers in general.
As well, they reflect an important and significant technical difference between memory
and mass storage devices, which has been blurred by the historical usage of the terms
"main storage" (and sometimes "primary storage") for random access memory, and
"secondary storage" for mass storage devices. This is explained in the following
sections, in which the traditional "storage" terms are used as sub-headings for
convenience.

Purposes of storage
The fundamental components of a general-purpose computer are arithmetic and logic
unit, control circuitry, storage space, and input/output devices. If storage was removed,
the device we had would be a simple digital signal processing device (e.g. calculator,
media player) instead of a computer. The ability to store instructions that form a

105
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
computer program, and the information that the instructions manipulate is what makes
stored program architecture computers versatile.

A Digital computer represents information using the binary numeral system. Text,
numbers, pictures, audio, and nearly any other form of information can be converted
into a string of bits, or binary digits, each of which has a value of 1 or 0. The most
common unit of storage is the byte, equal to 8 bits. A piece of information can be
manipulated by any computer whose storage space is large enough to accommodate
the corresponding data, or the binary representation of the piece of information. For
example, a computer with a storage space of eight million bits, or one megabyte, could
be used to edit a small novel.

Various forms of storage, based on various natural phenomena, have been invented. So
far, no practical universal storage medium exists, and all forms of storage have some
drawbacks. Therefore a computer system usually contains several kinds of storage,
each with an individual purpose, as shown in
Various forms of storage, divided according
the diagram. to their distance from the central processing
unit. Additionally, common technology and
capacity found in home computers of 2005 is

Primary storage
Primary storage is directly
connected to the central processing unit of
the computer. It must be present for the CPU
to function correctly, just as in a biological
analogy the lungs must be present ( f o r
oxygen storage) for the heart to function
(to pump and oxygenate the blood). As shown
in the diagram, primary storage typic ally
consists of three kinds of storage:

· Processor registers are internal to


the central processing unit. Re gis ters
contain information that the arithmetic
and logic unit needs to carry out the
current instruction. They are technically
the fastest of all forms of computer
storage, being switching
transistors integrated on the C P U ' s
silicon chip, and functioning as e le ctro nic
"flip-flops".

· Cache memory is a special type of


internal memory used by many central

106
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
processing units to increase their performance or "throughput". Some of the
information in the main memory is duplicated in the cache memory, which is
slightly slower but of much greater capacity than the processor registers, and faster
but much smaller than main memory. Multi-level cache memory is also commonly
used - "primary cache" being smallest, fastest and closest to the processing device;
"secondary cache" being larger and slower, but still faster and much smaller than
main memory.

· Main memory contains the programs that are currently being run and the data the
programs are operating on. In modern computers, the main memory is the
electronic solid-state random access memory. It is directly connected to the CPU
via a "memory bus" (shown in the diagram) and a "data bus". The arithmetic and
logic unit can very quickly transfer information between a processor register and
locations in main storage, also known as a "memory addresses". The memory bus
is also called an address bus or front side bus and both busses are high-speed
digital "superhighways". Access methods and speed are two of the fundamental
technical differences between memory and mass storage devices. (Note that all
memory sizes and storage capacities shown in the diagram will inevitably be
exceeded with advances in technology over time.)

Secondary and off-line storage


Secondary storage requires the computer to use its input/output channels to access
the information, and is used for long-term storage of persistent information. However
most computer operating systems also use secondary storage devices as virtual
memory - to artificially increase the apparent amount of main memory in the computer.
Secondary storage is also known as "mass storage", as shown in the diagram above.
Secondary or mass storage is typically of much greater capacity than primary storage
(main memory), but it is also much slower. In modern computers, hard disks are
usually used for mass storage. The time taken to access a given byte of information
stored on a hard disk is typically a few thousandths of a second, or milliseconds. By
contrast, the time taken to access a given byte of information stored in random access
memory is measured in thousand-millionths of a second, or nanoseconds. This
illustrates the very significant speed difference which distinguishes solid-state memory
from rotating magnetic storage devices: hard disks are typically about a million times
slower than memory. Rotating optical storage devices (such as CD and DVD drives) are
typically even slower than hard disks, although their access speeds are likely to improve
with advances in technology. Therefore, the use of virtual memory, which is millions of
times slower than "real" memory, significantly degrades the performance of any
computer. Virtual memory is implemented by many operating systems using terms like
swap file or "cache file". The main historical advantage of virtual memory was that it
was much less expensive than real memory. That advantage is less relevant today, yet
surprisingly most operating systems continue to implement it, despite the significant
performance penalties.

107
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Off-line storage is a system where the storage medium can be easily removed from
the storage device. Off-line storage is used for data transfer and archival purposes. In
modern computers, compact discs, DVDs, memory cards, flash memory devices
including "USB drives", floppy disks, Zip disks and magnetic tapes are commonly used
for off-line mass storage purposes. "Hot-pluggable" USB hard disks are also available.
Off-line storage devices used in the past include punched cards, microforms, and
removable Winchester disk drums.

Tertiary and database storage


Tertiary storage is a system where a robotic arm will "mount" (connect) or
"dismount" off-line mass storage media (see the next item) according to the computer
operating system's demands. Tertiary storage is used in the realms of enterprise
storage and scientific computing on large computer systems and business computer
networks, and is something a typical personal computer user never sees firsthand.

Database storage is a system where information in computers is stored in large


databases, data banks, data warehouses, or data vaults. It involves packing and storing
large amounts of storage devices throughout a series of shelves in a room, usually an
office, all linked together. The information in database storage systems can be accessed
by a supercomputer, mainframe computer, or personal computer. Databases, data
banks, and data warehouses, etc, can only be accessed by authorized users.

Network storage
Network storage is any type of computer storage that involves accessing information
over a computer network. Network storage arguably allows to centralize the information
management in an organization, and to reduce the duplication of information. Network
storage includes:

· Network-attached storage is secondary or tertiary storage attached to a


computer which another computer can access over a local-area network, a private
wide-area network, or in the case of online file storage, over the Internet.

· Network computers are computers that do not contain internal secondary


storage devices. Instead, documents and other data are stored on a
network-attached storage.

Confusingly, these terms are sometimes used differently. Primary storage can be
used to refer to local random-access disk storage, which should properly be called
secondary storage. If this type of storage is called primary storage, then the term
secondary storage would refer to offline, sequential-access storage like tape media.

Characteristics of storage

108
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
The division to primary, secondary, tertiary and off-line storage is based on memory
hierarchy, or distance from the central processing unit. There are also other ways to
characterize various types of storage.

Volatility of information
Volatile memory requires constant power to maintain the stored information. Volatile
memory is typically used only for primary storage. (Primary storage is not necessarily
volatile, even though today's most cost-effective primary storage technologies are.
Non-volatile technologies have been widely used for primary storage in the past and may
again be in the future.)
Non-volatile memory will retain the stored information even if it is not constantly
supplied with electric power. It is suitable for long-term storage of information, and
therefore used for secondary, tertiary, and off-line storage.
Dynamic memory is volatile memory which also requires that stored information is
periodically refreshed, or read and rewritten without modifications.

Ability to access non-contiguous information


Random access means that any location in storage can be accessed at any moment in
the same, usually small, amount of time. This makes random access memory well suited
for primary storage.
Sequential access means that the accessing a piece of information will take a varying
amount of time, depending on which piece of information was accessed last. The device
may need to seek (e.g. to position the read/write head correctly), or cycle (e.g. to wait
for the correct location in a revolving medium to appear below the read/write head).

Ability to change information


· Read/write storage, or mutable storage, allows information to be overwritten
at any time. A computer without some amount of read/write storage for primary
storage purposes would be useless for many tasks. Modern computers typically use
read/write storage also for secondary storage.

· Read only storage retains the information stored at the time of manufacture, and
write once storage (WORM) allows the information to be written only once at
some point after manufacture. These are called immutable storage. Immutable
storage is used for tertiary and off-line storage. Examples include CD-R.

· Slow write, fast read storage is read/write storage which allows information to
be overwritten multiple times, but with the write operation being much slower than
the read operation. Examples include CD-RW.

109
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Addressability of information
· In location-addressable storage, each individually accessible unit of information
in storage is selected with its numerical memory address. In modern computers,
location-addressable storage usually limits to primary storage, accessed internally
by computer programs, since location-addressability is very efficient, but
burdensome for humans.

· In file system storage, information is divided into files of variable length, and a
particular file is selected with human-readable directory and file names. The
underlying device is still location-addressable, but the operating system of a
computer provides the file system abstraction to make the operation more
understandable. In modern computers, secondary, tertiary and off-line storage use
file systems.

· In content-addressable storage, each individually accessible unit of information


is selected with a hash value, or a short identifier with number? Pertaining to the
memory address the information is stored on. Content-addressable storage can be
implemented using software (computer program) or hardware (computer device),
with hardware being faster but more expensive option.

Capacity and performance


· Storage capacity is the total amount of stored information that a storage device
or medium can hold. It is expressed as a quantity of bits or bytes (e.g. 10.4
megabytes).

· Storage density refers to the compactness of stored information. It is the storage


capacity of a medium divided with a unit of length, area or volume (e.g. 1.2
megabytes per square centimeter).

· Latency is the time it takes to access a particular location in storage. The relevant
unit of measurement is typically nanosecond for primary storage, millisecond for
secondary storage, and second for tertiary storage. It may make sense to separate
read latency and write latency, and in case of sequential access storage, minimum,
maximum and average latency.

· Throughput is the rate at which information can read from or written to the
storage. In computer storage, throughput is usually expressed in terms of
megabytes per second or MB/s, though bit rate may also be used. As with latency,
read rate and write rate may need to be differentiated.

110
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Technologies, devices and media

Magnetic storage
Magnetic storage uses different patterns of magnetization on a magnetically coated
surface to store information. Magnetic storage is non-volatile. The information is
accessed using one or more read/write heads. Since the read/write head only covers a
part of the surface, magnetic storage is sequential access and must seek, cycle or both.
In modern computers, the magnetic surface will take these forms:

· Magnetic disk

· Floppy disk, used for off-line storage

· Hard disk, used for secondary storage

· Magnetic tape, used for tertiary and off-line storage

In early computers, magnetic storage was also used for primary storage in a form of
magnetic drum, or core memory, core rope memory, thin film memory, twistor memory
or bubble memory. Also unlike today, magnetic tape was often used for secondary
storage.

Semiconductor storage
Semiconductor memory uses semiconductor-based integrated circuits to store
information. A semiconductor memory chip may contain millions of tiny transistors or
capacitors. Both volatile and non-volatile forms of semiconductor memory exist. In
modern computers, primary storage almost exclusively consists of dynamic volatile
semiconductor memory or dynamic random access memory. Since the turn of the
century, a type of non-volatile semiconductor memory known as flash memory has
steadily gained share as off-line storage for home computers. Non-volatile
semiconductor memory is also used for secondary storage in various advanced
electronic devices and specialized computers.

Optical disc storage


Optical disc storage uses tiny pits etched on the surface of a circular disc to store
information, and reads this information by illuminating the surface with a laser diode
and observing the reflection. Optical disc storage is non-volatile and sequential access.
The following forms are currently in common use:

· CD, CD-ROM, DVD: Read only storage, used for mass distribution of digital
information (music, video, computer programs)

111
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
· CD-R, DVD-R, DVD+R: Write once storage, used for tertiary and off-line storage

· CD-RW, DVD-RW, DVD+RW, DVD-RAM: Slow write, fast read storage, used for
tertiary and off-line storage

· Blu-ray

· HD DVD

The following form have also been proposed:

· HVD

· Phase-change Dual

Magneto-optical disc storage


Magneto-optical disc storage is optical disc storage where the magnetic state on a
ferromagnetic surface stores information. The information is read optically and written
by combining magnetic and optical methods. Magneto-optical disc storage is
non-volatile, sequential access, slow write, fast read storage used for tertiary and
off-line storage.

Ultra Density Optical disc storage


Ultra Density Optical disc storage An Ultra Density Optical disc or UDO is a 5.25"
ISO cartridge optical disc encased in a dust-proof caddy which can store up to 30 GB of
data. Utilizing a design based on a Magneto-optical disc, but utilizing Phase Change
technology combined with a blue violet laser, a UDO disc can store substantially more
data than a magneto-optical disc or MO, because of the shorter wavelength (405 nm) of
the blue-violet laser employed. MOs use a 650-nm-wavelength red laser. Because its
beam width is shorter when burning to a disc than a red-laser for MO, a blue-violet
laser allows more information to be stored digitally in the same amount of space.

Current generations of UDO store up to 30 GB, but 60 GB and 120 GB versions of UDO
are in development and are expected to arrive sometime in 2007 and beyond, though
up to 500 GB has been speculated as a possibility for UDO. [1]

Optical Jukebox storage


Optical jukebox storage is a robotic storage device that utilizes optical disk device
and can automatically load and unload optical disks and provide terabytes of near-line
information. The devices are often called optical disk libraries, robotic drives, or auto
changers. Jukebox devices may have up to 1,000 slots for disks, and usually have a
112
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
picking device that traverses the slots and drives. The arrangement of the slots and
picking devices affects performance, depending on the space between a disk and the
picking device. Seek times and transfer rates vary depending upon the optical
technology. Jukeboxes are used in high-capacity archive storage environments such as
imaging, medical, and video. HSM is a strategy that moves little-used or unused files
from fast magnetic storage to optical jukebox devices in a process called migration. If
the files are needed, they are migrated back to magnetic disk.

Other early methods


Paper tape and punch cards have been used to store information for automatic
processing since the 1890s, long before general-purpose computers existed.
Information was recorded by punching holes into the paper or cardboard medium, and
was read by electrically (or, later, optically) sensing whether a particular location on the
medium was solid or contained a hole.

Williams tube used a cathode ray tube, and Selectron tube used a large vacuum
tube to store information. These primary storage devices were short-lived in the
market, since Williams tube was unreliable and Selectron tube was expensive.

Delay line memory used sound waves in a substance such as mercury to store
information. Delay line memory was dynamic volatile, cycle sequential read/write
storage, and was used for primary storage.

Other proposed methods


Phase-change memory uses different mechanical phases of phase change material to
store information, and reads the information by observing the varying electric
resistance of the material. Phase-change memory would be non-volatile, random access
read/write storage, and might be used for primary, secondary and off-line storage.

Holographic storage stores information optically inside crystals or photopolymers.


Holographic storage can utilize the whole volume of the storage medium, unlike optical
disc storage which is limited to a small number of surface layers. Holographic storage
would be non-volatile, sequential access, and either write once or read/write storage. It
might be used for secondary and off-line storage.

Molecular memory stores information in polymers that can store electric charge.
Molecular memory might be especially suited for primary storage.

LESSON V

A Simple computer: Hardware Design

113
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
A computer program is a collection of instructions that describe a task, or set of
tasks, to be carried out by a computer.

The term computer program may refer to source code, written in a programming
language, or to the executable form of this code. Computer programs are also known as
software, applications programs, system software or simply programs.

The source code of most computer programs consists of a list of instructions that
explicitly implement an algorithm (known as an imperative programming style); in
another form (known as declarative programming) the characteristics of the required
information are specified and the method used to obtain the results, if any, is left to the
platform.

Computer programs are often written by people known as computer programmers, but
may also be generated by other programs.

Terminology
Commercial computer programs aimed at end-users are commonly referred to as
application software by the computer industry, as these programs are focused on the
functionality of what the computer is being used for (its application), as opposed to
being focused on system-level functionality (for example, as the Windows operating
system software is). In practice, colloquially, both application software and system
software may correctly be referred to as programs, as may be the more esoteric
firmware—software firmly built into an embedded system. Programs that execute on
the hardware are a set of instructions in a format understandable by the instruction set
of the computer's main processor, which cause specific other instructions to execute or
perform a simple computation like addition. But computers process millions of such per
second and that is the program, the sequence of instructions strung together such that
when executed, they do something useful, and usually repeatable and reliable.

For differences in the usage of the spellings program and programmed, see American
and British English spelling differences.

Program execution
A computer program is loaded into memory (usually by the operating system) and then
executed ("run"), instruction by instruction, until termination, either with success or
through software or hardware error.

Before a computer can execute any sort of program (including the operating system,
itself a program) the computer hardware must be initialized. This initialization is done in
modern PCs by a piece of software stored on programmable memory chips installed by

114
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
the manufacturer, called the BIOS. The BIOS will attempt to initialize the boot
sequence, making the computer ready for higher-level program execution.

Programs vs. data


The executable form of a program (that is, usually object code) is often treated as being
different from the data the program operates on. In some cases this distinction is
blurred with programs creating, or modifying, data, which is subsequently executed as
part of the same program (this is a common occurrence for programs written in Lisp),
see self-modifying code.

Programming
Main article: Computer programming
A program is likely to contain a variety of data structures and a variety of different
algorithms to operate on them.

Creating a computer program is the iterative process of writing new source code or
modifying existing source code, followed by testing, analyzing and refining this code. A
person who practices this skill is referred to as a computer programmer or software
developer. The sometimes lengthy process of computer programming is now referred to
as "software development" or software engineering. The latter becoming more popular
due to the increasing maturity of the discipline. (see Debate over who is a software
engineer)

Two other forms of modern day approaches are team programming where each
member of the group has equal say in the development process except for one person
who guides the group through discrepancies. These groups tend to be around 10 people
to keep the group manageable. The second form is referred to as "peer programming"
or pair programming.

See Process and methodology for the different aspects of modern day computer
programming.

Trivia
The world's shortest useful program is usually agreed upon to be the utility cont/rerun
used on the old operating system CP/M. It was 2 bytes long (JMP 100), jumping to the
start position of the program that had previously been run and so restarting the
program, in memory, without loading it from the much slower disks of the 1980's.

According to the International Obfuscated C Code Contest, the world's smallest


"program" consisted of a file containing zero bytes, which when run output zero bytes
to the screen (also making it the world's smallest self-replicating program). This

115
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
"program" was qualified as such only due to a flaw in the language of the contest rules,
which were soon after modified to require the program to be greater than zero bytes.

Ada Lovelace wrote a set of notes specifying in complete detail a method for calculating
Bernoulli numbers with the Analytical Engine described by Charles Babbage. This is
recognized as the world's first computer program and she is recognized as the world's
first computer programmer by historians.

In computer science, a data structure is a way of storing data in a computer so that it


can be used efficiently. Often a carefully chosen data structure will allow a more
efficient algorithm to be used. The choice of the data structure often begins from the
choice of an abstract data structure. A well-designed data structure allows a variety of
critical operations to be performed, using as few resources, both execution time and
memory space, as possible. Data structures are implemented using the data types,
references and operations on them provided by a programming language.

Different kinds of data structures are suited to different kinds of applications, and some
are highly specialized to certain tasks. For example, B-trees are particularly well-suited
for implementation of databases, while routing tables rely on networks of machines to
function.

A binary tree, a simple type of branching linked data structure.


Different kinds of data structures are suited to different kinds of applications, and some
are highly specialized to certain tasks. For example, B-trees are particularly well-suited
for implementation of databases, while routing tables rely on networks of machines to
function.

In the design of many types of programs, the choice of data structures is a primary
design consideration, as experience in building large systems has shown that the
difficulty of implementation and the quality and performance of the final result depends
heavily on choosing the best data structure. After the data structures are chosen, the
algorithms to be used often become relatively obvious. Sometimes things work in the
opposite direction - data structures are chosen because certain key tasks have
algorithms that work best with particular data structures. In either case, the choice of
appropriate data structures is crucial.

116
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
This insight has given rise to many formalized design methods and programming
languages in which data structures, rather than algorithms, are the key organizing
factor. Most languages feature some sort of module system, allowing data structures to
be safely reused in different applications by hiding their verified implementation details
behind controlled interfaces. Object-oriented programming languages such as C++ and
Java in particular use classes for this purpose.

Since data structures are so crucial to professional programs, many of them enjoy
extensive support in standard libraries of modern programming languages and
environments, such as C++'s Standard Template Library, the Java API, and the
Microsoft .NET Framework.

The fundamental building blocks of most data structures are arrays, records,
discriminated unions, and references. For example, the nullable reference, a reference
which can be null, is a combination of references and discriminated unions, and the
simplest linked data structure, the linked list, is built from records and nullable
references.

There is some debate about whether data structures represent implementations or


interfaces. How they are seen may be a matter of perspective. A data structure can be
viewed as an interface between two functions or as an implementation of methods to
access storage that is organized according to the associated data type.

Common data structures


Main article: List of data structures
· stacks

· queues

· linked lists

· trees

· graphs

The arithmetic logic unit (ALU) is a digital circuit that calculates an arithmetic
operation (like an addition, subtraction, etc.) and logic operations (like an Exclusive Or)
between two numbers. The ALU is a fundamental building block of the central
processing unit of a computer.

117
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Many types of electronic circuits need to perform some type of arithmetic operation, so
even the circuit inside a digital watch will have a tiny ALU that keeps adding 1 to the
current time, and keeps checking if it should beep the timer, etc...

By far, the most complex electronic circuits are those that are built inside the chip of
modern microprocessors like the Pentium. Therefore, these processors have inside them
a powerful and very complex ALU. In fact, a modern microprocessor (or mainframe)
may have multiple cores, each core with multiple execution units, each with multiple
ALUs.

Many other circuits may contain ALUs inside: GPUs like the ones in NVidia and ATI
graphic cards, FPUs like the old 80387 co-processor, and digital signal processor like the
ones found in Sound Blaster sound cards, CD players and High-Definition TVs. All of
these have several powerful and complex ALUs inside.

A typical schematic symbol for an ALU: A & B are operands; R is the output; F is the input
from the Control Unit; D is an output status

History: Von Neumann's proposal


Mathematician John von Neumann proposed the ALU concept in 1945, when he wrote a
report on the foundations for a new computer called the EDVAC (Electronic Discrete
Variable Automatic Computer). Later in 1946, he worked with his colleagues in
designing a computer for the Princeton Institute of Advanced Studies (IAS). The IAS
computer became the prototype for many later computers. In the proposal, von
Neumann outlined what he believed would be needed in his machine, including an ALU.

Von Neumann stated that an ALU is a necessity for a computer because it is guaranteed
that a computer will have to compute basic mathematical operations, including addition,
118
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
subtraction, multiplication, and division.[1] He therefore believed it was "reasonable
that [the computer] should contain specialized organs for these operations."[2]

Numerical Systems
An ALU must process numbers using the same format as the rest of the digital circuit.
For modern processors, that almost always is the two's complement binary number
representation. Early computers used a wide variety of number systems, including one's
complement, sign-magnitude format, and even true decimal systems, with ten tubes
per digit.

ALUs for each one of these numeric systems had different designs, and that influenced
the current preference for two's complement, as this is the representation that makes it
easier for the ALUs to calculate additions and subtractions.

Practical overview

A simple 2-bit ALU that does AND, OR, XOR,


and addition (click image for an explanation)
Most of the computer’s actions are
performed by the ALU. The ALU gets data
from processor registers. This data is
processed and the results of this operation
are stored into ALU output registers. Other
mechanisms move data between these
registers and memory.[3]

A Control Unit controls the ALU, by setting


circuits that tell the ALU what operations to
perform.

Simple Operations
Most ALUs can perform the following operations:

· Integer arithmetic operations (addition, subtraction, and sometimes multiplication


and division, though this is more expensive)

· Bitwise logic operations (AND, NOT, OR, XOR)

· Bit-shifting operations (shifting or rotating a word by a specified number of bits to


the left or right, with or without sign extension). Shifts can be interpreted as
multiplications by 2 and divisions by 2.

119
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Complex Operations
An engineer can design an ALU to calculate any operation, however complicated it is;
the problem is that the more complex the operation, the more expensive the ALU is, the
more space it uses in the processor, and more power it dissipates, etc...

Therefore, engineers always calculate a compromise, to provide for the processor (or
other circuits) an ALU powerful enough to make the processor fast, but yet not so
complex as to become prohibitive. Imagine that you need to calculate, say the square
root of a number; the digital engineer will examine the following options to implement
this operation:

1. Design a very complex ALU that calculates the square root of any number in a
single step. This is called calculation in a single clock.

2. Design a complex ALU that calculates the square root through several steps. This is
called interactive calculation, and usually relies on control from a complex
control unit with built-in microcode.

3. Design a simple ALU in the processor, and sell a separate specialized and costly
processor that the customer can install just besides this one, and implements one
of the options above. This is called the co-processor.

4. Emulate the existence of the co-processor, that is, whenever a program attempts
to perform the square root calculation, make the processor check if there is a
co-processor present and use it if there is one; if there isn't one, interrupt the
processing of the program and invoke the operating system to perform the square
root calculation through some software algorithm. This is called software
emulation.

5. Tell the programmers that there is no co-processor and there is no emulation, so


they will have to write their own algorithms to calculate square roots by software.
This is performed by software libraries.

The options above go from the fastest and most expensive one to the slowest and least
expensive one. Therefore, while even the simplest computer can calculate the most
complicated formula, the simplest computers will usually take a long time doing that
because several of the steps for calculating the formula will involve the options #3, #4
and #5 above.

Powerful processors like the Pentium IV and AMD64 implement option #1 above for the
most of the complex operations and the slower #2 for the extremely complex
operations. That is possible by the ability of building very complex ALUs in these
processors.

120
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Inputs and outputs
The inputs to the ALU are the data to be operated on (called operands) and a code from
the control unit indicating which operation to perform. Its output is the result of the
computation.

In many designs the ALU also takes or generates as inputs or outputs a set of condition
codes from or to a status register. These codes are used to indicate cases such as
carry-in or carry-out, overflow, divide-by-zero, etc.[4]

ALUs vs. FPUs


A Floating Point Unit also performs arithmetic operations between two values, but they
do so for numbers in floating point representation, which is much more complicated
than the two's complement representation used in a typical ALU. In order to do these
calculations, an FPU has several complex circuits built-in, including some internal ALUs.

Usually engineers call an ALU the circuit that performs arithmetic operations in integer
formats (like two's complement and BCD), while the circuits that calculate on more
complex formats like floating point, complex numbers, etc... usually receive a more
illustrious name.

In computing, input/output

, or I/O, is the collection of interfaces that different functional units (sub-systems) of


an information processing system use to communicate with each other, or the signals
(information) sent through those interfaces. Inputs are the signals received by the unit,
and outputs are the signals sent from it. The term can also be used as part of an action;
to "do I/O" is to perform an input or output operation. I/O devices are used by a person
(or other system) to communicate with a computer. For instance, keyboards and
mouses are considered input devices of a computer and monitors and printers are
considered output devices of a computer. Typical devices for communication between
computers are for both input and output, such as modems and network cards.

It is important to notice that the previous designations of devices as either input or


output change when the perspective changes. Mouses and keyboards take as input
physical movement that the human user outputs and convert it into signals that a
computer can understand. The output from these devices is treated as input by the
computer. Similarly, printers and monitors take as input signals that a computer
outputs. They then convert these signals into representations that human users can see
or read. (For a human user the process of reading or seeing these representations is
receiving input.)

In computer architecture, the combination of the CPU and main memory (i.e. memory
that the CPU can read and write to directly, with individual instructions) is considered

121
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
the heart of a computer, and any movement of information from or to that complex,
for example to or from a disk drive, is considered I/O. The CPU and its supporting
circuitry provide I/O methods that are used in low-level computer programming in the
implementation of device drivers.

Higher-level operating system and programming facilities employ separate, more


abstract I/O concepts and primitives. For example, an operating system provides
application programs with the concept of files. C programming language defines
functions that allow programs to perform I/O through streams, such as read data from
them and write data into them.

An alternative to special primitive functions is the I/O monad that permits programs to
just describe I/O, and the actions are carried out outside the program. This is notable
because the I/O functions would introduce side-effects to any programming language,
but now purely functional programming is practical.

Control unit is the part of a CPU or other device that directs its operation. The outputs
of the unit control the activity of the rest of the device. A control unit can be thought of
as a finite state machine.

At one time control units for CPUs were ad-hoc logic, and they were difficult to design.
Now they are often implemented as a microprogram that is stored in a control store.
Words of the microprogram are selected by a microsequencer and the bits from those
words directly control the different parts of the device, including the registers,
arithmetic and logic units, instruction registers, buses, and off-chip input/output. In
modern computers, each of these subsystems may have its own subsidiary controller,
with the control unit acting as a supervisor. (See also CPU design and computer
architecture.)

Types of control units


All types of control units generate electronic control signals that control other parts of a
CPU. Control units are usually one of these types:

1. Microcoded control units. In a microcoded control unit, a program reads signals,


and generates control signals. The program itself is executed by a very simple
computer, a relatively simple digital circuit called a microsequencer.

2. Hardware control units. In a hardware control unit, a digital circuit generates the
control signals directly.

The system console, root console or simply console is the text entry and display
device for system administration messages, particularly those from the BIOS or boot
loader, the kernel, from the init system and from the system logger.

122
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
On traditional minicomputers, the console was a serial console, an RS-232 serial link
to a terminal such as a DEC VT100. This terminal was usually kept in a secured room
since it could be used for certain privileged functions such as halting the system or
selecting which media to boot from. Large midrange systems, e.g. those from Sun
Microsystems, Hewlett-Packard and IBM, still use serial consoles. In larger installations,
the console ports are attached to multiplexers or network-connected multiport serial
servers that let an operator connect his terminal to any of the attached servers.

On PCs, the computer's attached keyboard and monitor have the equivalent function.
Since the monitor cable carries video signals, it cannot be extended very far. Often,
installations with many servers therefore use keyboard/video multiplexers (KVM
switches) and possibly video amplifiers to centralize console access. In recent years,
KVM/IP devices have become available that allow a remote computer to view the video
output and send keyboard input via any TCP/IP network and therefore the Internet.

Some PC BIOSes, especially in servers, also support serial consoles, giving access to the
BIOS through a serial port so that the simpler and cheaper serial console infrastructure
can be used. Even where BIOS support is lacking, some operating systems, e.g.
FreeBSD and Linux, can be configured for serial console operation either during boot up,
or after startup.

It is usually possible to log in from the console. Depending on configuration, the


operating system may treat a login session from the console as being more trustworthy
than a login session from other sources. Routers and Managed Switches (as well as
other networking and telecoms equipment) may also have console ports in particular
Cisco Systems routers and switches that use Cisco IOS are normally configured via their
console ports.

123
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Knoppix system console showing the boot process

A microprogram implements a CPU instruction set. Just as a single high level language
statement is compiled to a series of machine instructions (load, store, shift, etc), in a
CPU using microcode, each machine instruction is in turn implemented by a series of
microinstructions, sometimes called a microprogram. Microprograms are often referred
to as microcode.

The elements composing a microprogram exist on a lower conceptual level than the
more familiar assembler instructions. Each element is differentiated by the "micro"
prefix to avoid confusion: microprogram, microcode, microinstruction, microassembler,
etc.

Microprograms are carefully designed and optimized for the fastest possible execution,
since a slow microprogram would yield a slow machine instruction which would in turn
cause all programs using that instruction to be slow. The microprogrammer must have
extensive low-level hardware knowledge of the computer circuitry, as the microcode
controls this. The microcode is written by the CPU engineer during the design phase.

124
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
On most computers using microcode, the microcode doesn't reside in the main system
memory, but exists in a special high speed memory, called the control store. This
memory might be read-only memory, or it might be read-write memory, in which case
the microcode would be loaded into the control store from some other storage medium
as part of the initialization of the CPU. If the microcode is read-write memory, it can be
altered to correct bugs in the instruction set, or to implement new machine instructions.
Microcode can also allow one computer microarchitecture to emulate another, usually
more-complex architecture.

Microprograms consist of series of microinstructions. These microinstructions control the


CPU at a very fundamental level. For example, a single typical microinstruction might
specify the following operations:

· Connect Register 1 to the "A" side of the ALU

· Connect Register 7 to the "B" side of the ALU

· Set the ALU to perform two's-complement addition

· Set the ALU's carry input to zero

· Store the result value in Register 8

· Update the "condition codes" with the ALU status flags ("Negative", "Zero",
"Overflow", and "Carry")

· Microjump to MicroPC nnn for the next microinstruction

To simultaneously control all of these features, the microinstruction is often very wide,
for example, 56 bits or more.

The reason for microprogramming


Microcode was originally developed as a simpler method of developing the control logic
for a computer. Initially CPU instruction sets were "hard wired". Each machine
instruction (add, shift, move) was implemented directly with circuitry. This provided fast
performance, but as instruction sets grew more complex, hard-wired instruction sets
became more difficult to design and debug.

Microcode alleviated that problem by allowing CPU design engineers to write a


microprogram to implement a machine instruction rather than design circuitry for that.
Even late in the design process, microcode could easily be changed, whereas hard wired
instructions could not. This greatly facilitated CPU design and led to more complex
instruction sets.

Another advantage of microcode was the implementation of more complex machine


instructions. In the 1960s through the late 1970s, much programming was done in
125
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
assembly language, a symbolic equivalent of machine instructions. The more abstract
and higher level the machine instruction, the greater the programmer productivity. The
ultimate extension of this were "Directly Executable High Level Language" designs. In
these each statement of a high level language such as PL/I would be entirely and
directly executed by microcode, without compilation. The IBM Future Systems project
and Data General Fountainhead Processor were examples of this.

Microprogramming also helped alleviate the memory bandwidth problem. During the
1970s, CPU speeds grew more quickly than memory speeds. Numerous acceleration
techniques such as memory block transfer, memory pre-fetch and multi-level caches
helped reduce this. However high level machine instructions (made possible by
microcode) helped further. Fewer more complex machine instructions require less
memory bandwidth. For example complete operations on character strings could be
done as a single machine instruction, thus avoiding multiple instruction fetches.

Architectures using this approach included the IBM System/360 and Digital Equipment
Corporation VAX, the instruction sets of which were implemented by complex
microprograms. The approach of using increasingly complex microcode-implemented
instruction sets was later called CISC.

Other benefits
A processor's microprograms operate on a more primitive, totally different and much
more hardware-oriented architecture than the assembly instructions visible to normal
programmers. In coordination with the hardware, the microcode implements the
programmer-visible architecture. The underlying hardware need not have a fixed
relationship to the visible architecture. This makes it possible to implement a given
instruction set architecture on a wide variety of underlying hardware
micro-architectures.

Doing so is important if binary program compatibility is a priority. That way previously


existing programs can run on totally new hardware without requiring revision and
recompilation. However there may be a performance penalty for this approach. The
tradeoffs between application backward compatibility vs CPU performance are hotly
debated by CPU design engineers.

The IBM System/360 has a 32-bit architecture with 16 general-purpose registers, but
most of the System/360 implementations actually used hardware implementing a much
simpler underlying microarchitecture.

In this way, microprogramming enabled IBM to design many System/360 models with
substantially different hardware and spanning a wide range of cost and performance,
while making them all architecturally compatible. This dramatically reduced the amount
of unique system software that had to be written for each model.

126
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
A similar approach was used by Digital Equipment Corporation in their VAX family of
computers. Initially a 32-bit TTL processor in conjunction with supporting microcode
implemented the programmer-visible architecture. Later VAX versions used different
micro architectures, yet the programmer-visible architecture didn't change.

Microprogramming also reduced the cost of field changes to correct defects (bugs) in
the processor; a bug could often be fixed by replacing a portion of the microprogram
rather than by changes being made to hardware logic and wiring.

History
In 1947, the design of the MIT Whirlwind introduced the concept of a control store as a
way to simplify computer design and move beyond ad hoc methods. The control store
was a two-dimensional lattice: one dimension accepted "control time pulses" from the
CPU's internal clock, and the other connected to control signals on gates and other
circuits. A "pulse distributor" would take the pulses generated by the CPU clock and
break them up into eight separate time pulses, each of which would activate a different
row of the lattice. When the row was activated, it would activate the control signals
connected to it.

Described another way, the signals transmitted by the control store are being played
much like a player piano roll. That is, they are controlled by a sequence of very wide
words constructed of bits, and they are "played" sequentially. In a control store,
however, the "song" is short and repeated continuously.

In 1951 Maurice Wilkes enhanced this concept by adding conditional execution, a


concept akin to a conditional in computer software. His initial implementation consisted
of a pair of matrices, the first one generated signals in the manner of the Whirlwind
control store, while the second matrix selected which row of signals (the microprogram
instruction word, as it were) to invoke on the next cycle. Conditionals were implemented
by providing a way that a single line in the control store could choose from alternatives
in the second matrix. This made the control signals conditional on the detected internal
signal. Wilkes coined the term microprogramming to describe this feature and
distinguish it from a simple control store.

Examples of micro programmed systems


· The Burroughs B1700 included bit-addressable main memory, to allow
microprogramming to support different programming languages.

· The Digital Equipment Corporation PDP-11 processors, with the exception of the
PDP-11/20, were microprogrammed (Sieworek,Bell,Newell 1982)

127
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
· Most models of the IBM System/360 series were microprogrammed:

· The Model 25 was unique among System/360 models in using the top 16k
bytes of core storage to hold the control storage for the microprogram. The
2025 used a 16-bit microarchitecture with seven control words (or
microinstructions).

· The Model 30, the slowest model in the line, used an 8-bit microarchitecture
with only a few hardware registers; everything that the programmer saw
was emulated by the microprogram.

· The Model 40 used 56-bit control words. The 2040 box implements both the
System/360 main processor and the multiplex channel (the I/O processor).

· The Model 50 had two internal data paths which operated in parallel: a 32-bit
data path used for arithmetic operations, and an 8-bit data path used in
some logical operations. The control store used 90-bit microinstructions.

· The Model 85 had separate instruction fetch (I-unit) and execution (E-unit) to
provide high performance. The I-unit is hardware controlled. The E-unit is
microprogrammed with 108-bit control words.

Implementation
Each microinstruction in a microprogram provides the bits which control the functional
elements that internally comprise a CPU. The advantage over a hard-wired CPU is that
internal CPU control becomes a specialized form of a computer program. Microcode thus
transforms a complex electronic design challenge (the control of a CPU) into a
less-complex programming challenge.

To take advantage of this, computers were divided into several parts:

A microsequencer picked the next word of the control store. A sequencer is mostly a
counter, but usually also has some way to jump to a different part of the control store
depending on some data, usually data from the instruction register and always some
part of the control store. The simplest sequencer is just a register loaded from a few
bits of the control store.

A register set is a fast memory containing the data of the central processing unit. It
may include the program counter, stack pointer, and other numbers that are not easily
accessible to the application programmer. Often the register set is triple-ported, that is,
two registers can be read, and a third written at the same time.

An arithmetic and logic unit performs calculations, usually addition, logical negation, a
right shift, and logical AND. It often performs other functions, as well.

128
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
There may also be a memory address register and a memory data register, used to
access the main computer storage.

Together, these elements form an "execution unit." Most modern CPUs have several
execution units. Even simple computers usually have one unit to read and write
memory, and another to execute user code.

These elements could often be bought together as a single chip. This chip came in a
fixed width which would form a 'slice' through the execution unit. These were known a
'bit slice' chips. The AMD Am2900 is the best known example of a bit slice processor.

The parts of the execution units, and the execution units themselves are interconnected
by a bundle of wires called a bus.

Programmers develop microprograms. The basic tools are software: A microassembler


allows a programmer to define the table of bits symbolically. A simulator program
executes the bits in the same way as the electronics (hopefully), and allows much more
freedom to debug the microprogram.

After the microprogram is finalized, and extensively tested, it is sometimes used as the
input to a computer program that constructs logic to produce the same data. This
program is similar to those used to optimize a programmable logic array. No known
computer program can produce optimal logic, but even pretty good logic can vastly
reduce the number of transistors from the number required for a ROM control store.
This reduces the cost and power used by a CPU.

Microcode can be characterized as horizontal or vertical. This refers primarily to


whether each microinstruction directly controls CPU elements (horizontal microcode), or
requires subsequent decoding by combinational logic before doing so (vertical
microcode). Consequently each horizontal microinstruction is wider (contains more bits)
and occupies more storage space than a vertical microinstruction.

Horizontal microcode
A typical horizontal microprogram control word has a field, a range of bits, to control
each piece of electronics in the CPU. For example, one simple arrangement might be:

| register source A | register source B | destination register | arithmetic and logic unit
operation | type of jump | jump address |

For this type of micro machine to implement a jump instruction with the address
following the jump op-code, the micro assembly would look something like:
# Any line starting with a number-sign is a comment
# This is just a label, the ordinary way assemblers symbolically represent a
# memory address.
Instruction JUMP:
129
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
# To prepare for the next instruction, the instruction-decode microcode has
already
# moved the program counter to the memory address register. This instruction
fetches
# the target address of the jump instruction from the memory word following
the
# jump opcode, by copying from the memory data register to the memory
address register.
# This gives the memory system two clock ticks to fetch the next
# instruction to the memory data register for use by the instruction decode.
# The sequencer instruction "next" means just add 1 to the control word
address.
MDR, NONE, MAR, COPY, NEXT, NONE
# This places the address of the next instruction into the PC.
# This gives the memory system a clock tick to finish the fetch started on the
# previous microinstruction.
# The sequencer instruction is to jump to the start of the instruction decode.
MAR, 1, PC, ADD, JMP, Instruction Decode
# The instruction decode is not shown, because it's usually a mess, very
particular
# to the exact processor being emulated. Even this example is simplified.
# Many CPUs have several ways to calculate the address, rather than just
fetching
# it from the word following the op-code. Therefore, rather than just one
# jump instruction, those CPUs have a family of related jump instructions.
Horizontal microcode is microcode that sets all the bits of the CPU's controls on each
tick of the clock that drives the sequencer.

Note how many of the bits in horizontal microcode contain fields to do nothing.

Vertical microcode
In vertical microcode, each microinstruction is encoded -- that is, the bit fields may pass
through intermediate combinatory logic which in turn generates the actual control
signals for internal CPU elements (ALU, registers, etc.) By contrast, with horizontal
microcode the bit fields themselves directly produce the control signals. Consequently
vertical microcode requires smaller instruction lengths and less storage, but requires
more time to decode, resulting in a slower CPU clock.

Some vertical microcodes are just the assembly language of a simple conventional
computer that is emulating a more complex computer. This technique was popular in
the time of the PDP-8. Another form of vertical microcode has two fields:

| field select | field value |

130
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
The "field select" selects which part of the CPU will be controlled by this word of the
control store. The "field value" actually controls that part of the CPU. With this type of
microcode, a designer explicitly chooses to make a slower CPU to save money by
reducing the unused bits in the control store; however, the reduced complexity may
increase the CPU's clock frequency, which lessens the effect of an increased number of
cycles per instruction.

As transistors became cheaper, horizontal microcode came to dominate the design of


CPUs using microcode, with vertical microcode no longer being used.

Writable control stores


A few computers were built using "writable microcode" -- rather than storing the
microcode in ROM or hard-wired logic, the microcode was stored in a RAM called a
Writable Control Store or WCS. Many of these machines were experimental laboratory
prototypes, but there were also commercial machines that used writable microcode,
such as early Xerox workstations, the DEC VAX 8800 ("Nautilus") family, the Symbolic
L- and G-machines, and a number of IBM System/370 implementations. Many more
machines offered user-programmable writeable control stores as an option (including
the HP 2100 and DEC PDP-11/60 minicomputers). WCS offered several advantages
including the ease of patching the microprogram and, for certain hardware generations,
faster access than ROMs could provide. User-programmable WCS allowed the user to
optimize the machine for specific purposes.

A CPU that uses microcode generally takes several clock cycles to execute a single
instruction, one clock cycle for each step in the microprogram for that instruction. Some
CISC processors include instructions that can take a very long time to execute. Such
variations in instruction length interfere with pipelining and interrupt latency.

Microcode versus VLIW and RISC


The design trend toward heavily microcoded processors with complex instructions
began in the early 1960s and continued until roughly the mid-1980s. At that point the
RISC design philosophy started becoming more prominent. This included the points:

· Analysis shows complex instructions are rarely used, hence the machine resources
devoted to them are largely wasted.

· Programming has largely moved away from assembly level, so it's no longer
worthwhile to provide complex instructions for productivity reasons.

· The machine resources devoted to rarely-used complex instructions is better used


for expediting performance of simpler, commonly-used instructions.

131
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
· Complex microcoded instructions requiring many, varying clock cycles are difficult
to pipeline for increased performance.

· Simpler instruction sets allow direct execution by hardware, avoiding the


performance penalty of microcoded execution.

Many RISC and VLIW processors are designed to execute every instruction (as long as it
is in the cache) in a single cycle. This is very similar to the way CPUs with microcode
execute one microinstruction per cycle. VLIW processors have instructions that behave
like very wide horizontal microcode, although typically VLIW instructions do not have as
fine-grained control over hardware as microcode. RISC processors can have instructions
that look like narrow vertical microcode.

Modern implementations of CISC instruction sets such as the x86 instruction set
implement the simpler instructions in hardware rather than microcode, using microcode
only to implement the more complex instructions.

LESSON VI

Input/Output

The Input-output model of economics uses a matrix representation of a nation's (or a


region's) economy to predict the effect of changes in one industry on others and by
consumers, government, and foreign suppliers on the economy. Wassily Leontief
(1906-1999) is credited with the development of this analysis. Francois Quesnay
developed a cruder version of this technique called Tableau économique. Leontief won a
Bank of Sweden Prize in Economic Sciences in Memory of Alfred Nobel for his
development of this model. The analytical apparatus is strictly empiricist which reduces
bias in the analysis. For this reason, Leontief seems to have been just about the only
economist who was equally honored by communist and capitalist economists.

Input-output analysis considers inter-industry relations in an economy, depicting how


the output of one industry goes to another industry where it serves as an input, and
thereby makes one industry dependent on another both as customer of output and as
supplier of inputs. An input-output model is a specific formulation of input-output
analysis.

Each row of the input-output matrix reports the monetary value of an industry's inputs
and each column represents the value of an industry's outputs. Suppose there are three
industries. Row 1 reports the value of inputs to Industry 1 from Industries 1, 2, and 3.
Rows 2 and 3 do the same for those industries. Column 1 reports the value of outputs
from Industry 1 to Industries 1, 2, and 3. Columns 2 and 3 do the same for the other
industries.

While the input-output matrix reports only the intermediate goods and services that are
exchanged among industries, row vectors on the bottom record the disposition of

132
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
finished goods and services to consumers, government, and foreign buyers. Similarly,
column vectors on the right record non-industrial inputs like labor and purchases from
foreign suppliers.

In addition to studying the structure of national economies, input-output economics has


been used to study regional economies within a nation, and as a tool for national
economic planning.

The mathematics of input-output economics is straightforward, but the data


requirements are enormous because the expenditures and revenues of each branch of
economic activity has to be represented. The tool has languished because not all
countries collect the required data, data quality varies, and the data collection and
preparation process has lags that make timely analysis difficult. Typically input-out
tables are compiled retrospectively as a "snapshot" cross-section of the economy, once
every few years.

Usefulness
An input-output model is widely used in economic forecasting to predict flows
between sectors. They are also used in local urban economics.

Irving Hock at the Chicago Area Transportation Study did detailed forecasting by
industry sectors using input-output techniques. At the time, Hock’s work was quite an
undertaking, the only other work that has been done at the urban level was for
Stockholm and it was not widely known. Input-output was one of the few techniques
developed at the CATS not adopted in later studies. Later studies used economic base
analysis techniques.

Input-output Analysis versus Consistency Analysis


Despite the clear ability of the input-output model to depict and analyze the dependence
of one industry or sector on another, Leontief and others never managed to introduce
the full spectrum of dependency relations in a market economy. In 2003, Mohammad
Gani, a pupil of Leontief, introduced Consistency Analysis in his book 'Foundations of
Economic Science', which formally looks exactly like the input-output table, but explores
the dependency relations in terms of payments and intermediation relations.
Consistency analysis explores the consistency of plans of buyers and sellers by
decomposing the input-output table into four separate matrices, each for a different
kind of means of payment. It integrates micro and macroeconomics in one model and
deals with money in a fully ideology-free manner. It deals with the circulation of money
vis-à-vis the movement of goods.

In a technical sense, input-output analysis can be seen as a special case of consistency


analysis without money and without entrepreneurship and transaction cost.

133
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Key Ideas
The inimitable book by Leontief himself remains the best exposition of input-output
analysis. See bibliography.

Input-output concepts are simple. Consider the production of the ith sector. We may
isolate (1) the quantity of that production that goes to final demand, c, (2) to total
output, xi, and (3) flows xi from that industry to other industries. We may write a
transactions tableau.

Table: Transactions in a Three Sector Economy


Economic Inputs to Inputs to Inputs to Final Total
Activities Agriculture Manufacturing Transport Demand Output

Agriculture 5 15 2 68 90

Manufacturing 10 20 10 40 80
Transportation 10 15 5 0 30

Labor 25 30 5 0 60

Or

Note that in the example given we have no input flows from the industries to 'Labor’.
We know very little about production functions because all we have are numbers
representing transactions in a particular instance (single points on the production
functions):

The neoclassical production function is an explicit function


Q = f (K, L),

134
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Where Q = Quantity, K = Capital, L = Labor,

And the partial derivatives


( ) are the demand
schedules for input factors.Leontief, the innovator of
input-output analysis, uses a special production function w h i c h
depends linearly on the total output variables xi. Using Leontief
coefficients a, we may manipulate our transactions
information into what is known as an input-output table:

Now

Gives

Rewriting finally yields

Introducing matrix notation, we can see how a solution may be obtained. Let

135
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Denote the total output vector, the final demand vector, the unit matrix and the
input-output matrix, respectively. Then:

Provided (I − A) is a regular matrix which can thus be inverted.

There are many interesting aspects of the Leontief system, and there is an extensive
literature. There is the Hawkins-Simon Condition on producibility. There has been
interest in disaggregation to clustered inter-industry flows, and the study of
constellations of industries. A great deal of empirical work has been done to identify
coefficients, and data have been published for the national economy as well as for
regions. This has been a healthy, exciting area for work by economists because the
Leontief system can be extended to a model of general equilibrium; it offers a method
of decomposing work done at a macro level.

Transportation is implicit in the notion of inter-industry flows. It is explicitly recognized


when transportation is identified as an industry – how much is purchased from
transportation in order to produce. But this is not very satisfactory because
transportation requirements differ, depending on industry locations and capacity
constraints on regional production. Also, the receiver of goods generally pays freight
cost, and often transportation data are lost because transportation costs are treated as
part of the cost of the goods.

Walter Isard and his student, Leon Moses, were quick to see the spatial economy and
transportation implications of input-output, and began work in this area in the 1950s
developing a concept of interregional input-output. Take a one region versus the world
case. We wish to know something about interregional commodity flows, so introduce a
column into the table headed “exports” and we introduce an “input” row.

Imports

Table: Adding Export And Import Transactions

Economic Activities 1 2 … … Z Exports Final Demand Total Outputs

136
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
2


A more satisfactory way to proceed would be to tie regions together at the industry
level. That is, we identify both within region inter-industry transactions and among
region inter-industry transactions. A not-so-small problem here is that the table gets
very large very quickly.

Input-output, as we have discussed it, is conceptually very simple. Its extension to an


overall model of equilibrium in the national economy is also relatively simple and
attractive. But there is a downside. One who wishes to do work with input-output
systems must deal skillfully with industry classification, data estimation, and inverting
very large, ill-conditioned matrices. Two additional difficulties are of interest in
transportation work. There is the question of substituting one input for another, and
there is the question about the stability of coefficients as production increases or
decreases. These are intertwined questions. They have to do with the nature of regional
production functions.

Forecasting and/or Analysis Using Input-Output


This discussion focuses on the use of input-output techniques in transportation; there is
a vast literature on the technique as such.

Table: Interregional Transactions


Economic North East West Total
Ag ... ... Ag ... ... Ag ... ... Exports
Activities Mfg Mfg Mfg Outputs

North Mfg

...

...

Ag

East Mfg

...

...

137
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Ag

West Mfg

...

...

As we see from the use of the economic base study, Urban transportation planning
studies are demand-driven. The question we want to answer is, “What transportation
need results from some economic development: what’s the feedback from development
to transportation?” For that question, input-output is helpful. That’s the question Hock
posed. There is an increase in the final demand vector, changed inter-industry relations
result, and there is an impact on transportation requirements.

Rappoport et al. (1979) started with consumption projections. These drove solutions of
a national I-O model for projections of GNP and transportation requirements as per the
transportation vector in the I-O matrix. Submodels were then used to investigate modal
split and energy consumption in the transportation sector.

Another question asked is: What is the impact of the transportation construction activity
on an area? One of the first studies made of the impact of the interstate highway
system used the national I/O model to forecast impacts measured in increased steel
production, cement, employment, etc.

Table: Input-Output Model for Hypothetical Economy Total requirements from


regional industries per dollar of output delivered to final demand
Purchasing Industry Agriculture Transport Manufacturer Services

Selling Industry

Agriculture 1.14 0.22 0.13 0.12

Transportation 0.19 1.10 0.16 0.07

Manufacturing 0.16 0.16 1.16 0.06

Services 0.08 0.05 0.08 1.09

Total 1.57 1.53 1.53 1.34

The Maritime Administration (MARAD) has produced the Port Impact Kit for a number of
years. This software illustrates the use of I/O models. Simply written, it makes the
technique widely available. It shows how to calculate direct effects from the initial round
of spending that’s worked out by the vessel/cargo combinations. The direct
expenditures are entered into the I/O table, and indirect effects are calculated. These
are the inter-industry-relations derived activities from the purchases of supplies,
purchases, labor, etc. An I/O table is supplied to aid that calculation. Then, using the
138
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
I/O table, induced effects are calculated. These are effects from household purchases of
goods and services made possible from the wages generated from direct and indirect
effects. The Corps of Engineers has a similar capability that has been used to examine
the impacts of construction or base closing. The US Department of Commerce Bureau of
Economic Analysis (BEA) (1997) model discusses how to use their state level I/O
models (RIMS II). The ready availability of BEA and MARAD-like tables and calculation
tools says that we will see more and more feedback impact analysis. The information is
meaningful for many purposes.

Feed forward calculations seem to be much more interesting for planning. The question
is, “If an investment is made in transportation, what will be its development effects?”
An investment in transportation might lower transport costs, increase quality of service,
or a mixture of these. What would be the effect on trade flows, output, earnings, etc.?

The first problem we know of worked on from this point of view was in Japan in the
1950’s. The situation was the building of a bridge to connect two islands, and the core
question was of the mixing of the two island economies.

A first consideration is the impact of changed transportation attributes, say, lower cost,
on industry location, and/or agricultural or other resource based extra active activity,
and/or on markets. A spatial price equilibrium model (linear programming) is the tool of
choice for that. Input-output then permits tracing changed inter-industry relations,
impacts on wages, etc.

Britton Harris (1974) uses that analysis strategy. He begins with industry location
forecasting equations: treats equilibrium of locations, markets, and prices; and pays
much attention to transport costs. An interesting thing about this and other models is
that input-output considerations are no more than an accounting add-on; they hardly
enter Harris’ study. The interesting problems are the location and flow problems.
I/O devices
This topic discusses the different types of I/O devices used on your managed system,
and how the I/O devices are added to logical partitions.

I/O devices allow your managed system to gather, store, and transmit data. I/O devices
are found in the server unit itself and in expansion units and towers that are attached to
the server. I/O devices can be embedded into the unit, or they can be installed into
physical slots.

Not all types of I/O devices are supported for all operating systems or on all server
models. For example, I/O processors (IOPs) are supported only on i5/OS® logical
partitions. Also, Switch Network Interface (SNI) adapters are supported only on certain
server models, and are not supported for i5/OS logical partitions.
I/O pools for i5/OS logical partitions

139
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
This information discusses how I/O pools must be used to switch I/O adapters (IOAs)
between i5/OS® logical partitions that support switchable independent auxiliary storage
pools (IASPs).

An I/O pool is a group of I/O adapters that form an IASP. Other names for IASPs
include I/O failover pool and switchable independent disk pool. The IASP can be
switched from a failed server to a backup server within the same cluster without the
active intervention of the HMC. The I/O adapters within the IASP can be used by only
one logical partition at a time, but any of the other logical partitions in the group can
take over and use the I/O adapters within the IASP. The current owning partition must
power off the adapters before another partition can take ownership.

IASPs are not suitable for sharing I/O devices between different logical partitions. If you
want to share an I/O device between different logical partitions, use the HMC to move
the I/O device dynamically between the logical partitions.
IOPs for i5/OS logical partitions
This information discusses the purpose of IOPs and how you can switch IOPs and IOAs
dynamically between i5/OS® logical partitions.

i5/OS logical partitions require that the I/O processor (IOP) be attached to the system
I/O bus and one or more I/O adapters (IOA). The IOP processes instructions from the
server and works with the IOAs to control the I/O devices. The combined-function IOP
(CFIOP) can connect to a variety of different IOAs. For instance, a CFIOP could support
disk units, a console, and communications hardware.
Note: A server with i5/OS logical partitions must have the correct IOP feature codes for
the load source disk unit and alternate restart devices. Without the correct hardware, the
logical partitions will not function correctly.
A logical partition controls all devices connected to an IOP. You cannot switch one I/O
device to another logical partition without moving the ownership of the IOP. Any
resources (IOAs and devices) that are attached to the IOP cannot be in use when you
move an IOP from one logical partition to another.
IOAs for i5/OS logical partitions
This information discusses some of the types of IOAs that are used to control devices in
i5/OS® logical partitions and the placement rules that you must follow when installing
these devices in your servers and expansion units.

Load source for i5/OS logical partitions

This topic discusses the purpose of a load source for i5/OS® logical partitions and the
placement rules that you must follow when installing the load source.

140
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Each i5/OS logical partition must have one disk unit designated as the load source. The
server uses the load source to start the logical partition. The server always identifies
this disk unit as unit number 1.

You must follow placement rules when placing a load source disk unit in your managed
system. Before adding a load source to your managed system or moving a load source
within your managed system, validate the revised system hardware configuration with
the System Planning Tool (SPT), back up the data on the disks attached to the IOA, and
move the hardware according to the SPT output.
Alternate restart device and removable media devices for i5/OS logical
partitions

This topic discusses the purpose of tape and optical devices in i5/OS® logical partitions
and the placement rules that you must follow when installing these devices.

A removable media device reads and writes to media (tape, CD-ROM, or DVD). Every
i5/OS logical partition must have either a tape or an optical device (CD-ROM or DVD)
available to use. The server uses the tape or optical devices as the alternate restart
device and alternate installation device. The media in the device is what the system
uses to start from when you perform a D-mode initial program load (IPL). The alternate
restart device loads the Licensed Internal Code contained on the removable media
instead of the code on the load source disk unit. It can also be used to install the
system.

Depending on your hardware setup, you might decide that your logical partitions will
share these devices. If you decide to share these devices, remember that only one
logical partition can use the device at any time. To switch devices between logical
partitions, you must move the IOP controlling the shared device to the desired logical
partition.
Disk units for i5/OS logical partitions
Disk units store data for i5/OS™ logical partitions. You can configure disk units into
auxiliary storage pools (ASPs).

Disk units store data for i5/OS logical partitions. The server can use and reuse this data
at any time. This method of storing data is more permanent than memory (RAM);
however, you can still erase any data on a disk unit.

Disk units can be configured into auxiliary storage pools (ASPs) on any logical partition.
All of the disk units you assign to an ASP must be from the same logical partition. You
cannot create a cross-partition ASP.

In computing, an interrupt is an asynchronous signal from hardware indicating the


need for attention or a synchronous event in software indicating the need for a change
in execution. A hardware interrupt causes the processor to save its state of execution
via a context switch, and begin execution of an interrupt handler. Software interrupts
141
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
are usually implemented as instructions in the instruction set, which cause a context
switch to an interrupt handler similarly to a hardware interrupt. Interrupts are a
commonly used technique for computer multitasking, especially in real-time computing.
Such a system is said to be interrupt-driven.

An act of interrupting is referred to as an interrupt request ("IRQ").

Overview
Hardware interrupts were introduced as a way to avoid wasting the processor's valuable
time in polling loops, waiting for external events.

Interrupts may be implemented in hardware as a distinct system with control lines, or


they may be integrated into the memory subsystem. If implemented in hardware, a
Programmable Interrupt Controller (PIC) or Advanced Programmable Interrupt
Controller (APIC) is connected to both the interrupting device and to the processor's
interrupt pin. If implemented as part of the memory controller, interrupts are mapped
into the system's memory address space.

Interrupts can be categorized into the following types: software interrupt, maskable
interrupt, non-maskable interrupt (NMI), interprocessor interrupt (IPI), and spurious
interrupt.

· A software interrupt is an interrupt generated within a processor by executing an


instruction. Examples of software interrupts are system calls.

An interrupt that leaves the machine in a well-defined state is called a precise


interrupt. Such an interrupt has four properties:

- The PC (Program Counter)is saved in a known place.

- All instructions before the one pointed to by the PC have fully executed.
- No instruction beyond the one pointed to by the PC has been executed (That is no
prohibition on instruction beyond that in PC, it is just that any changes they make to
registers or memory must be undone before the interrupt happens).

- The execution state of the instruction pointed to by the PC is known.


An interrupt that does not meet these requirements is called an imprecise interrupt.

· A maskable interrupt is essentially a hardware interrupt which may be ignored by


setting a bit in an interrupt mask register's (IMR) bit-mask.

· Likewise, a non-maskable interrupt is a hardware interrupt which typically does not


have a bit-mask associated with it allowing it to be ignored.
142
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
· An interprocessor interrupt is a special type of interrupt which is generated by one
processor to interrupt another processor in a multiprocessor system.

· A spurious interrupt is a hardware interrupt which is generated by system errors,


such as electrical noise on one of the PICs interrupt lines.

Processors typically have an internal interrupt mask which allows software to ignore all
external hardware interrupts while it is set. This mask may offer faster access than
accessing an IMR in a PIC, or disabling interrupts in the device itself. In some cases,
such as the x86 architecture, disabling and enabling interrupts on the processor itself
acts as a memory barrier, in which case it may actually be slower.

The phenomenon where the overall system performance is severely hindered by


excessive amounts of processing time spent handling interrupts is called an interrupt
storm or live lock.

Level-triggered
A level-triggered interrupt is a class of interrupts where the presence of an
unserviced interrupt is indicated by a high level (1), or low level (0), of the interrupt
request line. A device wishing to signal an interrupt drives the line to its active level,
and then holds it at that level until serviced. It ceases asserting the line when the CPU
commands it to or otherwise handles the condition that caused it to signal the interrupt.

Typically, the processor samples the interrupt input at predefined times during each bus
cycle such as state T2 for the Z80 microprocessor. If the interrupt isn't active when the
processor samples it, the CPU doesn't see it. One possible use for this type of interrupt
is to minimize spurious signals from a noisy interrupt line: a spurious pulse will often be
so short that it is not noticed.

Multiple devices may share a level-triggered interrupt line if they are designed to. The
interrupt line must have a pull-down or pull-up resistor so that when not actively driven
it settles to its inactive state. Devices actively assert the line to indicate an outstanding
interrupt, but let the line float (do not actively drive it) when not signaling an interrupt.
The line is then in its asserted state when any (one or more than one) of the sharing
devices is signaling an outstanding interrupt.

This class of interrupts is favored by some because of a convenient behavior when the
line is shared. Upon detecting assertion of the interrupt line, the CPU must search
through the devices sharing it until one requiring service is detected. After servicing this
one, the CPU may recheck the interrupt line status to determine whether any other
devices also need service. If the line is now disserted then the CPU avoids the need to
check all the remaining devices on the line. Where some devices interrupt much more
than others, or where some devices are particularly expensive to check for interrupt
status, a careful ordering of device checks brings some efficiency gain.

143
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
There are also serious problems with sharing level-triggered interrupts. As long as any
device on the line has an outstanding request for service the line remains asserted, so it
is not possible to detect a change in the status of any other device. Deferring servicing
a low-priority device is not an option, because this would prevent detection of service
requests from higher-priority devices. If there is a device on the line that the CPU does
not know how to service, then any interrupt from that device permanently blocks all
interrupts from the other devices.

The original PCI standard mandated shareable level-triggered interrupts. The rationale
for this was the efficiency gain discussed above. (Newer versions of PCI allow, and PCI
Express requires, the use of message-signaled interrupts.)

Edge-triggered
An edge-triggered interrupt is a class of interrupts that are signaled by a level
transition on the interrupt line, either a falling edge (1 to 0) or (usually) a rising edge (0
to 1). A device wishing to signal an interrupt drives a pulse onto the line and then
returns the line to its quiescent state. If the pulse is too short to detect by polled I/O
then special hardware may be required to detect the edge.

Multiple devices may share an edge-triggered interrupt line if they are designed to. The
interrupt line must have a pull-down or pull-up resistor so that when not actively driven
it settles to one particular state. Devices signal an interrupt by briefly driving the line to
its non-default state, and let the line float (do not actively drive it) when not signaling
an interrupt. The line then carries all the pulses generated by all the devices. However,
interrupt pulses from different devices may merge if they occur close in time. To avoid
losing interrupts the CPU must trigger on the trailing edge of the pulse (e.g., the rising
edge if the line is pulled up and driven low). After detecting an interrupt the CPU must
check all the devices for service requirements.

Edge-triggered interrupts do not suffer the problems that level-triggered interrupts


have with sharing. Service of a low-priority device can be postponed arbitrarily, and
interrupts will continue to be received from the high-priority devices that are being
serviced. If there is a device that the CPU does not know how to service, it may cause a
spurious interrupt, or even periodic spurious interrupts, but it does not interfere with
the interrupt signaling of the other devices.

The elderly ISA bus uses edge-triggered interrupts, but does not mandate that devices
be able to share them. The parallel port also uses edge-triggered interrupts. Many older
devices assume that they have exclusive use of their interrupt line, making it electrically
unsafe to share them. However, ISA motherboards include pull-up resistors on the IRQ
lines, so well-behaved devices share ISA interrupts just fine.

Hybrid

144
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Some systems use a hybrid of level-triggered and edge-triggered signaling. The
hardware not only looks for an edge, but it also verifies that the interrupt signal stays
active for a certain period of time. A common hybrid interrupt is the NMI (non-maskable
interrupt) input. Because NMIs generally signal major-or even catastrophic-system
events, a good implementation of this signal tries to ensure that the interrupt is valid by
verifying that it remains active for a period of time. This 2-step approach helps to
eliminate false interrupts from affecting the system.

Message-signalled
A message-signalled interrupt does not use a physical interrupt line. Instead, a device
signals its request for service by sending a short message over some communications
medium, typically a computer bus. The message might be of a type reserved for
interrupts, or it might be of some pre-existing type such as a memory write.

Message-signalled interrupts behave very much like edge-triggered interrupts, in that


the interrupt is a momentary signal rather than a continuous condition.
Interrupt-handling software treats the two in much the same manner. Typically,
multiple pending message-signalled interrupts with the same message (the same virtual
interrupt line) are allowed to merge, just as closely-spaced edge-triggered interrupts
can merge.

Message-signalled interrupt vectors can be shared, to the extent that the underlying
communication medium can be shared. No additional effort is required.

Because the identity of the interrupt is indicated by a pattern of data bits, not requiring
a separate physical conductor, many more distinct interrupts can be efficiently handled.
This reduces the need for sharing. Interrupt messages can also be passed over a serial
bus, not requiring any additional lines.

PCI Express, a serial computer bus, uses message-signalled interrupts exclusively.

Difficulty with sharing interrupt lines


Multiple devices sharing an interrupt line (of any triggering style) all act as spurious
interrupt sources with respect to each other. With many devices on one line the
workload in servicing interrupts grows as the square of the number of devices. It is
therefore preferred to spread devices evenly across the available interrupt lines.
Shortage of interrupt lines is a problem in older system designs where the interrupt
lines are distinct physical conductors. Message-signalled interrupts, where the interrupt
line is virtual, are favoured in new system architectures (such as PCI Express) and
relieve this problem to a considerable extent.

Some devices with a badly-designed programming interface provide no way to


determine whether they have requested service. They may lock up or otherwise

145
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
misbehave if serviced when they do not want it. Such devices cannot tolerate spurious
interrupts, and so also cannot tolerate sharing an interrupt line. ISA cards, due to often
cheap design and construction, are notorious for this problem. Such devices are
becoming much rarer, as hardware logic becomes cheaper and new system
architectures mandate shareable interrupts.

Typical uses
Typical interrupt uses include the following: system timers, disks I/O, power-off signals,
and traps. Other interrupts exist to transfer data bytes using UARTs or Ethernet; sense
key-presses; control motors; or anything else the equipment must do.

A classic system timer interrupt interrupts periodically from a counter or the power-line.
The interrupt handler counts the interrupts to keep time. The timer interrupt may also
be used by the OS's task scheduler to reschedule the priorities of running processes.
Counters are popular, but some older computers used the power line frequency instead,
because power companies in most Western countries control the power-line frequency
with an atomic clock.

A disk interrupt signals the completion of a data transfer from or to the disk peripheral.
A process waiting to read or write a file starts up again.

A power-off interrupt predicts or requests a loss of power. It allows the computer


equipment to perform an orderly shutdown.

Interrupts are also used in type ahead features for buffering events like keystrokes.

Direct memory access (DMA) is a feature of modern computers, that allows


certain hardware subsystems within the computer to access system memory for reading
and/or writing independently of the central processing unit. Many hardware systems
use DMA including disk drive controllers, graphics cards, network cards, and sound
cards. Computers that have DMA channels can transfer data to and from devices with
much less CPU overhead than computers without a DMA channel.

Without DMA, using programmed input/output (PIO) mode, the CPU typically has to be
occupied for the entire time it's performing a transfer. With DMA, the CPU would initiate
the transfer, do other operations while the transfer is in progress, and receive an
interrupt from the DMA controller once the operation has been done. This is especially
useful in real-time computing applications where not stalling behind concurrent
operations is critical.

Principle
DMA is an essential feature of all modern computers, as it allows devices to transfer
data without subjecting the CPU to a heavy overhead. Otherwise, the CPU would have
to copy each piece of data from the source to the destination. This is typically slower
146
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
than copying normal blocks of memory since access to I/O devices over a peripheral
bus is generally slower than normal system RAM. During this time the CPU would be
unavailable for any other tasks involving CPU bus access, although it could continue
doing any work which did not require bus access.

A DMA transfer essentially copies a block of memory from one device to another. While
the CPU initiates the transfer, it does not execute it. For so-called "third party" DMA, as
is normally used with the ISA bus, the transfer is performed by a DMA controller which
is typically part of the motherboard chipset. More advanced bus designs such as PCI
typically use bus mastering DMA, where the device takes control of the bus and
performs the transfer itself.

A typical usage of DMA is copying a block of memory from system RAM to or from a
buffer on the device. Such an operation does not stall the processor, which as a result
can be scheduled to perform other tasks. DMA transfers are essential to high
performance embedded systems. It is also essential in providing so-called zero-copy
implementations of peripheral device drivers as well as functionalities such as network
packet routing, audio playback and streaming video.

DMA engines
In addition to hardware interaction, DMA can also be used to offload expensive memory
operations, such as large copies or scatter-gather operations, from the CPU to a
dedicated DMA engine. While normal memory copies are typically too small to be
worthwhile to offload on today's desktop computers, they are frequently offloaded on
embedded devices due to more limited resources.[1]

Newer Intel Xeon processors also include a DMA engine technology called I/OAT, meant
to improve network performance on high-throughput network interfaces, such as
gigabit Ethernet, in particular.[2] However, benchmarks with this approach on Linux
indicate no more than 10% improvement in CPU utilization.[3]

Examples

ISA
For example, a PC's ISA DMA controller has 16 DMA channels of which 7 are available
for use by the PC's CPU. Each DMA channel has associated with it a 16-bit address
register and a 16-bit count register. To initiate a data transfer the device driver sets up
the DMA channel's address and count registers together with the direction of the data
transfer, read or write. It then instructs the DMA hardware to begin the transfer. When
the transfer is complete, the device interrupts the CPU.

147
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
"Scatter-gather" DMA allows the transfer of data to and from multiple memory areas in
a single DMA transaction. It is equivalent to the chaining together of multiple simple
DMA requests. Again, the motivation is to off-load multiple input/output interrupt and
data copy tasks from the CPU.

DRQ stands for DMA request; DACK for DMA acknowledge. These symbols are generally
seen on hardware schematics of computer systems with DMA functionality. They
represent electronic signaling lines between the CPU and DMA controller.

The Core Connect™ Bus Architecture

Recent advances in silicon densities now allow for the integration of numerous functions
onto a single silicon chip. With this increased density, peripherals formerly attached to the
processor at the card level are integrated onto the same die as the processor. As a result,
chip designers must now address issues traditionally handled by the system designer. In
particular, the on-chip buses used in such system-on-a chip designs must be sufficiently
flexible and robust in order to support a wide variety of embedded system needs. The IBM
Blue Logic™ cores program provides the framework to efficiently realize complex
system-on-a chip (SOC) designs. Typically, a SOC contains numerous functional blocks
representing a very large number of logic gates. Designs such as these are best realized
through a macro-based approach. Macro based design provides numerous benefits during
logic entry and verification, but the ability to reuse intellectual property is often the most
significant. From generic serial ports to complex memory controllers and processor cores,
each SOC generally requires the use of common macros. Many single chip solutions used
in applications today are designed as custom chips, each with its own internal
architecture. Logical units within such a chip are often difficult to extract and re-use in
different applications. As a result, many times the same function is redesigned from one
application to another. Promoting reuse by ensuring macro interconnectivity is
accomplished by using common buses for intermacro communications. To that end, the
IBM CoreConnect architecture provides three buses for interconnecting cores, library
macros, and custom logic:
· Processor Local Bus (PLB)
· On-Chip Peripheral Bus (OPB)
· Device Control Register (DCR) Bus
Figure 1 illustrates how the CoreConnect architecture can be used to interconnect macros
in a PowerPC 440 based SOC. High performance, high bandwidth blocks such as the
PowerPC 440 CPU core, PCI-X Bridge and PC133/DDR133 SDRAM Controller reside on the
PLB, while the OPB hosts lower data rate peripherals. The daisy-chained DCR bus provides
a relatively low-speed data path for passing configuration and status information between
the PowerPC 440 CPU core and other on-chip macros.

148
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
The CoreConnect architecture shares many similarities with the Advanced Microcontroller
Bus
Architecture (AMBA™) from ARM Ltd. As shown in Table 1, the recently announced AMBA
2.01 includes the specification of many high performance features that have been
available in the Core Connect architecture for over three years. Both architectures support
data bus widths of 32-bits and higher, utilize separate read and write data paths and
149
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
allow multiple masters. CoreConnect and AMBA 2.0 now both provide high performance
features including pipelining, split transactions and burst transfers. Many custom designs
utilizing the high performance features of the CoreConnect architecture are available in
the marketplace today. Open specifications for the CoreConnect architecture are available
on the IBM Microelectronics web site. In addition, IBM offers a no-fee, royalty-free
CoreConnect architectural license. Licensees receive the PLB arbiter, OPB arbiter and
PLB/OPB Bridge designs along with bus model toolkits and bus functional compilers for the
PLB, OPB and DCR buses. In the future, IBM intends to include compliance test suites for
each of the three buses.
Processor Local Bus
The PLB and OPB buses provide the primary means of data flow among macro elements.
Because these two buses have different structures and control signals, individual macros
are designed to interface to either the PLB or the OPB. Usually the PLB interconnects
high-bandwidth devices such as processor cores, external memory interfaces and DMA
controllers. The PLB addresses the high performance, low latency and design flexibility
issues needed in a highly integrated SOC through:

· Decoupled address, read data, and write data buses with split transaction capability
· Concurrent read and writes transfers yielding a maximum bus utilization of two data
transfers per clock
· Address pipelining that reduces bus latency by overlapping a new write request with an
ongoing write transfer and up to three read requests with an ongoing read transfer.
· Ability to overlap the bus request/grant protocol with an ongoing transfer

150
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
In addition to providing a high bandwidth data path, the PLB offers designers flexibility
through the following features:
· Support for both multiple masters and slaves
· Four priority levels for master requests allowing PLB implementations with various
arbitration
schemes
· Deadlock avoidance through slave forced PLB rearbitration
· Master driven atomic operations through a bus arbitration locking mechanism
· Byte-enable capability, supporting unaligned transfers
· A sequential burst protocol allowing byte, half-word, word and double-word burst
transfers
· Support for 16-, 32- and 64-byte line data transfers
· Read word address capability, allowing slaves to return line data either sequentially or
target
word first
· DMA support for buffered, fly-by, peripheral-to-memory, memory-to-peripheral, and
memory-to memory
transfers
· Guarded or unguarded memory transfers allow slaves to individually enable or disable
prefetching of instructions or data
· Slave error reporting
· Architecture extendable to 256-bit data buses
· Fully synchronous
The PLB specification describes system architecture along with a detailed description of
the signals and transactions. PLB-based custom logic systems require the use of a PLB
macro to interconnect the various master and slave macros.
Figure 2 illustrates the connection of multiple masters and slaves through the PLB macro.
Each PLB master is attached to the PLB macro via separate address, read data and write
data buses and a plurality of transfer qualifier signals. PLB slaves are attached to the PLB
macro via shared, but decoupled, address, read data and write data buses along with
transfer control and status signals for each data bus.

The PLB architecture supports up to 16 master devices. Specific PLB macro


implementations, however, may support fewer masters. The PLB architecture also
supports any number of slave devices. The number of masters and slaves attached to a
PLB macro directly affects the maximum attainable PLB bus clock rate. This is because

151
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
larger systems tend to have increased bus wire load and a longer delay in arbitrating
among multiple masters and slaves.

The PLB macro consists of a bus arbitration control unit and the control logic required
managing the address and data flow through the PLB. The separate address and data
buses from the masters allow simultaneous transfer requests. The PLB macro arbitrates
among these requests and directs the address, data and control signals from the granted
master to the slave bus. The slave response is then routed from the slave bus back to the
appropriate master.

PLB Bus Transactions


PLB transactions consist of multiphase address and data tenures. Depending on the level
of bus activity and capabilities of the PLB slaves, these tenures may be one or more PLB
bus cycles in duration. In addition, address pipelining and separate read and write data
buses yield increased bus throughput by way of concurrent tenures. Address tenures have
three phases: request, transfer and address acknowledge. A PLB transaction begins when
a master drives its address and transfer qualifier signals and requests ownership of the
bus during the request phase of the address tenure. Once the PLB arbiter grants bus
ownership the master's address and transfer qualifiers are presented to the slave devices
during the transfer phase. The address cycle terminates when a slave latches the master's
address and transfer qualifiers during the address acknowledge phase. Figure 3 illustrates
two deep read and write address pipelining along with concurrent read and write data
tenures. Master A and Master B represent the state of each master's address and transfer
qualifiers. The PLB arbitrates between these requests and passes the selected master's
request to the PLB slave address bus. The trace labeled Address Phase shows the state of
the PLB slave address bus during each PLB clock.

As shown in Figure 3, the PLB specification supports implementations where these three
phases can require only a single PLB clock cycle. This occurs when the requesting master
is immediately granted access to the slave bus and the slave acknowledges the address in
the same cycle. If a master issues a request that cannot be immediately forwarded to the
slave bus, the request phase lasts one or more cycles.

152
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Each data beat in the data tenure has two phases: transfer and acknowledge. During the
transfer phase the master drives the write data bus for a write transfer or samples the
read data bus for a read transfer.
As shown in Figure 3, the first (or only) data beat of a write transfer coincides with the
address transfer phase. Data acknowledge cycles are required during the data
acknowledge phase for each data beat in a data cycle. In the case of a single-beat
transfer, the data acknowledge signals also indicate the end of the data transfer. For line
or burst transfers, the data acknowledge signals apply to each individual beat and indicate
the end of the data cycle only after the final beat. The highest data throughput occurs
when data is transferred between master and slave in a single PLB clock cycle. In this
case the data transfer and data acknowledge phases are coincident. During multi-cycle
accesses there is a wait-state either before or between the data transfer and data
acknowledge phases.

The PLB address, read data, and write data buses are decoupled from one another,
allowing for address cycles to be overlapped with read or write data cycles, and for read
data cycles to be overlapped with write data cycles. The PLB split bus transaction
capability allows the address and data buses to have different masters at the same time.
Additionally, a second master may request ownership of the PLB, via address pipelining, in
parallel with the data cycle of another master's bus transfer. This is shown in Figure 3.
Overlapped read and write data transfers and split-bus transactions allow the PLB to
operate at a very high bandwidth by fully utilizing the read and write data buses. Allowing
PLB devices to move data using long burst transfers can further enhance bus throughput.
However, to control the maximum latency in a particular application, master latency
timers are required. All masters able to issue burst operations must contain a latency
timer that increments at the PLB clock rate and a latency count register. The latency
count register is an example of a configuration register that is accessed via the DCR bus.
During a burst operation, the latency timer begins counting after an address acknowledge
153
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
is received from a slave. When the latency timer exceeds the value programmed into the
latency count register, the master can either immediately terminate its burst, continue
until another master requests the bus or continue until another master requests the bus
with a higher priority.

PLB Cross-Bar Switch


In some PLB-based systems multiple masters may cause the aggregate data bandwidth to
exceed that which can be satisfied with a single PLB. With such a system it may be
possible to place the high data rate masters and their target slaves on separate PLB
buses. An example is a multiprocessor system using separate memory controllers. A
macro known as the PLB Cross-Bar Switch (CBS) can be utilized to allow communication
between masters on one PLB and slaves on the other. As shown in Figure 4, the CBS is
placed between the PLB arbiters and their slave buses. When a master begins a
transaction, the CBS uses the associated address to select the appropriate slave bus. The
CBS supports simultaneous data transfers on both PLB buses along with a prioritization
scheme to handle multiple requests to a common slave port. In addition, a high priority
request can interrupt a lower priority transaction.

On-Chip Peripheral Bus


The On-Chip Peripheral Bus (OPB) is a secondary bus architected to alleviate system
performance bottlenecks by reducing capacitive loading on the PLB. Peripherals suitable
for attachment to the OPB include serial ports, parallel ports, UARTs, GPIO, timers and
other low-bandwidth devices. As part of the IBM Blue Logic cores program, all OPB core
peripherals directly attach to OPB. This common design point accelerates the design cycle
time by allowing system designers to easily integrate complex peripherals into an ASIC.

The OPB provides the following features:


· A fully synchronous protocol with separate 32-bit address and data buses
· Dynamic bus sizing to support byte, half-word and word transfers
154
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
· Byte and half-word duplication for byte and half-word transfers
· A sequential address (burst) protocol
· Support for multiple OPB bus masters
· Bus parking for reduced-latency transfers

OPB Bridge
PLB masters gain access to the peripherals on the OPB bus through the OPB bridge
macro. The OPB bridge acts as a slave device on the PLB and a master on the OPB. It
supports word (32-bit), half-word (16-bit) and byte read and write transfers on the 32-bit
OPB data bus, bursts and has the capability to perform target word first line read
accesses. The OPB bridge performs dynamic bus sizing, allowing devices with different
data widths to efficiently communicate. When the OPB bridge master performs an
operation wider than the selected OPB slave the bridge splits the operation into two or
more smaller transfers.

OPB Implementation
The OPB supports multiple masters and slaves by implementing the address and data
buses as a distributed multiplexer. This type of structure is suitable for the less data
intensive OPB bus and allows adding peripherals to a custom core logic design without
changing the I/O on either the OPB arbiter or existing peripherals. Figure 5 shows one
method of structuring the OPB address and data buses. Observe that both masters and
slaves provide enable control signals for their outbound buses. By requiring that each
macro provide this signal, the associated bus combining logic can be strategically

Channels

(1) A high-speed metal or optical fiber subsystem that provides a path between the
computer and the control units of the peripheral devices. Used in mainframes and
high-end servers, each channel is an independent unit that transfers data concurrently
with other channels and the CPU. For example, in a 32-channel computer, 32 streams of
data are transferred simultaneously. In contrast, the PCI bus in a desktop computer is a
shared channel between all devices plugged into it.

(2) The physical connecting medium in a network, which could be twisted wire pairs,
coaxial cable or optical fiber between clients, servers and other devices.

(3) A subchannel within a communications channel. Multiple channels are transmitted via
different carrier frequencies or by interleaving bits and bytes. This usage of the term can
refer to both wired and wireless transmission. See FDM and TDM.

(4) The Internet counterpart to a TV or radio channel. Information on a particular subject


is transmitted to the user's computer from a Webcast site via the browser or push client.

155
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
See Webcast, push client and push technology.

(5) The distributor/dealer sales channel. Vendors that sell in the channel rely on the sales
ability of their dealers and the customer relationships they have built up over the years.
Such vendors may also compete with the channel by selling direct to the customer via
catalogs and the Web.

Channel controller

(redirected from I/O processor)

A channel controller is a simple CPU used to handle the task of moving data to
and from the memory of a computer. Depending on the sophistication of the design, they
can also be referred to as peripheral processors, I/O processors, I/O controllers or
DMA controllers.
Most input/output tasks can be fairly complex and require logic to be applied to the
data to convert formats and other similar duties. In these situations the computer's CPU
would normally be asked to handle the logic, but due to the fact that the I/O devices are
very slow, the CPU would end up spending a huge amount of time (in computer terms)
sitting idle waiting
for the data from the device.
A channel controller avoids this problem by using a low-cost CPU with enough logic
and memory onboard to handle these sorts of tasks. They are typically not powerful or
flexible enough to be used on their own, and are actually a form of co-processor. The CPU
sends small programs to the controller to handle an I/O job, which the channel controller
can then complete without any help from the CPU. When it is complete, or there is an
error, the channel
controller communicates with the CPU using a selection of interrupts.
Since the channel controller has direct access to the main memory of the computer,
they are also often referred to as DMA Controllers (where DMA means direct memory
access), but that term is somewhat more loose in definition and is often applied to
non-programmable devices as well.
The first use of channel controllers was in the famed CDC 6600 supercomputer,
which used 12 dedicated computers they referred to as peripheral processors, or PP's for
this role. The PP's were quite powerful, basically a cut down version of CDC's first
computer, the CDC 1604. Since the 1960s channel controllers have been a standard part
of almost all mainframe designs, and the primary reason why anyone buys one. CDC's
PP's are at one end of the spectrum of power, most mainframe systems tasked the CPU
with more and the channel controllers with less of the overall I/O task.

156
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Channel controllers have also been made as small as single-chip designs with
multiple channels on them, used in the NeXT computers for instance. However with the
rapid speed increases in computers today, combined with operating systems that don't
"block" when waiting for data, the channel controller has become somewhat redundant
and are not commonly found on smaller machines.
Channel controllers can be said to be making a comeback in the form of "bus
mastering" peripheral devices, such as SCSI adaptors and network cards. The rationale
for these devices is the same as for the original channel controllers, namely off-loading
interrupts and context switching from the main CPU.

A serial number is a unique number that is one of a series assigned for identification
which varies from its successor or predecessor by a fixed discrete integer value.
Common usage has expanded the term to refer to any unique alphanumeric identifier
for one of a large set of objects, however in data processing and allied fields in
computer science. Not every numerical identifier is a serial number; identifying numbers
which are not serial numbers are sometimes called nominal numbers.
Sequence numbers are almost always non-negative, and typically start at zero or one.

Applications of serial numbering


Serial numbers are valuable in quality control, as once a defect is found in the
production of a particular batch of product, the serial number will quickly identify which
units are affected. Serial numbers are also used as a deterrent against theft and
counterfeit products in that serial numbers can be recorded, and stolen or otherwise
irregular goods can be identified.

Many computer programs come with serial numbers, often called "CD keys," and the
installers often require the user to enter a valid serial number to continue. These
numbers are verified using a certain algorithm to avoid usage of counterfeit keys.

Serial numbers also help track down counterfeit currency, because in some countries
each banknote has a unique serial number.

The ISSN or International Standard Serial Number seen on magazines and other
periodicals, an equivalent to the ISBN applied to books, is serially assigned but takes its
name from the library science use of serial to mean a periodical.

Certificates and Certificate Authorities (CA) are necessary for widespread use of
cryptography. These depend on applying mathematically rigorous serial numbers and
serial number arithmetic

The term "serial number" is also used in military formations as an alternative to the
expression "service number".

Estimating population size from serial numbers

157
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
If there are items whose serial numbers is part of a sequence of consecutive numbers
and you take n number of random samples of items' serial numbers, you can then
estimate the population of items "in the wild" using a maximum likelihood method
derived using Bayesian reasoning.

Serial number arithmetic

Serial numbers are often used in network protocols. However, most sequence numbers
in computer protocols are limited to a fixed number of bits, and will wrap around after a
sufficiently many numbers have been allocated. Thus, recently-allocated serial numbers
may duplicate very old serial numbers, but not other recently-allocated serial numbers.
To avoid ambiguity with these non-unique numbers, RFC 1982, " Serial Number
Arithmetic" defines special rules for calculations involving these kinds of serial numbers.

Lollipop sequence number spaces are a more recent and sophisticated scheme for
dealing with finite-sized sequence numbers in protocols.

An information processor or information processing system, as its name


suggests, is a system (be it electrical, mechanical or biological) which takes information
(a sequence of enumerated states) in one form and processes (transforms) it into
another form, e.g. to statistics, by an algorithmic process.

An information processing system is made up of four basic parts, or sub-systems:

· input

· processor

· storage

· output

An object may be considered an information processor if it receives information from


another object and in some manner changes the information before transmitting it. This
broadly defined term can be used to describe every change which occurs in the
158
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
universe. As an example, a falling rock could be considered an information processor
due to the following observable facts:

First, information in the form of gravitational force from the earth serves as input to the
system we call a rock. At a particular instant the rock is a specific distance from the
surface of the earth traveling at a specific speed. Both the current distance and speed
properties are also forms of information which for that instant only may be considered
"stored" in the rock.

In the next instant, the distance of the rock from the earth has changed due to its
motion under the influence of the earth's gravity. Any time the properties of an object
change a process has occurred meaning that a processor of some kind is at work. In
addition, the rock's new position and increased speed is observed by us as it falls. These
changing properties of the rock are its "output."

It could be argued that in this example both the rock and the earth are the information
processing system being observed since both objects are changing the properties of
each other over time. If information is not being processed no change would occur at
all.

Lesson VII

Arithmetic/Logic Unit Enhancement


Arithmetic logic units (ALU) perform arithmetic and logic operations on binary data
inputs. In some processors, the ALU is divided into two units: an arithmetic unit (AU) and
a logic unit (LU). In processors with multiple arithmetic units, one AU may be used for
fixed-point operations while another is used for floating-point operations. In some
personal computers (PCs), floating-point operations are performed by a special
floating-point AU that is located on a separate chip called a numeric coprocessor.
Typically, arithmetic logic units have direct input and output access to the processor
controller, main memory and input/output (I/O) devices. Inputs and outputs flow along an
electronic path called a bus. Each input consists of a machine instruction word that
contains an operation code, one or more operands, and sometimes a format code. The
operation code determines the operations to perform and the operands to use. When
combined with a format code, it also indicates whether the operation is fixed-point or
floating-point. ALU outputs are placed in a storage register. Generally, arithmetic logic
units include storage points for input operands, operands that are being added, the
accumulated result, and shifted results.

Arithmetic logic units vary in terms of number of bits, supply voltage, operating current,
propagation delay, power dissipation, and operating temperature. The number of bits
equals the width of the two input words on which the ALU performance arithmetic and
logical operations. Common configurations include 2-bit, 4-bit, 8-bit, 16-bit, 32-bit and

159
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
64-bit ALUs. Supply voltages range from - 5 V to 5 V and include intermediate voltages
such as - 4.5 V, - 3.3 V, - 3 V, 1.2 V, 1.5 V, 1.8 V, 2.5 V, 3 V, 3.3 V, and 3.6 V. The
operating current is the minimum current needed for active operation. The propagation
delay is the time interval between the application of an input signal and the occurrence of
the corresponding output. Power dissipation, the total power consumption of the device, is
generally expressed in watts (W) or milliwatts (mW). Operating temperature is a
full-required range.

Selecting arithmetic logic units requires an analysis of logic families. Transistor-transistor


logic (TTL) and related technologies such as Fairchild advanced Schottky TTL (FAST) use
transistors as digital switches. By contrast, emitter coupled logic (ECL) uses transistors to
steer current through gates that compute logical functions. Another logic family,
complementary metal-oxide semiconductor (CMOS), uses a combination of P-type and
N-type metal-oxide-semiconductor field effect transistors (MOSFETs) to implement logic
gates and other digital circuits. Bipolar CMOS (BiCMOS) is a silicon-germanium technology
that combines the high speed of bipolar TTL with the low power consumption of CMOS.
Other logic families for arithmetic logic units include cross-bar switch technology (CBT),
gallium arsenide (GaAs), integrated injection logic (I2L) and silicon on sapphire (SOS).
Gunning with transceiver logic (GTL) and gunning with transceiver logic plus (GTLP) are
also available.

Arithmetic logic units are available in a variety of integrated circuit (IC) package types and
with different numbers of pins. Basic IC package types for ALUs include ball grid array
(BGA), quad flat package (QFP), single in-line package (SIP), and dual in-line package
(DIP). Many packaging variants are available. For example, BGA variants include
plastic-ball grid array (PBGA) and tape-ball grid array (TBGA). QFP variants include
low-profile quad flat package (LQFP) and thin quad flat package (TQFP). DIPs are
available in either ceramic (CDIP) or plastic (PDIP). Other IC package types include small
outline package (SOP), thin small outline package (TSOP), and shrink small outline
package (SSOP).

Decimal Arithmetic
The 80x86 CPUs use the binary numbering system for their native internal representation.
The binary numbering system is, by far, the most common numbering system in use in
computer systems today. In days long since past, however, there were computer systems
that were based on the decimal (base 10) numbering system rather than the binary
numbering system. Consequently, their arithmetic system was decimal based rather than
binary. Such computer systems were very popular in systems targeted for
business/commercial systems1. Although systems designers have discovered that binary
arithmetic is almost always better than decimal arithmetic for general calculations, the
myth still persists that decimal arithmetic is better for money calculations than binary
arithmetic. Therefore, many software systems still specify the use of decimal arithmetic in
their calculations (not to mention that there is lots of legacy code out there whose

160
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
algorithms are only stable if they use decimal arithmetic). Therefore, despite the fact that
decimal arithmetic is generally inferior to binary arithmetic, the need for decimal
arithmetic still persists.

Of course, the 80x86 is not a decimal computer; therefore we have to play tricks in order
to represent decimal numbers using the native binary format. The most common
technique, even employed by most so-called decimal computers, is to use the binary
coded decimal, or BCD representation. The BCD representation (see "Nibbles" on page 56)
uses four bits to represent the 10 possible decimal digits. The binary value of those four
bits is equal to the corresponding decimal value in the range 0..9. Of course, with four bits
we can actually represent 16 different values. The BCD format ignores the remaining six
bit combinations.

Table 1: Binary Code Decimal (BCD) Representation


BCD Representation Decimal Equivalent

0000 0

0001 1

0010 2

0011 3

0100 4

0101 5

0110 6

0111 7

1000 8

1001 9

1010 Illegal

1011 Illegal

1100 Illegal

1101 Illegal

1110 Illegal

1111 Illegal

161
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Since each BCD digit requires four bits, we can represent a two-digit BCD value with a single byte. This means
that we can represent the decimal values in the range 0..99 using a single byte (versus 0..255 if we treat the value as
an unsigned binary number). Clearly it takes a bit more memory to represent the same value in BCD as it does to
represent the same value in binary. For example, with a 32-bit value you can represent BCD values in the range
0..99,999,999 (eight significant digits) but you can represent values in the range 0..4,294,967,295 (better than nine
significant digits) using the binary representation.
Not only does the BCD format waste memory on a binary computer (since it uses more
bits to represent a given integer value), but decimal arithmetic is slower. For these
reasons, you should avoid the use of decimal arithmetic unless it is absolutely mandated
for a given application.

Binary coded decimal representation does offer one big advantage over binary
representation: it is fairly trivial to convert between the string representation of a decimal
number and the BCD representation. This feature is particularly beneficial when working
with fractional values since fixed and floating point binary representations cannot exactly
represent many commonly used values between zero and one (e.g., 1/10). Therefore,
BCD operations can be efficient when reading from a BCD device, doing a simple
arithmetic operation (e.g., a single addition) and then writing the BCD value to some
other device.

Literal BCD Constants


HLA does not provide, nor do you need, a special literal BCD constant. Since BCD is just a
special form of hexadecimal notation that does not allow the values $A..$F, you can easily
create BCD constants using HLA's hexadecimal notation. Of course, you must take care
not to include the symbols 'A'..'F' in a BCD constant since they are illegal BCD values. As
an example, consider the following MOV instruction that copies the BCD value '99' into the
AL register:
mov( $99, al );

he important thing to keep in mind is that you must not use HLA literal decimal constants
for BCD values. That is, "mov( 95, al );" does not load the BCD representation for
ninety-five into the AL register. Instead, it loads $5F into AL and that's an illegal BCD
value. Any computations you attempt with illegal BCD values will produce garbage results.
Always remember that, even though it seems counter-intuitive, you use hexadecimal
literal constants to represent literal BCD values.
How Pipelining Works

PIpelining, a standard feature in RISC processors, is much like an assembly line. Because
162
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
the processor works on different steps of the instruction at the same time, more
instructions can be executed in a shorter period of time.
A useful method of demonstrating this is the laundry analogy. Let's say that there are
four loads of dirty laundry that need to be washed, dried, and folded. We could put the
first load in the washer for 30 minutes, dry it for 40 minutes, and then take 20 minutes
to fold the clothes. Then pick up the second load and wash, dry, and fold, and repeat
for the third and fourth loads. Supposing we started at 6 PM and worked as efficiently
as possible, we would still be doing laundry until midnight.

However, a smarter approach to the problem would be to put the second load of dirty
laundry into the washer after the first was already clean and whirling happily in the
dryer. Then, while the first load was being folded, the second load would dry, and a
third load could be added to the pipeline of laundry. Using this method, the laundry
would be finished by 9:30.

163
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
RISC Pipelines

A RISC processor pipeline operates in much the same way, although the stages in the
pipeline are different. While different processors have different numbers of steps, they
are basically variations of these five, used in the MIPS R3000 processor:

1. fetch instructions from memory

2. read registers and decode the instruction

3. execute the instruction or calculate an address

4. access an operand in data memory

5. write the result into a register

If you glance back at the diagram of the laundry pipeline, you'll notice that although the
washer finishes in half an hour, the dryer takes an extra ten minutes, and thus the wet
clothes must wait ten minutes for the dryer to free up. Thus, the length of the pipeline
is dependent on the length of the longest step. Because RISC instructions are simpler
than those used in pre-RISC processors (now called CISC, or Complex Instruction Set
Computer), they are more conducive to pipelining. While CISC instructions varied in
length, RISC instructions are all the same length and can be fetched in a single
operation. Ideally, each of the stages in a RISC processor pipeline should take 1 clock
cycle so that the processor finishes an instruction each clock cycle and averages one
cycle per instruction (CPI).

164
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Pipeline Problems

In practice, however, RISC processors operate at more than one cycle per instruction.
The processor might occasionally stall a result of data dependencies and branch
instructions.

A data dependency occurs when an instruction depends on the results of a previous


instruction. A particular instruction might need data in a register which has not yet been
stored since that is the job of a preceding instruction which has not yet reached that
step in the pipeline.

For example:
add $r3, $r2, $r1
add $r5, $r4, $r3
more instructions that are independent of the first two

In this example, the first instruction tells the processor to add the contents of registers
r1 and r2 and store the result in register r3. The second instructs it to add r3 and r4
and store the sum in r5. We place this set of instructions in a pipeline. When the second
instruction is in the second stage, the processor will be attempting to read r3 and r4
from the registers. Remember, though, that the first instruction is just one step ahead
of the second, so the contents of r1 and r2 are being added, but the result has not yet
been written into register r3. The second instruction therefore cannot read from the
register r3 because it hasn't been written yet and must wait until the data it needs is
stored. Consequently, the pipeline is stalled and a number of empty instructions (known
as bubbles go into the pipeline. Data dependency affects long pipelines more than
shorter ones since it takes a longer period of time for an instruction to reach the final
register-writing stage of a long pipeline.

MIPS' solution to this problem is code reordering. If, as in the example above, the
following instructions have nothing to do with the first two, the code could be
rearranged so that those instructions are executed in between the two dependent
instructions and the pipeline could flow efficiently. The task of code reordering is
generally left to the compiler, which recognizes data dependencies and attempts to
minimize performance stalls.

Branch instructions are those that tell the processor to make a decision about what the
next instruction to be executed should be based on the results of another instruction.
Branch instructions can be troublesome in a pipeline if a branch is conditional on the
results of an instruction which has not yet finished its path through the pipeline.

For example:
Loop : add $r3, $r2, $r1

165
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
sub $r6, $r5, $r4
beq $r3, $r6, Loop
The example above instructs the processor to add r1 and r2 and put the result in r3,
then subtract r4 from r5, storing the difference in r6. In the third instruction, beq
stands for branch if equal. If the contents of r3 and r6 are equal, the processor should
execute the instruction labeled "Loop." Otherwise, it should continue to the next
instruction. In this example, the processor cannot make a decision about which branch
to take because neither the value of r3 or r6 have been written into the registers yet.

The processor could stall, but a more sophisticated method of dealing with branch
instructions is branch prediction. The processor makes a guess about which path to take
- if the guess is wrong, anything written into the registers must be cleared, and the
pipeline must be started again with the correct instruction. Some methods of branch
prediction depend on stereotypical behavior. Branches pointing backward are taken
about 90% of the time since backward-pointing branches are often found at the bottom
of loops. On the other hand, branches pointing forward, are only taken approximately
50% of the time. Thus, it would be logical for processors to always follow the branch
when it points backward, but not when it points forward. Other methods of branch
prediction are less static: processors that use dynamic prediction keep a history for
each branch and uses it to predict future branches. These processors are correct in their
predictions 90% of the time.

Still other processors forgo the entire branch prediction ordeal. The RISC System/6000
fetches and starts decoding instructions from both sides of the branch. When it
determines which branch should be followed, it then sends the correct instructions down
the pipeline to be executed.

Pipelining Developments

In order to make processors even faster, various methods of optimizing pipelines have
been devised.

Super pipelining refers to dividing the pipeline into more steps. The more pipe stages
there are, the faster the pipeline is because each stage is then shorter. Ideally, a
pipeline with five stages should be five times faster than a non-pipelined processor (or
rather, a pipeline with one stage). The instructions are executed at the speed at which
each stage is completed, and each stage takes one fifth of the amount of time that the
non-pipelined instruction takes. Thus, a processor with an 8-step pipeline (the MIPS
R4000) will be even faster than its 5-step counterpart. The MIPS R4000 chops its
pipeline into more pieces by dividing some steps into two. Instruction fetching, for
example, is now done in two stages rather than one. The stages are as shown:

1. Instruction Fetch (First Half)

166
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
2. Instruction Fetch (Second Half)

3. Register Fetch

4. Instruction Execute

5. Data Cache Access (First Half)

6. Data Cache Access (Second Half)

7. Tag Check

8. Write Back

Superscalar pipelining involves multiple pipelines in parallel. Internal components of the


processor are replicated so it can launch multiple instructions in some or all of its
pipeline stages. The RISC System/6000 has a forked pipeline with different paths for
floating-point and integer instructions. If there is a mixture of both types in a program,
the processor can keep both forks running simultaneously. Both types of instructions
share two initial stages (Instruction Fetch and Instruction Dispatch) before they fork.
Often, however, superscalar pipelining refers to multiple copies of all pipeline stages (In
terms of laundry, this would mean four washers, four dryers, and four people who fold
clothes). Many of today's machines attempt to find two to six instructions that it can
execute in every pipeline stage. If some of the instructions are dependent, however,
only the first instruction or instructions are issued.

Dynamic pipelines have the capability to schedule around stalls. A dynamic pipeline is
divided into three units: the instruction fetch and decode unit, five to ten execute or
functional units, and a commit unit. Each execute unit has reservation stations, which
act as buffers and hold the operands and operations.

167
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
While the functional units have the freedom to execute out of order, the instruction
fetch/decode and commit units must operate in-order to maintain simple pipeline
behavior. When the instruction is executed and the result is calculated, the commit unit
decides when it is safe to store the result. If a stall occurs, the processor can schedule
other instructions to be executed until the stall is resolved. This, coupled with the
efficiency of multiple units executing instructions simultaneously, makes a dynamic
pipeline an attractive alternative.

Lesson VIII

Processor and System Structures


Types of Computer System

The line between computer systems can be extremely vague. A powerful entry level
system can double as an low end business system, or a gamming system can be identical
to a low end workstation. In fact, some equipment manufactures may refer to their
business systems as workstations. Some components on a computer in any class can be
installed on all systems. For example, a manufacture may use the same RAM on the entry
level system and the gaming system. You will want to pay particular attention to system's
CPU and Video. Sometimes it the amount of hard disk space or the addition of a better

168
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
video adapter that moves a system from one class to another. Please keep in mind that
the tables below show the minimum configurations.

Entry Level Systems


Entry level systems are the most common systems for home and general use. These
systems are often powerful enough to run standard office software, watch DVD movies,
surf the internet, send and receive email, as well as run home and small office
accounting software. As of the date of this article, the minimum configuration for an
entry level computer system is as follows:
Intel Pentium 4 or Celeron running at 2
Computer Processor Gigahertz (GHz) or better, or AMD Athlon,
(CPU) Duron or Semphon running at 1.5Ghz or
better
System memory (RAM) 256 megabytes (Mb) of DDR RAM
Hard Disk Storage 40 Gigabytes (Gb)
Optical Storage CDRW/DVD
Monitor 17 inch CRT display
USB Ports 2.0 standard at least 4 ports
At least 32MB - Often uses system
Video
memory
Audio Should be included along with speakers
Network Adapter Should be included (for Hi-speed Internet)
Additionally the system may include some additional ports such as keyboard and mouse
ports, a printer port, serial port(s), a game port and, optionally, a dial-up modem.

Business Class Systems


Intel Pentium 4 or Celeron running at 2
Computer Processor Gigahertz (GHz) or better, or AMD Athlon,
(CPU) Duron or Semphon running at 1.5Ghz or
better
256 megabytes (Mb) of Error Correcting
System memory (RAM)
Code (ECC) DDR RAM
EIDE 120 Gigabytes (Gb) – SCSI or SATA
Hard Disk Storage
with RAID 1 preferred
Optical Storage CDRW/DVD
Monitor 17 inch CRT display
USB Ports 2.0 standard at least 4 ports
At least 32MB RAM - Often uses system
Video
memory
Audio Should be included along with speakers
Network Adapter Should be included (for Hi-speed Internet)
169
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Gaming Systems
Intel Pentium 4 running at 3 Gigahertz
Computer Processor
(GHz) or better, or AMD Athlon, or AMD64
(CPU)
running at 2.2Ghz or better
System memory (RAM) 1 Gigabyte (Gb) of DDR RAM
Hard Disk Storage Gigabytes 80 (Gb)
Optical Storage CDRW/DVD
Monitor 17 inch LCD display
USB Ports 2.0 standard at least 6 ports
At least 128MB DDR RAM video adapter
Video with Graphics Processing Unit (GPU) and
heat sink
Audio 5.1 Dolby
Network Adapter Should be included (for Hi-speed Internet)

Workstations and Servers


Workstations and servers are usually built and configured to specifications.

Intel Pentium 4, Intel Xeon, AMD64,


Computer Processor(s)
AMD64FX, AMD Opteron. System may
(CPUs)
support multiple processors.
512 megabytes (Mb) to 4 Gigabytes of
System memory (RAM)
DDR RAM
Hard Disk Storage 80 Gigabytes (Gb) to 2 terabytes (Tb)
Optical Storage Task Specific
Monitor 17 inch CRT display
USB Ports 2.0 standard at least 6 ports
Video Task Specific
Audio Optional - Task Specific
Network Adapter High end Network Adapter

In mathematics, an operand is one of the inputs (arguments) of an operator. For


instance, in
3+6=9
'+' is the operator and '3' and '6' are the operands.

170
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
The number of operands of an operator is called its arity. Based on arity, operators are
classified as unary, binary, ternary etc.

In computer programming languages, the definitions of operator and operand are


almost the same as in mathematics.

Additionally, in assembly language, an operand is a value (an argument) on which the


instruction, named by mnemonic, operates. The operand may be a processor register, a
memory address, a literal constant, or a label. A simple example (in the PC
architecture) is
MOV DS, AX
where the value in register operand 'AX' is to be moved into register 'DS'. Depending on
the instruction, there may be zero, one, two, or more operands.
Retrieved from "http://en.wikipedia.org/wiki/Operand"

The ISA Level:


Data Types, Instruction
Formats and Addressing

Introduction
• At the ISA level, a variety of different data types are used to represent data.
• A key issue is whether or not there is hardware support for a particular data type.
• Hardware support means that one or more instructions expect data in a particular
format, and the user is not free to pick a different format.
• Another issue is precision – what if we wanted to total the transactions on Bill Gates’
deposit
account?
• Using 32-bit arithmetic would not work here because the numbers involved are larger
than 232 (about 4 billion).
• We could to use two 32-bit integers to represent each number, giving 64 bits in all.
• However, if the machine does not support this kind of double precision number, all
arithmetic on them will have to be done in software, thus, without a required hardware
representation.
• Today, we will look at data types are supported by the hardware, and thus for which
specific
formats are required.

Numeric Data Types

• Data types can be divided into two categories: numeric and nonnumeric.
• Chief among the numeric data types are the integers, which come in many lengths,
typically
8, 16, 32, and 64 bits.

171
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
• Most modern computers store integers in two’s complement binary notation.
• Some computers support unsigned integers as well as signed integers.
• For an unsigned integer, there is no sign bit and all the bits contain data – thus the
range of a 32-bit word is 0 to 232 − 1, inclusive.
• In contrast, a two’s complement signed 32-bit integer can only handle numbers up to
231
− 1, but it can also handle negative numbers.
• For numbers that cannot be expressed as an integer, floating-point numbers are used.
• They have lengths of 32, 64, or sometimes 128 bits.
• Most computers have instructions for doing floating-point arithmetic.
• Many computers have separate registers for holding integer operands and for holding
floating-point operands.

Nonnumeric Data Types

• Modern computers are often used for nonnumerical applications, such as word
processing or
database management.
• Thus, characters are clearly important here although not every computer provides
hardware
support for them.
• The most common character codes are ASCII and UNICODE.
• These support 7-bit characters and 16-bit characters, respectively.
• It is not uncommon for the ISA level to have special instructions that are intended for
handling character strings.
• The instructions can perform copy, search, edit and other functions on the strings.
• Boolean values are also important.
• Two values: TRUE or FALSE.
• In theory, a single bit can represent a Boolean, with 0 as false and 1 as true (or vice
versa).
• In practice, a byte or word is used per Boolean value because individual bits in a byte do
not
have their own addresses and thus are hard to access.
• A common system uses the convention that 0 means false and everything else means
true.
• Our last data type is the pointer, which is just a machine address.
• We have already seen pointers.
• When we discussed stacks we came across pointers SP and LV.
• Accessing a variable at a fixed distance from a pointer, which is the way ILOAD works,
is
extremely common on all machines.

Instruction Formats

172
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
• An instruction consists of an opcode, usually along with some additional information
such as
where operands come from, and where results go to.
• The general subject of specifying where the operands are (i.e., their addresses) is called
addressing.
• Instructions always have an opcode to tell what the instruction does.
• There can be zero, one, two, or three addresses present.

Instruction Formats

• On some machines, all instructions have the same length; on others there may be many
different lengths.
• Instructions may be shorter than, the same length as, or longer than the word length.
• Having all the instructions be the same length is simpler and makes decoding easier but
often wastes space, since all instructions then have to be as long as the longest one.

173
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Addressing
• Instructions generally have one, two or three operands.
• The operands are addressed using one of the following modes:
– Immediate
– Direct
– Register
– Indexed
– Other mode
174
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
• Some machines have a large number of
complex addressing modes.
• We will consider a few addressing modes here.
• The simplest way for an instruction to specify an operand is for the address part of the
instruction actually to contain the operand itself rather than an address or other
information
describing where the operand is.
• Such an operand is called an immediate operand because it is automatically fetched
from memory at the same time the instruction itself is fetched.
• Example:
MOV R1,4
• Advantage – no extra memory reference to fetch the operand.
• Disadvantage – only a constant can be supplied this way.

Direct Addressing

• A method for specifying an operand in memory is just to give its full address.
• This mode is called direct addressing.
• Like immediate addressing, direct addressing is restricted in its use: the instruction will
always access exactly the same memory location.
• So while the value can change, the location cannot.
• Thus direct addressing can only be used to access global variables whose address is
known at compile time.

Register Addressing

• Register addressing is conceptually the same as direct addressing but specifies a


register
instead of a memory location.
• Because registers are so important (due to fast access and short addresses) this
addressing
mode is the most common one on most computers.
• Many compilers go to great lengths to determine which variables will be accessed most
often (for example, a loop index) and put these variables in registers.
• This addressing mode is known simply as register mode.
• In this mode, the operand being specified comes from memory or goes to memory, but
its
address is not hardwored into the instruction, as in direct addressing.
• Instead, the address is contained in a register.
• When an address is used in this manner, it is called a pointer.
• A big advantage of register indirect addressing is that it can reference memory without
paying the price of having a full memory address in the instruction.
• Consider an program which steps through the elements of a 1024-element
one-dimensional

175
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
integer array to compute the sum of the elements in register R1.
• We will indirectly register through R2 to access the elements of the array
• Here is the assembly program:

Example of Register Indirect Addressing

MOV R1,#0 ; accumulate the sum in R1, initially 0


MOV R2,#A ; R2 = address of the array A
MOV R3,#A+4096 ; R3 = address of the first word beyond A
LOOP: ADD R1,(R2) ; register indirect through R2 to get operand
ADD R2,#4 ; increment R2 by one word (4 bytes)
CMP R2,R3 ; are we done yet?

Indexed Addressing

• It is frequently useful to be able to reference memory words at a known offset from a


register.
• Addressing memory by giving a register (explicit or implicit) plus a constant offset is
called indexed addressing.
• Example: consider the following calculation:
• We have two one-dimensional arrays of 1024 words each, A and B, and we wish to
compute Ai AND Bi for all the pairs and then OR these 1024 Boolean products together to
see if there is at least one nonzero pair in the set.
• Here is the assembly program.

Based-Inde
xed Addressing

• Some machines have an addressing mode in which the memory address is computed
by adding up two registers plus an (optional) offset.
• Sometimes this mode is called based-indexed addressing.
176
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
• One of the registers is the base and the other is the index.
• Such a mode would have been useful in our example here.
• Outside the loop we could have put the address of A in R5 and the address of B in R6.
• Then we could have replaced the instruction at LOOP and its successor with LOOP: MOV
R4,(R2+R5) AND R4,(R2+R6)

An instruction set is (a list of) all instructions, and all their variations, that a processor
can execute.

Instructions include:

· arithmetic such as add and subtract

· logic instructions such as and, or, and not

· data instructions such as move, input, output, load, and store

An instruction set, or instruction set architecture (ISA), is the part of the


computer architecture related to programming, including the native data types,
instructions, registers, addressing modes, memory architecture, interrupt and exception
handling, and external I/O. An ISA includes a specification of the set of opcodes
(machine language), the native commands implemented by a particular CPU design.

Instruction set architecture is distinguished from the microarchitecture, which is the set
of processor design techniques used to implement the instruction set. Computers with
different microarchitectures can share a common instruction set. For example, the Intel
Pentium and the AMD Athlon implement nearly identical versions of the x86 instruction
set, but have radically different internal designs.

This concept can be extended to unique ISAs like TIMI (Technology-Independent


Machine Interface) present in the IBM System/38 and IBM AS/400. TIMI is an ISA that
is implemented as low-level software and functionally resembles what is now referred to
as a virtual machine. It was designed to increase the longevity of the platform and
applications written for it, allowing the entire platform to be moved to very different
hardware without having to modify any software except that which comprises TIMI
itself. This allowed IBM to move the AS/400 platform from an older CISC architecture to
the newer POWER architecture without having to rewrite any parts of the OS or
software associated with it.

Instruction set design


When designing microarchitectures, engineers use Register Transfer Language (RTL) to
define the operation of each instruction of an ISA. Historically there have been 4 ways
to store that description inside the CPU:

177
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
· all early computer designers, and some of the simpler later RISC computer
designers, hard-wired the instruction set.

· Many CPU designers compiled the instruction set to a microcode ROM inside the
CPU. (such as the Western Digital MCP-1600)

· Some CPU designers compiled the instruction set to a writable RAM or FLASH inside
the CPU (such as the Rekursiv processor and the Imsys Cjip)[1], or a FPGA
(reconfigurable computing).

An ISA can also be emulated in software by an interpreter. Due to the additional


translation needed for the emulation, this is usually slower than directly running
programs on the hardware implementing that ISA. Today, it is common practice for
vendors of new ISAs or microarchitectures to make software emulators available to
software developers before the hardware implementation is ready.

Some instruction set designers reserve one or more opcodes for some kind of software
interrupt. For example, MOS Technology 6502 uses 0x00 (all zeroes), Zilog Z80 uses
0xFF (all ones),[1] and Motorola 68000 has instructions 0xA000 through 0xAFFF.

Fast virtual machines are much easier to implement if an instruction set meets the
Popek and Goldberg virtualization requirements.

On systems with multiple processors, non-blocking synchronization algorithms are much


easier to implement if the instruction set includes support for something like
"fetch-and-increment" or "load linked/store conditional (LL/SC)" or "atomic compare
and swap".

Code density
In early computers, program memory was expensive and limited, and minimizing the
size of a program in memory was important. Thus the code density -- the combined size
of the instructions needed for a particular task -- was an important characteristic of an
instruction set. Instruction sets with high code density employ powerful instructions that
can implicity perform several functions at once. Typical complex instruction-set
computers (CISC) have instructions that combine one or two basic operations (such as
"add", "multiply", or "call subroutine") with implicit instructions for accessing memory,
incrementing registers upon use, or dereferencing locations stored in memory or
registers. Some software-implemented instruction sets have even more complex and
powerful instructions.

Reduced instruction-set computers (RISC), first widely implemented during a period of


rapidly-growing memory subsystems, traded off simpler and faster instruction-set
implementations for lower code density (that is, more program memory space to
implement a given task). RISC instructions typically implemented only a single implicit

178
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
operation, such as an "add" of two registers or the "load" of a memory location into a
register.

Minimal instruction set computers (MISC) are a form of stack machine, where there are
few separate instructions (16-64), so that multiple instructions can be fit into a single
machine word. These type of cores often take little silicon to implement, so they can be
easily realized in an FPGA or in a multi-core form. Code density is similar to RISC; the
increased instruction density is offset by requiring more of the primitive instructions to
do a task.

Instruction sets may be categorized by the number of operands in their most complex
instructions. (In the examples that follow, a, b, and c refer to memory addresses, and
reg1 and so on refer to machine registers.)

· 0-operand ("zero address machines") -- these are also called stack machines, and
all operations take place using the top one or two positions on the stack. Adding
two numbers here can be done with four instructions: push a, push b, add, pop
c;

· 1-operand -- this model was common in early computers, and each instruction
performs its operation using a single operand and places its result in a single
accumulator register: load a, add b, store c;

· 2-operand -- most RISC machines fall into this category, though many CISC
machines also fall here as well. For a RISC machine (requiring explicit memory
loads), the instructions would be: load a,reg1, load b,reg2, add reg1,reg2, store
reg2;

· 3-operand -- some CISC machines, and a few RISC machines fall into this category.
The above example here might be performed in a single instruction in a machine
with memory operands: add a,b,c, or more typically (most machines permit a
maximum of two memory operations even in three-operand instructions): move
a,reg1, add reg1,b,c. In three-operand RISC machines, all three operands are
typically registers, so explicit load/store instructions are needed. An instruction set
with 32 registers requires 15 bits to encode three register operands, so this scheme
is typically limited to instructions sets with 32-bit instructions or longer;

· more operands -- some CISC machines permit a variety of addressing modes that
allow more than 3 register-based operands for memory accesses.

There has been research into executable compression as a mechanism for improving
code density. The mathematics of Kolmogorov complexity describes the challenges and
limits of this.

Machine language

179
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Machine language is built up from discrete statements or instructions. Depending on the
processing architecture, a given instruction may specify:

· Particular registers for arithmetic, addressing, or control functions

· Particular memory locations or offsets

· Particular addressing modes used to interpret the operands

More complex operations are built up by combining these simple instructions, which (in
a von Neumann machine) are executed sequentially, or as otherwise directed by control
flow instructions.

Some operations available in most instruction sets include:

· moving

· set a register (a temporary "scratchpad" location in the CPU itself) to a fixed


constant value

· move data from a memory location to a register, or vice versa. This is done
to obtain the data to perform a computation on it later, or to store the result
of a computation.

· read and write data from hardware devices

· computing

· add, subtract, multiply, or divide the values of two registers, placing the
result in a register

· perform bitwise operations, taking the conjunction/disjunction (and/or) of


corresponding bits in a pair of registers, or the negation (not) of each bit in a
register

· compare two values in registers (for example, to see if one is less, or if they
are equal)

· affecting program flow

· jump to another location in the program and execute instructions there

· jump to another location if a certain condition holds

· jump to another location, but save the location of the next instruction as a
point to return to (a call)

180
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Some computers include "complex" instructions in their instruction set. A single
"complex" instruction does something that may take many instructions on other
computers. Such instructions are typified by instructions that take multiple steps,
control multiple functional units, or otherwise appear on a larger scale than the bulk of
simple instructions implemented by the given processor. Some examples of "complex"
instructions include:

· saving many registers on the stack at once

· moving large blocks of memory

· complex and/or floating-point arithmetic (sine, cosine, square root, etc.)

· performing an atomic test-and-set instruction

· instructions that combine ALU with an operand from memory rather than a register

A complex instruction type that has become particularly popular recently is the SIMD or
Single-Instruction Stream Multiple-Data Stream operation or vector instruction, an
operation that performs the same arithmetic operation on multiple pieces of data at the
same time. SIMD have the ability of manipulating large vectors and matrices in minimal
time. SIMD instructions allow easy parallelization of algorithms commonly involved in
sound, image, and video processing. Various SIMD implementations have been brought
to market under trade names such as MMX, 3DNow! and AltiVec.

The design of instruction sets is a complex issue. There were two stages in history
for the microprocessor. One using CISC or complex instruction set computer where
many instructions were implemented. In the 1970s places like IBM did research and
found that many instructions were used that could be eliminated. The result was the
RISC, reduced instruction set computer, architecture which uses a smaller set of
instructions. The result was a simpler instruction set may offer the potential for higher
speeds, reduced processor size, and reduced power consumption; a more complex one
may optimize common operations, improve memory/cache efficiency, or simplify
programming.

List of ISAs
This list is far from comprehensive as old architectures are abandoned and new ones
invented on a continual basis. There are many commercially available microprocessors
and microcontrollers implementing ISAs in all shapes and sizes. Customised ISAs are
also quite common in some applications, e.g. ARC International, application-specific
integrated circuit, FPGA, and reconfigurable computing. Also see history of computing
hardware.

181
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
ISAs commonly implemented in hardware
· Alpha AXP (DEC Alpha)

· AMD64 (also known as EM64T)

· ARM (Acorn RISC Machine) (Advanced RISC Machine now ARM Ltd)

· IA-64 (Itanium)

· MIPS

· Motorola 68k

· PA-RISC (HP Precision Architecture)

· IBM 700/7000 series

· IBM POWER

· PowerPC

· SPARC

· SuperH

· System/360

· Tricore (Infineon)

· Transputer (STMicroelectronics)

· UNIVAC 1100/2200 series

· VAX (Digital Equipment Corporation)

· x86 (also known as IA-32) (Pentium, Athlon)

· EISC (AE32K)

ISAs commonly implemented in software with hardware incarnations


· p-Code (UCSD p-System Version III on Western Digital Pascal MicroEngine)

· Java virtual machine (ARM Jazelle, PicoJava, JOP)

· FORTH

182
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
ISAs never implemented in hardware
· ALGOL object code

· SECD machine, a virtual machine used for some functional programming


languages.

· MMIX, a teaching machine used in Donald Knuth's The Art of Computer


Programming

· Z-machine, a virtual machine used for Infocom's text adventure games

Categories of ISA
· application-specific integrated circuit (ASIC) fully custom ISA

· CISC

· digital signal processor

· graphics processing unit

· MISC

· reconfigurable computing

· RISC

· vector processor

· VLIW

· orthogonal instruction set

Examples of commercially available ISA


· central processing unit

· microcontroller

· microprocessor

Processor Register

In computer architecture, a processor register is a small amount of very fast


computer memory used to speed the execution of computer programs by providing
quick access to commonly used values—typically, the values being calculated at a given
point in time. Most, but not all, modern computer architectures operate on the principle
183
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
of moving data from main memory into registers, operating on them, then moving the
result back into main memory—a so-called load-store architecture.

Processor registers are the top of the memory hierarchy, and provide the fastest way
for the system to access data. The term is often used to refer only to the group of
registers that can be directly indexed for input or output of an instruction, as defined by
the instruction set. More properly, these are called the "architectural registers". For
instance, the x86 instruction set defines a set of eight 32-bit registers, but a CPU that
implements the x86 instruction set will contain many more registers than just these
eight.

Putting frequently used variables into registers is critical to the program's performance.
This action, namely register allocation is usually done by a compiler in the code
generation phase.

Categories of registers
Registers are normally measured by the number of bits they can hold, for example, an
"8-bit register" or a "32-bit register". Registers are now usually implemented as a
register file, but they have also been implemented using individual flip-flops, high speed
core memory, thin film memory, and other ways in various machines.

There are several classes of registers according to the content:

· Data registers are used to store integer numbers (see also Floating Point
Registers, below). In some older and simple current CPUs, a special data register is
the accumulator, used implicitly for many operations.

· Address registers hold memory addresses and are used to access memory. In
some CPUs, a special address register is an index register, although often these
hold numbers used to modify addresses rather than holding addresses.

· Conditional registers hold truth values often used to determine whether some
instruction should or should not be executed.

· General purpose registers (GPRs) can store both data and addresses, i.e., they
are combined Data/Address registers.

· Floating point registers (FPRs) are used to store floating point numbers in many
architectures.

· Constant registers hold read-only values (e.g., zero, one, pi, ...).

· Vector registers hold data for vector processing done by SIMD instructions
(Single Instruction, Multiple Data).

184
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
· Special purpose registers hold program state; they usually include the program
counter (aka instruction pointer), stack pointer, and status register (aka processor
status word).

· Instruction registers store the instruction currently being executed.

· Index registers are used for modifying operand addresses during the run of
a program.

· In some architectures, model-specific registers (also called machine-specific


registers) store data and settings related to the processor itself. Because their
meanings are attached to the design of a specific processor, they cannot be
expected to remain standard between processor generations.

· Registers related to fetching information from random access memory, a collection


of storage registers located on separate chips from the CPU (unlike most of the
above, these are generally not architectural registers):

· Memory buffer register

· Memory data register

· Memory address register

· Memory Type Range Registers

Hardware registers are similar, but occur outside CPUs.

Some examples
The table below shows the number of registers of several mainstream processors:

I n t e g e r Double FP
Processors
registers registers

Pentium 4 8 8

Athlon MP 8 8

Opteron 240 16 16

185
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Itanium 2 128 128

UltraSPARC IIIi 32 32

Power 3 32 32

Addressing modes, a concept from computer science, are an aspect of the instruction
set architecture in most central processing unit (CPU) designs. The various addressing
modes that are defined in a given instruction set architecture define how machine
language instructions in that architecture identify the operand (or operands) of each
instruction. An addressing mode specifies how to calculate the effective memory
address of an operand by using information held in registers and/or constants contained
within a machine instruction or elsewhere.

In computer programming, addressing modes are primarily of interest to compiler


writers and to those who write code directly in assembly language.

Caveats
Note that there is no generally accepted way of naming the various addressing modes.
In particular, different authors and/or computer manufacturers may give different
names to the same addressing mode, or the same names to different addressing
modes. Furthermore, an addressing mode which, in one given architecture, is treated
as a single addressing mode may represent functionality that, in another architecture, is
covered by two or more addressing modes. For example, some complex instruction set
computer (CISC) computer architectures, such as the Digital Equipment Corporation
(DEC) VAX, treat registers and literal/immediate constants as just another addressing
mode. Others, such as the IBM System/390 and most reduced instruction set computer
(RISC) designs, encode this information within the instruction code. Thus, the latter
machines have three distinct instruction codes for copying one register to another,
copying a literal constant into a register, and copying the contents of a memory location
into a register, while the VAX has only a single "MOV" instruction.

The addressing modes listed below are divided into code addressing and data
addressing. Most computer architectures maintain this distinction, but there are, or
have been, some architectures which allow (almost) all addressing modes to be used in
any context.

The instructions shown below are purely representative in order to illustrate the
addressing modes, and do not necessarily apply to any particular computer.
186
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Useful side effect
Some computers have a Load effective address instruction. This performs a
calculation of the effective operand address, but instead of acting on that memory
location, it loads the address that would have been accessed into a register. This can be
useful when passing the address of an array element to a subroutine. It may also be a
slightly sneaky way of doing more calculation than normal in one instruction; for
example, use with the addressing mode 'base+index+offset' allows one to add two
registers and a constant together in one instruction.

How many address modes?


Different computer architectures vary greatly as to the number of addressing modes
they provide. At the cost of a few extra instructions, and perhaps an extra register, it is
normally possible to use the simpler addressing modes instead of the more complicated
modes. It has proven much easier to design pipelined CPUs if the only addressing
modes available are simple ones.

Most RISC machines have only about five simple addressing modes, while CISC
machines such as the DEC VAX supermini have over a dozen addressing modes, some
of which are quite complicated. The IBM System/360 mainframe had only three
addressing modes; a few more have been added for the System/390.

When there are only a few addressing modes, the particular addressing mode required
is usually encoded within the instruction code (e.g. IBM System/390, most RISC). But
when there are lots of addressing modes, a specific field is often set aside in the
instruction to specify the addressing mode. The DEC VAX allowed multiple memory
operands for almost all instructions and so reserved the first few bits of each operand
specifier to indicate the addressing mode for that particular operand.

Even on a computer with many addressing modes, measurements of actual programs


indicate that the simple addressing modes listed below account for some 90% or more
of all addressing modes used. Since most such measurements are based on code
generated from high-level languages by compilers, this may reflect to some extent the
limitations of the compilers being used.

Simple addressing modes for code

Absolute
+----+------------------------------+
|jump| address |
+----+------------------------------+
Effective address = address as given in instruction

187
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Program relative
+------+-----+-----+----------------+
|jumpEQ| reg1| reg2| offset | jump relative if reg1=reg2
+------+-----+-----+----------------+
Effective address = offset plus address of next instruction.

The offset is usually signed, in the range -32768 to +32767.

This is particularly useful in connection with conditional jumps, because you usually only
want to jump to some nearby instruction (in a high-level language most if or while
statements are reasonably short). Measurements of actual programs suggest that an 8
or 10 bit offset is large enough for some 90% of conditional jumps.

Another advantage of program-relative addressing is that the code may be


position-independent, i.e. it can be loaded anywhere in memory without the need to
adjust any addresses.

Register indirect
+-------+-----+
|jumpVia| reg |
+-------+-----+
Effective address = contents of specified register.

The effect is to transfer control to the instruction whose address is in the specified
register. Such an instruction is often used for returning from a subroutine call, since the
actual call would usually have placed the return address in a register.

Register
+------+-----+-----+-----+
| mul | reg1| reg2| reg3| reg1 := reg2 * reg3;
+------+-----+-----+-----+
This 'addressing mode' does not have an effective address, and is not considered to be
an addressing mode on some computers.

In this example, all the operands are in registers, and the result is placed in a register.

Base plus offset, and variations


+------+-----+-----+----------------+
| load | reg | base| offset |
+------+-----+-----+----------------+
Effective address = offset plus contents of specified base register.

188
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
The offset is usually a signed 16-bit value (though the 80386 is famous for expanding it
to 32-bit, though x64 didn't).

If the offset is zero, this becomes an example of register indirect addressing; the
effective address is just that in the base register.

On many RISC machines, register 0 is fixed with value 0. If register 0 is used as the
base register, this becomes an example of absolute addressing. However, only a small
portion of memory can be accessed (the first 32 Kbytes and possibly the last 32 Kbytes)

The 16-bit offset may seem very small in relation to the size of current computer
memories (which is why the 80386 expanded it to 32-bit. x64 didn't expand it,
however.) (it could be worse: IBM System/360 mainframes only have a positive 12-bit
offset 0 to 4095). However, the principle of locality of reference applies: over a short
time span most of the data items you wish to access are fairly close to each other.

Example 1: Within a subroutine you will mainly be interested in the parameters and the
local variables, which will rarely exceed 64 Kbytes, for which one base register suffices.
If this routine is a class method in an object-oriented language, you will need a second
base register pointing at the attributes for the current object (this or self in some high
level languages).

Example 2: If the base register contains the address of a record or structure, the offset
can be used to select a field from that record (most records/structures are less than 32
Kbytes in size).

Simple addressing modes for data

Immediate/literal
+------+-----+-----+----------------+
| add | reg1| reg2| constant | reg1 := reg2 + constant;
+------+-----+-----+----------------+
This 'addressing mode' does not have an effective address, and is not considered to be
an addressing mode on some computers.

The constant might be signed or unsigned.

Instead of using an operand from memory, the value of the operand is held within the
instruction itself. On the DEC VAX machine, the literal operand sizes could be 6, 8, 16,
or 32 bits long.

189
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Other addressing modes for code and/or data

Absolute/Direct
+------+-----+--------------------------------------+
| load | reg | address |
+------+-----+--------------------------------------+
Effective address = address as given in instruction.

This requires space in an instruction for quite a large address. It is often available on
CISC machines which have variable length instructions.

Some RISC machines have a special Load Upper Literal instruction which places a 16-bit
constant in the top half of a register. An OR literal instruction can be used to insert a
16-bit constant in the lower half of that register, so that a full 32-bit address can then
be used via the register-indirect addressing mode, which itself is provided as
'base-plus-offset' with an offset of 0.

Indexed absolute
+------+-----+-----+--------------------------------+
| load | reg |index| 32-bit address |
+------+-----+-----+--------------------------------+
Effective address = address plus contents of specified index register.

This also requires space in an instruction for quite a large address. The address could be
the start of an array or vector, and the index could select the particular array element
required. The index register may need to have been scaled to allow for the size of each
array element.

Note that this is more or less the same as base-plus-offset addressing mode, except
that the offset in this case is large enough to address any memory location.

Base plus index


+------+-----+-----+-----+
| load | reg | base|index|
+------+-----+-----+-----+
Effective address = contents of specified base register plus contents of specified index
register.

The base register could contain the start address of an array or vector, and the index
could select the particular array element required. The index register may need to have
been scaled to allow for the size of each array element. This could be used for accessing
elements of an array passed as a parameter.

190
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Base plus index plus offset
+------+-----+-----+-----+----------------+
| load | reg | base|index| 16-bit offset |
+------+-----+-----+-----+----------------+
Effective address = offset plus contents of specified base register plus contents of
specified index register.

The base register could contain the start address of an array or vector of records, the
index could select the particular record required, and the offset could select a field
within that record. The index register may need to have been scaled to allow for the
size of each record.

Scaled
+------+-----+-----+-----+
| load | reg | base|index|
+------+-----+-----+-----+
Effective address = contents of specified base register plus scaled contents of specified
index register.

The base register could contain the start address of an array or vector, and the index
could contain the number of the particular array element required.

This addressing mode dynamically scales the value in the index register to allow for the
size of each array element, e.g. if the array elements are double precision floating-point
numbers occupying 8 bytes each then the value in the index register is multiplied by 8
before being used in the effective address calculation. The scale factor is normally
restricted to being a power of two so that shifting rather than multiplication can be used
(shifting is usually faster than multiplication).

Register indirect
+------+-----+-----+
| load | reg | base|
+------+-----+-----+
Effective address = contents of base register.

A few computers have this as a distinct addressing mode. Many computers just use
base plus offset with an offset value of 0.

191
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Register autoincrement indirect
+------+-----+-----+
| load | reg | base|
+------+-----+-----+
Effective address = contents of base register.

After determining the effective address, the value in the base register is incremented by
the size of the data item that is to be accessed.

Within a loop, this addressing mode can be used to step through all the elements of an
array or vector. A stack can be implemented by using this in conjunction with the next
addressing mode (autodecrement).

In high-level languages it is often thought to be a good idea that functions which return a
result should not have side effects (lack of side effects makes program understanding and
validation much easier). This instruction has a side effect in that the base register is
altered. If the subsequent memory access causes a page fault then restarting the
instruction becomes much more problematical.

This side-effects business proved to be something of a nightmare for VAX implementors,


since instructions could have up to 6 operands, each of which could cause side-effects on
registers and each of which could each cause 2 page faults (if operands happened to
straddle a page boundary). Of course the instruction itself could be over 50 bytes long
and might straddle a page boundary as well!

Autodecrement register indirect


+------+-----+-----+
| load | reg | base|
+------+-----+-----+
Before determining the effective address, the value in the base register is decremented
by the size of the data item which is to be accessed.

Effective address = new contents of base register.

Within a loop, this addressing mode can be used to step backwards through all the
elements of an array or vector. A stack can be implemented by using this in conjunction
with the previous addressing mode (autoincrement).

See also the discussion on side-effects under the autoincrement addressing mode.

Memory indirect
Any of the addressing modes mentioned in this article could have an extra bit to
indicate indirect addressing, i.e. the address calculated by using some addressing mode
192
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
is the address of a location (often 32 bits or a complete word) which contains the actual
effective address.

Indirect addressing may be used for code and/or data. It can make implementation of
pointers or references very much easier, and can also make it easier to call subroutines
which are not otherwise addessable. There is a performance penalty due to the extra
memory access involved.

Some early minicomputers (e.g. DEC PDP8, Data General Nova) had only a few
registers and only a limited addressing range (8 bits). Hence the use of memory indirect
addressing was almost the only way of referring to any significant amount of memory.

PC-based addressing
The x86-64 architecture supports RIP-based addressing, which uses the 64-bit program
counter (instruction pointer) RIP as a base register. This allows for position-independent
code.

Obsolete addressing modes


The addressing modes listed here were used in the 1950-1980 time frame, but most are
no longer available on current computers. This list is by no means complete; there have
been lots of other interesting/peculiar addressing modes used from time to time, e.g.
absolute plus logical OR of 2 or 3 index registers.

Multi-level memory indirect


If the word size is larger than the address size, then the word referenced for
memory-indirect addressing could itself have an indirect flag set to indicate another
memory indirect cycle. Care is needed to ensure that a chain of indirect addresses does
not refer to itself; if it did, you could get an infinite loop while trying to resolve an
address.

The DEC PDP-10 computer with 18-bit addresses and 36-bit words allowed multi-level
indirect addressing with the possibility of using an index register at each stage as well.

Memory-mapped registers
On some computers the registers were regarded as occupying the first 8 or 16 words of
memory (e.g. ICL 1900, DEC PDP-10). This meant that there was no need for a
separate 'Add register to register' instruction - you could just use the 'Add memory to
register' instruction.

193
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
In the case of early models of the PDP-10, which did not have any cache memory, you
could actually load a tight inner loop into the first few words of memory (the fast
registers in fact), and have it run much faster than if it was in magnetic core memory.

Later models of the DEC PDP-11 series mapped the registers onto addresses in the
input/output area, but this was primarily intended to allow remote diagnostics.
Confusingly, the 16-bit registers were mapped onto consecutive 8-bit byte addresses.

Memory indirect, auto inc/dec


On some early minicomputers (e.g. DEC PDP8, Data General Nova), there were typically
16 special memory locations. When accessed via memory indirect addressing, 8 would
automatically increment after use and 8 would automatically decrement after use. This
made it very easy to step through memory in loops without using any registers.

Zero page
In the MOS Technology 6502 the first 256 bytes of memory could be accessed very
rapidly. The reason was that the 6502 was lacking in registers which were not special
function registers. To use zero page access an 8-bit address would be used, saving one
clock cycle as compared with using a 16-bit address. An Operating System would use
much of zero page, so it was not as useful as it might have seemed.

Scaled index with bounds checking


This is similar to scaled index addressing, except that the instruction has two extra
operands (typically constants), and the hardware would check that the index value was
between these bounds.

Another variation uses vector descriptors to hold the bounds; this makes it easy to
implement dynamically allocated arrays and still have full bounds checking.

Register indirect to byte within word


The DEC PDP-10 computer used 36-bit words. It had a special addressing mode which
allowed memory to be treated as a sequence of bytes (bytes could be any size from 1
bit to 36 bits). A 1-word sequence descriptor held the current word address within the
sequence, a bit position within a word, and the size of each byte.

Instructions existed to load and store bytes via this descriptor, and to increment the
descriptor to point at the next byte (bytes were not split across word boundaries). Much
DEC software used five 7-bit bytes per word (plain ASCII characters), with 1 bit unused
per word. Implementations of C had to use four 9-bit bytes per word, since C assumes
that you can access every bit of memory by accessing consecutive bytes.

194
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Lesson IX

Memory System Enhancement

Characteristics of Main Memory

Main memory is as vital as the processor chip to a computer system. Fast systems have
both a fast processor and a large memory. Here is a list of some characteristics of
computer memory. Some characteristics are true for both kinds of memory; others are
true for just one.

Bit

Here is a table that summarizes the characteristics of the two types of computer
memory.

True for True for


Characteristic
Main Memory Secondary Memory

Closely connected to the


X
processor.

Holds the programs and data


the processor is actively X
using.

Used for long term storage. X

Interacts with processor


X
millions of times per second.

Contents is easily changed. X X

Relatively low capacity. X

Relatively huge capacity. X

Fast access. X

Slow access. X

Connected to main memory. X

Holds programs and data. X X

Organized into files. X

195
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
In both main and secondary memory, information is stored as patterns of bits. Recall
from chapter two what a bit is:
A bit is a single "on"/"off" value. Only these two values are possible.
The two values may go by different names, such as "true"/"false", or "1"/"0". There are
many ways in which a bit can be implemented. For example a bit could be implemented
as:

· A mechanical electrical switch (like a light switch.)

· Voltage on a wire.

· A single transistor (used in main memory).

· A tiny part of the surface of a magnetic disk.

· A tiny part of the surface of a magnetic tape.

· A hole punched in a card.

· A tiny part of the light-reflecting surface of a CD.

· Part of a radio signal.

· Many, many more ways

So the particular implementation of bits is different in main memory and secondary


memory, but logically, both types of memory store bits.

Copied Information

Information stored in binary form does not change when it is copied from one medium
(storage method) to another. And an unlimited number of such copies can be made
(remember the advantages of binary.) This is a very powerful combination. You may be
so accustomed to this that it seems commonplace. But when you (say) download an
image from the Internet, the data has been copied many dozens of times, using a
variety of storage and transmission methods.

It is likely, for example, that the data starts out on magnetic disk and is then copied to
main storage of the web site's computer (involving a voltage signal in between.) From
main storage it is copied (again with a voltage signal in between) to a network interface
card, which temporarily holds it in many transistors. From there it is sent as an
electrical signal down a cable. Along the route to your computer, there may be dozens
of computers that transform data from an electrical signal, into main memory transistor
form, and then back to an electrical signal on another cable. Your data may even be
transformed into a radio signal, sent to a satellite (with its own computers), and sent

196
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
back to earth as another radio signal. Eventually the data ends up as data in your video
card (transistors), which transforms it

Byte

power of 2
Name Number of Bytes

byte 1 20

kilobyte 1024 210

megabyte 1,048,576 220

gigabyte 1,073,741,824 230

1,099,511,627,776
terabyte 240

One bit of information is so little that usually computer memory is organized into groups
of eight bits. Each eight bit group is called a byte. When more than eight bits are
required for some data, a whole number of bytes are used. One byte is about enough
memory to hold a single character.

Often very much more than eight bits are required for data, and thousands, millions, or
even billions of bytes are needed. These amounts have names, as seen in the table. If
you expect computers to be your career, it would be a good idea to become very
familiar with this table. (Except that the only number you should remember from the
middle column is that a kilobyte is 1024 bytes.) Often a kilobyte is called a "K" and a
megabyte is called a "Meg."

Bytes, not Bits

The previous table listed the number of bytes, not bits. So one K of memory is 1024
bytes, or 1024*8 == 8,192 bits. Usually one is not particularly interested in the exact
number of bits. It will be very useful in your future career to be sure you know how to
multiply powers of two.
2M * 2N = 2(M+N)

In the above, "*" means "multiplication." For example:


26 * 210 = 216
Locations in a digital image are specified by a row number and a column number (both
of them integers). A particular digital image is 1024 rows by 1024 columns, and each
location holds one byte. How many megabytes are in that image?

197
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Locations in a digital image are specified by a row
number and a column number (both of t h e m
integers). A particular digital image is 1024 rows
by 1024 columns, and each location holds one byte.
How many megabytes are in that image?

Picture of Main Memory

Main memory consists of a very long list of bytes.


In most modern computers, each byte has an
address that is used to locate it. The picture
shows a small part of main memory:

Each box in this picture represents a s i n g l e


byte. Each byte has an address. In this picture
the addresses are the integers to the left of the
boxes: 0, 1, 2, 3, 4, ... and so on. The addr esses
for most computer memory start at 0 and go up in
sequence until each byte has an address.

Each byte contains a pattern of eight bits. When the computer's power is on, every byte
contains some pattern or other, even those bytes not being used for anything.
(Remember the nature of binary: when a binary device is working it is either "on" or
"off", never in between.)

The address of a byte is not part of its contents. When the processor needs to access
the byte at a particular address, the electronics of the computer "knows how" to find
that byte in memory.

Contents of Main Memory

198
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Main memory (as all computer m e m o r y )
stores bit patterns. That is, each m e m o r y
location consists of eight bits, and each bit is
either "0" or "1". For example, the picture shows
the first few bytes of memory.

The only thing that can be stored at one memory


location is eight bits, each with a value of "0" or "1".
The bits at a memory location are called the
contents of that location.

Sometimes people will say that each m e m o r y


location holds an eight bit binary number. This
is OK, as long as you remember that the "number"
might be used to represent a character, or
anything else.

Remember that what a particular p a t t e r n


represents depends on its context (ie., how a
program is using it.) You cannot look at an
arbitrary bit pattern (such as those in the picture) and say what it represents.

Programs and Memory

The processor has written a byte of data at location 7. The old contents of that location
are lost. Main memory now looks like the picture.

When a program is running, it has a section of memory for the data it is using.
Locations in that section can be changed as many times as the program needs. For
example, if a program is adding up a list of numbers, the sum will be kept in main
memory (probably using several bytes.) As new numbers are added to the sum, it will
change and main memory will have to be changed, too.

Other sections of main memory might not change at all while a program is running. For
example, the instructions that make up a program do not (usually) change as a
program is running. The instructions of a running program are located in main memory,
so those locations will not change.

When you write a program in Java (or most other languages) you do not need to keep
track of memory locations and their contents. Part of the purpose of a programming
language is to do these things automatically.

199
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
In secondary storage (usually the computer system's hard disk.)

Hard Disks

The hard disk of a


computer system records bytes on
a magnetic surface much like the
surface of audio tape. The recording
(writing) and reading of the data is
done with a read/write h e a d
similar to that used with a u d i o
tape.

The picture shows one disk and one


read/write head at the end of a
movable arm. The arm moves in and out along a radius of the disk. Since the disk is
rotating it will record data in a circular track on the disk. Later on, to read the data, it
must be moved to the right position, then it must wait until the rotating disk brings the
data into position. Just as with audio tape, data can be read without changing it. When
new data it recorded, it replaces any data that was previously recorded at that location.
Unlike audio tape, the read/write head does not actually touch the disk but skims just a
little bit above it.

Usually the component called the "hard disk" of a computer system contains many
individual disks and read/write heads like the above. The disks are coated with
magnetic material on both sides (so each disk gets two read/write heads) and the disks
are all attached to one spindel. All the disks and heads are sealed into a dust-free metal
can. Since the operation of a hard disk involves mechanical motion (which is much
slower than electronic processes), reading and writing data is much slower than with
main memory.

Files

Hard disks (and other secondary memory devices) are used for long-term storage of
large blocks of information, such as programs and data sets. Usually disk memory is
organized into files.
A file is a collection of information that has been given a name and is stored in secondary
memory. The information can be a program or can be data.
The form of the information in a file is the same as with any digital information---it
consists of bits, usually grouped into eight bit bytes. Files are frequently quite large;
their size is measured in kilobytes or megabytes.

200
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
If you have never worked with files on a computer before you should study the
documentation that came with your operating system, or look at a book such as
Windows NT for Dummies (or whatever is appropriate for your computer.)

One of the jobs of a computer's operating system is to keep track of file names and
where they are on its hard disk. For example, in DOS the user can ask to run the
program DOOM like this:
C:\> DOOM.EXE
The "C:\>" is a prompt; the user typed in "DOOM.EXE". The operating system now has
to find the file called DOOM.EXE somewhere on its hard disk. The program will be
copied into main storage and will start running. As the program runs it asks for
information stored as additional files on the hard disk, which the operating system has
to find and copy into main memory. Usually in a file in secondary storage. If the file
does not already exist, the program will ask the operating system to create it.

Files and the Operating System

Usually all collections of data outside of main storage are organized into files. The job of
keeping all this information straight is the job of the operating system. If the computer
system is part of a network, keeping straight all the files on all the computers can be
quite a task, and is the collective job of all the operating systems involved.

Application programs (including programs that you might write) do not directly read,
write, create, or delete files. Since the operating system has to keep track of
everything, all other programs ask it to do file manipulation tasks. For example, say
that a program has just calculated a set of numbers and needs to save them. The
following might be how it does this:

1. Program: asks the operating system to create a file with a name RESULTS.DAT

2. Operating System: gets the request; finds an unused section of the disk and
creates an empty file. The program is told when this has been completed.

3. Program: asks the operating system to save the numbers in the file.

4. Operating System: gets the numbers from the program's main memory, writes
them to the file. The program is told when this has been completed.

5. Program: continues on with whatever it is doing.

So when an application program is running, it is constantly asking the operating system


to perform file manipulation tasks (and other tasks) and waiting for them to be
completed. If a program asks the operating system to do something that will damage
the file system, the operating system will refuse to do it. Modern programs are written

201
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
so that they have alternatives when a requests is refused. Older programs were not
written this way, and do not run well on modern computers.

In modern computer systems, only the operating system can directly do anything with
disk files. How does this:

1. affect the security of the system?

· The security is increased because programs that try to do dangerous or


stupid things to files can't. They have to ask the operating system, which
will only do safe and sensible things.

2. affect computer games?

· Older computer games did their file manipulation themselves without


asking the operating system (old DOS was stupid and slow and many
application programs ignored it.) Those games won't run on modern
computers.

3. affect the ease in creating programs?

· Program creation is easier because the work of dealing with files is done by
the operating system.

Types of Files

As far as the hard disk is concerned, all files are the same. At the electronic level, there
is no difference between a file containing a program and a file containing data. All files
are named collections of bytes. Of course, what the files are used for is different. The
operating system can take a program file, copy it into main memory, and start it
running. The operating system can take a data file, and supply its information to a
running program when it asks.

Often then last part of a file's name (the extension) shows what the file is expected to
be used for. For example, in "mydata.txt" the ".txt" means that the file is expected to
be used as a collection of text, that is, characters. With "Netscape.exe" the ".exe"
means that the file is an "executable," that is, a program that is ready to run. With
"program1.java" the ".java" means that the file is a source program in the language
java (there will be more about source programs later on in these notes.) To the hard
disk, each of these files is the same sort of thing: a collection of bytes.

Address EXtension

In computing, Physical Address Extension (PAE) refers to a feature of x86


processors that allows for up to 64 gigabytes of physical memory to be used in 32-bit
systems, given appropriate operating system support. PAE is provided by Intel Pentium
202
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Pro and above CPUs (including all later Pentium-series processors except the 400 MHz
bus versions on the Pentium M), as well as by some compatible processors such as
those from AMD. The CPUID flag PAE is assigned for the purpose of identifying CPUs
with this capability.

The processor hardware is augmented with additional address lines used to select the
additional memory, and 36-bit page tables, but regular application software continues
to use instructions with 32-bit addresses and a flat memory model limited to 4
gigabytes. The operating system uses PAE to map this 32-bit address space onto the 64
gigabytes of total memory, and the map can be and usually is different for each
process. In this way the extra memory is useful even though regular applications cannot
access it all simultaneously.

For application software which needs access to more than 4 gigabytes of memory some
special mechanism may be provided by the operating system in addition to the regular
PAE support. On Microsoft Windows this mechanism is called Address Windowing
Extensions (AWE), while on Unix systems a variety of tricks are used, such as using
mmap() to map regions of a file into and out of the address space as needed, none
having been blessed as a standard.

Page table structures


In traditional 32-bit protected mode, x86 processors use a two-level page translation
scheme, where the register CR3 points to a single 4K-long page directory, which is
divided into 1024 4-byte entries that point to 4K-long page tables, similarly consisting
of 1024 4-byte entries pointing to 4K-long pages.

Enabling PAE (by setting bit 5, PAE, of the system control register CR4) causes major
changes to this scheme. By default, the size of each page remains as 4K. Each entry in
the page table and page directory is extended to 64 bits (8 bytes) rather than 32 to
allow for additional address bits; the table size does not change, however, so each table
now has only 512 entries. Because this allows only a quarter as many entries as the
original scheme, an extra level of hierarchy must be added, so CR3 now points to the
Page Directory Pointer Table, a short table which contains pointers to 4 page
directories.

Additionally, the entries in the page directory have an additional flag, named 'PS' (for
Page Size). If this bit (bit 7) is set to 1, the page directory entry does not point to a
page table, but a single large page (2MB in length).

203
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Lesson X

Control Unit Enhancement


A speed enhancement technique for CMOS circuits is disclosed. In the series of logic
stages, nodes in the signal path of a pulse are set by preceding logic stages, then reset by
feedback from subsequent logic stages. This eliminates the capacitive burden of resetting
any given node from the input signal to allow substantially all of the input signal to be
employed in setting the nodes to an active state rather than wasting part of the signal in
turning off the reset path. The technique is illustrated as applied to RAM circuits.

C l a i m s

I claim:

1. A circuit comprising: a plurality of cascaded stages, each stage being capable of being
placed in one of a set state or a reset state, wherein the set state of a particular stage is
established by using a majority of charge supplied from an immediately preceding stage,
and the reset state of the particular stage is established by using a minority of charge
supplied directly from a subsequent stage.

2. A circuit as in claim 1 wherein each stage comprises at least one PMOS transistor and
at least one NMOS transistor.

3. A circuit as in claim 2 wherein each stage further comprises an additional transistor.

4. A circuit as in claim 3 wherein the additional transistor is coupled to the subsequent


stage to enable reset of the particular stage.

5. A circuit as in claim 2 wherein the PMOS and NMOS transistors are interconnected to
form an inverter.
6. A circuit as in claim 5 wherein the PMOS and NMOS transistors include gates commonly
connected.

204
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
7. A circuit as in claim 1 wherein the circuit is coupled between a first and a second
potential source, and a selected number of the stages each comprises: an input node; an
output node; a first transistor connected to the input node, the output node, and to one of
the first and second potential sources of connecting the output node to one of the first and
second potential sources in response to a first type signal on the input node; and a second
transistor connected to the output node, to the other of the first and second potential
sources, and to the following stage, for connecting the output node to the other of the
first and second potential sources in delayed response to the earlier first type signal on
the input node.

8. A circuit as in claim 7 wherein for each of the selected number of stages: the first
transistor has a gate connected to the input node, a source connected to the second
potential source, and a drain connected to the output node; and the second transistor has
a gate connected to an output node of the following stage, a drain connected to the
output node, and a source connected to the first potential source.

9. A circuit as in claim 8 wherein for each of the selected number of stages: the first
transistor comprises an NMOS transistor; and the second transistor comprises a PMOS
transistor.

10. A circuit as in claim 9 wherein for each of the selected number of stages, each stage
further comprises: a third transistor connected to the output node, to the other of the first
and second potential sources and to the input node, for connecting the output node to the
other of the first and second potential in response to a second type signal on the input
node.

11. A circuit as in claim 10 wherein for each of the selected number of stages, each stage
further comprises: the third transistor has a source connected to the other of the first and
second potential sources, a drain connected to the output node, and a gate connected to
the input node.
12. A circuit as in claim 1 wherein the input node of the particular stage is connected to
the output node of the stage immediately preceding the particular stage.

13. A circuit as in claim 12 wherein the subsequent stage is an even number of stages
after the particular stage.

14. A CMOS circuit comprising: a plurality of serially-connected stages, each stage


including an NMOS and a PMOS transistor; odd-numbered stages including an NMOS
transistor which is more than one-half the size of the PMOS transistor in that stage; and
even-numbered stages including a PMOS transistor which is more than twice the size of
the NMOS transistor in that stage.

205
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
15. A CMOS circuit as in claim 14 wherein: each stage includes an input node and an
output node, the input node of a stage being connected to the output node of a preceding
stage; wherein the input node of an odd-numbered stage is connected to control the
PMOS transistor, the input node of an even-numbered stage is connected to control the
NMOS transistor; and wherein the NMOS transistor in an odd-numbered stage is
connected to be controlled by the output node of a subsequent stage, and the PMOS
transistor in an even-numbered stage is controlled by the output node of a subsequent
stage.

16. A circuit as in claim 15 wherein: the stages are capable of being placed in a first logic
stage or a second logic state; the NMOS transistor in the odd-numbered stages is coupled
to a subsequent stage and controls the first logic state of the odd-numbered stages; and
the PMOS transistor in the even-numbered stages is coupled to a subsequent stage and
controls the first logic state of the even-numbered stages.

17. A circuit as in claim 16 wherein: the NMOS transistor in the even-numbered stages is
coupled to a immediately preceding stage and controls the second logic state of the
even-numbered stages; the PMOS transistor in the odd-numbered stages is coupled to an
immediately preceding stage and controls the second logic stage of the odd-numbered
stages.

18. A circuit as in claim 14 coupled between a lower potential and an upper potential
wherein each stage comprises: an input node; an output node; a PMOS transistor having
a gate connected to the input node, a source connected to the upper potential, and a
drain connected to the output node; and an NMOS transistor having a gate connected to
the input node, a source connected to the lower potential, and a drain connected to the
output node.

19. A circuit comprising: a first stage; a plurality of cascaded stages; a last stage; each
cascaded stage including set means and reset means; the set means for each particular
one of the cascaded stages being coupled to and driven by a previous stage, and the
reset means for each cascaded stage being coupled to and driven by a subsequent stage,
wherein virtually all of the power available during the switching of the previous stage is
available for driving the set means for the particular cascaded stage, thereby increasing
the switching speed of the set means of the particular cascaded stage; and wherein a
minor portion of the power available during the switching of the subsequent stage is used
for driving the reset means of the particular cascaded stage, thereby accomplishing the
reset of the particular cascaded stage without significantly altering the switching speed of
the subsequent stage.

20. A logic circuit comprising: a first node for receiving an input signal having energy; a
second node for supplying an output signal; a plurality of cascaded stages each having a
control input node, a reset input node, and an output node, the control input node of a
first stage of the plurality being connected to the first node, the output node of a last

206
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
stage of the plurality being connected to the second node; each said stage being capable
of assuming a set state and a reset state, the set state for each stage being controlled by
a signal on its control input node which for all stages except the first stage is coupled to
the output node of an earlier stage, whereby most of the energy available from the input
signal for the first cascaded stage is used for setting that cascaded stage and most of the
energy available from each subsequent stage is used for setting the next stage thereafter;
and the reset state for each stage being controlled by a signal on the reset input node
supplied from an output node of a subsequent stage, whereby energy to reset each
particular cascaded stage comes from a subsequent stage.

21. A logic circuit as in claim 20 wherein the set state and the reset state for each
cascaded stage are controlled by logic switches whose conduction depends upon the state
of the control
input node for that stage.

22. A circuit as in claim 21 wherein the logic switches for each cascaded stage connected
between the output node of that stage and a most positive potential source comprises
PMOS transistors.

23. A circuit as in claim 21 wherein the logic switches for each cascaded stage connected
between the output node for that stage and a most negative potential source comprise
NMOS transistors.

24. A logic circuit as in claim 20 wherein the subsequent stage is an even number of
stages following the particular stage.

25. The circuit as in claim 24 wherein the even number is four.

26. A circuit for providing control signals to other circuits comprising: a plurality of
serially-connected stages, each capable of being placed in a set state or a reset state;
wherein a majority of charge to switch a stage to a set state comes from a prior stage
and a majority of charge to switch a stage to a reset state comes directly from a later
stage.

27. A method of increasing the speed of operation of a CMOS circuit having multiple
serially-connected stages comprising: providing a pulse having charge at an input node to
a selected stage; using a majority of the charge of the pulse to place the selected stage in
an active state; propagating the active state of the selected stage to later stages to
thereby also place them in an active state; and using an output signal from one of the
later stages connected directly to the selected stage to place the selected stage in a reset
state to await arrival of another pulse.

207
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
The Hard-Wired Control Unit

Figure 2 is a block diagram showing the internal organization of a hard-wired control


unit for our simple computer. Input to the controller consists of the 4-bit opcode of the
instruction currently contained in the Instruction Register and the negative flag from the
accumulator. The controller's output is a set of 16 control signals that go out to the
various registers and to the memory of the computer, in addition to a HLT signal that is
activated whenever the leading bit of the op-code is one. The controller is composed of
the following functional units: A ring counter, an instruction decoder, and a control
matrix.

Figure 2. A Block diagram of the Basic Computer's Hard-wired Control unit

The ring counter provides a sequence of six consecutive active signals that cycle
continuously. Synchronized by the system clock, the ring counter first activates its T0
line, then its T1 line, and so forth. After T5 is active, the sequence begins again with T0.
Figure 3 shows how the ring counter might be organized internally.

Figure 3. The Internal Organization of the Ring Counter

208
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
The instruction decoder takes its four-bit input from
the op-code field of the instruction register and
activates one and only one of its 8 output lines.
Each line corresponds to one of the instructions in
the computer's instruction set. Figure 4 shows the
internal organization of this decoder.

Figure 4. The Internal Organization of the


Hard-w
i r e d
Instruc
t i o n
Decode
r
The
m o s t
importa
nt part
of the
hard-wir
e d
controlle
r is the
control
matrix.
I t
receives
i n p u t
from the
r i n g
c ount er
and the
instructi
o n
decoder and provides the proper sequence of control signals. Figure 5 is a diagram of
how the control matrix for our simple machine might be wired.

Figure 5. The Internal Organization of the Hard-wired Control Matrix

209
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
IP = T2
W = T5*STA
LP = T3*JMP + T3*NF*JN
LD = T4*STA
LA = T5*LDA + T4*ADD + T4*SUB
EA = T4*STA + T3*MBA
EP = T0
S = T3*SUB
A = T3*ADD
LI = T2
LM = T0 + T3*LDA + T3*STA
ED = T2 + T5*LDA
R = T1 + T4*LDA
EU = T3*ADD+T3*SUB
EI = T3*LDA + T3*STA + T3*JMP + T3*NF*JN
LB = T3*MBA

To understand how this diagram was obtained, we must look carefully at the machine's
instruction set (Table 1).

Table 1. An Instruction Set For The Basic Computer


Instruction Op-Code Execution Register Ring
Active Control
Mnemonic Action Transfers Pulse
Signals
------------------------------------------------------------------------------------------------------------------------------------
LDA 1 ACC<--(RAM) 1. MAR <-- IR
3 EI, LM
(Load ACC) 2. MDR <-- RAM(MAR)
4 R
3. ACC <-- MDR
5 ED, LA
STA 2 (RAM) <--ACC 1. MAR <-- IR 3
EI, LM
(Store ACC) 2. MDR <-- ACC 4
EA, LD

210
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
3. RAM(MAR) <-- MDR 5
W
ADD 3 ACC <-- ACC + B 1. ALU <-- ACC + B 3
A
(Add B to ACC) 2. ACC <-- ALU 4
EU, LA
SUB 4 ACC <-- ACC - B 1. ALU <-- ACC - B 3
S
(Sub. B from ACC) 2. ACC <-- ALU 4
EU, LA
MBA 5 B <-- ACC 1. B <-- A 3
EA, LB
(Move ACC to B)
JMP 6 PC <-- RAM 1. PC <-- IR 3
EI, LP
(Jump to
Address)
JN 7 PC <-- RAM 1. PC <-- IR 3
NF: EI, LP
(Jump if if negative if NF set
Negative) flag is set

HLT 8-15 Stop clock


"Fetch" IR <-- Next 1. MAR <-- PC 0
EP, LM
Instruction 2. MDR <-- RAM(MAR) 1
R
3. IR <-- MDR

Table 2 shows which control signals must be active at each ring counter pulse for each of
the instructions in the computer's instruction set (and for the instruction fetch operation).
The table was prepared by simply writing down the instructions in the left-hand column.
(In the circuit these will be the output lines from the decoder). The various control
signals are placed horizontally along the top of the table. Entries into the table consist of
the moments (ring counter pulses T0, T1, T2, T3, T4, or T5) at which each control signal
must be active in order to have the instruction executed. This table is prepared very
easily by reading off the information for each instruction given in Table 1. For example,
the Fetch operation has the EP and LM control signals active at ring count 1, and ED, LI,

211
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
and IPC active at ring count 2. Therefore the first row (Fetch) of Table 2 has T0 entered
below EP and LM, T1 below R, and T2 below IP, ED, and LI.

Table 2. A Matrix of Times at which Each Control Signal Must Be Active in


Order to
Execute the Hard-wired Basic Computer's Instructions

Control Signal: IP LP EP LM R W LD ED LI EI LA EA A S EU LB
Instruction:
-----------------------------------------------------------------------------
"Fetch" T2 T0 T0 T1 T2 T2
LDA T3 T4 T5 T3 T5
STA T3 T5 T4 T3 T4
MBA T3 T3
ADD T4 T3 T4
SUB T4 T3 T4
JMP T3 T3
JN T3*NF T3*NF

Once Table 2 has been prepared, the logic required for each control signal is easily
obtained. For each an AND operation is performed between any active ring counter (Ti)
signals that were entered into the signal's column and the corresponding instruction
contained in the far left-hand column. If a column has more than one entry, the output
of the ANDs are ORed together to produce the final control signal. For example, the LM
column has the following entries: T0 (Fetch), T3 associated with the LDA instruction,
and T3 associated with the STA instruction. Therefore, the logic for this signal is:

LM = T0 + T3*LDA + T3*STA

This means that control signal LM will be activated whenever any of the following
conditions is satisfied: (1) ring pulse T0 (first step of an instruction fetch) is active, or
(2) an LDA instruction is in the IR and the ring counter is issuing pulse 3, or (3) and
STA instruction is in the IR and the ring counter is issuing pulse 3.

The entries in the JN (Jump Negative) row of this table require some further
explanation. The LP and EI signals are active during T3 for this instruction if and only if
the accumulator's negative flag has been set. Therefore the entries that appear above
these signals for the JN instruction are T3*NF, meaning that the state of the negative
flag must be ANDed in for the LP and EI control signals.

Figure 6 gives the logical equations required for each of the control signals used on our
machine. These equations have been read from Table 2, as explained above. The circuit
diagram of the control matrix (Figure 5) is constructed directly from these equations.

212
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
It should be noticed that the HLT line from the instruction decoder does not enter the
control matrix, Instead this signal goes directly to circuitry (not shown) that will stop
the clock and thus terminate execution.

Figure 6. The logical equations required for each of the hardwired control
signals on the basic computer. The machine's control matrix is designed from
these equations.

213
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
R I S C ?
RISC, or Reduced Instruction Set Computer. is a type of microprocessor architecture that
utilizes a small, highly-optimized set of instructions, rather than a more specialized set of
instructions often found in other types of architectures.
H i s t o r y
The first RISC projects came from IBM, Stanford, and UC-Berkeley in the late 70s and
early 80s. The IBM 801, Stanford MIPS, and Berkeley RISC 1 and 2 were all designed
with a similar philosophy which has become known as RISC. Certain design features
have been characteristic of most RISC processors:
214
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
· one cycle execution time: RISC processors have a CPI (clock per instruction) of one
cycle. This is due to the optimization of each instruction on the CPU and a technique
called <I.PIPELINING< I>;

· pipelining: a technique that allows for simultaneous execution of parts, or stages,


of instructions to more efficiently process instructions;

· large number of registers: the RISC design philosophy generally incorporates a


larger number of registers to prevent in large amounts of interactions with memory

This Site

We will first examine MIPS in detail as an example of an early RISC architecture to


better understand the features and design of RISC architectures. We will then study
pipelining to see the performance benefits of such a technique. Then we will look at the
advantages and disadvantages of such a RISC-based architecture as compared to CISC
architectures. Finally, we will discuss some of the recent developments and future
directions of RISC processor technology in particular, and processor technology as a whole
in general.
H i s t o r y
The MIPS processor was developed as part of a VLSI research program at
Stanford University in the early 80s. Professor John Hennessy, now the
University's President, started the development of MIPS with a
brainstorming class for graduate students. The readings and idea sessions
helped launch the development of the processor which became one of the
first RISC processors, with IBM and Berkeley developing processors at
around the same time.

Mips Architecture

The Stanford research group had a strong background in compilers, which led them to
develop a processor whose architecture would represent the lowering of the compiler to
the hardware level, as opposed to the raising of hardware to the software level, which
had been a long running design philosophy in the hardware industry.

Thus, the MIPS processor implemented a smaller, simpler instruction set. Each of the
instructions included in the chip design ran in a single clock cycle. The processor used a
technique called pipelining to more efficiently process instructions.

MIPS used 32 registers, each 32 bits wide (a bit pattern of this size is referred to as a
word).

215
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Instruction Set

The MIPS instruction set consists of about 111 total instructions, each represented in 32
bits. An example of a MIPS instruction is below:

add $r12, $r7, $r8

Above is the assembly (left) and binary (right) representation of a MIPS addition
instruction. The instruction tells the processor to compute the sum of the values in
registers 7 and 8 and store the result in register 12. The dollar signs are used to
indicate an operation on a register. The colored binary representation on the right
illustrates the 6 fields of a MIPS instruction. The processor identifies the type of
instruction by the binary digits in the first and last fields. In this case, the processor
recognizes that this instruction is an addition from the zero in its first field and the 20 in
its last field.

The operands are represented in the blue and yellow fields, and the desired result
location is presented in the fourth (purple) field. The orange field represents the shift
amount, something that is not used in an addition operation.

The instruction set consists of a variety of basic instructions, including:

· 21 arithmetic instructions (+, -, *, /, %)

· 8 logic instructions (&, |, ~)

· 8 bit manipulation instructions

· 12 comparison instructions (>, <, =, >=, <=, ¬)

· 25 branch/jump instructions

· 15 load instructions

· 10 store instructions

· 8 move instructions

· 4 miscellaneous instructions

MIPS Today

216
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
MIPS Computer Systems, Inc. was founded in 1984 upon the Stanford research from
which the first MIPS chip resulted. The company was purchased buy Silicon Graphics,
Inc. in 1992, and was spun off as MIPS Technologies, Inc. in 1998. Today, MIPS powers
many consumer electronics and other devices.
How Pipelining Works

Pipelining, a standard feature in RISC processors, is much like an assembly line. Because
the processor works on different steps of the instruction at the same time, more
instructions can be executed in a shorter period of time.
A useful method of demonstrating this is the laundry analogy. Let's say that there are
four loads of dirty laundry that need to be washed, dried, and folded. We could put the
the first load in the washer for 30 minutes, dry it for 40 minutes, and then take 20
minutes to fold the clothes. Then pick up the second load and wash, dry, and fold, and
repeat for the third and fourth loads. Supposing we started at 6 PM and worked as
efficiently as possible, we would still be doing laundry until midnight.

However, a smarter approach to the problem would be to put the second load of dirty
laundry into the washer after the first was already clean and whirling happily in the dryer.
Then, while the first load was being folded, the second load would dry, and a third load
could be added to the pipeline of laundry. Using this method, the laundry would be
finished by 9:30.

217
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
RISC Pipelines

A RISC processor pipeline operates in much the same way, although the stages in the
pipeline are different. While different processors have different numbers of steps, they
are basically variations of these five, used in the MIPS R3000 processor:

1. fetch instructions from memory

2. read registers and decode the instruction

3. execute the instruction or calculate an address

4. access an operand in data memory

5. write the result into a register

If you glance back at the diagram of the laundry pipeline, you'll notice that although the
washer finishes in half an hour, the dryer takes an extra ten minutes, and thus the wet
clothes must wait ten minutes for the dryer to free up. Thus, the length of the pipeline
is dependent on the length of the longest step. Because RISC instructions are simpler
than those used in pre-RISC processors (now called CISC, or Complex Instruction Set
Computer), they are more conducive to pipelining. While CISC instructions varied in
length, RISC instructions are all the same length and can be fetched in a single
operation. Ideally, each of the stages in a RISC processor pipeline should take 1 clock
cycle so that the processor finishes an instruction each clock cycle and averages one
cycle per instruction (CPI).

218
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Pipeline Problems

In practice, however, RISC processors operate at more than one cycle per instruction.
The processor might occasionally stall a a result of data dependencies and branch
instructions.

A data dependency occurs when an instruction depends on the results of a previous


instruction. A particular instruction might need data in a register which has not yet been
stored since that is the job of a preceeding instruction which has not yet reached that
step in the pipeline.

For example:
add $r3, $r2, $r1

add $r5, $r4, $r3

more instructions that are independent of the first two


In this example, the first instruction tells the processor to add the contents of registers
r1 and r2 and store the result in register r3. The second instructs it to add r3 and r4
and store the sum in r5. We place this set of instructions in a pipeline. When the second
instruction is in the second stage, the processor will be attempting to read r3 and r4
from the registers. Remember, though, that the first instruction is just one step ahead
of the second, so the contents of r1 and r2 are being added, but the result has not yet
been written into register r3. The second instruction therefore cannot read from the
register r3 because it hasn't been written yet and must wait until the data it needs is
stored. Consequently, the pipeline is stalled and a number of empty instructions (known
as bubbles go into the pipeline. Data dependency affects long pipelines more than
shorter ones since it takes a longer period of time for an instruction to reach the final
register-writing stage of a long pipeline.

MIPS' solution to this problem is code reordering. If, as in the example above, the
following instructions have nothing to do with the first two, the code could be
rearranged so that those instructions are executed in between the two dependent
instructions and the pipeline could flow efficiently. The task of code reordering is
generally left to the compiler, which recognizes data dependencies and attempts to
minimize performance stalls.

Branch instructions are those that tell the processor to make a decision about what the
next instruction to be executed should be based on the results of another instruction.

219
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Branch instructions can be troublesome in a pipeline if a branch is conditional on the
results of an instruction which has not yet finished its path through the pipeline.

For example:
Loop : add $r3, $r2, $r1
sub $r6, $r5, $r4
beq $r3, $r6, Loop
The example above instructs the processor to add r1 and r2 and put the result in r3,
then subtract r4 from r5, storing the difference in r6. In the third instruction, beq
stands for branch if equal. If the contents of r3 and r6 are equal, the processor should
execute the instruction labeled "Loop." Otherwise, it should continue to the next
instruction. In this example, the processor cannot make a decision about which branch
to take because neither the value of r3 or r6 have been written into the registers yet.

The processor could stall, but a more sophisticated method of dealing with branch
instructions is branch prediction. The processor makes a guess about which path to take
- if the guess is wrong, anything written into the registers must be cleared, and the
pipeline must be started again with the correct instruction. Some methods of branch
prediction depend on stereotypical behavior. Branches pointing backward are taken
about 90% of the time since backward-pointing branches are often found at the bottom
of loops. On the other hand, branches pointing forward, are only taken approximately
50% of the time. Thus, it would be logical for processors to always follow the branch
when it points backward, but not when it points forward. Other methods of branch
prediction are less static: processors that use dynamic prediction keep a history for
each branch and uses it to predict future branches. These processors are correct in their
predictions 90% of the time.

Still other processors forgo the entire branch prediction ordeal. The RISC System/6000
fetches and starts decoding instructions from both sides of the branch. When it
determines which branch should be followed, it then sends the correct instructions down
the pipeline to be executed.

Pipelining Developments

In order to make processors even faster, various methods of optimizing pipelines have
been devised.

Super pipelining refers to dividing the pipeline into more steps. The more pipe stages
there are, the faster the pipeline is because each stage is then shorter. Ideally, a
pipeline with five stages should be five times faster than a non-pipelined processor (or
rather, a pipeline with one stage). The instructions are executed at the speed at which
each stage is completed, and each stage takes one fifth of the amount of time that the
non-pipelined instruction takes. Thus, a processor with an 8-step pipeline (the MIPS
R4000) will be even faster than its 5-step counterpart. The MIPS R4000 chops its
220
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
pipeline into more pieces by dividing some steps into two. Instruction fetching, for
example, is now done in two stages rather than one. The stages are as shown:

1. Instruction Fetch (First Half)

2. Instruction Fetch (Second Half)

3. Register Fetch

4. Instruction Execute

5. Data Cache Access (First Half)

6. Data Cache Access (Second Half)

7. Tag Check

8. Write Back

Superscalar pipelining involves multiple pipelines in parallel. Internal components of the


processor are replicated so it can launch multiple instructions in some or all of its
pipeline stages. The RISC System/6000 has a forked pipeline with different paths for
floating-point and integer instructions. If there is a mixture of both types in a program,
the processor can keep both forks running simultaneously. Both types of instructions
share two initial stages (Instruction Fetch and Instruction Dispatch) before they fork.
Often, however, superscalar pipelining refers to multiple copies of all pipeline stages (In
terms of laundry, this would mean four washers, four dryers, and four people who fold
clothes). Many of today's machines attempt to find two to six instructions that it can
execute in every pipeline stage. If some of the instructions are dependent, however,
only the first instruction or instructions are issued.

Dynamic pipelines have the capability to schedule around stalls. A dynamic pipeline is
divided into three units: the instruction fetch and decode unit, five to ten execute or
functional units, and a commit unit. Each execute unit has reservation stations, which
act as buffers and hold the operands and operations.

221
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
While the functional units have the freedom to execute out of order, the instruction
fetch/decode and commit units must operate in-order to maintain simple pipeline
behavior. When the instruction is executed and the result is calculated, the commit unit
decides when it is safe to store the result. If a stall occurs, the processor can schedule
other instructions to be executed until the stall is resolved. This, coupled with the
efficiency of multiple units executing instructions simultaneously, makes a dynamic
pipeline an attractive alternative.
The simplest way to examine the advantages and disadvantages of RISC architecture is
by contrasting it with it's predecessor: CISC (Complex Instruction Set Computers)
architecture.
Multiplying Two Numbers in Memory

On the right is a diagram representing the storage scheme for a generic computer. The
main memory is divided into locations numbered from (row) 1: (column) 1 to (row) 6:
(column) 4. The execution unit is responsible for carrying out all computations.
However, the execution unit can only operate on data that has been loaded into one of
the six registers (A, B, C, D, E, or F). Let's say we want to find the product of two
numbers - one stored in location 2:3 and another stored in location 5:2 - and then store
the product back in the location 2:3.

The CISC Approach

222
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
A complex instruction set computer (CISC) is a
microprocessor instruction set architecture (ISA) in
which each instruction can execute several low-level
operations, such as a load from memory, an
arithmetic operation, and a memory store, all in a
single instruction. The term was coined in contrast
to reduced instruction set computer (RISC).

The primary goal of CISC architecture is to


complete a task in as few lines of assembly as
possible. This is achieved by building processor
hardware that is capable of understanding and
executing a series of operations. For this particular
task, a CISC processor would come prepared with a
specific instruction (we'll call it "MULT"). When
executed, this instruction loads the two values into
separate registers, multiplies the operands in the
execution unit, and then stores the product in the appropriate register. Thus, the entire
task of multiplying two numbers can be completed with one instruction:
MULT 2:3, 5:2
MULT is what is known as a "complex instruction." It operates directly on the
computer's memory banks and does not require the programmer to explicitly call any
loading or storing functions. It closely resembles a command in a higher level language.
For instance, if we let "a" represent the value of 2:3 and "b" represent the value of 5:2,
then this command is identical to the C statement "a = a * b."

One of the primary advantages of this system is that the compiler has to do very little
work to translate a high-level language statement into assembly. Because the length of
the code is relatively short, very little RAM is required to store instructions. The
emphasis is put on building complex instructions directly into the hardware.

The RISC Approach

RISC processors only use simple instructions that can be executed within one clock
cycle. Thus, the "MULT" command described above could be divided into three separate
commands: "LOAD," which moves data from the memory bank to a register, "PROD,"
which finds the product of two operands located within the registers, and "STORE,"
which moves data from a register to the memory banks. In order to perform the exact
series of steps described in the CISC approach, a programmer would need to code four
lines of assembly:

223
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
LOAD A, 2:3

LOAD B, 5:2

PROD A, B

STORE 2:3, A
At first, this may seem like a much less efficient way of completing the operation.
Because there are more lines of code, more RAM is needed to store the assembly level
instructions. The compiler must also perform more work to convert a high-level
language statement into code of this form.
CISC RISC
Emphasis on hardware Emphasis on software
Includes multi-clock S i n g l e - c l o c k ,
complex instructions reduced instruction only
M e m o r y - t o - m e m o r y : Register to register:
"LOAD" and "STORE" "LOAD" and "STORE"
incorporated in instructions are independent instructions

Small code sizes, Low cycles per second,


high cycles per second large code sizes
Transistors used for storing Spends more transistors
complex instructions on memory registers
However, the RISC strategy also brings some very important advantages. Because each
instruction requires only one clock cycle to execute, the entire program will execute in
approximately the same amount of time as the multi-cycle "MULT" command. These
RISC "reduced instructions" require less transistors of hardware space than the complex
instructions, leaving more room for general purpose registers. Because all of the
instructions execute in a uniform amount of time (i.e. one clock), pipelining is possible.

Separating the "LOAD" and "STORE" instructions actually reduces the amount of work
that the computer must perform. After a CISC-style "MULT" command is executed, the
processor automatically erases the registers. If one of the operands needs to be used
for another computation, the processor must re-load the data from the memory bank
into a register. In RISC, the operand will remain in the register until another value is
loaded in its place.

224
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
The Performance Equation

The following equation is commonly used for expressing a computer's performance


ability:

The CISC approach attempts to minimize the number of instructions per program,
sacrificing the number of cycles per instruction. RISC does the opposite, reducing the
cycles per instruction at the cost of the number of instructions per program.
RISC Roadblocks

Despite the advantages of RISC b a s e d


processing, RISC chips took over a decade to
gain a foothold in the commercial world. This
was largely due to a lack of software support.

Although Apple's Power Macintosh l i n e


featured RISC-based chips and Windows
NT was RISC compatible, Windows 3.1 and Windows 95 were designed with CISC
processors in mind. Many companies were unwilling to take a chance with the emerging
RISC technology. Without commercial interest, processor developers were unable to
manufacture RISC chips in large enough volumes to make their price competitive.

Another major setback was the presence of Intel. Although their CISC chips were
becoming increasingly unwieldy and difficult to develop, Intel had the resources to plow
through development and produce powerful processors. Although RISC chips might
surpass Intel's efforts in specific areas, the differences were not great enough to
persuade buyers to change technologies.

The Overall RISC Advantage

Today, the Intel x86 is arguable the only chip which retains CISC architecture. This is
primarily due to advancements in other areas of computer technology. The price of RAM
has decreased dramatically. In 1977, 1MB of DRAM cost about $5,000. By 1994, the
same amount of memory cost only $6 (when adjusted for inflation). Compiler
technology has also become more sophisticated, so that the RISC use of RAM and
emphasis on software has become ideal.

225
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
CISC and RISC Convergence

State of the art processor technology has changed significantly since RISC chips were first
introduced in the early '80s. Because a number of advancements (including the ones
described on this page) are used by both RISC and CISC processors, the lines between
the two architectures have begun to blur. In fact, the two architectures almost seem to
have adopted the strategies of the other. Because processor speeds have increased, CISC
chips are now able to execute more than one instruction within a single clock. This also
allows CISC chips to make use of pipelining. With other technological improvements, it is
now possible to fit many more transistors on a single chip. This gives RISC processors
enough space to incorporate more complicated, CISC-like commands. RISC chips also
make use of more complicated hardware, making use of extra function units for
superscalar execution. All of these factors have led some groups to argue that we are now
in a "post-RISC" era, in which the two styles have become so similar that distinguishing
between them is no longer relevant. However, it should be noted that RISC chips still
retain some important traits. RISC chips strictly utilize uniform, single-cycle instructions.
They also retain the register-to-register, load/store architecture. And despite their
extended instruction sets, RISC chips still have a large number of general purpose
registers.
Simultaneous Multi-Threading

Simultaneous Multi-Threading (SMT) allows multiple threads to be executed at the exact


same time. Threads are series of tasks which are executed alternately by the processor.

Normal thread execution requires threads to be switched on and off the processor as a
single processor dominates the processor for a moment of time. This allows some tasks
that involve waiting (for disk accesses, or network usage) to execute more efficiently.

SMT allows threads to execute at the same time by pulling instructions into the pipeline
from different threads. This way, multiple threads advance in their processes and no
one thread dominates the processor at any given time.

Value Prediction

Value prediction is the prediction of the value that a particular load instruction will
produce. Load values are generally not random, and approximately half of the load
instructions in a program will fetch the same value as they did in a previous execution.
Thus, predicting that the load value will be the same as it was last time speeds up the
processor since it allows the computer to continue without having to wait for the load
memory access. As loads tend to be one of the slowest and most frequently executed
instructions, this improvement makes a significant difference in processor speed.

226
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Example system configurations
Connection Point Services (CPS) lends itself to various configurations, according to your
needs. A few examples follow.

Dedicated single server


Example uses:
• Testing
• Small company
• Small Internet service provider

You can maintain both Phone Book Service (PBS) and Phone Book Administrator (PBA)
on a single computer running an operating system in the Windows Server 2003 family.

Even though PBA posts to the same server on which it resides, you must use the same
procedures for setting permissions and posting phone books as you would with any
other configuration.

Dedicated Phone Book Service server with a Phone Book Administrator client
Example uses:
• Medium to large corporations
• When ownership and responsibilities for phone book administration and server
maintenance are split between groups

In this configuration, PBS and PBA are installed on separate computers. PBA could be
installed on a server or on a workstation running Windows XP Professional. The
following illustration shows this configuration.

227
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Dedicated Phone Book Service server with remote administration of Phone Book
Administrator client

Example use: When the primary computer running Phone Book Administrator is not
physically accessible to the administrator, you can use this dual-mode system.

You can configure PBA to run on a primary (dedicated) computer and on a remote
workstation at the same time.The following illustration shows this configuration.

All data files reside on the primary computer, never on the remote workstation. The
remote workstation accesses the data files on the primary computer.

Multiple servers with firewall


Example uses:
• Highly secured environment
• Very large Internet service providers
• Phone book replication among multiple Internet service providers

You can install PBA on a primary computer and on multiple remote workstations. PBS is
installed on a staging server and on multiple host servers residing in a less secure
environment outside a firewall. The
following illustration shows this
configuration.

228
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
The remote workstations access phone book data on the primary PBA computer. Phone
book updates are posted to the staging server. Using a content replication method,
phone book updates are then copied from the staging server through the firewall to the
host servers.

Lesson XI

Advanced Architectures

Classes of Architecture:

Figure 1. Layered class type architecture.

I originally used the term "class type" because I first started with this approach using
object-oriented (OO) technology, although since then have used it for component-based
architectures, service oriented architectures (SOAs), and combinations thereof.
Throughout this article I still refer to classes within the layers, although there is
absolutely nothing stopping you from using non-OO technology to implement the layers.
The five layers are summarized in Table 1, as are the skills required to successfully work
on them (coding is applicable to
all layers so it's not listed).

Table 1. The 5 Layers.

Skillset
Layer Description

229
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
For user interfaces:
This layer wraps access to the logic of · User interface design
your system. There are two categories of skills
interface class – user interface (UI)
classes that provide people access to your · Usability skills
system and system interface (SI) classes
that provide access to external systems to · Ability to work closely
Interface your system. Java Server Pages (JSPs) with stakeholders
and graphical user interface (GUI) screens
For system interfaces:
implemented via the Swing class library
are commonly used to implement UI · API design skills
classes within Java. Web services and
CORBA wrapper classes are good options · Legacy analysis skills
for implementing SI classes.

· Analysis skills to identify


This layer implements the concepts domain classes
pertinent to your business domain such as
Student or Seminar, focusing on the data · Design skills to determine
aspects of the business objects, plus how to implement the
Domain
behaviors specific to individual objects. domain classes
Enterprise Java Bean (EJB) entity classes
are a common approach to implementing · Domain modeling skills,
domain classes within Java. in particular UML class
modeling

· Analysis skills to identify


process classes and process
logic

· Design skills to determine


The process layer implements business
how to implement the
logic that involves collaborating with
Process process classes
several domain classes or even other
process classes. · Modeling skills, in
particular activity
modeling, flow charting,
and sequence
diagramming
Persistence layers encapsulate the · Object/relational (O/R)
capability to store, retrieve, and delete mapping
Persistence objects/data permanently without
revealing details of the underlying storage · Architectural skills so you
technology. often implement between can choose the right
230
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
your object schema and your database encapsulation
database schema and there are various strategy
available to you.
· Modeling skills, in
particular class modeling
and physical data
modeling

· Analysis skills to identify


what needs to be built
System classes provide
· Architectural and design
operating-system-specific functionality for
skills to determine how to
your applications, isolating your software
System implement the classes
from the operating system (OS) by
wrapping OS-specific features, increasing · Modeling skills, in
the portability of your application. particular class modeling,
sequence diagramming,
and state modeling

Collaboration within a layer is allowed. For example, UI objects can send messages to
other UI objects and business/domain objects can send messages to other
business/domain objects. Collaboration can also occur between layers connected by
arrows. As you see in Figure 1, interface classes may send messages to domain classes
but not to persistence classes. Domain classes may send messages to persistence classes,
but not to interface classes. By restricting the flow of messages to only one direction, you
dramatically increase the portability of your system by reducing the coupling between
classes. For example, the domain classes don’t rely on the user interface of the system,
implying that you can change the interface without affecting the underlying business logic.

All types of classes may interact with system classes. This is because your system layer
implements fundamental software features such as inter-process communication (IPC), a
service classes use to collaborate with classes on other computers, and audit logging,
which classes use to record critical actions taken by the software. For example, if your
user interface classes are running on a personal computer (PC) and your domain classes
are running on an EJB application server on another machine, and then your interface
classes will send messages to the domain classes via the IPC service in the system layer.
This service is often implemented via the use of middleware.

It’s critical to understand that this isn’t the only way to layer an application, but instead
that it is a very common one. The important thing is that you identify the layers that are
pertinent to your environment and then act accordingly.

Dataflow architecture is a computer architecture that directly contrasts the traditional


von Neumann architecture or control flow architecture. Dataflow architectures do not
231
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
have a program counter or (at least conceptually) the executability and execution of
instructions is solely determined based on the availability of input arguments to the
instructions. Although no commercially successful computer hardware has used a data
flow architecture, it is very relevant in many software architectures today including
database engine designs and parallel computing frameworks.

Software architecture
Dataflow is a software architecture based on the idea that changing the value of a
variable should automatically force recalculation of the values of other variables.

A data flow diagram (DFD) is a graphical representation of the "flow" of data through
an information system. A data flow diagram can also be used for the visualization of
data processing (structured design). It is common practice for a designer to draw a
context-level DFD first which shows the interaction between the system and outside
entities. This context-level DFD is then "exploded" to show more detail of the system
being modeled.

Azna, the original developer of structured design, based on Martin and Estrin's "data
flow graph" model of computation. Data flow diagrams (DFDs) are one of the three
essential perspectives of SSADM. The sponsor of a project and the end users will need
to be briefed and consulted throughout all stages of a systems evolution. With a
dataflow diagram, users are able to visualize how the system will operate, what the
system will accomplish and how the system will be implemented. Old system dataflow
diagrams can be drawn up and compared with the new systems dataflow diagrams to
draw comparisons to implement a more efficient system. Dataflow diagrams can be
used to provide the end user with a physical idea of where the data they input,
ultimately has an effect upon the structure of the whole system from order to dispatch
to restock how any system is developed can be determined through a dataflow
diagram.

Components
A data flow diagram illustrates the processes, data stores, and external entities in a
business or other system and the connecting data flows.

Data flow diagram example


The four components of a data flow diagram (DFD) are:

232
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Data flow diagram notation

External Entities/Terminators
are outside of the system being modeled. Terminators represent where information
comes from and where it goes. In designing a system, we have no idea about what
these terminators do or how they do it.
Processes
modify the inputs in the process of generating the outputs
Data Stores
represent a place in the process where data comes to rest. A DFD does not say
anything about the relative timing of the processes, so a data store might be a
place to accumulate data over a year for the annual accounting process.
Data Flows
are how data moves between terminators, processes, and data stores (those that
cross the system boundary are known as IO or Input Output Descriptions).
Every page in a DFD should contain fewer than 10 components. If a process has more
than 10 components, then one or more components (typically a process) should be
combined into one and another DFD be generated that describes that component in
more detail. Each component should be numbered, as should each subcomponent, and
so on. So for example, a top level DFD would have components 1 2 3 4 5, the
subcomponent DFD of component 3 would have components 3.1, 3.2, 3.3, and 3.4; and
the sub subcomponent DFD of component 3.2 would have components 3.2.1, 3.2.2, and
3.2.3

Data store
A''''data store is a repository for data. Data stores can be manual, digital, or
temporary.''''

233
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Duplication
External entities and data stores can be duplicated in the system for more clarity, while
processes cannot. External entities that have been replicated are marked by an asterisk
(\) in the lower left part of the oval that represents that entity. Data stores have a
double line on the left side of their box .

Developing a DFD

Top-Down Approach
1. The system designer makes a context level DFD, which shows the interaction (data
flows) between the system (represented by one process) and the system
environment (represented by terminators).

2. The system is decomposed in lower level DFD (Zero) into a set of processes, data
stores, and the data flows between these processes and data stores.

3. Each process is then decomposed into an even lower level diagram containing its
sub processes.

4. This approach then continues on the subsequent sub processes, until a necessary
and sufficient level of detail is reached which is called the primitive process (aka
chewable in one bite).

Event Partitioning Approach


1. Construct detail DFD.

1. The list of all events is made.

2. For each event a process is constructed.

3. Each process is linked (with incoming data flows) directly with other
processes or via data stores, so that it has enough information to respond to
given event.

4. The reaction of each process to a given event is modeled by an outgoing data


flow.

DFD tools
· Concept Draw - Windows and MacOS X data flow diagramming tool

· Dia - open source diagramming tool with DFD support

234
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
· Microsoft Visio - Windows diagramming tool which includes very basic DFD support
(Images only, does not record data flows

Dataflow programming languages embody these principles, with Spreadsheets perhaps


the most widespread embodiment of dataflow. For example, in a spreadsheet you can
specify a cell formula which depends on other cells; then when any of those cells is
updated the first cell's value is automatically recalculated. It's possible for one change
to initiate a whole sequence of changes, if one cell depends on another cell which
depends on yet another cell, and so on.

The dataflow technique is not restricted to recalculating numeric values, as done in


spreadsheets. For example, dataflow can be used to redraw a picture in response to
mouse movements, or to make robot turn in response to a change in light level.

One benefit of dataflow is that it can reduce the amount of coupling-related code in a
program. For example, without dataflow, if a variable X depends on a variable Y, then
whenever Y is changed X must be explicitly recalculated. This means that Y is coupled to
X. Since X is also coupled to Y (because X's value depends on the Y's value), the
program ends up with a cyclic dependency between the two variables. Most good
programmers will get rid of this cycle by using an observer pattern, but only at the cost
of introducing a non-trivial amount of code. Dataflow improves this situation by making
the recalculation of X automatic, thereby eliminating the coupling between from Y to X.
Dataflow makes implicit a significant amount of code that otherwise would have had to
be tediously explicit.

Dataflow is also sometimes referred to as reactive programming.

There have been a few programming languages created specifically to support dataflow.
In particular, many (if not most) visual programming languages have been based on
the idea of dataflow. A good example of a Java-based framework is Pervasive
DataRush.

Diagrams
The term dataflow may also be used to refer to the flow of data within a system, and
is the name normally given to the arrows in a data flow diagram that represent the flow
of data between external entities, processes, and data stores.

Concurrency
A dataflow network is a network of concurrently executing processes or automata
that can communicate by sending data over channels (see message passing.)

235
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Kahn process networks, named after one of the pioneers of dataflow networks, are a
particularly important class of such networks. In a Kahn process network the processes
are determinate. This implies that they satisfy the so-called Kahn's principle, which
roughly speaking states that, each determinate process computes a continuous function
from input streams to output streams, and that a network of determinate processes is
itself determinate, thus computing a continuous function. This implies that the
behaviour of such networks can be described by a set of recursive equations, which can
be solved using fix point theory.

The concept of dataflow networks is closely related to another model of concurrency


known as the Actor model.

Hardware architecture
Hardware architectures for dataflow was a major topic in Computer architecture
research in the 1970s and early 1980s. Jack Dennis of MIT pioneered the field of static
dataflow architectures while the Manchester Dataflow Machine and MIT Tagged Token
architecture were major projects in dynamic dataflow.

A compiler analyzes a computer program for the data dependencies between


operations. It does this in order to better optimize the instruction sequences. Normally,
the compiled output has the results of these optimizations, but the dependency
information itself is not recorded within the compiled binary code.

A compiled program for a dataflow machine would keep this dependency information. A
dataflow compiler would record these dependencies by creating unique tags for each
dependency instead of using variable names. By giving each dependency a unique tag,
it exposes any possibility of parallel execution of non-dependent instructions. Each
instruction, along with its tagged operands would be stored in the compiled binary code.

The compiled program would be loaded into a Content-addressable memory of the


dataflow computer. When all of the tagged operands of an instruction became available,
that is previously calculated, the instruction was marked as available for execution by
an execution unit. This was known as activating or firing the instruction.

Once the instruction was completed by the execution unit, its output data would be
broadcast (with its tag) to the CAM memory. Any other instructions that were
dependent on this particular datum (identified by its tag value) would be updated. In
this way, subsequent instructions would be activated.

Instructions would be activated in data order, that is when all of the required data
operands were available. This order can be different from the sequential order
envisioned by the human programmer, the programmed order.

The instructions along with their required data would be transported as packets to the
execution units. These packets are often known as instruction tokens. Similarly, data
236
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
results are transported back to the CAM as data tokens. The packetization of
instructions and results allowed for parallel execution of activated instructions on a large
scale. Connection networks would deliver the activated instruction tokens to the
execution units and return data tokens to the instruction CAM memory. In contrast to
the conventional von Neumann architecture, data tokens are not permanently stored in
memory, rather they are transient messages that only exist when in transit to the
instruction storage.

Earlier designs that only used instruction addresses as data dependency tags were
called static dataflow machines. These machines could not allow instructions from
multiple loop iterations (or multiple calls to the same routine) to be issued
simultaneously as the simple tags could not differentiate between the different loop
iterations (or each invocation of the routine). Later designs called dynamic dataflow
machines used more complex tags to allow greater parallelism from these cases.

The research, however, never overcame the problems related to:

· efficiently broadcasting data tokens in a massively parallel system

· efficiently dispatching instruction tokens in a massively parallel system

· building CAMs large enough to hold all of the dependencies of a real programs

Instructions and their data dependencies proved to be too fine-grained to be effectively


distributed in a large network. That is, the time for the instructions and tagged results
to travel through a large connection network was longer than the time to actually do the
computations.

Out-of-order execution is the conceptual descendant of dataflow computation and has


become the dominant computing paradigm since the 1990s. It is a form of restricted
dataflow. This paradigm introduced the idea of an execution window. The execution
window follows the programmed sequential order of the program, however within the
window, instructions are allowed to be completed in data dependency order. This is
accomplished by the computer hardware dynamically tagging the data dependencies
within the window. The logical complexity of dynamically keeping track of the data
dependencies, restricts OoO CPUs to a small number of execution units (2-6) and the
execution window sizes to the range of 32 to 200
instructions, much smaller than Blue RJ-45 patchcord of the type envisioned for
full dataflow machines. commonly used to connect network
devices.

237
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
A computer network is two or m o r e
computers connected together using a
telecommunication system for the purpose of
communicating and sharing resources

Blue RJ-45 patchcord of the type


commonly used to connect network
devices.

Experts in the field of networking d e b a t e


whether two computers that are connected together using some form of
communications medium constitute a network. Therefore, some sources will state that a
network requires three connected computers. A computer connected to a
non-computing device (e.g., networked to a printer via an Ethernet link) may also
represent a computer network, although this article does not currently address this
configuration. For example, [1] states that "the term network describes two or more
connected computers" while [2] states that a computer network is "A network of data
processing nodes that are interconnected for the purpose of data communication", the
term "network" being defined in the same document as "An interconnection of three or
more communicating entities" (author's emphasis).

This article uses the definition which requires two or more computers to be connected
together to form a network. The same basic functions are generally present in this case
as with larger numbers of connected computers.

Basic Computer Network Building Blocks

Computers
Many of the components of an average network are individual computers, which are
generally either workstations (including personal computers) or servers.
Types of Workstations
There are many types of workstations that may be incorporated into a particular
network, some of which have high-end displays, multiple CPUs, large amounts of
RAM, large amounts of hard drive storage space, or other enhancements required
for special data processing tasks, graphics, or other resource intensive applications.
(See also network computer).

Types of Servers
The following lists some common types of servers and their purpose.
File Server
238
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Stores various types of files and distributes them to other clients on the network.
Print Server
Controls and manages one or more printers and accepts print jobs from other
network clients, spooling the print jobs, and performing most or all of the other
functions that a workstation would perform to accomplish a printing task if the
printer were connected directly to the workstation's printer port.
Mail Server
Stores, sends, receives, routes, and performs other email related operations for
other clients on the network.
Fax Server
Stores, sends, receives, routes, and performs other functions necessary for the
proper transmission, reception, and distribution of faxes.
Telephony Server
Performs telephony related functions such as answering calls automatically,
performing the functions of an interactive voice response system, storing and
serving voice mail, routing calls between the Public Switched Telephone Network
(PSTN) and the network or the Internet (e.g., voice over IP (VoIP) gateway), etc.

Proxy Server
Performs some type of function on behalf of other clients on the network to
increase the performance of certain operations (e.g., prefetching and caching
documents or other data that is requested very frequently) or as a security
precaution to isolate network clients from external threats.
Remote Access Server (RAS)
Monitors modem lines or other network communications channels for requests to
connect to the network from a remote location, answers the incoming telephone
call or acknowledges the network request, and performs the necessary security
checks and other procedures necessary to log a user onto the network.
Application Server
Performs the data processing or business logic portion of a client application,
accepting instructions for operations to perform from a workstation and serving the
results back to the workstation, while the workstation performs the user interface
or GUI portion of the processing (i.e., the presentation logic) that is required for the
application to work properly.
Web Server
Stores HTML documents, images, text files, scripts, and other Web related data
(collectively known as content), and distributes this content to other clients on the
network on request.
Backup Server
Has network backup software installed and has large amounts of hard drive storage
or other forms of storage (tape, etc.) available to it to be used for the purpose of
ensuring that data loss does not occur in the network.

Printers

239
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Many printers are capable of acting as part of a computer network without any
other device, such as a print server, to act as an intermediary between the printer
and the device that is requesting a print job to be completed.

Dumb Terminals
Many networks use dumb terminals instead of workstations either for data entry
and display purposes or in some cases where the application runs entirely on the
server.

Other Devices
There are many other types of devices that may be used to build a network, many
of which require an understanding of more advanced computer networking
concepts before they are able to be easily understood (e.g., hubs, routers, bridges,
switches, hardware firewalls, etc.). On home and mobile networks, connecting
consumer electronics devices such as video game consoles is becoming increasingly
common.

Building a Computer Network

A Simple Network
A simple computer network may be constructed from two computers by adding a
network adapter (Network Interface Controller (NIC)) to each computer and then
connecting them together with a special cable called a crossover cable. This type of
network is useful for transferring information between two computers that are not
normally connected to each other by a permanent network connection or for basic
home networking applications. Alternatively, a network between two computers can
be established without dedicated extra hardware by using a standard connection
such as the RS-232 serial port on both computers, connecting them to each other
via a special cross linked null modem cable.

Practical Networks
Practical networks generally consist of more than two interconnected computers
and generally require special devices in addition to the Network Interface Controller
that each computer needs to be equipped with. Examples of some of these special
devices are listed above under Basic Computer Network Building Blocks / Other
devices.

Types of Networks:
Below is a list of the most common types of computer networks.

240
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Local Area Network (LAN):
A network that is limited to a relatively small spatial area such as a room, a single
building, a ship, or an aircraft. Local area networks are sometimes called a single
location network.
Note: For administrative purposes, large LANs are generally divided into smaller
logical segments called workgroups. A workgroup is a group of computers that
share a common set of resources within a LAN.

Campus Area Network (CAN):


A network that connects two or more LANs but that is limited to a specific (possibly
private) geographical area such as a college campus, industrial complex, or a
military base.
Note: A CAN is generally limited to an area that is smaller than a Metropolitan Area
Network

Metropolitan Area Network (MAN):


A network that connects two or more LANs or CANs together but does not extend
beyond the boundaries of the immediate town, city, or metropolitan area. Multiple
routers, switches & hubs are connected to create a MAN.

Wide Area Network (WAN):


A network that covers a broad geographical area (i.e., any network whose
communications links cross metropolitan, regional, or national boundaries) or, less
formally, a network that uses routers and public communications links.
Types of WANs:
Centralized:
A centralized WAN consists of a central computer that is connected to dumb
terminals and / or other types of terminal devices.
Distributed:
A distributed WAN consists of two or more computers in different locations and may
also include connections to dumb terminals and other types of terminal devices.

Internetwork:
Two or more networks or network segments connected using devices that operate
at layer 3 (the 'network' layer) of the OSI Basic Reference Model, such as a router.
Note: Any interconnection among or between public, private, commercial,
industrial, or governmental networks may also be defined as an internetwork.
Internet, The:
A specific internetwork, consisting of a worldwide interconnection of governmental,
academic, public, and private networks based upon the Advanced Research
Projects Agency Network (ARPANET) developed by ARPA of the U.S. Department of
Defense – also home to the World Wide Web (WWW) and referred to as the
'Internet' with a capital 'I' to distinguish it from other generic internetworks.
241
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Synonyms for the 'Internet' also include the 'Web' or, in a more comical sense, the
'Interweb'.

Intranet:
A network or internetwork that is limited in scope to a single organization or entity
or, also, a network or internetwork that is limited in scope to a single organization
or entity and which uses the TCP/IP protocol suite, HTTP, FTP, and other network
protocols and software commonly used on the Internet.
Note: Intranets may also be categorized as a LAN, CAN, MAN, WAN, or other type
of network.

Extranet:
A network or internetwork that is limited in scope to a single organization or entity
but which also has limited connections to the networks of one or more other
usually, but not necessarily, trusted organizations or entities (e.g., a company's
customers may be provided access to some part of its intranet thusly creating an
extranet while at the same time the customers may not be considered 'trusted'
from a security standpoint).
Note: Technically, an extranet may also be categorized as a CAN, MAN, WAN, or
other type of network, although, by definition, an extranet cannot consist of a
single LAN, because an extranet must have at least one connection with an outside
network.
Intranets and extranets may or may not have connections to the Internet. If
connected to the Internet, the intranet or extranet is normally protected from being
accessed from the Internet without proper authorization. The Internet itself is not
considered to be a part of the intranet or extranet, although the Internet may serve
as a portal for access to portions of an extranet.

Classification of computer networks

By network layer
Computer networks may be classified according to the network layer at which they
operate according to some basic reference models that are considered to be
standards in the industry such as the seven layer OSI reference model and the five
layer TCP/IP model.

By scale
Computer networks may be classified according to the scale or extent of reach of
the network, for example as a Personal area network (PAN), Local area network
(LAN), Wireless local area network (WLAN), Campus area network (CAN),
Metropolitan area network (MAN), or Wide area network (WAN).

242
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
By connection method
Computer networks may be classified according to the technology that is used to
connect the individual devices in the network such as HomePNA, Power line
communication, Ethernet, or WiFi.

By functional relationship
Computer networks may be classified according to the functional relationships which
exist between the elements of the network, for example Active Networking,
Client-server and Peer-to-peer (workgroup) architectures.

By network topology
Computer networks may be classified according to the network topology upon which the
network is based, such as Bus network, Star network, Ring network, Mesh network,
Star-bus network, Tree or Hierarchical topology network, etc.

Topology can be arranged in a Geometric Arrangement

By services provided
Computer networks may be classified according to the services which they provide,
such as Storage area networks, Server farms, Process control networks, Value-added
network, SOHO network, Wireless community network, XML appliance, Jungle
Networks, khadar network, etc.

By protocol

243
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
Computer networks may be classified according to the communications protocol that is
being used on the network. See the articles on List of network protocol stacks and List
of network protocols for more information.

S ampl e
Networks

244
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327
245
f.c. ledesma avenue, san carlos city, negros occidental
Tel. #: (034) 312-6189 / (034) 729-4327