Sisteme Integrate Ver5

2009 – 2011
FACULTY OF
AUTOMATION
EMBEDDED SYSTEMS
Cosmin Ionete
Dragos Surlea
Nicolae Neagu
Contents
1. Embedded Systems Architecture ................................................................................................................6
1.1 What is an embedded system ? ---------------------------------------------------------------------------------6
1.2 Microprocessor and Microcontroller Architectures --------------------------------------------------------7
1.3 Microprocessor/Microcontroller Basics --------------------------------------------------------------------- 11
1.3.1 What is a microprocessor? .............................................................................................................. 11
1.3.1.2 Microprocessor Fundamentals ................................................................................................. 16
1.3.2 What is a microcontroller? .............................................................................................................. 20
1.3.3 Some differences between microprocessors and microcontrollers................................................ 21
1.4 Compiling, Linking, and Locating ------------------------------------------------------------------------------ 26
1.4.1 The Build Process ........................................................................................................................ 26
1.4.2 Compiling .................................................................................................................................... 28
1.4.3 Linking ......................................................................................................................................... 28
1.4.4 Locating ....................................................................................................................................... 30
1.4.5 Dowloading and Debugging ........................................................................................................ 30
1.4.6 Emulators .................................................................................................................................... 32
2. Fixed points vs. Floating point numbers. Fundamentals .............................................................................. 34
2.1 About Fixed-Point Numbers ----------------------------------------------------------------------------------------- 34
2.2 Scaling--------------------------------------------------------------------------------------------------------------------- 36
2.2.1 Quantization, Range and Precision ............................................................................................. 41
2.2.2 Recommendations for Arithmetic and Scaling ........................................................................... 46
3. Microcontroller CPU, Interupts, Memory, and I/O ...................................................................................... 49
3.1 CPU – Central Processing Unit -------------------------------------------------------------------------------------- 49
3.2 Interrupts ---------------------------------------------------------------------------------------------------------------- 50
3.1.1.1 Vectored Interrupts & Non-Vectored Interrupts ............................................................... 53
3.1.1.2 Interrupt Priority ................................................................................................................ 54
3.1.1.3 Serial communication with polling and interrupts ............................................................ 54
3.3 On-Chip Memory ------------------------------------------------------------------------------------------------------- 59

3.3.1 Read-Only Memory (ROM) .............................................................................................................. 61
3.3.2 Random-Access Memory (RAM)...................................................................................................... 62
3.3.3 Hybrid Types .................................................................................................................................... 64
3.4 I/O-------------------------------------------------------------------------------------------------------------------------- 66
3.4.1 Study of External Peripherals .......................................................................................................... 67
3.4.1.1 Initialize the Hardware ....................................................................................................... 67
3.4.2 Peripheral devices ........................................................................................................................... 68
3.4.2.1 Control and Status Registers .................................................................................................... 68
3.4.2.2 The Device Driver Philosophy ................................................................................................... 69
5. Decodificarea adreselor................................................................................................................................ 71
6. Flip-Flops, Registers, Counters ..................................................................................................................... 75
6.1 Flip-Flops ----------------------------------------------------------------------------------------------------------------- 75
6.1.1 RS Flip-Flops ..................................................................................................................................... 75
6.1.1 Gated D latch ................................................................................................................................... 76
6.1.2 Master-Slave and Edge-Triggered D Flip-Flops................................................................................ 76
6.1.3 D Flip-Flops with Clear and Preset ................................................................................................... 78
6.1.4 T Flip-Flop ........................................................................................................................................ 80
6.1.5 T JK Flip-Flop .................................................................................................................................... 81
6.2 Registers ------------------------------------------------------------------------------------------------------------------ 82
6.2.1 Shift Register---------------------------------------------------------------------------------------------------------- 82
6.2.2 Parallel-Access Shift Register ------------------------------------------------------------------------------------- 83
6.3 Counters ------------------------------------------------------------------------------------------------------------------ 84
6.3.1 Asynchronous Counters................................................................................................................... 84
6.3.1.1 Up-Counter with T Flip-Flops .................................................................................................... 85
6.3.1.2 Down-Counter with T Flip-Flops ............................................................................................... 86
6.3.2 Synchronous Counters..................................................................................................................... 86
6.3.2.1 Synchronous Counter with T Flip-Flops .................................................................................... 86
6.3.2.2 Synchronous Counter with D Flip-Flops ................................................................................... 88

6.3.3 Counters with Parallel Load ............................................................................................................. 90
7. Timers/Counters ........................................................................................................................................... 92
3.3.1.1 Reloading a timer ............................................................................................................... 93
3.3.1.2 Input Capture Timer ........................................................................................................... 93
3.3.1.3 Watchdog Timer................................................................................................................. 93
3.3.1.4 Using Timers ....................................................................................................................... 94
8. PWM Control ................................................................................................................................................ 95
8.1 Examples and description -------------------------------------------------------------------------------------------- 95
8.2 Concepts of Pulse Width Modulation (PWM) ----------------------------------------------------------------- 102
8.3 PWM Study ------------------------------------------------------------------------------------------------------------ 105
9. DAC and ADC .............................................................................................................................................. 112
9.1 Digital-to-Analog Converters (DAC)------------------------------------------------------------------------------ 112
9.2 Analog-to-Digital Converters (ADC)------------------------------------------------------------------------------ 113
9.2.1 Reference Voltage ......................................................................................................................... 113
9.2.1 Resolution ...................................................................................................................................... 113
10. Communication ........................................................................................................................................ 114
10.1 UART ------------------------------------------------------------------------------------------------------------------- 114
10.1.1 Synchronous Serial Transmission ................................................................................................ 114
10.1.2 Asynchronous Serial Transmission .............................................................................................. 115
10.2 RS232 ------------------------------------------------------------------------------------------------------------------ 115
10.3 Serial Peripheral Interface .............................................................................................................. 119
10.4 Local Interconnect Network (LIN)................................................................................................... 122
10.4 Controller Area Network ................................................................................................................ 127
11. IDE – Integrated Development Environment ..................................................................................... 135
11.1 Source Code Editor --------------------------------------------------------------------------------------------- 135
11.2 Compiler ----------------------------------------------------------------------------------------------------------- 136
11.2.1 Front end .............................................................................................................................. 136
11.2.2 Back end ............................................................................................................................... 137

11.3 Linker --------------------------------------------------------------------------------------------------------------- 137
11.4 Debugger ---------------------------------------------------------------------------------------------------------- 138
12. Real-Time Operating Systems............................................................................................................. 138
12.1 Introduction ------------------------------------------------------------------------------------------------------ 138
12.2 Defining an RTOS ------------------------------------------------------------------------------------------------ 138
12.3 The Scheduler ---------------------------------------------------------------------------------------------------- 140
12.3.1 Schedulable Entities ............................................................................................................. 140
12.3.2 Multitasking .......................................................................................................................... 141
12.3.3 The Context Switch ............................................................................................................... 141
12.3.4 The Dispatcher...................................................................................................................... 142
12.3.5 Scheduling Algorithms .......................................................................................................... 142
12.4 Objects ------------------------------------------------------------------------------------------------------------- 144
12.4.1 Tasks ..................................................................................................................................... 144
12.4.1.1 Introduction ..................................................................................................................... 144
12.4.1.2 Defining a Task ................................................................................................................. 144
12.4.1.3 Task States and Scheduling .............................................................................................. 146
12.4.1.4 Typical Task Operations ................................................................................................... 149
12.4.2 Semaphores .......................................................................................................................... 151
12.4.2.1 Introduction ..................................................................................................................... 151
12.4.2.2 Defining Semaphores ....................................................................................................... 151
12.4.2.3 Typical Semaphore Operations ........................................................................................ 155
12.5 Services ------------------------------------------------------------------------------------------------------------ 156

1. Embedded Systems Architecture
1.1 What is an embedded system ?
An embedded system is a special-purpose computer system designed to perform one or a few
dedicated functions. It is usually embedded as part of a complete device including hardware and
mechanical parts. In contrast, a general-purpose computer, such as a personal computer, can do
many different tasks depending on programming. Since the embedded system is dedicated to
specific tasks, design engineers can optimize it, reducing the size and cost of the product, or
increasing the reliability and performance. Complexity varies from low, with a single
microcontroller chip, to very high with multiple units, peripherals and networks mounted inside a
large chassis or enclosure.
In general, "embedded system" is not an exactly defined term, as many systems have some
element of programmability. For example, Handheld computers share some elements with
embedded systems - such as the operating systems and microprocessors which power them - but are
not truly embedded systems, because they allow different applications to be loaded and peripherals
to be connected. Some of the actual commercial applications of embedded systems include:
Market Embedded Device

Ignition System
Engine Control
Automotive
Brake System (Antilock Braking System)
Interior/Exterior Lights
Set-Top Boxes (DVDs, VCRs, Cable Boxes, etc.)
Kitchen Appliances (Refrigerators, Toasters, Microwave Ovens)
Cameras
Consumer Handheld tools
Electronics Remote control devices
Security systems
Global Positioning Systems (GPS)
Cordless and cellular phones
Robotics and Control Systems (Manufacturing)
Industrial Control Electronic measurement instruments (e.g., digital multimeters, frequency
synthesisers, and oscilloscopes)
Infusion Pumps
Medical Dialysis Machines
Prosthetic Devices
Hearing aids
Cardiac Monitors
Routers
Networking Hubs
Gateways
Fax Machine
Monitors
Office Automation Scanners
Photocopier
Printers
Selecting a particular processor for a given application is usually a function of the designer’s
familiarity with a particular architecture. While there are many variations in the details and specific
features, there are two general categories of devices: microprocessors and microcontrollers. The key
difference between a microprocessor and a microcontroller is that a microprocessor contains only a
central processing unit (CPU) while a microcontroller has memory and I/O on the chip in addition to
a CPU. Microcontrollers are generally used for dedicated tasks. Microcomputer is a general term
that applies to complete computer systems implemented with either a microprocessor or
microcontroller.
1.2 Microprocessor and Microcontroller Architectures

Microprocessors are generally utilized for relatively high performance applications where
cost and size are not critical selection criteria. Because microprocessor chips have their entire
function dedicated to the CPU and thus have room for more circuitry to increase execution speed,
they can achieve very high-levels of processing power. However, microprocessors require external
memory and I/O hardware. Microprocessor chips are used in desktop PCs and workstations where
software compatibility, performance, generality, and flexibility are important.
By contrast, microcontroller chips are usually designed to minimize the total chip count and
cost by incorporating memory and I/O on the chip. They are often “application specialized” at the
expense of flexibility. In some cases, the microcontroller has enough resources on-chip that it is the
only IC required for a product. Examples of a single-chip application include the key fob used to
arm a security system, a toaster, or hand-held games. The hardware interfaces of both devices have
much in common, and those of the microcontrollers are generally a simplified subset of the
microprocessor. The primary design goals for each type of chip can be summarized this way:
• microprocessors are most flexible
• microcontrollers are most compact
Microcontroller Architectures
A. Princeton (Von Neumann) vs. Harvard
There are also differences in the basic CPU architectures used, and these tend to reflect the
application. Microprocessor based machines usually have a von Neumann architecture with a
single memory for both programs and data to allow maximum flexibility in allocation of
memory. Microcontroller chips, on the other hand, frequently embody the Harvard
architecture, which has separate memories for programs and data. Figure 1.1 illustrates this
difference.
Program Data Program

CPU and Data CPU
Memory Memory Memory
Figure 1.1 - At left is the von Neumann architecture; at right is the Harvard architecture
Princeton architecture:
All memory space on same bus

Every location has unique address so instructions and data treated the same way
Possible bottleneck between instruction and data fetches
Overcome with instruction prefetching (overlapping, pipelining) and/or Instruction/Data
caches
Simplifies processor design -- one memory interface
More reliable -- fewer things can fail
Also RAM can be used for both data and instruction storage
Greater flexibility in design of software (esp. real-time OS)
One advantage the Harvard architecture has for embedded applications is due to the two types of
memory used in embedded systems. A fixed program and constants can be stored in non-volatile
ROM memory while working variable data storage can reside in volatile RAM. Volatile memory
loses its contents when power is removed, but non-volatile ROM memory always maintains its
contents even after power is removed.
The Harvard architecture also has the potential advantage of a separate interface allowing
twice the memory transfer rate by allowing instruction fetches to occur in parallel with data
transfers. Unfortunately, in most Harvard architecture machines, the memory is connected to the
CPU using a bus that limits the parallelism to a single bus.
A typical embedded computer consists of the CPU, memory, and I/O. They are most often
connected by means of a shared bus for communication. The peripherals on a microcontroller chip
are typically timers, counters, serial or parallel data ports, and analog-to-digital and digital-to-analog
converters that are integrated directly on the chip. The performance of these peripherals is generally
less than that of dedicated peripheral chips, which are frequently used with microprocessor chips.
However, having the bus connections, CPU, memory, and I/O functions on one chip has several
advantages:
- Fewer chips are required since most functions are already present on the processor chip.
- Lower cost and smaller size result from a simpler design.
- Lower power requirements because on-chip power requirements are much smaller than
external loads.
- Fewer external connections are required because most are made on-chip, and most of the
chip connections can be used for I/O.
- More pins on the chip are available for user I/O since they aren’t needed for the bus.
- Overall reliability is higher since there are fewer components and interconnections.
Of course there are disadvantages too, including:
- Reduced flexibility since you can’t easily change the functions designed into the chip.
- Expansion of memory or I/O is limited or impossible.
- Limited data transfer rates due to practical size and speed limits for a single-chip.
- Lower performance I/O because of design compromises to fit everything on one chip.
The von Neumann machine, with only one memory, requires all instruction and data
transfers to occur on the same interface. This is sometimes referred to as the “von Neumann
bottleneck.” In common computer architectures, this is the primary upper limit to processor
throughput. The Harvard architecture has the potential advantage of a separate interface allowing
twice the memory transfer rate by allowing instruction fetches to occur in parallel with data
transfers. Unfortunately, in most Harvard architecture machines, the memory is connected to the
CPU using a bus that limits the parallelism to a single bus. The memory separation is still used to
advantage in microcontrollers, as the program is usually stored in non-volatile memory (program is
not lost when power is removed), and the temporary data storage is in volatile memory.
Non-volatile memories, such as read-only memory (ROM) are used in both types of systems
to store permanent programs. In a desktop PC, ROMs are used to store just the start-up or bootstrap
programs and hardware specific programs. Volatile random access memory (RAM) can be read and
written easily, but it loses its contents when power is removed. RAM is used to store both
application programs and data in PCs that need to be able to run many different programs. In a
dedicated embedded computer, however, the programs are stored permanently in ROM where they
will always be available. Microcontroller chips that are used
in dedicated applications generally use ROM for program storage and RAM for data storage.
B. CISC vs. RISC
• CISC (Complex Instruction Set Computer)

Tend to have many instruction in instruction set
Can carry out complex operations (many used very infrequently)
Many are very long (many bits)
And require many clock cycles
• RISC (Reduced Instruction Set Computer)
Few instructions
Simple instructions
Short (few bits) and fast
Often orthogonal instruction sets
Can read/write/use all registers in same way
Allows for great power and flexibility
Example PICmicro
Many other microcontrollers use RISC
Some microprocessors offer both CISC and RISC features
C. Microcoded versus Hardwired processors
The under cover design of a processor
Microcoded
Processor within a processor
Signals required to execute instructions "fetched" from internal "Control ROM" memory
Allows for great flexibility in instruction set
Easier to design
Slower than hardwired
Hardwired
Signals required to execute instruction generated by logic gates (combinational circuitry)
The "control matrix" is:
Faster
Less flexible
1.3 Microprocessor/Microcontroller Basics

Microprocessor vs Micro-controllers
• Microprocessors
–high end of market where performance matters
–high power dissipation
–high cost–need peripheral devices to work
–mostly used in microcomputers
• Microcontollers
–targeted for low end of market where performance does not matter
–low power dissipation
–low cost
–memory plus I/O devices, all integrated into one chip
–Mostly used in embedded systems
1.3.1 What is a microprocessor?
A device that integrates a number of useful functions into a single IC package

Some functions are:
- Ability to execute a stored set of instructions to carry out user defined tasks.
- Ability to access external memory chips to read/write data from/to memory.
- Ability to interface with I/O devices
There are three groups of signals, or buses, that connect the CPU to the other major components.
The buses are:
- Data bus
- Address bus
- Control bus
• concepts of address and data is fundamental to the operation of the microprocessor
• memory -consists of locations uniquely identified by CPU through their address
• CPU communicates with those addresses to read and write the data
• the communications go via buses
• the CPU -responsible for control of address, data and control buses
• All devices attached to data bus -potential clash
• Devices connected to data buses can be driven to high-impedance states
• The ability of devices to set their output at either logic 1, logic 0 or in a high impedance state is an
essential feature of common bus systems and is termed a tristate device.
A. Data bus - to transfer the data associated with the processing function of the microprocessor. (8
lines, typically)
The data bus width is defined as the number of bits that can be transferred on the bus at one
time. This defines the processor’s “word size.” Many chip vendors define the word size based on the
width of an internal data bus. A processor with eight data bus pins is an 8-bit CPU. Both instructions
and data are transferred on the data bus one “word” at a time. This allows the re-use of the same
connections for many different types of information. Due to packaging limitations, the number of
connections or pins on a chip is limited. By sharing the pins in this way, the number of pins required
is reduced at the expense of increased complexity in the external circuits. Many processors also take
this a step further and share some or all of the data bus pins to carry address information as well.
This is referred to as a multiplexed address/data bus. Processors that have multiplexed address/data
buses require an external address latch to separate and hold the address information stable for the
duration of a data transfer. The processor controls the direction of data transfer on the data
bus(read/write).
B. Address bus - contains the address of a specific memory location for accessing (reading/writing)
stored data. 16, typically
The address bus is a set of wires that are used to point to the memory or I/O location that is
to be read from or written to. The address signals must generally be held at a constant value for
some period of time before, during, and after the data is transferred. In most cases, the processor
actively drives the address bus with either instruction or data addresses.
Memory Read and Write Cycles
• Hardware Control lines used by the CPU to Control reads and Writes to Memory
• Active low signal RD asserted for a Read Cycle
• Active Low signal WR indicates a write
• RD and WR signals supply timing information to memory device
Read cycle
• It lasts 2 cycles of the clock signal:
1. address of required memory location puton address bus (by CPU), at rising edge
2. while device held at ‘tristate’ level -control bus issues ‘read signal’ (active low) to the device
(2nd cycle begins)
3. after delay -valid data placed on data bus
4. levels on the data bus sampled by CPUat falling edgeof the 2nd cycle
Write cycle
1. CPU places address at rising edge
2. decoding logic selects correct device
3. 2nd cycle -rising edge: CPU outputs data onto data bus & sets WRITE control bus signal active
(LOW)
•Note:–memory devices & other I/O components have static logic -do not depend on clock signal-
read data from data bus when write signal high (inactive) - data must be valid for transition
C. Control bus - carries the control signals to the memory and the I/O devices. Arbitrary number,
often 15.
The control bus is an assortment of signals that determine what kind of information is on the
data bus and determines where the data will go, in conjunction with the address bus. Most of the
design process is concerned with the logic and timing of the control signals. The timing analysis is
primarily involved with the relative timing between these control signals and the appearance and
disappearance of data and addresses on their respective buses.
1.3.1.2 Microprocessor Fundamentals
–MPU Register set and Internal Architecture

–MPU buses
–Memory Considerations
–MPU interfacing
The CPU
• processes the data by executing a program stored in the memory

• performs sequence of fetch-and-execute operations
• consists of:
– Control Unit + ALU + Registers
• responsible for the control of address, data and control buses (a ‘master’)
• all actions within μP synchronised to the CPU via a clock signal
• clock signal = a logic square-wave to drive all the circuitry in the μP, typically 1 to 30 MHz or
higher
The Control Unit

• determines timing and sequence of operations
• generates timing signals which are used to fetch program instructions from memory and to execute
it
• also responsible for decoding instructions
• supplies control signals to read and write data into registers, controls ALU and external control
signals
The ALU
• The arithmetic and logic unit (ALU) -responsible for data manipulation
• arithmetic operations, logic operations (AND, OR, XOR etc.)
• bit shifting, rotating, incrementing, decrementing, negate, complementing, addition etc.
Registers
• Registers –data/adressesthat CPU currently uses -stored in special memory (Small and fast)
locations on the CPU
• accumulator register-input to ALU is stored temp and sometime I/O operations. It may be 8, 16,
32 bits wide
• flags registeror status register–Individual bits in the register are called flags. Conditions of the
latest ALU operations are reflected. Used by subsequent jump, branch instructions
• general purpose register-temporary storage for data or addresses. Not assigned any specific task.
• program counter-tracks CPUs position in program. Width of the program counter is same as
address bus
• instruction register-stores instruction where it can be decoded; not accessible by the programmer
• index registers-hold the address of an operand when the indexed address mode is used
• stack pointer register-holds the address of the next memory location in the stack in RAM. Stack -
special area of RAM: last-in first-out (LIFO or FILO) file organisation. It is used during subroutine
calls andinteruppts
Types of registers:
Stack
• Part of memory where program data can be stored by a simple PUSH operation
• Restore data by a POP
• Stack is in main memory and is defined by the program
• Stack Pointer (SP) keeps track of the next location available on the Stack
• Organised as a FILO Buffer
General Registers
• Small set of internal registers -temporary data storage
• CU ensures that data from the correct register is presented to the CPU
• CU ensures that data is written back to correct register
• Accumulator usually holds ALU result
Status or Flags Register
• CF -Carry Flag
•1 -there is a carry out from the most significant bit
•0 -no carry out frommsb
• PF -Parity Flag
•1-low bye has an even number of 1 bits
•0 -low byte has odd parity
• AF -Auxiliary carry Flag
•1 -carry out from bit 3 on addition
•0 -borrow into bit 3 on addition
• ZF -Zero Flag
•1 -zero result
•0 -non-zero result
• SF -Sign Flag
•1 -msbis 1 (negative)
•0 -msbis 0 (positive)
• TF -Trap Flag
•Used by debuggers for single step operation
•1 -Trap on
•0 -Trap off
• IF -Interrupt Flag
•1 -Enabled
•0 -Disabled
• OF -Overflow Flag
•1 -signed overflow occurred
•0 -no overflow
Flag bits are set by instructions

Flag bits are basis of conditional jump instructions
The program status word (PSW) is an area of memory or a hardware register which contains
information about program state used by the operating system and the underlying hardware. It will
normally include a pointer (address) to the next instruction to be executed. The program status word
typically contains an error status field and condition codes such as the interrupt enable/disable bit
and a supervisor / user mode bit.
PSW
PSW contains information such as:
condition code bits (set by various comparison instructions)
CPU priority
mode:
user-mode: only a subset of instructions and features are accessible.
kernel-mode: all instructions and features are accessible.
The program status word (PSW) is 32 bits in length and contains the information required for proper
program execution. The PSW includes the instruction address, condition code, and other fields. In
general, the PSW is used to control instruction sequencing and to hold and indicate the status of the
system in relation to the program currently being executed. The active or controlling PSW is called
the current PSW. By storing the current PSW during an interruption, the status of the CPU can be
preserved for subsequent inspection. By loading a new PSW or part of a PSW, the state of the CPU
can be initialized or changed
1.3.2 What is a microcontroller?
- Common component in modern electronic systems

- Microprocessor-based device
- Basically, a device which integrates a number of the components of a microprocessor system
onto a single chip: Single-chip computer
- Completely self-contained with memory and I/O
- Only need to be supplied power and clocking
Primary role: provide inexpensive, programmable logic control and interfacing to external devices
e.g., turns devices on/off, monitor external conditions
Microcontroller combines on the same chip:
• The CPU core

• I/O
• Memory: PROM/EPROM/EEPROM/Flash memory
Variable RAM memory
Most microcontrollers will also combine other devices such as:
• A Timer module to allow the microcontroller to perform tasks for certain time
periods.
• Serial I/O (UART) for data flow between microcontroller and devices such as a PC or
other microcontroller.
• Analog input and output (e.g., to receive data from sensors or control motors)
• Interrupt capability (from a variety of sources)
• Bus/external memory interfaces (for RAM or ROM)
• Built-in monitor/debugger program
• Support for external peripherals (e.g., I/O and bus extenders)
A typical microcontroller; the different sub units integrated onto the microcontroller chip.
The heart of the microcontroller is the CPU core
1.3.3 Some differences between microprocessors and microcontrollers
MP: suited to processing information in computer systems

MC: suited to control of I/O devices requiring a minimum component count
Instruction sets:
MP: processing intensive
powerful addressing modes
instructions to perform complex operations & manipulate large volumes of data
processing capability of MCs never approaches those of MPs
large instructions -- e.g., 80X86 7-byte long instructions
MC: cater to control of inputs and outputs
instructions to set/clear bits
boolean operations (AND, OR, XOR, NOT, jump if a bit is set/cleared), etc.
Extremely compact instructions, many implemented in one byte
(Control program must often fit in the small, on-chip ROM)
Instruction sets:
• The set of instructions given to the μP to execute a task is called an instruction set
• Generally, instructions can be classified into the following categories:
– Data transfer
– Arithmetic
– Logical
– Program control
• Differ depending on the manufacturer, but some are reasonably common to most μP's.
A. Data transfer
1. Load
• reads the content of a specified memory location and copies it to the specified register
location in the CPU
2. Store
• copies the current contents of a specified register into a specified memory location.
B. Arithmetic
3. Add
• Adds the contents of a specified memory location to the data in some register
4. Decrement
• subtracts 1 from the content of a specified location.
5. Compare
• indicates whether the contents of a register are greater than, less than or same as the
contents of a specified memory location. The result appears as a flag in the status register.
C. Logical
6. AND
• carries out the logical AND operation with the contents of a specified memory location and
the data in some register
7. OR
• carries out the logical OR operation with the contents of a specified memory location and
the data in some register
8. EXCLUSIVE OR-(similar to 6, but for exclusive OR)
9. Logical shift
• moving the pattern of bits in the register one place to the left or right by moving zero (0) to
the end of the number
10. Arithmetic shift
• moving the pattern of bits one place left/right but with copying of the end bit into the
vacancy created by shift
D. Program control
•11. Jump
• changes the sequence in which the program is executed. So the program counter jumps to
some specified location (other than sequential)
12. Branch
• a conditional instruction which might be 'branch if zero'or 'branch if plus'. It is followed
if the right conditions are met.
13. Halt
• stops all further microprocessor activities
Hardware & Instructionset support:
MC: built-in I/O operations, event timing, enabling & setting up priority levels
for interrupts caused by external stimuli
MP: usually require external circuitry to do similar things (e.g, 8255 PPI, 8254 PIT,
8259 PIC)
Bus widths:
MP: very wide
large memory address spaces (>4 Gbytes)
lots of data (Data bus: 32, 64, 128 bits wide)
MC: narrow
relatively small memory address spaces (typically kBytes)
less data (Data bus typically 4, 8, 16 bits wide)
Clock rates:
MP very fast (> 1 GHz)
MC: Relatively slow (typically 10-20 MHz)
since most I/O devices being controlled are relatively slow
Cost:
MP's expensive (often > $100)
MCs cheap (often $1 - $10)
4-bit: < $1.00
8-bit: $1.00 - $8.00
16-32-bit: $6.00 - $20.00
1.4 Compiling, Linking, and Locating
1.4.1 The Build Process

There are a lot of things that software development tools can do automatically when the target
platform is well defined. This automation is possible because the tools can exploit features of the
hardware and operating system on which your program will execute. For example, if all of your
programs will be executed on IBM-compatible PCs running DOS, your compiler can automate-and,
therefore, hide from your view-certain aspects of the software build process.
Embedded software development tools, on the other hand, can rarely make assumptions about the
target platform. Instead, the user must provide some of his own knowledge of the system to the tools
by giving them more explicit instructions.
The term "target platform" is best understood to include not only the hardware but also the operating
system that forms the basic runtime environment for your software. If no operating system is
present-as is sometimes the case in an embedded system-the target platform is simply the processor
on which your program will be run.
The process of converting the source code representation of your embedded software into an
executable binary image involves three distinct steps. First, each of the source files must be
compiled or assembled into an object file. Second, all of the object files that result from the first step
must be linked together to produce a single object file, called the relocatable program. Finally,
physical memory addresses must be assigned to the relative offsets within the relocatable program in
a process called relocation. The result of this third step is a file that contains an executable binary
image that is ready to be run on the embedded system.
The embedded software development process just described is illustrated in Figure below. In this
figure, the three steps are shown from top to bottom, with the tools that perform them shown in
boxes that have rounded corners. Each of these development tools takes one or more files as input
and produces a single output file. More specific information about these tools and the files they
produce is provided in the sections that follow.
Each of the steps of the embedded software build process is a transformation performed by
software running on a general-purpose computer. To distinguish this development computer
(usually a PC or Unix workstation) from the target embedded system, it is referred to as the
host computer. In other words, the compiler, assembler, linker, and locator are all pieces of
software that run on a host computer, rather than on the embedded system itself. Yet, despite
the fact that they run on some other computer platform, these tools combine their efforts to
produce an executable binary image that will execute properly only on the target embedded
system. This split of responsibilities is shown in Figure below.
1.4.2 Compiling
The job of a compiler is mainly to translate programs written in some human-readable language into
an equivalent set of opcodes for a particular processor. In that sense, an assembler is also a compiler
(you might call it an "assembly language compiler") but one that performs a much simpler one-to-
one translation from one line of human-readable mnemonics to the equivalent opcode. Everything in
this section applies equally to compilers and assemblers. Together these tools make up the first step
of the embedded software build process.
Of course, each processor has its own unique machine language, so you need to choose a compiler
that is capable of producing programs for your specific target processor. In the embedded systems
case, this compiler almost always runs on the host computer. It simply doesn't make sense to execute
the compiler on the embedded system itself. A compiler such as this-that runs on one computer
platform and produces code for another-is called a cross-compiler. The use of a cross-compiler is
one of the defining features of embedded software development.
Regardless of the input language (C/C++, assembly, or any other), the output of the cross-compiler
will be an object file. This is a specially formatted binary file that contains the set of instructions and
data resulting from the language translation process. Although parts of this file contain executable
code, the object file is not intended to be executed directly. In fact, the internal structure of an object
file emphasizes the incompleteness of the larger program.
The contents of an object file can be thought of as a very large, flexible data structure. The structure
of the file is usually defined by a standard format like the Common Object File Format (COFF) or
Extended Linker Format (ELF). If you'll be using more than one compiler (i.e., you'll be writing
parts of your program in different source languages), you need to make sure that each is capable of
producing object files in the same format. Although many compilers (particularly those that run on
Unix platforms) support standard object file formats like COFF and ELF ( gcc supports both), there
are also some others that produce object files only in proprietary formats. If you're using one of the
compilers in the latter group, you might find that you need to buy all of your other development
tools from the same vendor.
Most object files begin with a header that describes the sections that follow. Each of these sections
contains one or more blocks of code or data that originated within the original source file. However,
these blocks have been regrouped by the compiler into related sections. For example, all of the code
blocks are collected into a section called text, initialized global variables (and their initial values)
into a section called data, and uninitialized global variables into a section called bss.
There is also usually a symbol table somewhere in the object file that contains the names and
locations of all the variables and functions referenced within the source file. Parts of this table may
be incomplete, however, because not all of the variables and functions are always defined in the
same file. These are the symbols that refer to variables and functions defined in other source files.
And it is up to the linker to resolve such unresolved references.
1.4.3 Linking
All of the object files resulting from step one (compiling) must be combined in a special way before
the program can be executed. The object files themselves are individually incomplete, most notably
in that some of the internal variable and function references have not yet been resolved. The job of
the linker is to combine these object files and, in the process, to resolve all of the unresolved
symbols.
The output of the linker is a new object file that contains all of the code and data from the input
object files and is in the same object file format. It does this by merging the text, data, and bss
sections of the input files. So, when the linker is finished executing, all of the machine language
code from all of the input object files will be in the text section of the new file, and all of the
initialized and uninitialized variables will reside in the new data and bss sections, respectively.
While the linker is in the process of merging the section contents, it is also on the lookout for
unresolved symbols. For example, if one object file contains an unresolved reference to a variable
named foo and a variable with that same name is declared in one of the other object files, the linker
will match them up. The unresolved reference will be replaced with a reference to the actual
variable. In other words, if foo is located at offset 14 of the output data section, its entry in the
symbol table will now contain that address.
The GNU linker (ld ) runs on all of the same host platforms as the GNU compiler. It is essentially a
command-line tool that takes the names of all the object files to be linked together as arguments. For
embedded development, a special object file that contains the compiled startup code must also be
included within this list.
Startup Code
One of the things that traditional software development tools do automatically is to insert startup
code. Startup code is a small block of assembly language code that prepares the way for the
execution of software written in a high-level language.
Each high-level language has its own set of expectations about the runtime environment. For
example, C and C++ both utilize an implicit stack. Space for the stack has to be allocated and
initialized before software written in either language can be properly executed. That is just one of
the responsibilities assigned to startup code for C/C++ programs.
Most cross-compilers for embedded systems include an assembly language file called startup.asm,
crt0.s (short for C runtime), or something similar. The location and contents of this file are usually
described in the documentation supplied with the compiler.
Startup code for C/C++ programs usually consists of the following actions, performed in the order
described:
1. Disable all interrupts.
2. Copy any initialized data from ROM to RAM.
3. Zero the uninitialized data area.
4. Allocate space for and initialize the stack.
5. Initialize the processor's stack pointer.
6. Create and initialize the heap.
7. Execute the constructors and initializers for all global variables (C++ only).
8. Enable interrupts.
9. Call main.
Typically, the startup code will also include a few instructions after the call to main.
These instructions will be executed only in the event that the high-level language program exits (i.e.,
the call to main returns). Depending on the nature of the embedded system, you might want to use
these instructions to halt the processor, reset the entire system, or transfer control to a debugging
tool.
Because the startup code is not inserted automatically, the programmer must usually assemble it
himself and include the resulting object file among the list of input files to the linker. He might even
need to give the linker a special command-line option to prevent it from inserting the usual startup
code.
If the same symbol is declared in more than one object file, the linker is unable to proceed. It will
likely appeal to the programmer-by displaying an error message-and exit. However, if symbol
reference instead remains unresolved after all of the object files have been merged, the linker will try
to resolve the reference on its own. The reference might be to a function that is part of the standard
library, so the linker will open each of the libraries described to it on the command line (in the order
provided) and examine their symbol tables. If it finds a function with that name, the reference will
be resolved by including the associated code and data sections within the output object file.
After merging all of the code and data sections and resolving all of the symbol references, the linker
produces a special "relocatable" copy of the program. In other words, the program is complete
except for one thing: no memory addresses have yet been assigned to the code and data sections
within. If you weren't working on an embedded system, you'd be finished building your software
now.
But embedded programmers aren't generally finished with the build process at this point.
Even if your embedded system includes an operating system, you'll probably still need an absolutely
located binary image. In fact, if there is an operating system, the code and data of which it consists
are most likely within the relocatable program too. The entire embedded application-including the
operating system-is almost always statically linked together and executed as a single binary image.
1.4.4 Locating
The tool that performs the conversion from relocatable program to executable binary image is called
a locator. It takes responsibility for the easiest step of the three. In fact, you will have to do most of
the work in this step yourself, by providing information about the memory on the target board as
input to the locator. The locator will use this information to assign physical memory addresses to
each of the code and data sections within the relocatable program. It will then produce an output file
that contains a binary memory image that can be loaded into the target ROM.
In many cases, the locator is a separate development tool. However, in the case of the GNU tools,
this functionality is built right into the linker. Try not to be confused by this one particular
implementation. Whether you are writing software for a general-purpose computer or an embedded
system, at some point the sections of your relocatable program must have actual addresses assigned
to them. In the first case, the operating system does it for you at load time. In the second, you must
perform the step with a special tool. This is true even if the locator is a part of the linker.
The memory information required by the GNU linker can be passed to it in the form of a linker
script. Such scripts are sometimes used to control the exact order of the code and data sections
within the relocatable program.
1.4.5 Dowloading and Debugging

Once you have an executable binary image stored as a file on the host computer, you will need a
way to download that image to the embedded system and execute it. The executable binary image is
usually loaded into a memory device on the target board and executed from there. And if you have
the right tools at your disposal, it will be possible to set breakpoints in the program or to observe its
execution in less intrusive ways. This chapter describes various techniques for downloading,
executing, and debugging embedded software.
One of the most obvious ways to download your embedded software is to load the binary image into
a read-only memory device and insert that chip into a socket on the target board. Obviously, the
contents of a truly read-only memory device could not be overwritten.
However, embedded systems commonly employ special read-only memory devices that can be
programmed (or reprogrammed) with the help of a special piece of equipment called a device
programmer. A device programmer is a computer system that has several memory sockets on the
top-of varying shapes and sizes-and is capable of programming memory devices of all sorts.
In an ideal development scenario, the device programmer would be connected to the same network
as the host computer. That way, files that contain executable binary images could be easily
transferred to it for ROM programming. After the binary image has been transferred to the device
programmer, the memory chip is placed into the appropriately sized and shaped socket and the
device type is selected from an on-screen menu. The actual device programming process can take
anywhere from a few seconds to several minutes, depending on the size of the binary image and the
type of memory device you are using.
After you program the ROM, it is ready to be inserted into its socket on the board. Of course, his
shouldn't be done while the embedded system is still powered on. The power should be turned off
and then reapplied only after the chip has been carefully inserted.
As soon as power is applied to it, the processor will begin to fetch and execute the code that is
stored inside the ROM. However, beware that each type of processor has its own rules about the
location of its first instruction.
If your program doesn't appear to be working, it could be there is something wrong with your reset
code. You must always ensure that the binary image you've loaded into the ROM satisfies the target
processor's reset rules.
A development board includes a special in-circuit programmable memory, called Flash memory,
that does not have to be removed from the board to be reprogrammed. In fact, software that can
perform the device programming function, the monitor, is already installed in another memory
device on the board. The board actually has two read-only memory devices: one (a true ROM)
contains a simple program that allows the user to in-circuit program the other (a Flash memory
device). All the host computers need to talk to the monitor program on a serial port and with a
terminal program.
The biggest disadvantage of this download technique is that there is no easy way to debug software
that is executing out of ROM. The processor fetches and executes the instructions at a high rate of
speed and provides no way for you to view the internal state of the program.
This might be fine once you know that your software works and you're ready to deploy the system,
but it's not very helpful during software development.
Remote Debuggers
If available, a remote debugger can be used to download, execute, and debug embedded software
over a serial port or network connection between the host and target. The frontend of a remote
debugger looks just like any other debugger that you might have used. It usually has a text or GUI-
based main window and several smaller windows for the source code, register contents, and other
relevant information about the executing program. However, in the case of embedded systems, the
debugger and the software being debugged are executing on two different computer systems.
A remote debugger actually consists of two pieces of software.
The frontend runs on the host computer and provides the human interface just described. But there is
also a hidden backend that runs on the target processor and communicates with the frontend over a
communications link of some sort. The backend provides for low-level control of the target
processor and is usually called the debug monitor. Figure below shows how these two components
work together.
The debug monitor resides in ROM-having been placed there in the manner described earlier (either
by you or at the factory)-and is automatically started whenever the target processor is reset. It
monitors the communications link to the host computer and responds to requests from the remote
debugger running there. Of course, these requests and the monitor's responses must conform to some
predefined communications protocol and are typically of a very low-level nature. Examples of
requests the remote debugger can make are "read register x," "modify register y," "read n bytes of
memory starting at address," and "modify the data at address." The remote debugger combines
sequences of these low-level commands to accomplish high-level debugging tasks like downloading
a program, single-stepping through it, and setting breakpoints.
Communication between the frontend and the debug monitor is byte-oriented and designed for
transmission over a serial connection, RS232 or USB.
Remote debuggers are one of the most commonly used downloading and testing tools during
development of embedded software. This is mainly because of their low cost. Embedded software
developers already have the requisite host computer. In addition, the price of a remote debugger
frontend does not add significantly to the cost of a suite of cross-development tools (compiler,
linker, locator, etc.). Finally, the suppliers of remote debuggers often desire to give away the source
code for their debug monitors, in order to increase the size of their installed user base.
As shipped, the Keil board includes a free debug monitor in Flash memory. Together with host
software provided by Arcom, this debug monitor can be used to download programs directly into
target RAM and execute them.
1.4.6 Emulators
Remote debuggers are helpful for monitoring and controlling the state of embedded software, but
only an in-circuit emulator (ICE) allows you to examine the state of the processor on which that
program is running. In fact, an ICE actually takes the place of - or emulates - the processor on your
target board. It is itself an embedded system, with its own copy of the target processor, RAM, ROM,
and its own embedded software. As a result, in-circuit emulators are usually pretty expensive-often
more expensive than the target hardware. But they are a powerful tool, and in a tight debugging spot
nothing else will help you get the job done better.
Like a debug monitor, an emulator uses a remote debugger for its human interface. In some cases, it
is even possible to use the same debugger frontend for both. But because the emulator has its own
copy of the target processor it is possible to monitor and control the state of the processor in real
time. This allows the emulator to support such powerful debugging features as hardware breakpoints
and real-time tracing, in addition to the features provided by any debug monitor.
With a debug monitor, you can set breakpoints in your program. However, these software
breakpoints are restricted to instruction fetches-the equivalent of the command "stop execution if
this instruction is about to be fetched." Emulators, by contrast, also support hardware breakpoints.
Hardware breakpoints allow you to stop execution in response to a wide variety of events. These
events include not only instruction fetches, but also memory and I/O reads and writes, and
interrupts. For example, you might set a hardware breakpoint on the event "variable foo contains 15
and register AX becomes 0."
Another useful feature of an in-circuit emulator is real-time tracing. Typically, an emulator
incorporates a large block of special-purpose RAM that is dedicated to storing information about
each of the processor cycles that are executed. This feature allows you to see in exactly what order
things happened, so it can help you answer questions, such as, did the timer interrupt occur before or
after the variable bar became 94? In addition, it is usually possible to either restrict the information
that is stored or post-process the data prior to viewing it in order to cut down on the amount of trace
data to be examined.
ROM Emulators
One other type of emulator is worth mentioning at this point. A ROM emulator is a device that
emulates a read-only memory device. Like an ICE, it is an embedded system that connects to the
target and communicates with the host. However, this time the target connection is via a ROM
socket. To the embedded processor, it looks like any other read-only memory device. But to the
remote debugger, it looks like a debug monitor.
ROM emulators have several advantages over debug monitors. First, no one has to port the debug
monitor code to your particular target hardware. Second, the ROM emulator supplies its own serial
or network connection to the host, so it is not necessary to use the target's own, usually limited,
resources. And finally, the ROM emulator is a true replacement for the original ROM, so none of the
target's memory is used up by the debug monitor code.
Simulators and Other Tools

Of course, many other debugging tools are available to you, including simulators, logic analyzers,
and oscilloscopes. A simulator is a completely host-based program that simulates the functionality
and instruction set of the target processor. The human interface is usually the same as or similar to
that of the remote debugger. In fact, it might be possible to use one debugger frontend for the
simulator backend as well, as shown in Figure below. Although simulators have many
disadvantages, they are quite valuable in the earlier stages of a project when there is not yet any
actual hardware for the programmers to experiment with.
By far, the biggest disadvantage of a simulator is that it only simulates the processor. And embedded
systems frequently contain one or more other important peripherals. Interaction with these devices
can sometimes be imitated with simulator scripts or other workarounds, but such workarounds are
often more trouble to create than the simulation is valuable. So you probably won't do too much with
the simulator once you have the actual embedded hardware available to you.
Once you have access to your target hardware-and especially during the hardware debugging-logic
analyzers and oscilloscopes can be indispensable debugging tools. They are most useful for
debugging the interactions between the processor and other chips on the board.
Because they can only view signals that lie outside the processor, however, they cannot control the
flow of execution of your software like a debugger or an emulator can. This makes these tools
significantly less useful by themselves. But coupled with a software debugging tool like a remote
debugger or an emulator, they can be extremely valuable.
An oscilloscope is another piece of laboratory equipment for hardware debugging. But this one is
used to examine any electrical signal, analog or digital, on any piece of hardware.
Oscilloscopes are sometimes useful for quickly observing the voltage on a particular pin or, in the
absence of a logic analyzer, for something slightly more complex. However, the number of inputs is
much smaller (there are usually about four) and advanced triggering logic is not often available. As
a result, it'll be useful to you only rarely as a software debugging tool.
Most of the debugging tools described in this chapter will be used at some point or another in every
embedded project. Oscilloscopes and logic analyzers are most often used to debug hardware
problems - simulators during early stages of the software development, and debug monitors and
emulators during the actual software debugging. To be most effective, you should understand what
each tool is for and when and where to apply it for the greatest impact.
Programming
Generally done in either the core's native assembly language or C
Sometimes HLL support (often BASIC) is available
Assemblers/Linkers often supplied free by the micro's manufacturer
C compilers vary from free and very buggy to very expensive and only moderately buggy
Environments generally not friendly or reliable
Downloading
Program development usually done on a PC
Software tools must produce a file to download to the MC's EPROM
Several standard formats (e.g., binary, hex)
EPROM burner often necessary
Can download program to an EPROM emulator
But to reprogram, must us an UV erasor first
Flash memory programmers make this easier
Very easy to reprogram with inexpensive "in-circuit debugger"
Interacts with MC via 3 pins + power + ground
Or can be programmed/debugged with a resident monitor program
on-chip UART for communications with PC
No burner or UV erasor needed
No expensive quartz window required
Expedites program-test-erase-reprogram code development cycle
Monitor
A program module that communicates with PC software
Typically uses a serial port to talk to a PC's terminal program
Capabilities vary widely
Usually can send/receive text and ASCII-converted numbers
Often has commands to examine/change registers, memory locations, I/O ports
2. Fixed points vs. Floating point numbers. Fundamentals

2.1 About Fixed-Point Numbers
Fixed-point numbers are stored in data types that are characterized by their word size in bits, binary
point, and whether they are signed or unsigned. The Simulink® Fixed Point™ software supports
integers, fractionals, and generalized fixed-point numbers. The main difference among these data
types is their default binary point.
Note Fixed-point word sizes up to 128 bits are supported.
A common representation of a binary fixed-point number (either signed or unsigned) is shown in the
following figure.
where
* bi are the binary digits (bits).
* The size of the word in bits is given by ws.
* The most significant bit (MSB) is the leftmost bit, and is represented by location bws – 1.
* The least significant bit (LSB) is the rightmost bit, and is represented by location b0.
* The binary point is shown four places to the left of the LSB.
Signed Fixed-Point Numbers
Computer hardware typically represents the negation of a binary fixed-point number in three
different ways: sign/magnitude, one's complement, and two's complement. Two's complement is the
preferred representation of signed fixed-point numbers and is supported by the Simulink Fixed Point
software.
Negation using two's complement consists of a bit inversion (translation into one's complement)
followed by the addition of a one. For example, the two's complement of 000101 is 111011.
Whether a fixed-point value is signed or unsigned is usually not encoded explicitly within the binary
word; that is, there is no sign bit. Instead, the sign information is implicitly defined within the
computer architecture.
Binary Point Interpretation
The binary point is the means by which fixed-point numbers are scaled. It is usually the software
that determines the binary point. When performing basic math functions such as addition or
subtraction, the hardware uses the same logic circuits regardless of the value of the scale factor. In
essence, the logic circuits have no knowledge of a scale factor. They are performing signed or
unsigned fixed-point binary algebra as if the binary point is to the right of b0.
Within the Simulink Fixed Point software, the main difference between fixed-point data types is the
default binary point. For integers and fractionals, the binary point is fixed at the default value. For
generalized fixed-point data types, you must either explicitly specify the scaling by configuring
dialog box parameters, or inherit the scaling from another block. The sections that follow describe
the supported fixed-point data types.
Integers
The default binary point for signed and unsigned integer data types is assumed to be just to the right
of the LSB. You specify unsigned and signed integers with the uint and sint functions, respectively.
Fractionals
The default binary point for unsigned fractional data types is just to the left of the MSB, while for
signed fractionals the binary point is just to the right of the MSB. If you specify guard bits, then they
lie to the left of the binary point. You specify unsigned and signed fractional numbers with the ufrac
and sfrac functions, respectively.
Generalized Fixed-Point Numbers
For signed and unsigned generalized fixed-point numbers, there is no default binary point. You
specify unsigned and signed generalized fixed-point numbers with the ufix and sfix functions,
respectively.
Note: You can also use the fixdt function to create integer, fractional, and generalized fixed-point
objects.
2.2 Scaling
The dynamic range of fixed-point numbers is much less than that of floating-point numbers with
equivalent word sizes. To avoid overflow conditions and minimize quantization errors, fixed-point
numbers must be scaled.
With the Simulink Fixed Point software, you can select a fixed-point data type whose scaling is
defined by its default binary point, or you can select a generalized fixed-point data type and choose
an arbitrary linear scaling that suits your needs. This section presents the scaling choices available
for generalized fixed-point data types.
A fixed-point number can be represented by a general [Slope Bias] encoding scheme

~
V~
= V = SQ + B
where
* V is an arbitrarily precise real-world value.

~
* V is the approximate real-world value.
* Q is an integer that encodes V.
* S = Fx2E is the slope.
* B is the bias.
The slope is partitioned into two components:
* 2E specifies the binary point. E is the fixed power-of-two exponent.
* F is the fractional slope. It is normalized such that 1  F  2 With the fractional slope, we can use
as value for the LSB of the number representation a value 2 E  S  2 E+1 . The slope is the value
assigned to the LSB of the representation.
Note: S and B are constants and do not show up in the computer hardware directly—only the
quantization value Q is stored in computer memory (as a variable).
Binary-Point-Only Scaling
As the name implies, binary-point-only (or power-of-two) scaling involves moving only the binary
point within the generalized fixed-point word. The advantage of this scaling mode is that the number
of processor arithmetic operations is minimized.
With binary-point-only scaling, the components of the general [Slope Bias] formula have these
values:
*F=1
* S = 2E
*B=0
That is, the scaling of the quantized real-world number is defined only by the slope S, which is
restricted to a power of two.
In the Simulink Fixed Point software, you specify binary-point-only scaling with the syntax 2^-E
where E is unrestricted. This creates a MATLAB® structure with a bias B = 0 and a fractional slope
F = 1.0. For example, the syntax 2^-10 defines a scaling such that the binary point is at a location 10
places to the left of the least significant bit.
[Slope Bias] Scaling
When you scale by slope and bias, the slope S and bias B of the quantized real-world number can
take on any value. You specify scaling by slope and bias with the syntax [slope bias], which creates
a MATLAB structure with the given slope and bias. For example, a [Slope Bias] scaling specified
by [5/9 10] defines a slope of 5/9 and a bias of 10. The slope must be a positive number.
5 2  5 1 10 −1
= =  2 = F  2E
9 9 2 9
Examples:
1. Let’s represent x = 3.3333e-002
The number x is converted to a signed, 10-bit generalized fixed-point data type with binary-point-
only scaling of 2-7 (that is, the binary point is located seven places to the left of the rightmost bit).
0.033333 0.
0.066666 0.0
0.133332 0.00
0.266664 0.000
0.533328 0.0000
1.066656 ->0.066656 0.00001
0.133312 0.000010
0.266624 0.0000100
0.533248 0.00001000
1.066496 -> 0.066496 0.000010001
0.132992 0.0000100010
etc
We use 10-bit generalized fixed-point data type with binary-point-only scaling of 2-7, so we use 7
bits for fractional representation and 3 bits for signed integer:
x = 000. 0000100010.... ~= 000. 0000100 /(010...) = 2-5 = 0.03125
0.33333 0.
0.66666 0.0
1.33332 ->0.33332 0.01
0.66664 0.010
1.33328 -> 0.33328 0.0101
0.66656 0.01010
1.33312 -> 0. 33312 0.010101
0.66624 0.0101010
1.33248 -> 0. 33248 0. 01010101
0.66496 0. 010101010
1.32992 0. 0101010101
etc
x = 000. 0101010 (101.....)~= 000. 0101010 = 0.328125
0.0033333 0.
0.0066666 0.0
0.0133332 0.00
0.0266664 0.000
0.0533328 0.0000
0.1066656 0.00000
0.2133312 0.000000
0.4266624 0.0000000
0.8533248 0.00000000
1.766496 -> 0. 766496 0.000000001
1.532992 -> 0.532992 0.0000000011
etc
x = 0.0000000 /(011...) ~= 0
fi(v,s,w,f) returns a fixed-point object with value v, signedness s, word length w, and fraction length
f.
fi(0.33333,1,10,7, 'RoundMode','floor')
ans =
0.328125000000000
DataTypeMode: Fixed-point: binary point scaling
Signed: true
WordLength: 10
FractionLength: 7
RoundMode: floor
OverflowMode: saturate
ProductMode: FullPrecision
MaxProductWordLength: 128
SumMode: FullPrecision
MaxSumWordLength: 128
CastBeforeSum: true
>> 1/4+1/16+1/64
ans =
0.328125000000000
By default, the RoundMode is Nearest.
fi(0.33333,1,10,7)
ans =
0.335937500000000
>> (1/4+1/16+1/64)+1/128
ans =
0.335937500000000
Another example:
m= [3.3333e-005 3.3333e-006 3.3333e-007 3.3333e-008
3.3333e-004 3.3333e-005 3.3333e-006 3.3333e-007
3.3333e-003 3.3333e-004 3.3333e-005 3.3333e-006
3.3333e-002 3.3333e-003 3.3333e-004 3.3333e-005
3.3333e-001 3.3333e-002 3.3333e-003 3.3333e-004]
We use 10 bit word length wit 7 bits fraction
>>x=2^-7
x=
0.007812500000000
>>round(m/x)*x
ans =
0 0 0 0
0 0 0 0
0 0 0 0
0.031250000000000 0 0 0
0.335937500000000 0.031250000000000 0 0
The same result can be obtained with:
>>fi(m,0,10,7)
>> M=m/m(5,1) %relative to the maximum value of the matrix
M=
0.0001 0.0000 0.0000 0.0000
0.0010 0.0001 0.0000 0.0000
0.0100 0.0010 0.0001 0.0000

0.1000 0.0100 0.0010 0.0001
1.0000 0.1000 0.0100 0.0010
>> fi(M,0,10,7)
ans =
0 0 0 0
0 0 0 0
0.0078 0 0 0
0.1016 0.0078 0 0
1.0000 0.1016 0.0078 0
>>fi(M,0,10,7)*round(m(5,1)/x)*x
ans =
0 0 0 0
0 0 0 0
0.0026 0 0 0
0.0341 0.0026 0 0
0.3359 0.0341 0.0026 0
2.2.1 Quantization, Range and Precision

Introduction
The sections that follow describe the relationship between arithmetic operations and fixed-point
scaling, and offer some basic recommendations that may be appropriate for your fixed-point design.
For each arithmetic operation,
* The general [Slope Bias] encoding scheme described in Scaling is used.
* The scaling of the result is automatically selected based on the scaling of the two inputs. In
other words, the scaling is inherited.
* Scaling choices are based on
o Minimizing the number of arithmetic operations of the result
o Maximizing the precision of the result
Additionally, binary-point-only scaling is presented as a special case of the general encoding

scheme.
In embedded systems, the scaling of variables at the hardware interface (the ADC or DAC) is fixed.
However for most other variables, the scaling is something you can choose to give the best design.
When scaling fixed-point variables, it is important to remember that
* Your scaling choices depend on the particular design you are simulating.
* There is no best scaling approach. All choices have associated advantages and disadvantages. It
is the goal of this section to expose these advantages and disadvantages to you.
From the previous analysis of fixed-point variables scaled within the general [Slope Bias] encoding
scheme, you can conclude
* Addition, subtraction, multiplication, and division can be very involved unless certain choices
are made for the biases and slopes.
* Binary-point-only scaling guarantees simpler math, but generally sacrifices some precision.
Note that the previous formulas don't show the following:
* Constants and variables are represented with a finite number of bits.
* Variables are either signed or unsigned.
* Rounding and overflow handling schemes. You must make these decisions before an actual
fixed-point realization is achieved.
A. Quantization
The quantization Q of a real-world value V is represented by a weighted sum of bits.
Within the context of the general [Slope Bias] encoding scheme, the value of an unsigned fixed-
point quantity is given by
while the value of a signed fixed-point quantity is given by

where
* bi are binary digits, with bi = 1, 0.
* The word size in bits is given by ws, with ws = 1,2,3,...,128.
* S is given by Fx2E, where the scaling is unrestricted because the binary point does not have
to be contiguous with the word.
bi are called bit multipliers and 2i are called the weights.
Example: Fixed-Point Format
The formats for 8-bit signed and unsigned fixed-point values are shown in the following figure.
Note that you cannot discern whether these numbers are signed or unsigned data types merely by
inspection since this information is not explicitly encoded within the word.
The binary number 0011.0101 yields the same value for the unsigned and two's complement
representation because the MSB = 0. Setting B = 0 and using the appropriate weights, bit
multipliers, and scaling, the value is
Conversely, the binary number 1011.0101 yields different values for the unsigned and two's
complement representation since the MSB = 1.
Setting B = 0 and using the appropriate weights, bit multipliers, and scaling, the unsigned value is
while the two's complement value is
B. Range and Precision

The range of a number gives the limits of the representation, while the precision gives the distance
between successive numbers in the representation. The range and precision of a fixed-point number
depend on the length of the word and the scaling.
Range
The range of representable numbers for an unsigned and two's complement fixed-point number of
size ws, scaling S, and bias B is illustrated in the following figure.
For both the signed and unsigned fixed-point numbers of any data type, the number of different bit
patterns is 2ws.
For example, if the fixed-point data type is an integer with scaling defined as S = 1 and B = 0, then
the maximum unsigned value is 2ws - 1, because zero must be represented. In two's complement,
negative numbers must be represented as well as zero, so the maximum value is 2ws - 1- 1.
Additionally, since there is only one representation for zero, there must be an unequal number of
positive and negative numbers. This means there is a representation for
-2ws – 1 but not for 2ws - 1.
Precision
The precision (scaling) of integer and fractional data types is specified by the default binary point.
For generalized fixed-point data types, the scaling must be explicitly defined as either [Slope Bias]
or binary-point-only. In either case, the precision is given by the slope.
Fixed-Point Data Type Parameters
The low limit, high limit, and default binary-point-only scaling for the supported fixed-point data
types discussed in Binary Point Interpretation are given in the following table. See Limitations on
Precision and Limitations on Range for more information.
Fixed-Point Data Type Range and Default Scaling

Range of an 8-Bit Fixed-Point Data Type — Binary-Point-Only Scaling
The precision, range of signed values, and range of unsigned values for an 8-bit generalized fixed-
point data type with binary-point-only scaling follow. Note that the first scaling value (21) represents
a binary point that is not contiguous with the word.
Range of an 8-Bit Fixed-Point Data Type — [Slope Bias] Scaling
The precision and range of signed and unsigned values for an 8-bit fixed-point data type using
[Slope Bias] scaling follow. The slope starts at a value of 1.25 and the bias is 1.0 for all slopes. Note
that the slope is the same as the precision.
Fixed-Point Data Type and Scaling Notation
The following table provides a key for various symbols that may appear in Simulink products to
indicate the data type and scaling of a fixed-point value.
2.2.2 Recommendations for Arithmetic and Scaling
Introduction
The sections that follow describe the relationship between arithmetic operations and fixed-point
scaling, and offer some basic recommendations that may be appropriate for your fixed-point design.
For each arithmetic operation,
* The general [Slope Bias] encoding scheme described in Scaling is used.
* The scaling of the result is automatically selected based on the scaling of the two inputs. In
other words, the scaling is inherited.
* Scaling choices are based on
o Minimizing the number of arithmetic operations of the result
o Maximizing the precision of the result
Additionally, binary-point-only scaling is presented as a special case of the general encoding

scheme.
In embedded systems, the scaling of variables at the hardware interface (the ADC or DAC) is fixed.
However for most other variables, the scaling is something you can choose to give the best design.
When scaling fixed-point variables, it is important to remember that
* Your scaling choices depend on the particular design you are simulating.
* There is no best scaling approach. All choices have associated advantages and disadvantages.
It is the goal of this section to expose these advantages and disadvantages to you.
Addition
Consider the addition of two real-world values:
These values are represented by the general [Slope Bias] encoding scheme described in Scaling:
In a fixed-point system, the addition of values results in finding the variable Qa:
This formula shows
* In general, Qa is not computed through a simple addition of Qb and Qc.
* In general, there are two multiplications of a constant and a variable, two additions, and
some additional bit shifting.
Inherited Scaling for Speed
In the process of finding the scaling of the sum, one reasonable goal is to simplify the calculations.
Simplifying the calculations should reduce the number of operations, thereby increasing execution
speed. The following choices can help to minimize the number of arithmetic operations:
* Set Ba = Bb + Bc. This eliminates one addition.
* Set Fa = Fb or Fa = Fc. Either choice eliminates one of the two constant times variable
multiplications.
The resulting formula is
These equations appear to be equivalent. However, your choice of rounding and precision may make
one choice stand out over the other. To further simplify matters, you could choose Ea = Ec or Ea =
Eb. This will eliminate some bit shifting.
Inherited Scaling for Maximum Precision
In the process of finding the scaling of the sum, one reasonable goal is maximum precision. You can
determine the maximum-precision scaling if the range of the variable is known. Example:
Maximizing Precision shows that you can determine the range of a fixed-point operation from and .
For a summation, you can determine the range from
You can now derive the maximum-precision slope:
In most cases the input and output word sizes are much greater than one, and the slope becomes
which depends only on the size of the input and output words. The corresponding bias is
The value of the bias depends on whether the inputs and output are signed or unsigned numbers.
If the inputs and output are all unsigned, then the minimum values for these variables are all zero
and the bias reduces to a particularly simple form:
If the inputs and the output are all signed, then the bias becomes
Binary-Point-Only Scaling
For binary-point-only scaling, finding Qa results in this simple expression:
This scaling choice results in only one addition and some bit shifting. The avoidance of any
multiplications is a big advantage of binary-point-only scaling.
3. Microcontroller CPU, Interupts, Memory, and I/O
The interconnection between the CPU, memory, and I/O of the address and data buses is
generally a one-to-one connection. The hard part is designing the appropriate circuitry to adapt the
control signals present on each device to be compatible with that of the other devices. The most
basic control signals are generated by the CPU to control the data transfers between the CPU and
memory, and between the CPU and I/O devices. The four most common types of CPU controlled
data transfers are:
- CPU reads data/instructions from memory (memory read)
- CPU writes data to memory (memory write)
- CPU reads data from an input device (I/O read)
- CPU writes data to an output device (I/O write)
3.1 CPU – Central Processing Unit
The four major CPU components are:

- the arithmetic logic unit (ALU) – The ALU contains the circuitry to perform simple
arithmetic and logical operations on the inputs
- registers – a type of fast memory
- the control unit (CU) – The control unit is the circuitry that controls the flow of data through
the processor, and coordinates the activities of the other units within it. In a way, it is the "brain
within the brain".
- the internal CPU buses – interconnect the ALU, registers, and the CU
The Figure 1.2 presents the internal block diagram of the V850 CPU.
Figure 1.2 – Internal block diagram of V850ES CPU
- The general-purpose registers can be used to store a data variable or an address

variable.
- The program counter holds the instruction address during program execution.
- The system registers control the status of the CPU and hold interrupt information
- The program status word (PSW) is an area of memory or a hardware register which contains
information about program state used by the operating system and the underlying hardware. It will
normally include a pointer (address) to the next instruction to be executed. The program status word
typically contains an error status field and condition codes such as the interrupt enable/disable bit
and a supervisor / user mode bit.
Registers
Registers are simply a combination of various flip-flops that can be used to temporarily store
data or to delay signals. A storage register is a form of fast programmable internal processor
memory usually used to temporarily store, copy, and modify operands that are immediately or
frequently used by the system. Shift registers delay signals by passing the signals between the
various internal flip-flops with every clock pulse.
Registers are made up of a set of flip-flops that can be activated either individually or as a
set. In fact, it is the number of flip-flops in each register that is actually used to describe a processor
(for example, a 32-bit processor has working registers that are 32 bits wide containing 32 flip-flops,
a 16-bit processor has working registers that are 16 bits wide containing 16 flipflops, and so on).
The number of flip-flops within these registers also determines the width of the data buses used in
the system
While ISA designs do not all use registers in the same way to process the data, storage
typically falls under one of two categories, either general purpose or special purpose. General
purpose registers can be used to store and manipulate any type of data determined by the
programmer, whereas special purpose registers can only be used in a manner specified by the ISA,
including holding results for specific types of computations, having predetermined flags (single bits
within a register that can act and be controlled independently), acting as counters (registers that can
be programmed to change states—that is, increment— asynchronously or synchronously after a
specified length of time), and controlling I/O ports (registers managing the external I/O pins
connected to the body of the processor and to board I/O). Shift registers are inherently special
purpose, because of their limited functionality.
The number of registers, the types of registers, and the size of the data that these registers can
store (8-bit, 16-bit, 32-bit, and so forth) varies depending on the CPU, according to the ISA
definitions. In the cycle of fetching and executing instructions, the CPU’s registers have to be fast,
so as to quickly feed data to the ALU, for example, and to receive data from the CPUs internal data
bus. Registers are also multi-ported so as to be able to both receive and transmit data to these CPU
components.
3.2 Interrupts
Now that you know the names and addresses of the memory and peripherals attached to the
processor, it is time to learn how to communicate with the latter. There are two basic communication
techniques: polling and interrupts. In either case, the processor usually issues some sort of
commands to the device-by way of the memory or I/O space-and waits for the device to complete
the assigned task. For example, the processor might ask a timer to count down from 1000 to 0. Once
the countdown begins, the processor is interested in just one thing: is the timer finished counting
yet?
If polling is used, then the processor repeatedly checks to see if the task has been completed.
This is analogous to the small child who repeatedly asks "are we there yet?" throughout a long trip.
Like the child, the processor spends a large amount of otherwise useful time asking the question and
getting a negative response. To implement polling in software, you need only create a loop that
reads the status register of the device in question.
The second communication technique uses interrupts.
An interrupt is an asynchronous electrical signal from a peripheral to the processor. When interrupts
are used, the processor issues commands to the peripheral exactly as before, but then waits for an
interrupt to signal completion of the assigned work. While the processor is waiting for the interrupt
to arrive, it is free to continue working on other things. When the interrupt signal is finally asserted,
the processor temporarily sets aside its current work and executes a small piece of software called
the interrupt service routine (ISR). When the ISR completes, the processor returns to the work that
was interrupted.
Of course, this isn't all automatic. The programmer must write the ISR himself and "install" and
enable it so that it will be executed when the relevant interrupt occurs. The first few times you do
this, it will be a significant challenge. But, even so, the use of interrupts generally decreases the
complexity of one's overall code by giving it a better structure. Rather than device polling being
embedded within an unrelated part of the program, the two pieces of code remain appropriately
separate.
On the whole, interrupts are a much more efficient use of the processor than polling. The processor
is able to use a larger percentage of its waiting time to perform useful work.
However, there is some overhead associated with each interrupt. It takes a good bit of time-relative
to the length of time it takes to execute an opcode-to put aside the processor's current work and
transfer control to the interrupt service routine. Many of the processor's registers must be saved in
memory, and lower-priority interrupts must be disabled. So in practice both methods are used
frequently. Interrupts are used when efficiency is paramount or multiple devices must be monitored
simultaneously. Polling is used when the processor must respond to some event more quickly than is
possible using interrupts.
DEFINITIONS
• Interrupt - Hardware-supported asynchronous transfer of control to an interrupt vector
• Interrupt Vector - Dedicated location in memory that specifies address execution jumps to
• Interrupt Handler - Code that is reachable from an interrupt vector
• Interrupt Controller - Peripheral device that manages interrupts for the processor
• Pending - Firing condition met and noticed but interrupt handler has not began to execute
• Interrupt Latency - Time from interrupt’s firing condition being met and start of execution of
interrupt handler
• Nested Interrupt - Occurs when one interrupt handler preempts another
• Reentrant Interrupt - Multiple invocations of a single interrupt handler are concurrently
active
An interrupt is an asynchronous signal from hardware indicating the need for attention or a
synchronous event in software indicating the need for a change in execution.
Hardware interrupts are triggered by a physical event, such as the closure of a switch, that
causes a specific subroutine to be called. They can be thought of as a sort of hardware initiated
subroutine call. They can and do occur at any time in the program, depending on when the event
occurs. These are referred to as asynchronous events because they may occur during the execution
of any part of the program. Interrupts allow the programs to respond to an event when it occurs.
A software interrupt is a special subroutine call. It is synchronous meaning that it always occurs at
the same time and place in the program that is interrupted. It is frequently used as a quick and simple
way to do a subroutine call for accessing programs such as the operating system and I/O programs.
Software interrupts are usually implemented as instructions in the instruction set, which cause a
context switch to an interrupt handler similar to a hardware interrupt.
Interrupts can be categorized into: maskable interrupt (IRQ), non-maskable interrupt (NMI),
interprocessor interrupt (IPI), software interrupt, and spurious interrupt.
- A maskable interrupt (IRQ) is a hardware interrupt that may be ignored by setting a bit in an
interrupt mask register's (IMR) bit-mask.
- Likewise, a non-maskable interrupt (NMI) is a hardware interrupt that does not have a bit-
mask associated with it - meaning that it can never be ignored. NMIs are often used for timers,
especially watchdog timers.
- An interprocessor interrupt is a special case of interrupt that is generated by one processor to
interrupt another processor in a multiprocessor system.
- A software interrupt is an interrupt generated within a processor by executing an instruction.
Software interrupts are often used to implement System calls because they implement a subroutine
call with a CPU ring level change.
- A spurious interrupt is a hardware interrupt that is unwanted. They are typically generated
by system conditions such as electrical interference on an interrupt line or through incorrectly
designed hardware.
An interrupt can notify the processor when an analog-to-digital converter (ADC) has new
data, when a timer rolls over, when a direct memory access (DMA) transfer is complete, when
another processor wants to communicate, or when almost any asynchronous event happens. The
interrupt hardware is initialized and programmed by the system software. When an interrupt is
acknowledged, that process is performed by hardware internal to the processor and the interrupt
controller integrated circuit (IC) (if any).
When an interrupt occurs, the on-chip hardware performs the following functions:
• It saves the program counter (the address the processor was executing when the
interrupt occurred) on the stack. Some processors save other information as well, such as register
contents.
• It executes an interrupt acknowledge cycle to get a vector from the interrupting peripheral,
depending on the processor and the specific type of interrupt.
• It branches to a predetermined address specific to that particular interrupt.
The destination address is the interrupt service routine (ISR, or sometimes ISP for interrupt
service process). The ISR performs whatever functions are required and then returns. When the
return code is executed, the processor performs the following tasks:
• It retrieves the return address and any other saved information from the stack.
• It resumes execution at the return address.
The return address, in nearly all cases, is the address that would have been executed next if
the interrupt had not occurred. If the implementation is correct the code that was interrupted will not
even know that an interrupt occurred. The hardware part of this process occurs at hardware speed-
microseconds or even tens of nanoseconds for a fast CPU with a high clock rate.
Re-entrant code or a re-entrant routine is code that can be interrupted at any point when
partially complete, then called by another process, and later return to the point where it was
interrupted to complete the original function without any errors. Non-re-entrant code, however,
cannot be interrupted and then called again without problems. An example of a program that is not
re-entrant is one that uses a fixed memory address to store a temporary result. If the program is
interrupted while the temporary variable is in use and then the routine is called again, the value in
the temporary variable would be changed. When execution returns to the point where it was
interrupted, the temporary variable will have the wrong value. In order to be re-entrant, a program
must keep a separate copy of all internal variables for each invocation. Re-entrant code is required
for any subroutines that must be available to more than one interrupt driven task.
Interrupts can be processed between execution of instructions by the CPU any time they are
enabled. Most CPUs check for the presence of an interrupt request at the end of every instruction. If
interrupts are enabled, the processor saves the contents of the program counter (PC) on the stack,
and loads the PC with the address of the ISR. Some CPUs allow certain instructions to be
interrupted when they take a long time to process, such as a block move instruction.
3.1.1.1 Vectored Interrupts & Non-Vectored Interrupts
Interrupt Map
Most embedded systems have only a handful of interrupts. Associated with each of these are an
interrupt pin (on the outside of the processor chip) and an ISR. In order for the processor to execute
the correct ISR, a mapping must exist between interrupt pins and ISRs. This mapping usually takes
the form of an interrupt vector table. The vector table is usually just an array of pointers to functions,
located at some known memory address. The processor uses the interrupt type (a unique number
associated with each interrupt pin) as its index into this array.
The value stored at that location in the vector table is usually just the address of the ISR to be
executed.
It is important to initialize the interrupt vector table correctly. (If it is done incorrectly, the ISR
might be executed in response to the wrong interrupt or never executed at all.) The first part of this
process is to create an interrupt map that organizes the relevant information. An interrupt map is a
table that contains a list of interrupt types and the devices to which they refer. This information
should be included in the documentation provided with the board.
In a vectored interrupt system, the interrupt request is accompanied by an identifier, referred to as a
vector or interrupt vector number that defines the source of the interrupt. The vector is a pointer that
is used as an index into a table known as the interrupt vector table. This table contains the addresses
of the ISRs that are to be executed when the corresponding interrupts are processed.
When a vectored interrupt is processed, the CPU goes through the following sequence of
events to begin execution of the ISR:
- After acknowledging the interrupt, the CPU receives the vector number.
- The CPU converts the vector into a memory address in the vector table.
- The ISR address is fetched from the vector table and placed in the program counter.
For example, when an external event occurs, the interrupting device activates the IRQ input
to the interrupt controller that then requests an interrupt cycle from the CPU. When the CPU
acknowledges the interrupt, the interrupt controller passes the vector number to the CPU. The CPU
converts the vector number to a memory address. This address points to the place in memory, which
in turn contains the address of ISR.
For systems with non-vectored interrupts, there is only one interrupt service routine entry
point, and the ISR code must determine what caused the interrupt if there are multiple interrupt
sources in the system. When an interrupt occurs a call to a fixed location is executed, and that begins
execution of the ISR. It is possible to have multiple interrupts pointing to the same ISR. The first act
of such an ISR is to determine which interrupt occurred and branch to the appropriate handler. Serial
I/O ports frequently have one vector for transmit and receive interrupts.
3.1.1.2 Interrupt Priority
There are a number of variations in the way interrupts can be handled by the processor.
These variations include how multiple interrupts are handled, if they can be turned off, and how they
are triggered. Some processors allow multiple (nested) interrupts, meaning the CPU can handle
multiple interrupts simultaneously. In other words, interrupts can interrupt interrupts. When multiple
interrupts are sent to the CPU, some method must be used to determine which is handled first. Here
are the most common prioritization schemes currently in use.
- Fixed (static) multi-level priority. This uses a priority encoder to assign priorities, with the
highest priority interrupt processed first. Nested interrupts allow an ISR itself to be interrupted by a
higher-priority device. Interrupts from lower-priority devices are ignored until the higher-priority
ISR is completed. This is the most common method of assigning priorities to interrupts.
- Variable (dynamic) multi-level priority. One problem with fixed priority is that one type of
event can “dominate” the CPU to the exclusion of other events. The solution is to rotate priority
each time an event occurs. This ensures that no interrupt gets “locked out” and all interrupts will
eventually be processed. This scheme is good for multi-user systems because eventually everyone
gets priority.
- Equal single-level priority. If an interrupt occurs with an interrupt, the new interrupt gains
control of the processor.
3.1.1.3 Serial communication with polling and interrupts
Depending on the interrupt strategy, the parts from the endless loop are structured differently.
Initialization part
Other initializations
Serial communication
initialization
Part 1
Serial receive part
Part 2
Serial transmit part
Part 3
Serial send/ receive without interrupts
If we want to detail further the serial receive/ transmit parts, we can implement:
Initialization part
Wait loop
initialization Byte received?
No
Take byte from receive buffer
Part 1
Serial receive part

Put byte in transmit buffer
Part 2
Wait loop
Serial transmit part Byte transmited?
No
Part 3
Serial receive/ transmit without interrupts (polling)

Initialization part
initialization Byte received?
No
Yes
Take byte from receive buffer
Part 1
Serial receive part

Put byte in transmit buffer
Part 2
Wait loop
Serial transmit part Byte transmited?
No
Part 3
Polling: CPU periodically checks each device to see if it needs service
• takes CPU time even when no requests pending

• overhead may be reduced at expense of response time
• can be efficient if events arrive rapidly
“Polling is like picking up your phone every few seconds to see if you have a call. …”
The main drawback of this implementation is the fact that, during wait loops, some other program
parts need to be executed. For example, if we don’t receive a byte, the program will stay forever in
the receive loop. We can eliminate the wait loops from the receiving part, using the implementation:
Implementation of interrupts is the answer to the above problems.
Definition: An interrupt is an event external to the currently executing process that causes a change
in the normal flow of instruction execution;
An interrupt is usually generated by hardware devices external to the CPU (UART for example)
• Key point is that interrupts are asynchronous w.r.t. current process
• Typically indicate that some device needs service

Is very important to underline the fact that an interrupt must be first activated in order to be used.
For using polling for a certain module, the interrupt corresponding to this module (UART for
example) must be disabled. If not, the interrupt is served by default.
For example, from table below we can use serial communication in interrupt mode by setting EA = 1
and ES = 1. Then we have to write an interrupt service routine which is called by the interrupt.
For using the serial communication in polling mode, we have to set at least ES = 0 for disabling the
specific serial interrupt, or to disable all interrupts by setting EA = 0.
Initialization part HW for Serial communication
No (RI = = 1) ?
Serial communication Byte received ?
initialization (ES = 0)
Yes
Other initializations Yes
RI = 0
char X = SBUF RI = 1
Part 1
Serial receive part
Byte transmited ?
(TI = = 1) ?
Part 2 No
Yes
Yes
Serial transmit part TI = 1
TI = 0
SBUF = char X
Part 3
SW for Serial communication
Serial communication in polling mode

Initialization part HW for Serial communication
Generate receive
Other initializations ISR interrupt
Receive
Serial communication RI = 0 Byte received ?
initialization (ES = 1, EA = 1)
Other initializations Yes

char X = SBUF
RI = 1
Part 1
Generate transmit
Part 2
ISR interrupt
Transmit Byte transmited ?
TI = 0
Part 3 Yes
TI = 1
SBUF = char X
SW endless loop ISR (Interrupt service routines)
3.3 On-Chip Memory

The CPU goes to memory to get what it needs to process, because it is in memory that all of
the data and instructions to be executed by the system are stored. Embedded platforms have a
memory hierarchy, a collection of different types of memory, each with unique speeds, sizes, and
usages (see Figure 1.3). Some of this memory can be physically integrated on the processor, such as
registers, read-only memory (ROM), certain types of random access memory (RAM) and level-1
cache.
Figure 1.3 – Memory hierarchy
Types of Memory
Many types of memory devices are available for use in modern computer systems. As an embedded
software engineer, you must be aware of the differences between them and understand how to use
each type effectively. In our discussion, we will approach these devices from a software viewpoint.
As you are reading, try to keep in mind that the development of these devices took several decades
and that there are significant physical differences in the underlying hardware. The names of the
memory types frequently reflect the historical nature of the development process and are often more
confusing than insightful.
Most software developers think of memory as being either random-access (RAM) or read-only
(ROM). But, in fact, there are subtypes of each and even a third class of hybrid memories. In a RAM
device, the data stored at each memory location can be read or written, as desired. In a ROM device,
the data stored at each memory location can be read at will, but never written. In some cases, it is
possible to overwrite the data in a ROM-like device. Such devices are called hybrid memories
because they exhibit some of the characteristics of both RAM and ROM. Figures below provides a
classification system for the memory devices that are commonly found in embedded systems.
3.3.1 Read-Only Memory (ROM)
Types of ROM
Memories in the ROM family are distinguished by the methods used to write new data to them
(usually called programming) and the number of times they can be rewritten. This classification
reflects the evolution of ROM devices from hardwired to one-time programmable to erasable-and-
programmable. A common feature across all these devices is their ability to retain data and programs
forever, even during a power failure.
The very first ROMs were hardwired devices that contained a preprogrammed set of data or
instructions. The contents of the ROM had to be specified before chip production, so the actual data
could be used to arrange the transistors inside the chip! Hardwired memories are still used, though
they are now called "masked ROMs" to distinguish them from other types of ROM. The main
advantage of a masked ROM is a low production cost. Unfortunately, the cost is low only when
hundreds of thousands of copies of the same ROM are required.
One step up from the masked ROM is the PROM (programmable ROM), which is purchased in an
unprogrammed state. If you were to look at the contents of an unprogrammed PROM, you would see
that the data is made up entirely of 1's. The process of writing your data to the PROM involves a
special piece of equipment called a device programmer. The device programmer writes data to the
device one word at a time, by applying an electrical charge to the input pins of the chip. Once a
PROM has been programmed in this way, its contents can never be changed. If the code or data
stored in the PROM must be changed, the current device must be discarded. As a result, PROMs are
also known as one-time programmable (OTP) devices.
An EPROM (erasable-and-programmable ROM) is programmed in exactly the same manner as a
PROM. However, EPROMs can be erased and reprogrammed repeatedly. To erase an EPROM, you
simply expose the device to a strong source of ultraviolet light. (There is a "window" in the top of
the device to let the ultraviolet light reach the silicon.) By doing this, you essentially reset the entire
chip to its initial-unprogrammed-state. Though more expensive than PROMs, their ability to be
reprogrammed makes EPROMs an essential part of the software development and testing process.
On-chip ROM is memory integrated into a processor that contains data or instructions that remain
even when there is no power in the system, due to a small, longer-life battery, and therefore is
considered to be nonvolatile memory (NVM). The content of on-chip ROM usually can only be read
by the system it is used in.
The most common types of on-chip ROM include:
- MROM (mask ROM), which is ROM (with data content) that is permanently etched into the
microchip during the manufacturing of the processor, and cannot be modified later.
- PROMs (programmable ROM), or OTPs (one-time programmables), which is a type of
ROM that can be integrated on-chip, that is one-time programmable by a PROM programmer (in
other words, it can be programmed outside the manufacturing factory).
- EPROM (erasable programmable ROM), which is ROM that can be integrated on a
processor, in which content can be erased and reprogrammed more than once (the number of times
erasure and re-use can occur depends on the processor). The content of EPROM is written to the
device using special separate devices and erased, either selectively or in its entirety using other
devices that output intense ultraviolet light into the processor’s built-in window.
- EEPROM (electrically erasable programmable ROM), which, like EPROM, can be erased
and reprogrammed more than once. The number of times erasure and re-use can occur depends on
the processor. Unlike EPROMs, the content of EEPROM can be written and erased without using
any special devices while the embedded system is functioning.With EEPROMs, erasing must be
done in its entirety, unlike EPROMs, which can be erased selectively.
A cheaper and faster variation of the EEPROM is Flash memory. Where EEPROMs are written and
erased at the byte level, Flash can be written and erased in blocks or sectors (a group of bytes). Like
EEPROM, Flash can be erased while still in the embedded device.
3.3.2 Random-Access Memory (RAM)

RAM (random access memory), commonly referred to as main memory, is memory in which
any location within it can be accessed directly (randomly, rather than sequentially from some
starting point), and whose content can be changed more than once (the number depending on the
hardware). Unlike ROM, contents of RAM are erased if RAM loses power, meaning RAM is
volatile. The two main types of RAM are static RAM (SRAM) and dynamic RAM (DRAM).
There are two important memory devices in the RAM family: SRAM and DRAM. The main
difference between them is the lifetime of the data stored. SRAM (static RAM) retains its contents
as long as electrical power is applied to the chip. However, if the power is turned off or lost
temporarily then its contents will be lost forever. DRAM (dynamic RAM), on the other hand, has an
extremely short data lifetime-usually less than a quarter of a second. This is true even when power is
applied constantly.
In short, SRAM has all the properties of the memory you think of when you hear the word RAM.
Compared to that, DRAM sounds kind of useless. What good is a memory device that retains its
contents for only a fraction of a second? By itself, such a volatile memory is indeed worthless.
However, a simple piece of hardware called a DRAM controller can be used to make DRAM behave
more like SRAM. (See DRAM Controllers later in this chapter.) The job of the DRAM controller is to
periodically refresh the data stored in the DRAM. By refreshing the data several times a second, the
DRAM controller keeps the contents of memory alive for as long as they are needed. So, DRAM is
as useful as SRAM after all.
DRAM Controllers
If your embedded system includes DRAM, there is probably a DRAM controller on board (or on-
chip) as well. The DRAM controller is an extra piece of hardware placed between the processor and
the memory chips. Its main purpose is to perform the refresh operations required to keep your data
alive in the DRAM. However, it cannot do this properly without some help from you.
One of the first things your software must do is initialize the DRAM controller. If you do not have
any other RAM in the system, you must do this before creating the stack or heap. As a result, this
initialization code is usually written in assembly language and placed within the hardware
initialization module.
Almost all DRAM controllers require a short initialization sequence that consists of one or more
setup commands. The setup commands tell the controller about the hardware interface to the DRAM
and how frequently the data there must be refreshed. To determine the initialization sequence for
your particular system, consult the designer of the board or read the databooks that describe the
DRAM and DRAM controller. If the DRAM in your system does not appear to be working properly,
it could be that the DRAM controller either is not initialized or has been Jinitialized incorrectly.
As shown in Figure 1.4, SRAM memory cells are made up of transistor-based flip-flop
circuitry that typically holds its data due to a moving current being switched bi-directionally on a
pair of inverting gates in the circuit, until power is cut off or the data is overwritten.
Figure 1.4 – 6 Transistor SRAM cell
As shown in Figure 1.5, DRAM memory cells are circuits with capacitors that hold a charge
in place (the charges or lack thereof reflecting data). DRAM capacitors need to be refreshed
frequently with power in order to maintain their respective charges, and to recharge capacitors after
DRAM is read (reading DRAM discharges the capacitor). The cycle of discharging and recharging
of memory cells is why this type of RAM is called dynamic.
Figure 1.5 – DRAM (capacitor based) memory cell

One of the major differences between SRAM and DRAM lies in the makeup of the DRAM
memory array itself. The capacitors in the memory array of DRAM are not able to hold a charge
(data). The charge gradually dissipates over time, thus requiring some additional mechanism to
refresh DRAM, in order to maintain the integrity of the data. This mechanism reads the data in
DRAM before it is lost, via a sense amplification circuit that senses a charge stored within the
memory cell, and writes it back onto the DRAM circuitry. Ironically, the process of reading the cell
also discharges the capacitor, even though reading the cell in the first place is part of the process of
correcting the problem of the capacitor gradually discharging in the first place. A memory controller
in the embedded system typically manages a DRAM’s recharging and discharging cycle by
initiating refreshes and keeping track of the refresh sequence of events. It is this refresh cycling
mechanism that discharges and recharges memory cells that gives this type of RAM its name—
“dynamic” RAM (DRAM)—and the fact that the charge in SRAM stays put is the basis for its name,
“static” RAM (SRAM). It is this same additional recharge circuitry that makes DRAM slower in
comparison to SRAM. (Note that SRAM is usually slower than registers, because the transistors
within the flip-flop are usually smaller, and thus do not carry as much current as those typically used
within registers.)
SRAMs also usually consume less power than DRAMs, since no extra energy is needed for
a refresh. On the flip side, DRAM is typically cheaper than SRAM, because of its capacitance based
design, in comparison to its SRAM flip-flop counterpart (more than one transistor). DRAM also can
hold more data than SRAM, since DRAM circuitry is much smaller than SRAM circuitry and more
DRAM circuitry can be integrated into an IC.
DRAM is usually the “main” memory in larger quantities, and is also used for video RAM
and cache. DRAMs used for display memory are also commonly referred to as frame buffers.
SRAM, because it is more expensive, is typically used in smaller quantities, but because it is also the
fastest type of RAM, it is used in external cache and video memory (when processing certain types
of graphics, and given a more generous budget, a system can implement a better-performing RAM).
When deciding which type of RAM to use, a system designer must consider access time and cost.
SRAM devices offer extremely fast access times (approximately four times faster than DRAM) but
are much more expensive to produce. Generally, SRAM is used only where access speed is
extremely important. A lower cost per byte makes DRAM attractive whenever large amounts of
RAM are required. Many embedded systems include both types: a small block of SRAM (a few
hundred kilobytes) along a critical data path and a much larger block of DRAM (in the megabytes)
for everything else.
Reading speed
Although the relative speed of RAM vs. ROM has varied over time, as of 2007 large RAM
chips can be read faster than most ROMs. For this reason (and to make for uniform access), ROM
content is sometimes copied to RAM or shadowed before its first use, and subsequently read from
RAM.
Writing speed
For those types of ROM that can be electrically modified, writing speed is always much
slower than reading speed, and it may require unusually high voltage, the movement of jumper plugs
to apply write-enable signals, and special lock/unlock command codes. Modern NAND Flash
achieves the highest write speeds of any rewritable ROM technology, with speeds as high as
15 MiB/s (or 70 ns/bit), by allowing (indeed requiring) large blocks of memory cells to be written
simultaneously.
3.3.3 Hybrid Types
As memory technology has matured in recent years, the line between RAM and ROM devices has
blurred. There are now several types of memory that combine the best features of both.
These devices do not belong to either group and can be collectively referred to as hybrid memory
devices. Hybrid memories can be read and written as desired, like RAM, but maintain their contents
without electrical power, just like ROM. Two of the hybrid devices, EEPROM and Flash, are
descendants of ROM devices; the third, NVRAM, is a modified version of SRAM.
EEPROMs are electrically-erasable-and-programmable. Internally, they are similar to EPROMs, but
the erase operation is accomplished electrically, rather than by exposure to ultraviolet light. Any
byte within an EEPROM can be erased and rewritten. Once written, the new data will remain in the
device forever-or at least until it is electrically erased. The tradeoff for this improved functionality is
mainly higher cost. Write cycles are also significantly longer than writes to a RAM, so you wouldn't
want to use an EEPROM for your main system memory.
Flash memory is the most recent advancement in memory technology. It combines all the best
features of the memory devices described thus far. Flash memory devices are high density, low cost,
nonvolatile, fast (to read, but not to write), and electrically reprogrammable. These advantages are
overwhelming and the use of Flash memory has increased dramatically in embedded systems as a
direct result. From a software viewpoint, Flash and EEPROM technologies are very similar. The
major difference is that Flash devices can be erased only one sector at a time, not byte by byte.
Typical sector sizes are in the range of 256 bytes to 16 kilobytes. Despite this disadvantage, Flash is
much more popular than EEPROM and is rapidly displacing many of the ROM devices as well.
The third member of the hybrid memory class is NVRAM (nonvolatile RAM). Nonvolatility is also
a characteristic of the ROM and hybrid memories discussed earlier. However, an NVRAM is
physically very different from those devices. An NVRAM is usually just an SRAM with a battery
backup. When the power is turned on, the NVRAM operates just like any other SRAM. But when
the power is turned off, the NVRAM draws just enough electrical power from the battery to retain
its current contents. NVRAM is fairly common in embedded systems. However, it is very
expensive-even more expensive than SRAM-so its applications are typically limited to the storage of
only a few hundred bytes of system-critical information that cannot be stored in any better way.
Direct Memory Access

Direct memory access (DMA) is a technique for transferring blocks of data directly between two
hardware devices. In the absence of DMA, the processor must read the data from one device and
write it to the other, one byte or word at a time. If the amount of data to be transferred is large, or the
frequency of transfers is high, the rest of the software might never get a chance to run. However, if a
DMA controller is present it is possible to have it perform the entire transfer, with little assistance
from the processor.
Here's how DMA works. When a block of data needs to be transferred, the processor provides the
DMA controller with the source and destination addresses and the total number of bytes. The DMA
controller then transfers the data from the source to the destination automatically. After each byte is
copied, each address is incremented and the number of bytes remaining is reduced by one. When the
number of bytes remaining reaches zero, the block transfer ends and the DMA controller sends an
interrupt to the processor.
In a typical DMA scenario, the block of data is transferred directly to or from memory. For example,
a network controller might want to place an incoming network packet into memory as it arrives, but
only notify the processor once the entire packet has been received. By using DMA, the processor
can spend more time processing the data once it arrives and less time transferring it between
devices. The processor and DMA controller must share the address and data buses during this time,
but this is handled automatically by the hardware and the processor is otherwise uninvolved with the
actual transfer.
Memory Management
Goals:
Protect the programs from each other, and the kernel from the programs.
Perform relocation
Relocation:
User program thinks it has the whole address space from address 0x0 to 0xffffffff.
Really it only has a part of the physical memory.
Need to map virtual address into a physical address.
This is performed by the MMU (Memory Management Unit), but the OS must configure it.
Context Switching
Whenever execution switches between a user program and the OS, a context switch occurs. The
operating system must now:
Save the PC, Stack Pointer, PSW.
Save the contents of the registers.
Reprogram the MMU registers.
Wait while the instructions in the CPU pipeline are trashed.
Wait for cache lines to load from new program’s memory.
Context switching is pretty expensive.
3.4 I/O
The entire point of an embedded microprocessor is to monitor or control some real-world
event. To do this, the microprocessor must have I/O capability. Like a desktop computer without a
monitor, printer, or keyboard, an embedded microprocessor without I/O is just a paperweight. The
I/O from an embedded control system falls into two broad categories: digital and analog. However,
at the microprocessor level, all I/O is digital. (Some microprocessor ICs have built-in ADCs, but the
processor itself still works with digital values.) The simplest form of I/O is a register that the
microprocessor can write to or a buffer that it can read.
Most of the peripherals require the use of a certain set of pins on the processor. In many
cases, the majority of those pins can be used for their specific function (serial port receiver, timer
output, DMA control signal, etc.), or they can be programmed to just act as a simple input or output
pin (PIO). This flexibility allows the silicon to be configured based on the needs of the design. For
example, if you don’t need two serial ports (and the processor comes with two), then the pins that
are allocated to the second port (RX2, TX2, and maybe DTR2, CTS2, etc…) can be programmed to
function as simple PIO pins and used to drive an LED or read a switch. Programmable pins are
sometimes referred to as dual function. Note that this dual functionality should not be assumed. How
each pin is configured and the ability to configure it to run in different modes is dependent on the
processor implementation.Often a pin name is chosen to reflect the pin’s dual personality. For
example if RX2 can be configured as a serial port 2 receiver or as a PIO pin, then it will probably be
labeled as RX2/PION (or something similar), where N is some number between one and M, and M
is the number of PIO pins on the processor.Some microprocessors may be advertised as having a set
of features but actually provide these features on dual-function pins. Hence, the full set of advertised
features (two serial ports and 32 PIO lines) may not be simultaneously available (because the pins
used for the second serial port are dual-functioned as PIO lines.
3.4.1 Study of External Peripherals
At this point, you've studied every aspect of the new hardware except the external peripherals.
These are the hardware devices that reside outside the processor chip and communicate with it by
way of interrupts and I/O or memory-mapped registers.
Begin by making a list of the external peripherals. Depending on your application, this list might
include LCD or keyboard controllers, A/D converters, network interface chips, or custom ASICs
(Application-Specific Generated Circuits). In the case of the Arcom board, the list contains just three
items: the Zilog 85230 Serial Controller, parallel port, and debugger port.
You should obtain a copy of the user's manual or databook for each device on your list. At this early
stage of the project, your goal in reading these documents is to understand the basic functions of the
device. What does the device do? What registers are used to issue commands and receive the
results? What do the various bits and larger fields within these registers mean?
When, if ever, does the device generate interrupts? How are interrupts acknowledged or cleared at
the device?
When you are designing the embedded software, you should try to break the program down along
device lines. It is usually a good idea to associate a software module called a device driver with each
of the external peripherals. This is nothing more than a collection of software routines that control
the operation of the peripheral and isolate the application software from the details of that particular
hardware device.
3.4.1.1 Initialize the Hardware
The final step in getting to know your new hardware is to write some initialization software.
This is your best opportunity to develop a close working relationship with the hardware-especially if
you will be developing the remainder of the software in a high-level language. During hardware
initialization it will be impossible to avoid using assembly language. However, after completing this
step, you will be ready to begin writing small programs in C or C++.
The hardware initialization should be executed before the startup code.
The code described there assumes that the hardware has already been initialized and concerns itself
only with creating a proper runtime environment for high-level language programs.
Figure below provides an overview of the entire initialization process, from processor reset through
hardware initialization and C/C++ startup code to main.
The first stage of the initialization process is the reset code. This is a small piece of assembly
(usually only two or three instructions) that the processor executes immediately after it is powered
on or reset. The sole purpose of this code is to transfer control to the hardware initialization routine.
The first instruction of the reset code must be placed at a specific location in memory, usually called
the reset address, that is specified in the processor databook. Most of the actual hardware
initialization takes place in the second stage. At this point, we need to inform the processor about its
environment. This is also a good place to initialize the interrupt controller and other critical
peripherals. Less critical hardware devices can be initialized when the associated device driver is
started, usually from within main.
Intel's 8051/80251 has several internal registers that must be programmed before any useful work
can be done with the processor. These registers are responsible for setting up the memory and I/O
maps and are part of the processor's internal chip-select unit. By programming the chip-select
registers, you are essentially waking up each of the memory and I/O devices that are connected to
the processor. Each chip-select register is associated with a single "chip enable" wire that runs from
the processor to some other chip. The association between particular chip-selects and hardware
devices must be established by the hardware designer. All you need to do is get a list of chip-select
settings from him and load those settings into the chip-select registers.
The third initialization stage contains the startup code, its job is to the prepare the way for code
written in a high-level language. Of importance here is only that the startup code calls main. From
that point forward, all of your other software can be written in C or C++.
3.4.2 Peripheral devices
In addition to the processor and memory, most embedded systems contain a handful of other
hardware devices. Some of these devices are specific to the application domain, whileothers-like
timers and serial ports-are useful in a wide variety of systems. The most generically useful of these
are often included within the same chip as the processor and are called internal, or on-chip,
peripherals. Hardware devices that reside outside the processor chip are, therefore, said to be
external peripherals. In this chapter we'll discuss the most common software issues that arise when
interfacing to a peripheral of either type.
3.4.2.1 Control and Status Registers
The basic interface between an embedded processor and a peripheral device is a set of control and
status registers. These registers are part of the peripheral hardware, and their locations, size, and
individual meanings are features of the peripheral. For example, the registers within a serial
controller are very different from those in a timer/counter. In this section, I'll describe how to
manipulate the contents of these control and status registers directly from your C/C++ programs.
Depending upon the design of the processor and board, peripheral devices are located either in the
processor's memory space or within the I/O space. In fact, it is common for embedded systems to
include some peripherals of each type. These are called memory-mapped and I/O-mapped
peripherals, respectively. Of the two types, memory-mapped peripherals are generally easier to work
with and are increasingly popular.
Memory-mapped control and status registers can be made to look just like ordinary variables. To
do this, you need simply declare a pointer to the register, or block of registers, and set the value of
the pointer explicitly.
Note, however, that there is one very important difference between device registers and ordinary
variables. The contents of a device register can change without the knowledge or intervention of
your program. That's because the register contents can also be modified by the peripheral hardware.
By contrast, the contents of a variable will not change unless your program modifies them explicitly.
For that reason, we say that the contents of a device register are volatile, or subject to change
without notice.
The C/C++ keyword volatile should be used when declaring pointers to device registers.
This warns the compiler not to make any assumptions about the data stored at that address.
For example, if the compiler sees a write to the volatile location followed by another write to that
same location, it will not assume that the first write is an unnecessary use of processor time. In other
words, the keyword volatile instructs the optimization phase of the compiler to treat that variable
as though its behavior cannot be predicted at compile time.
The primary disadvantage of the other type of device registers, I/O-mapped registers, is that there
is no standard way to access them from C or C++. Such registers are accessible only with the help of
special machine-language instructions. And these processor-specific instructions are not supported
by the C or C++ language standards. So it is necessary to use special library routines or inline
assembly (as we did in Chapter 2) to read and write the registers of an I/O-mapped device.
3.4.2.2 The Device Driver Philosophy
When it comes to designing device drivers, you should always focus on one easily stated goal: hide
the hardware completely. When you're finished, you want the device driver module to be the only
piece of software in the entire system that reads or writes that particular device's control and status
registers directly. In addition, if the device generates any interrupts, the interrupt service routine that
responds to them should be an integral part of the device driver.
In this section, I'll explain why is recommend this philosophy and how it can be achieved.
Of course, attempts to hide the hardware completely are difficult. Any programming interface you
select will reflect the broad features of the device. That's to be expected. The goal should be to
create a programming interface that would not need to be changed if the underlying peripheral
were replaced with another in its general class. For example, all Flash memory devices share the
concepts of sectors (though the sector size can differ between chips). An erase operation can be
performed only on an entire sector, and once erased, individual bytes or words can be rewritten. So
the programming interface provided by the Flash driver example in the last chapter should work
with any Flash memory device. The specific features of the AMD 29F010 are hidden from that
level, as desired.
Device drivers for embedded systems are quite different from their workstation counterparts.
In a modern computer workstation, device drivers are most often concerned with satisfying the
requirements of the operating system. For example, workstation operating systems generally impose
strict requirements on the software interface between themselves and a network card. The device
driver for a particular network card must conform to this software interface, regardless of the
features and capabilities of the underlying hardware. Application programs that want to use the
network card are forced to use the networking API provided by the operating system and don't have
direct access to the card itself. In this case, the goal of hiding the hardware completely is easily met.
By contrast, the application software in an embedded system can easily access your hardware.
In fact, because all of the software is linked together into a single binary image, there is rarely even
a distinction made between application software, operating system, and device drivers.
The drawing of these lines and the enforcement of hardware access restrictions are purely the
responsibilities of the software developers. Both are design decisions that the developers must
consciously make. In other words, the implementers of embedded software can more easily cheat on
the software design than their non-embedded peers.
The benefits of good device driver design are threefold.
• First, because of the modularization, the structure of the overall software is easier to
understand.
• Second, because there is only one module that ever interacts directly with the peripheral's
registers, the state of the hardware can be more accurately tracked.
• And, last but not least, software changes that result from hardware changes are localized to
the device driver.
Each of these benefits can and will help to reduce the total number of bugs in your embedded
software. But you have to be willing to put in a bit of extra effort at design time in order to realize
such savings.
If you agree with the philosophy of hiding all hardware specifics and interactions within the device
driver, it will usually consist of the five components in the following list. To make driver
implementation as simple and incremental as possible, these elements should be developed in the
order in which they are presented.
1. A data structure that overlays the memory-mapped control and status registers of the device
The first step in the driver development process is to create a C-style struct that looks just like the
memory-mapped registers of your device. This usually involves studying the data book for the
peripheral and creating a table of the control and status registers and their offsets.
Then, beginning with the register at the lowest offset, start filling out the struct. (If one or more
locations are unused or reserved, be sure to place dummy variables there to fill in the additional
space.)
2. A set of variables to track the current state of the hardware and device driver
The second step in the driver development process is to figure out what variables you will need to
track the state of the hardware and device driver. For example, in the case of the timer/counter unit
described earlier we'll probably need to know if the hardware has been initialized. And if it has been,
we might also want to know the length of the running countdown.
Some device drivers create more than one software device. This is a purely logical device that is
implemented over the top of the basic peripheral hardware. For example, it is easy to imagine that
more than one software timer could be created from a single timer/counter unit. The timer/counter
unit would be configured to generate a periodic clock tick, and the device driver would then manage
a set of software timers of various lengths by maintaining state information for each.
3. A routine to initialize the hardware to a known state

Once you know how you'll track the state of the physical and logical devices, it's time to start
writing the functions that actually interact with and control the device. It is probably best to begin
with the hardware initialization routine. You'll need that one first anyway, and it's a good way to get
familiar with the device interaction.
4. A set of routines that, taken together, provide an API for users of the device driver
After you've successfully initialized the device, you can start adding other functionality to the driver.
Hopefully, you've already settled on the names and purposes of the various routines, as well as their
respective parameters and return values. All that's left to do now is implement and test each one.
We'll see examples of such routines in the next section.
5. One or more interrupt service routines

It's best to design, implement, and test most of the device driver routines before enabling interrupts
for the first time. Locating the source of interrupt-related problems can be quite challenging. And, if
you add possible bugs in the other driver modules to the mix, it could even approach impossible. It's
far better to use polling to get the guts of the driver working. That way you'll know how the device
works (and that it is indeed working) when you start looking for the source of your interrupt
problems.
And there will almost certainly be some of those.
5. Decodificarea adreselor.
1. Microprocessor-based System Design, Ricardo Gutierrez-Osuna,

Wright State University
6. Flip-Flops, Registers, Counters
6.1 Flip-Flops
6.1.1 RS Flip-Flops
When both inputs, R and S, are equal to 0 the latch maintains its existing state. This state may be
either Qa = 0 and Qb = 1, or Qa = 1 and Qb = 0, which is indicated in the truth table by stating that
the Qa and Qb outputs have values 0/1 and 1/0, respectively. Observe that Qa and Qb are
complements of each other in this case. When R = 0 and S = 1, the latch is set into a state where Qa
= 1 and Qb = 0.
When R = 1 and S = 0, the latch is reset into a state where Qa = 0 and Qb = 1. The fourth possibility
is to have R = S = 1. In this case both Qa and Qb will be 0.
The basic SR latch can serve as a useful memory element. It remembers its state when both the S
and R inputs are 0. It changes its state in response to changes in the signals on these inputs. The state
changes occur at the time when the changes in the signals occur. If we cannot control the time of
such changes, then we don’t know when the latch may change its state.
Gated SR Latch with NAND Gates

6.1.1 Gated D latch
We describe another gated latch that is even more useful in practice. It has a single data input, called
D, and it stores the value on this input, under the control of a clock signal. It is called a gated D
latch.
6.1.2 Master-Slave and Edge-Triggered D Flip-Flops

In the level-sensitive latches, the state of the latch keeps changing according to the values of input
signals during the period when the clock signal is active (equal to 1 in our examples).
As we will see, there is also a need for storage elements that can change their states no more than
once during one clock cycle. We will discuss two types of circuits that exhibit such behavior.
Consider the circuit given above, which consists of two gated D latches. The first, called master,
changes its state while Clock = 1. The second, called slave, changes its state while Clock = 0. The
operation of the circuit is such that when the clock is high, the master tracks the value of the D input
signal and the slave does not change. Thus the value of Qm follows any changes in D, and the value
of Qs remains constant. When the clock signal changes to 0, the master stage stops following the
changes in the D input. At the same time, the slave stage responds to the value of the signal Qm and
changes state accordingly. Since Qm does not change while Clock = 0, the slave stage can undergo
at most one change of state during a clock cycle. From the external observer’s point of view,
namely, the circuit connected to the output of the slave stage, the master-slave circuit changes its
state at the negative-going edge of the clock. The negative edge is the edge where the clock signal
changes from 1 to 0. Regardless of the number of changes in the D input to the master stage during
one clock cycle, the observer of the Qs signal will see only the change that corresponds to the D
input at the negative edge of the clock. The above circuit is called a master-slave D flip-flop. The
term flip-flop denotes a storage element that changes its output state at the edge of a controlling
clock signal. The timing diagram for this flip-flop and a graphical symbol are given also. In the
symbol we use the > mark to denote that the flip-flop responds to the “active edge” of the clock. We
place a bubble on the clock input to indicate that the active edge for this particular circuit is the
negative edge.
A positive-edge-triggered D flip-flop.
It requires only six NAND gates and, hence, fewer transistors. The operation of the circuit is as
follows. When Clock = 0, the outputs of gates 2 and 3 are high. Thus P1 = P2 = 1, which maintains
the output latch, comprising gates 5 and 6, in its present state. At the same time, the signal P3 is
equal to D, and P4 is equal to its complement D. When Clock changes to 1, the following changes
take place. The values of P3 and P4 are transmitted through gates 2 and 3 to cause P1 = D and P2 =
D, which sets Q = D and Q = D. To operate reliably, P3 and P4 must be stable when Clock changes
from 0 to 1. Hence the setup time of the flip-flop is equal to the delay from the D input through gates
4 and 1 to P3. The hold time is given by the delay through gate 3 because once P2 is stable, the
changes in D no longer matter.
For proper operation it is necessary to show that, after Clock changes to 1, any further changes in D
will not affect the output latch as long as Clock = 1. We have to consider two cases. Suppose first
that D = 0 at the positive edge of the clock. Then P2 = 0, which will keep the output of gate 4 equal
to 1 as long as Clock = 1, regardless of the value of the D input. The second case is if D = 1 at the
positive edge of the clock. Then P1 = 0, which forces the outputs of gates 1 and 3 to be equal to 1,
regardless of the D input. Therefore, the flip-flop ignores changes in the D input while Clock = 1.
6.1.3 D Flip-Flops with Clear and Preset

Flip-flops are often used for implementation of circuits that can have many possible states, where
the response of the circuit depends not only on the present values of the circuit’s inputs but also on
the particular state that the circuit is in at that time. A simple example is a counter circuit that counts
the number of occurrences of some event, perhaps passage of time. A counter comprises a number
of flip-flops, whose outputs are interpreted as a number. The counter circuit has to be able to
increment or decrement the number. It is also important to be able to force the counter into a known
initial state (count).
Obviously, it must be possible to clear the count to zero, which means that all flip-flops must have Q
= 0. It is equally useful to be able to preset each flip-flop to Q = 1, to insert some specific count as
the initial value in the counter.
6.1.4 T Flip-Flop
The D flip-flop is a versatile storage element that can be used for many purposes. By including some
simple logic circuitry to drive its input, the D flip-flop may appear to be a different type of storage
element. An interesting modification is presented below.
This circuit uses a positive-edge-triggered D flip-flop. The feedback connections make the input
signal D equal to either the value of Q or Q under the control of the signal that is labeled T. On each
positive edge of the clock, the flip-flop may change its state Q(t). If T = 0, then D = Q and the state
will remain the same, that is, Q(t + 1) = Q(t). But if T = 1, then D = Q and the new state will be Q(t
+ 1) = Q(t). Therefore, the overall operation of the circuit is that it retains its present state if T = 0,
and it reverses its present state if T = 1.
The operation of the circuit is specified in the form of a truth.
Any circuit that implements this truth table is called a T flip-flop. The name T flip-flop derives from
the behavior of the circuit, which “toggles” its state when T = 1. The toggle feature makes the T flip-
flop a useful element for building counter circuits.
6.1.5 T JK Flip-Flop
Another interesting circuit can be derived from above figure. Instead of using a single control input,
T, we can use two inputs, J and K, as indicated in Figure below. For this circuit the input D is
defined as
D = JQ + KQ
A corresponding truth table is given in also. The circuit is called a JK flip-flop. It combines the
behaviors of SR and T flip-flops in a useful way. It behaves as the SR flip-flop, where J = S and K =
R, for all input values except J = K = 1. For the latter case, which has to be avoided in the SR flip-
flop, the JK flip-flop toggles its state like the T flip-flop.
The JK flip-flop is a versatile circuit. It can be used for straight storage purposes, just like the D and
SR flip-flops. But it can also serve as a T flip-flop by connecting the J and K inputs together.
Summary of Terminology
We have used the terminology that is quite common. But the reader should be aware that different
interpretations of the terms latch and flip-flop can be found in the literature. Our terminology can be
summarized as follows:
Basic latch is a feedback connection of two NOR gates or two NAND gates, which can store one bit
of information. It can be set to 1 using the S input and reset to 0 using the R input.
Gated latch is a basic latch that includes input gating and a control input signal. The latch retains its
existing state when the control input is equal to 0. Its state may be changed when the control signal
is equal to 1. In our discussion we referred to the control input as the clock. We considered two
types of gated latches:
• Gated SR latch uses the S and R inputs to set the latch to 1 or reset it to 0, respectively.
• Gated D latch uses the D input to force the latch into a state that has the same logic value as
the D input.
A flip-flop is a storage element based on the gated latch principle, which can have its output
state changed only on the edge of the controlling clock signal. We considered two types:
• Edge-triggered flip-flop is affected only by the input values present when the active edge of
the clock occurs.
• Master-slave flip-flop is built with two gated latches. The master stage is active during half
of the clock cycle, and the slave stage is active during the other half.
The output value of the flip-flop changes on the edge of the clock that activates the transfer into the
slave stage. Master-slave flip-flops can be edge-triggered or level sensitive. If the master stage is a
gated D latch, then it behaves as an edge-triggered flip-flop. If the master stage is a gated SR latch,
then the flip-flop is level sensitive.
6.2 Registers
A flip-flop stores one bit of information. When a set of n flip-flops is used to store n bits of
information, such as an n-bit number, we refer to these flip-flops as a register. A common clock is
used for each flip-flop in a register, and each flip-flop operates as described in the previous sections.
The term register is merely a convenience for referring to n-bit structures consisting of flip-flops.
6.2.1 Shift Register

We explained that a given number is multiplied by 2 if its bits are shifted one bit position to the left
and a 0 is inserted as the new least-significant bit. Similarly, the number is divided by 2 if the bits
are shifted one bit-position to the right. A register that provides the ability to shift its contents is
called a shift register.
Figure below shows a four-bit shift register that is used to shift its contents one bit position to the
right. The data bits are loaded into the shift register in a serial fashion using the In input. The
contents of each flip-flop are transferred to the next flip-flop at each positive edge of the clock. An
illustration of the transfer is given below, which shows what happens when the signal values at In
during eight consecutive clock cycles are 1, 0, 1, 1, 1, 0, 0, and 0, assuming that the initial state of
all flip-flops is 0.
To implement a shift register, it is necessary to use either edge-triggered or master-slave flip-flops.
The level-sensitive gated latches are not suitable, because a change in the value of In would
propagate through more than one latch during the time when the clock is equal to 1.
6.2.2 Parallel-Access Shift Register

In computer systems it is often necessary to transfer n-bit data items. This may be done by
transmitting all bits at once using n separate wires, in which case we say that the transfer is
performed in parallel. But it is also possible to transfer all bits using a single wire, by performing
the transfer one bit at a time, in n consecutive clock cycles. We refer to this scheme as serial
transfer. To transfer an n-bit data item serially, we can use a shift register that can be loaded with all
n bits in parallel (in one clock cycle). Then during the next n clock cycles, the contents of the
register can be shifted out for serial transfer. The reverse operation is also needed. If bits are
received serially, then after n clock cycles the contents of the register can be accessed in parallel as
an n-bit item.
Figure below shows a four-bit shift register that allows the parallel access. Instead of using the
normal shift register connection, the D input of each flip-flop is connected to two different sources.
One source is the preceding flip-flop, which is needed for the shift register operation. The other
source is the external input that corresponds to the bit that is
to be loaded into the flip-flop as a part of the parallel-load operation. The control signal Shift/Load is
used to select the mode of operation. If Shift/Load = 0, then the circuit operates as a shift register. If
Shift/Load = 1, then the parallel input data are loaded into the register. In both cases the action takes
place on the positive edge of the clock.
We have chosen to label the flip-flops outputs as Q3, . . . ,Q0 because shift registers are often used to
hold binary numbers. The contents of the register can be accessed in parallel by observing the
outputs of all flip-flops. The flip-flops can also be accessed serially, by observing the values of Q 0
during consecutive clock cycles while the contents are being shifted. A circuit in which data can be
loaded in series and then accessed in parallel is called a series-to-parallel converter. Similarly, the
opposite type of circuit is a parallel-to-series converter. The presented circuit can perform both of
these functions.
6.3 Counters
Counter circuits are used in digital systems for many purposes. They may count the number of
occurrences of certain events, generate timing intervals for control of various tasks in a system, keep
track of time elapsed between specific events, and so on.
Counters can be implemented using the adder/substractor circuits. However, since we only need to
change the contents of a counter by 1, it is not necessary to use such elaborate circuits. Instead, we
can use much simpler circuits that have a significantly lower cost. We will show how the counter
circuits can be designed using T and D flip-flops.
6.3.1 Asynchronous Counters

The simplest counter circuits can be built using T flip-flops because the toggle feature is naturally
suited for the implementation of the counting operation.
6.3.1.1 Up-Counter with T Flip-Flops
Figure 7.20a gives a three-bit counter capable of counting from 0 to 7. The clock inputs of the three
flip-flops are connected in cascade. The T input of each flip-flop is connected to a constant 1, which
means that the state of the flip-flop will be reversed (toggled) at each positive edge of its clock. We
are assuming that the purpose of this circuit is to count the number of pulses that occur on the
primary input called Clock. Thus the clock input of the first flip-flop is connected to the Clock line.
The other two flip-flops have their clock inputs driven by the Q output of the preceding flip-flop.
Therefore, they toggle their state whenever the preceding flip-flop changes its state from Q = 1 to Q
= 0, which results in a positive edge of the Q signal.
Figure 7.20b shows a timing diagram for the counter. The value of Q0 toggles once each clock
cycle. The change takes place shortly after the positive edge of the Clock signal. The delay is caused
by the propagation delay through the flip-flop. Since the second flip-flop is clocked by Q0, the value
of Q1 changes shortly after the negative edge of the Q0 signal.
Similarly, the value of Q2 changes shortly after the negative edge of the Q1 signal. If we look at the
values Q2Q1Q0 as the count, then the timing diagram indicates that the counting sequence is 0, 1, 2,
3, 4, 5, 6, 7, 0, 1, and so on. This circuit is a modulo-8 counter. Because it counts in the upward
direction, we call it an up-counter.
The counter in Figure above has three stages, each comprising a single flip-flop. Only the first stage
responds directly to the Clock signal; we say that this stage is synchronized to the clock. The other
two stages respond after an additional delay. For example, when Count = 3, the next clock pulse will
cause the Count to go to 4. As indicated by the arrows in the timing diagram, this change requires
the toggling of the states of all three flip-flops. The change in Q0 is observed only after a
propagation delay from the positive edge of Clock. The Q1 and Q2 flip-flops have not yet changed;
hence for a brief time the count is Q2Q1Q0 = 010. The change in Q1 appears after a second
propagation delay, at which point the count is 000. Finally, the change in Q2 occurs after a third
delay, at which point the stable state of the circuit is reached and the count is 100. The circuit in
Figure below is an asynchronous counter, or a ripple counter.
6.3.1.2 Down-Counter with T Flip-Flops

A slight modification of the above circuit is presented in below. The only difference is that the clock
inputs of the second and third flip-flops are driven by the Q outputs of the preceding stages, rather
than by the not-Q outputs.
The timing diagram shows that this circuit counts in the sequence 0, 7, 6, 5, 4, 3, 2, 1, 0, 7, and so
on. Because it counts in the downward direction, we say that it is a down-counter.
It is possible to combine the functionality of the circuits above circuits to form a counter that can
count either up or down. Such a counter is called an up/down-counter.
6.3.2 Synchronous Counters

The asynchronous counters are simple, but not very fast. If a counter with a larger number of bits is
constructed in this manner, then the delays caused by the cascaded clocking scheme may become
too long to meet the desired performance requirements. We can build a faster counter by clocking all
flip-flops at the same time, using the approach described below.
6.3.2.1 Synchronous Counter with T Flip-Flops

Table 7.1 shows the contents of a three-bit up-counter for eight consecutive clock cycles, assuming
that the count is initially 0. Observing the pattern of bits in each row of the table, it is apparent that
bit Q0 changes on each clock cycle. Bit Q1 changes only when Q0 = 1. Bit Q2 changes only when
both Q1 and Q0 are equal to 1. In general, for an n-bit up-counter, a given flip-flop changes its state
only when all the preceding flip-flops are in the state Q = 1. Therefore, if we use T flip-flops to
realize the counter, then the T inputs are defined as
T0 = 1
T1 = Q0
T2 = Q0Q1
T3 = Q0Q1Q2
Tn = Q0・Q1 ・・・Qn−1
Instead of using AND gates of increased size for each stage, which may lead to fan-in problems, we
use a factored arrangement, as shown in the figure. This arrangement does not slow down the
response of the counter, because all flip-flops change their states after a propagation delay from the
positive edge of the clock. Note that a change in the value of
Q0 may have to propagate through several AND gates to reach the flip-flops in the higher stages of
the counter, which requires a certain amount of time. This time must not exceed the clock period.
Actually, it must be less than the clock period minus the setup time for the flip-flops.
Enable and Clear Capability
The above counters change their contents in response to each clock pulse. Often it is desirable to be
able to inhibit counting, so that the count remains in its present state. This may be accomplished by
including an Enable control signal, as indicated below.
The circuit is the counter where the Enable signal controls directly the T input of the first flip-flop.
Connecting the Enable also to the AND gate chain means that if Enable = 0, then all T inputs will be
equal to 0. If Enable = 1, then the counter operates as explained previously.
In many applications it is necessary to start with the count equal to zero. This is easily achieved if
the flip-flops can be cleared. The clear inputs on all flip-flops can be tied together and driven by a
Clear control input.
6.3.2.2 Synchronous Counter with D Flip-Flops

While the toggle feature makes T flip-flops a natural choice for the implementation of counters, it is
also possible to build counters using other types of flip-flops. The JK flip-flops can be used in
exactly the same way as the T flip-flops because if the J and K inputs are tied together, a JK flip-flop
becomes a T flip-flop. We will now consider using D flip-flops for this purpose.
It is not obvious how D flip-flops can be used to implement a counter. Here we will present a circuit
structure that meets the requirements. We gives a four-bit up-counter that counts in the sequence 0,
1, 2, . . . , 14, 15, 0, 1, and so on. The count is indicated by the flip-flop outputs Q3Q2Q1Q0. If we
assume that Enable = 1, then the D inputs of the flip-flops are defined by the expressions
D0 = Q0 = 1 ⊕ Q0
D1 = Q1 ⊕ Q0
D2 = Q2 ⊕ Q1Q0
D3 = Q3 ⊕ Q2Q1Q0
For a larger counter the ith stage is defined by
Di = Qi ⊕ Qi−1Qi−2・・・Q1Q0
We will show how to derive these equations in Chapter 8.
We have included the Enable control signal so that the counter counts the clock pulses only if
Enable = 1. In effect, the above equations are modified to implement the circuit in the figure as
follows
D0 = Q0 ⊕ Enable
D1 = Q1 ⊕ Q0 ・ Enable
D2 = Q2 ⊕ Q1 ・ Q0 ・ Enable
D3 = Q3 ⊕ Q2 ・Q1 ・Q0 ・Enable
The operation of the counter is based on our observation for that the state of the flip-flop in stage i
changes only if all preceding flip-flops are in the state Q = 1. This makes the output of the AND gate
that feeds stage i equal to 1, which causes the output of the XOR gate connected to Di to be equal to
Qi . Otherwise, the output of the XOR gate provides Di = Qi , and the flip-flop remains in the same
state.
This resembles the carry propagation in a carry-look-ahead adder circuit; hence the AND-gate chain
can be thought of as the carry chain. Even though the circuit is only a four-bit counter, we have
included an extra AND that produces the “output carry.” This signal makes it easy to concatenate
two such four-bit counters to create an eight-bit counter.
6.3.3 Counters with Parallel Load

Often it is necessary to start counting with the initial count being equal to 0. This state can be
achieved by using the capability to clear the flip-flops. But sometimes it is desirable to start with a
different count. To allow this mode of operation, a counter circuit must have some inputs through
which the initial count can be loaded.
Using the Clear and Preset inputs for this purpose is a possibility, but a better approach is discussed
below.
A two-input multiplexer is inserted before each D input. One input to the multiplexer is used to
provide the normal counting operation. The other input is a data bit that can be loaded directly into
the flip-flop. A control input, Load, is used to choose the mode of operation. The circuit counts
when Load = 0. A new initial value, D3D2D1D0, is loaded into the counter when Load = 1.
Reset Synchronization
We have already mentioned that it is important to be able to clear, or reset, the contents of a counter
prior to commencing a counting operation. This can be done using the clear capability of the
individual flip-flops. But we may also be interested in resetting the count to 0 during the normal
counting process. An n-bit up-counter functions naturally as a modulo- 2n counter. Suppose that we
wish to have a counter that counts modulo some base that is not a power of 2. For example, we may
want to design a modulo-6 counter, for which the counting sequence is 0, 1, 2, 3, 4, 5, 0, 1, and so
on.
The most straightforward approach is to recognize when the count reaches 5 and then reset the
counter. An AND gate can be used to detect the occurrence of the count of 5.
Actually, it is sufficient to ascertain that Q2 = Q0 = 1, which is true only for 5 in our desired
counting sequence. A circuit based on this approach is given below. It uses a three-bit synchronous
counter of the type depicted in Figure 7.25. The parallel-load feature of the counter is used to reset
its contents when the count reaches 5. The resetting action takes place at the positive clock edge
after the count has reached 5. It involves loading D2D1D0 = 000 into the flip-flops. As seen in the
timing diagram in Figure 7.26b, the desired counting sequence is achieved, with each value of the
count being established for one full clock cycle. Because the counter is reset on the active edge of
the clock, we say that this type of counter has a synchronous reset.
The flip-flops are cleared to 0 a short time after the NAND gate has detected the count of 5. This
time depends on the gate delays in the circuit, but not on the clock. Therefore, signal values Q 2Q1Q0
= 101 are maintained for a time that is much less than a clock cycle. Depending on a particular
application of such a counter, this may be adequate, but it may also be completely unacceptable. For
example, if the counter is used in a digital system where all operations in the system are
synchronized by the same clock, then this narrow pulse denoting Count = 5 would not be seen by the
rest of the system. To solve this problem, we could try to use a modulo-7 counter instead, assuming
that the system would ignore the short pulse that denotes the count of 6. This is not a good way of
designing circuits, because undesirable pulses often cause unforeseen difficulties in practice.
7. Timers/Counters
Timers and counters, which are present in most microcontroller chips, allow generation of pulses
and interrupts at regular intervals. They can also be used to count pulses and measure event timing.
Some of the more sophisticated versions can measure frequency, pulse width, and relative pulse
timing on inputs. Outputs can be defined to have a given repetition rate, pulse width, and even
complex sequences of pulses in some cases.
A simple timer consists of a simple, loadable 8-bit counter. You could build this from a couple of
74HC161 counters or equivalent PLD logic.
The microprocessor can write a value to the timer that is transferred to the counter outputs. If
the counter is an UP counter, it counts up. A DOWN counter counts down. A typical timer
embedded in a microcontroller or in a timer IC will have some means to start the timer once it is
loaded, typically by setting a bit in a register. The clock input to the counter may be a derivative of
the microprocessor clock or it may be a signal applied to one of the external pins. A real timer will
also provide the outputs of the counter to the microprocessor so it can read the count. If the
microprocessor loads this timer with a value of 0xFE and then starts the timer, it will count from FE
to FF on the next clock. On the second clock, it will count from FF to 00 and generate an output.
The output of the timer may set a flip-flop that the microprocessor can read, or it may generate an
interrupt to the microprocessor, or both. The timer may stop once it has generated an output, or it
may continue counting from 00 back to FF. The problem with a continuously running timer is that it
will count from the loaded value the first time it counts up, but the second time it will start from 00.
3.3.1.1 Reloading a timer
This timer has an 8-bit latch to hold the value written by the microprocessor. When the
microprocessor writes to the latch, it also loads the counter. An OR gate also loads the timer when it
rolls over from FF to 00. For this example, we will assume that the logic in the IC gets all the
polarities and timings of the load signal correct so that there are no glitches or race conditions.
The way this timer works is that the microprocessor writes a value to the latch (also loading
it into the timer) and then starts the timer. When the timer rolls over from FF to 00, it generates an
output (again, either a latched bit for the microprocessor to read or an interrupt). At the same time
that the output is generated, the timer is loaded from the latch contents. Since the latch still holds the
value written by the microprocessor, the counter will start counting again from the same point it did
before. Now the timer will produce a regular output with the same accuracy as the input clock. This
output could be used to generate a regular interrupt, to provide a baud rate clock to a UART, or to
provide a signal to any device that needs a regular pulse. A variation of this feature used in some
microcontrollers does not load the counter with the desired count value but instead loads it into a
digital comparator. The comparator compares the counter value to the value written by the
microprocessor. The counter starts at zero and counts up. When the count equals the value written
by the microprocessor, the counter is reset to zero and the process repeats. The effect is the same as
the timer just described.
3.3.1.2 Input Capture Timer
In this case, the timer counts from zero to FF. When a pulse occurs on the capture input pin,
the contents of the counter are transferred to an 8-bit latch and the counter is reset. The input pulse
also generates an interrupt to the microprocessor. The timer is connected directly to the input pin; in
an actual circuit, of course, there will be some gating and synchronizing logic to make sure all the
timing is right. Similarly, the capture pin will not connect directly to a microprocessor interrupt but
will be passed through some flip-flops, timing logic, interrupt controller logic, and so on.
This configuration is typically used to measure the time between the leading edge of two
pulses. The timer is run at a constant clock, usually a derivative of the microprocessor clock. Each
time an edge occurs on the input capture pin, the processor is interrupted and the software reads the
capture latch. The value in the latch is the number of clocks that occurred since the last pulse. Some
microcontrollers do not reset the counter on an input capture but let
the counter free run. In those configurations, the software must remember the previous reading and
subtract the new reading from it. When the counter rolls over from FF to 00, the software must
recognize that fact and correct the numbers; if it doesn’t, negative values will result. Many
microcontrollers that provide a capturetype timer also provide a means for the counter to generate an
interrupt when it rolls over, which can simplify this software task.
3.3.1.3 Watchdog Timer
The watchdog timer (WDT) acts as a safety net for the system. If the software stops
responding or attending to the task at hand, the watchdog timer detects that something is amiss and
resets the software automatically. The system might stop responding as a result of any number of
difficult-to-detect hardware or firmware defects. For example, if an unusual condition causes a
buffer over run that corrupts the stack frame, some function’s return address could be overwritten.
When that function completes, it then returns to the wrong spot leaving the system utterly confused.
Runaway pointers (firmware) or a glitch on the data bus (hardware) can cause similar crashes.
Different external factors can cause “glitches.” For example, even a small electrostatic discharge
near the device might cause enough interference to momentarily change the state of one bit on the
address or data bus. Unfortunately, these kinds of defects can be very intermittent, making them
easy to miss during the project’s system test stage.
The watchdog timer is a great protector. Its sole purpose is to monitor the CPU with a “you
scratch my back and I’ll scratch yours” kind of relationship. The typical watchdog has an input pin
that must be toggled periodically (forexample, once every second). If the watchdog is not toggled
within that period, it pulses one of its output pins. Typically, this output pin is tied either to the
CPU’s reset line or to some nonmaskable interrupt (NMI), and the input pin is tied to an I/O line of
the CPU. Consequently, if the firmware does not keep the watchdog input line toggling at the
specified rate, the watchdog assumes that the firmware has stopped working, complains, and causes
the CPU to be restarted.
3.3.1.4 Using Timers
Time-Based Temperature Measurement - An example that illustrates some of the
important issues you must consider when using timers involves measurement of temperature. The
Maxim MAX6576 is an IC that measures temperature. The MAX6576 has a single wire output and
produces a square wave with a period that is proportional to temperature in degrees Kelvin. The
MAX6576 can operate from -40°C to +125°C. By connecting the TSO and TS1 inputs to ground or
Vcc in various combinations, the MAX6576 can be configured so that the period varies 10, 40, 160,
or 640µs per degree. In the configuration shown, the period will vary by 40µs per degree. At 25° C,
the period will be:
(25 + 273.15) x 40 = 11,926 microseconds, or 11.926ms
Say you connect this to an microprocessor using input capture mode. Let’s supose the
microprocessor is operating with a 4.096MHz crystal and using a prescaler of 256, so the timer gets
a clock of 4.096MHz/256, or 16,000Hz. The counter increments every 62.5 µs. For this application,
it doesn’t matter whether the input capture occurs on the rising or falling edge of the MAX6576
output.
How accurately can you measure temperature with this arrangement? Since the MAX6576
changes 40µs per degree and the clock to the counter is 16,00OHz, each increment of the counter
corresponds to 62.5/40 or 1.56 degrees. This is the best resolution you can get. If the temperature of
the sensor is 25°C, the captured count value will be 11,926/62.5 = 190.8. Since the counter can only
capture integral values, the actual count will be 190 (the .8 is dropped). For the count to be less than
190, the temperature must go to 23.7°C. Any changes between these two values cannot be read by
the microprocessor.
If we decide that this is insufficient accuracy for our application, we might change the
prescaler to 1, making the counter clock the same as the CPU clock, 4.096MHz. Now the counter
increments every 244.1ns, and the resolution is 244.1ns/40µs, or .0061 degrees per counter
increment. This is much better accuracy than the sensor itself has. What happens in this
configuration if the temperature goes from 25°C to 125°C? The count value will go from 11,926 to
15,926. This will result in a captured count of 65,232. The timer is 16 bits wide, so this is not a
problem, but it is very close to the 65,535 upper limit of the counter.
What happens at 125°C if we take the accuracy of the sensor itself into account? The
MAX6576 has a typical accuracy of 35°C at 125°C, but the maximum error is +5°C. This means
that, at 125°C, the output may actually indicate up to 130°C. At 130°C, the output period is
16126ms. This corresponds to a count value of 66,052, which means the timer we are using would
roll over from 65,535 to zero while sampling. The actual count that would be captured would be
517, indicating a much lower temperature than the MAX6576 is actually sensing.
There are several solutions to this specific problem: The timer prescaler could be changed,
the configuration of the MAX6576 could be changed, or even the microprocessor crystal could be
changed. You could leave the hardware as-is and handle the error in software by detecting the
rollover. The important point is to perform this type of analysis when you use timers in
microprocessor designs.
Another issue that arises from this example is that of sampling time. The system can only
sample the temperature at a rate equal to the period of the output. As the temperature goes up, the
time between samples also goes up. If several samples need to be averaged, the sampling rate goes
down proportionally. While a worstcase sample time of 16ms is probably not unreasonable for a
temperature measurement system, an analysis of the effects of sample time should be performed in
cases where the input rate of a signal affects it.
Motor Control - Say you have a DC motor that is part of a microprocessor controlsystem. The
motor has an encoder that produces 100 pulses per revolution, and the microprocessor must control
the speed of the motor from 10RPM to 2000RPM. Some undefined external source provides a
command to the microprocessor to set motor speed.
At 10RPM, the microprocessor will get pulses from the motor encoder at the following
frequency:
Re v Pulses 1 Min Pulses
10  100  = 16.6
Min Re v 60 Sec Sec
A similar calculation results in a frequency of 3333.33 pulses/sec at 2000RPM. If the input
capture hardware is configured to generate an interrupt when the input pulse occurs, then the
processor will get an interrupt every 60ms at lORPM, and every 300 ps at 2000 RPM.
Say we want to calculate motor speed by using a microcontroller with input capture capability to
measure the time between encoder pulses. If the input capture is measured with a 1MHz reference
clock, then the input capture registers will contain 1 MHz/16.6Hz or 60,024 at 10RPM. Similarly,
the registers will contain a value of 300 at 2000RPM.
The l00 count encoder produces one pulse every 3.6 degrees of rotation (360/100). This is
true at any motor speed. However, the input capture reference clock is fixed, so its accuracy (in
degrees of rotation) vanes with the motor speed. At 10RPM, each reference clock corresponds to:
EncoderPulses Deg 1 Sec
16.66  3.6  = 60 10 −6 DegreesPer Re ferenceClock
Sec EncoderPulse 1000000 Re ferenceClock
At 2000RPM, this becomes .012 degrees. While either of these is probably adequate for a
motor control application, the principle is important; at faster RPM, the accuracy of the reference
clock with respect to the input signal is less.
8. PWM Control
8.1 Examples and description
Pulse-width modulation control works by switching the power supplied to the motor on and off
very rapidly. The DC voltage is converted to a square-wave signal, alternating between fully on
(nearly 12V) and zero, giving the motor a series of power "kicks".
If the switching frequency is high enough, the motor runs at a steady speed due to its fly-wheel
momentum.
By adjusting the duty cycle of the signal (modulating the width of the pulse, hence the 'PWM') ie,
the time fraction it is "on", the average power can be varied, and hence the motor speed.
Advantages are,
1. The output transistor is either on or off, not partly on as with normal regulation, so less
power is wasted as heat and smaller heat-sinks can be used.
2. With a suitable circuit there is little voltage loss across the output transistor, so the top end of
the control range gets nearer to the supply voltage than linear regulator circuits.
3. The full-power pulsing action will run fans at a much lower speed than an equivalent steady
voltage.
Disadvantages:
1. Without adding extra circuitry, any fan speed signal is lost, as the fan electronics' power
supply is no longer continuous.
2. The 12V "kicks" may be audible if the fan is not well-mounted, especially at low revs. A
clicking or growling vibration at PWM frequency can be amplified by case panels.
3. Some authorities claim the pulsed power puts more stress on the fan bearings and windings,
shortening its life.
An oscillator is used to generate a triangle or sawtooth waveform (green line). At low frequencies
the motor speed tends to be jerky, at high frequencies the motor's inductance becomes significant
and power is lost. Frequencies of 30-200Hz are commonly used.
A potentiometer is used to set a steady reference voltage (blue line).
A comparator compares the sawtooth voltage with the reference voltage. When the sawtooth voltage
rises above the reference voltage, a power transistor is switched on. As it falls below the reference, it
is switched off. This gives a square wave output to the fan motor.
If the potentiometer is adjusted to give a high reference voltage (raising the blue line), the sawtooth
never reaches it, so output is zero. With a low reference, the comparator is always on, giving full
power.
A simple PWM consists of an 8-bit up/down counter that counts from 00 to FF, then back
down to 00 and an 8-bit comparator that compares the value in the 8-bit latch to the counter value.
When the two values are equal, the comparator clocks the “D” flipflop (again, timing logic makes
sure everything works correctly). If the counter is counting up, a “1” is clocked into the “D” flip-
flop. If the counter is counting down, a “0” is loaded. The flip-flop output is connected to one of the
microcontroller output pins.
Say the microprocessor writes a value of 0xFE into the latch. The counter counts from 00 to
FE, where the PWM output goes to “1” because the counter bits match the latched value. The
counter continues to FF, then back down through FE to zero. When the counter passes through FE,
the PWM output goes to zero. So in this case, the PWM output is high for two counts (FE and FF)
out of 256, or about .78 percent duty cycle. If the microprocessor writes 0xF0 to the latch, the PWM
output will be high from F0 to FF and back to F0, for a total of 30 counts or 11.7 percent duty cycle.
A more sophisticated PWM timer would include a second latch and comparator so the
counter can reverse direction at values other than FF. In such a timer, this comparator would set the
frequency of the PWM signal while the other comparator would set the duty cycle. Some
microprocessors provide other means to generate PWM. Some microcontrollers don’t use an
up/down counter but instead they provide two comparators. After the first count value is reached,
the counter is reset and the second comparator is used to indicate end-of-count. The output pin
indicates which comparator is being used so a PWM output can be generated by controlling the
ratios of the comparator values.
PWM Output
Similar considerations apply to timer outputs. If you are using an 8-bit timer to generate a
PWM signal, the output duty cycle can only be changed by one timer count, or 1 in 256. This results
in a duty cycle resolution of .3 percent. Note, though, that this applies only if the timer is allowed to
run a full 256 counts. If you are using an 8-bit timer but only 100 counts for the PWM period, then
one step is 1 percent of the total period. In this case, the best resolution you can get is I percent. This
is sufficient for many applications but is inadequate for others. In an application in which you vary
the PWM period and duty cycle, you need to be sure that the resolution at the fastest period (least
number of timer counts per cycle) is adequate for the application.
PWM Control
Pulse-width modulation control works by switching the power supplied to the motor on and off very
rapidly. The DC voltage is converted to a square-wave signal, alternating between fully on (nearly
12V) and zero, giving the motor a series of power "kicks".
If the switching frequency is high enough, the motor runs at a steady speed due to its fly-wheel
momentum.
By adjusting the duty cycle of the signal (modulating the width of the pulse, hence the 'PWM') ie,
the time fraction it is "on", the average power can be varied, and hence the motor speed.
Advantages are,
The output transistor is either on or off, not partly on as with normal regulation, so less power is
wasted as heat and smaller heat-sinks can be used.
With a suitable circuit there is little voltage loss across the output transistor, so the top end of the
control range gets nearer to the supply voltage than linear regulator circuits.
The full-power pulsing action will run fans at a much lower speed than an equivalent steady voltage.
Disadvantages:
Without adding extra circuitry, any fan speed signal is lost, as the fan electronics' power supply is no
longer continuous.
The 12V "kicks" may be audible if the fan is not well-mounted, especially at low revs. A clicking or
growling vibration at PWM frequency can be amplified by case panels. A way of overcoming this
by "blunting" the square-wave pulse is described in Application Note #58 from Telcom. (a 58k pdf
file, right-click to download). I've tried this, it works, but some of advantage #3 is lost.
Some authorities claim the pulsed power puts more stress on the fan bearings and windings,
shortening its life.
How It Works
An oscillator is used to generate a triangle or sawtooth waveform (green line). At low frequencies
the motor speed tends to be jerky, at high frequencies the motor's inductance becomes significant
and power is lost. Frequencies of 30-200Hz are commonly used.
A potentiometer is used to set a steady reference voltage (blue line).
A comparator compares the sawtooth voltage with the reference voltage. When the sawtooth voltage
rises above the reference voltage, a power transistor is switched on. As it falls below the reference, it
is switched off. This gives a square wave output to the fan motor.
If the potentiometer is adjusted to give a high reference voltage (raising the blue line), the sawtooth
never reaches it, so output is zero. With a low reference, the comparator is always on, giving full
power.
Principle
An example of PWM: the supply voltage (blue) modulated as a series of pulses results in a sine-like
flux density waveform (red) in a magnetic circuit of electromagnetic actuator. The smoothness of
the resultant waveform can be controlled by the width and number of modulated impulses (per given
cycle)
Fig. 1: a square wave, showing the definitions of ymin, ymax and D.
Pulse-width modulation uses a square wave whose pulse width is modulated resulting in the
variation of the average value of the waveform. If we consider a square waveform f(t) with a low
value ymin, a high value ymax and a duty cycle D (see figure 1), the average value of the waveform is
given by:
As f(t) is a square wave, its value is ymax for and ymin for . The above
expression then becomes:
This latter expression can be fairly simplified in many cases where ymin = 0 as . From
this, it is obvious that the average value of the signal ( ) is directly dependent on the duty cycle D.
Fig. 2: A simple method to generate the PWM pulse train corresponding to a given signal is the
intersective PWM: the signal (here the green sinewave) is compared with a sawtooth waveform
(blue). When the latter is less than the former, the PWM signal (magenta) is in high state (1).
Otherwise it is in the low state (0).
The simplest way to generate a PWM signal is the intersective method, which requires only a
sawtooth or a triangle waveform (easily generated using a simple oscillator) and a comparator.
When the value of the reference signal (the green sine wave in figure 2) is more than the modulation
waveform (blue), the PWM signal (magenta) is in the high state, otherwise it is in the low state.
Fig. 3 : Principle of the delta PWM. The output signal (blue) is compared with the limits (green).
These limits correspond to the reference signal (red), offset by a given value. Every time the output
signal reaches one of the limits, the PWM signal changes state.
Fig. 4 : Principle of the sigma-delta PWM. The top green waveform is the reference signal, on which
the output signal (PWM, in the middle plot) is subtracted to form the error signal (blue, in top plot).
This error is integrated (bottom plot), and when the integral of the error exceeds the limits (red
lines), the output changes state.
Fig. 5 : Three types of PWM signals (blue): leading edge modulation (top), trailing edge modulation
(middle) and centered pulses (both edges are modulated, bottom). The green lines are the sawtooth
signals used to generate the PWM waveforms using the intersective method.
Square wave is a unique function for many applications such as Pulse Width Modulation (PWM).
PWM is widely used in a variety of applications in measurement and digital controls. It offers a
simple method for digital control logic to create an analog equivalence.
The majority of microcontrollers today has built-in PWM capability that facilitates the
implementation of the control. Using PWM in communication systems is very popular due to the
fact that the digital signal is more robust and less vulnerable to noise.
8.2 Concepts of Pulse Width Modulation (PWM)
PWM is a method of digitally encoding analog signal levels. The duty cycle of a square wave is
modulated to encode a specific analog signal level using high-resolution counters. The PWM signal
is still a digital signal because at the given instant of time, the full DC supply is either fully on or
fully off.
The voltage or current source is supplied to the analog load by a repetitive series of ON and OFF
pulses. The ON time is the period when the DC supply is applied to the load, and the OFF time is
the period when the DC supply is switched off. If the available bandwidth is suffi cient, any analog
value can be encoded using PWM.
An analog signal has a continuously varying value, with infinite resolution in both time and
magnitude, and it can be used to control many electronic devices directly. For example, in a simple
analog radio, a knob is connected to a variable resistor. When turning the knob, the resistance goes
down or up, and the current fl owing through the resistor increases or decreases. Consequently, the
current that drives the speaker is changed proportionally, thus increasing or decreasing the volume.
Although analog control may be considered intuitive and simple, it is not always economically
attractive or practical. Analog circuits tend to drift over time and are very difficult to tune.
Problems solved by precision analog circuits can be large, heavy, and expensive.
Analog circuits tend to generate heat through the power dissipation. The power dissipated is
proportional to the voltage across the active elements, multiplied by the current that flows through it.
Analog circuitry can also be sensitive to noise because of its infinite resolution; even minor
perturbations of an analog signal can change its value.
By controlling analog circuits digitally, system costs and power consumption can be drastically
reduced. Many microcontrollers and digital signal processors (DSPs) already include PWM
controllers in the chip, thus making implementation easier.
Frequency and Duty Cycle
Figure 1 illustrates a circuit established using a battery, a switch and a LED. This circuit turns on the
LED for one second and then turns off the LED for one second using the switch control.
The LED is ON for 50% of the period and OFF the other 50%. The period is defined as the total
time it takes to complete one cycle (from OFF to ON state and back to OFF state).
The signal can be further characterized by the duty cycle, which is the ratio of the “ON” time
divided by the period. A high duty cycle generates a bright LED while a small duty cycle generates
a dimmer LED. The example shown in Figure 1 provides a 50% duty cycle.
In Figure 2, two waveforms with different frequencies produce the same amount of light. Note that
the
amount of light is independent from the frequency, but proportional to the duty cycle.
The frequency range you can use to control a circuit is limited by the number of response time to the
circuit.
In the example shown in Figure 1, a low frequency can cause the LED to flash noticeably. A high
frequency, in turn, can cause an inductive load to saturate.
For example, a transformer has a limited frequency range to transfer the energy efficiently. For some
designs, harmonics (or beat frequencies) of the PWM frequency can get coupled into the analog
circuitry, causing unwanted noise. If the right frequency is selected, the load being controlled will
act as a stabilizer, a light will glow continuously and the momentum will allow a rotor to turn
smoothly.
Generating PWM signals
The PWM signals are easy to generate using a comparator with a sine wave as one of the input
signals. Figure 3 shows a sample block diagram of an analog PWM generator.
Figures 4 and 5 show the PWM output waveform (red line) generated by a comparator with two
input signals: a sine wave (black line) and an input signal (gray line). The input signal of 0.5 VDC is
the voltage reference to be compared with the sine wave to produce a PWM waveform.
With the steady-state reference voltage of 0.5 VDC, a PWM waveform with 50% duty cycle is
generated.
If the reference voltage decreases to 0.25 VDC, the generated PWM waveform will have a higher
duty cycle, as shown in Figure 5
Advantages of Using PWM Application
PWM offers several advantages over an analog control. For example, using PWM to control the
brightness of a lamp, the heat dissipated from the lamp is less than the heat generated from an
analog control that converts the current to heat. Hence, less power is delivered to the load (light),
which will prolong the life cycle of the load.
With a higher frequency rate, the light (load) brightness can be controlled as smoothly as an analog
control.
Rotors can operate at a lower speed if they are controlled by PWM. Some of the rotors might not
function with low analog current. When an analog current controls a rotor, it will not produce
significant torque at low speed. The magnetic field that is created by the small current is insufficient
to turn the rotor. On the other hand, a PWM current can create short pulses of magnetic flux at full
strength that enables the rotor to turn at a slow speed.
Combining ON/OFF (1/0) states with the variety voltage and the duty cycle, PWM can output at a
desired voltage level. Thus, it can be used as voltage regulator for many applications. When the
desired voltage level is higher than the output voltage level, the state will be ON (1). On the other
hand, the state will be OFF (0) when the desired voltage level is lower than the output voltage level.
For example, PWM can be applied when CPLD is used for simple voltage regulation or with a
FPGA for complex control algorithms using its internal DSP blocks.
In addition, the entire control circuit can be digitized using the PWM technique. This eliminates the
need to use digital-to-analog converters in control circuitries. The digital control lines generated by
PWM reduce the susceptibility of your circuit to the interference.
The technology has become more pervasive as PWM controls are incorporated into low cost
microcontrollers. Microcontrollers offer simple commands to vary the duty cycle and frequencies of
the
PWM control signal. PWM is also widely used in the communications fi eld because the digital
signals are extremely immune to noise.
The popularity of PWM will continue to grow as the functionality becomes more popular in
microcontrollers and development tools. Hence, having profound knowledge of PWM will make it
easier to incorporate in your designs and works.
In addition, when working on a PWM design, a U1252A handheld DMM can be a great tool for
creating a waveform.
Glossary
Duty cycle – the percentage of time of a pulse train at its higher voltage
Period – total time taken before the signal repeats
Pulse width – total time during which the pulse is in the “true state”
8.3 PWM Study

În cele ce urmează ne propunem descrierea in detaliu a controlului PWM.
1. De ce PWM
Pentru a conduce sisteme continuale este necesar sa furnizăm semnale de control continue în timp.
În practica reglării numerice aceasta se face folosind convertoare numeric-analogice (CNA). Această
opţiune este relativ scumpă, iar în practica sistemelor embedded este evitată. PWM s-a impus ca o
metodă de generare a unor semnale de control pentru instalaţii continuale folosind ieşiri numerice,
disponibile în număr mare pe orice micrcontroller.
Deşi ieşirile numerice oferă numai informaţie logică, variabila timp este cea folosită de
implementarea PWM pentru emularea unui semnal analogic.
Pentru exemplificare, vom efectua nişte experimente în Matlab/Simulink
comanda 1
numerica
s+1
Pulse Transfer Fcn
Generator
Scope
comanda 1
0.5 continuala
s+1
Constant Transfer Fcn 1
Modelul de mai sus implementează cele 2 tipuri de control. Pe de o parte avem de-aface cu un
generator de impulsuri (configurabil) care implementează o comandă PWM. Pe de altă parte, acelaşi
sistem este comandat de o mărime de comandă continuală, o mărime de comandă constantă care
reprezintă valoarea factorului de umplere (Duty Cycle, Pulse Width). În exemplul de mai sus,
valoarea este 0.5 (deoarece Pulse Width = 50%)
2. Cum se configurează un PWM

După cum se observă din interfaţa de dialog a generatorului de impulsuri, există 3 parametri
imprtanţi:
a. amplitudinea impulsului
b. perioada impulsului
c. laţimea impulsului (Pulse Width, factorul de umplere)
Rezultatul unei simulari este:
a. Amplitudinea impulsului este data de valoarea High a ieşirii numerice, fiind un dat
constructiv. Teoretic, reprezintă valoarea maximă a comenzii aplicate instalaţiei. Cu alte cuvinte,
daca vom alege un factor de umplere de 100%, vom aplica o marime continuă cu amplitudinea 1.
b. Perioada impulsului reprezintă un parametru al PWM care trebuie adaptat la dinamica
sistemului controlat. În exemplul nostru, se observă ca distanţa în timp între valorile extreme ale
oscilaţiilor este de 2 secunde. Dacă vom păstra ceilalţi parametri dar vom reduce perioada la 0.4 sec,
vom obţine:
Cu alte cuvinte, prin reducerea perioadei PWM, sistemul condus va avea amplitudini ale oscilaţiilor
mai mici, dar la frecvenţe mai mari, cu consecinţe asupra elementelor de execuţie. Pentru a elimina
acest neajuns, se preferă introducerea unui filtru la intrarea instalaţiei:
comanda 1 1
numerica
.2s+1 s+1
Pulse Generator Filtru PWM Transfer Fcn
Pulse Width = 50 %
PWM period = 0.4
Scope
comanda 1
0.5 continuala
s+1
Acest filtru PWM trebuie sa nu influenţeze dinamica sistemului comandat, deci constanta sa de timp
trebuie sa fie mai mică (de 5 ori în cazul nostru) decât a sistemului comandat.
Cea mai bună reţetă este să alegem un filtru cu o constantă de timp de 10 ori mai mică decât a
sistemului comandat (dinamica sistemului controlat nu este influenţată prea mult de filtrul PWM),
iar perioada PWM să fie de 10 ori mai mică decât a filtrului (buna filtrare a PWM), deci de 100 de
ori mai mică decât a sistemului comandat:
comanda 1 1
numerica
.1s+1 s+1
Pulse Generator Filtru PWM Transfer Fcn
Pulse Width = 50 %
PWM period = 0.01
Scope
comanda 1
0.5 continuala
s+1
Se observă cum oscilaţiile date de PWM a dispărut.
De fapt, analiza Fourier ne oferă răspunsurile la problematica de mai sus.
Semnalul PWM x(t) periodic se poate reprezenta (sintetiza) astfel:


x PWM ( t ) = C0 +  C n cos(nPWM t ) + Sn sin( nPWM t )
n =1
2
unde PWM = 2f PWM = .
TPWM
Formulele de calcul ale coeficienţilor în acest caz sunt date de:
1
C0 =
TPWM x
TPWM
PWM ( t )dt
care reprezintă aşa-numita componentă continuă a semnalului, şi

2
Cn =
TPWM x
TPWM
PWM ( t ) cos(n  PWM  t )dt
2
Sn =
TPWM x
TPWM
PWM (t ) sin( n   PWM  t )dt
TPWM
Prin  (*)dt înţelegem  (*)dt iar

TPWM 0
TPWM este perioada semnalului periodic xPWM(t).
Bode Diagram
0
-5
-10
-15
Magnitude (dB)
-20
-25
-30
-35
-40
0
Phase (deg)
-45
-90
-2 -1 0 1 2
10 10 10 10 10
Frequency (rad/sec)
În diagrama Bode de mai sus, am reprezentat caracteristica de frecvenţă a instalaţiei aperiodice, în
care am considerat PWM = 10−1 [rad / sec] . Se observă cum caracteristica de frecvenţă a instalaţiei va
lăsa sa treacă la ieşire primele 16 frecvenţe din deszoltarea Fourier.
Dacă vom mări frecvenţa PWM-ului, de ex. la valoarea PWM = 4  10−1 [rad / sec] , mai puţine frecvenţe
se vor regăsi la ieşirea instalaţiei.
Bode Diagram
0
-5
-10
-15
Magnitude (dB)
-20
-25
-30
-35
-40
0
Phase (deg)
-45
-90
-2 -1 0 1 2
10 10 10 10 10
Frequency (rad/sec)
Dacă vom mari valoarea frecvenţei PWM la PWM = 10[rad / sec] , vom observa că instalaţia va filtra
practic toate frecvenţele din dezvoltarea Fourier, la ieşire regăsindu-se numai componenta continuă,
care se foloseşte pentru comandă
Bode Diagram
0
-5
-10
-15
Magnitude (dB)
-20
-25
-30
-35
-40
0
Phase (deg)
-45
-90
-2 -1 0 1 2
10 10 10 10 10
Frequency (rad/sec)
c. Pulse Width (factorul de umplere) al PWM acţionează asupra valorii medii a comenzii. Un
Pulse Width de 70% va conduce sistemul la un regim staţionar variabil în jurul valorii 0.7.
1
Într-adevăr, cum C0 =
TPWM x
TPWM
PWM ( t )dt este componenta continuă din dezvoltarea Fourier, asupra
oricărui sistem va acţiona această componentă împreună cu celelate frcvenţe multiplu ale frecvenţei
de bază. Dacă aceste frecvenţe sunt filtrate de către instalaţie (cum este cazul de mai sus), atunci
asupra sistemului controlat va acţiona numai componenta continuă, exact ca în cazul unui CNA.
comanda 1
numerica
s+1
Pulse Transfer Fcn
Generator
Pulse width = 70 %
PWM period = 0.1
Scope
comanda 1
0.7 continuala
s+1
3. Cum se implementează un sistem de reglare cu PWM?

Principial, PWM se implementează pentru a emula un CAN care lipseşte în aplicaţiile embedded.
Schema generală a unui sistem de reglare cu comandă PWM este prezentată mai jos.
Scalare Factor Umplere

0%...100% (Duty Cycle)
Sistem comandat
Generator PWM cu PWM
Regulator Emulare
(PID) CAN
Algoritmul de reglare numeric, de ex. un PID, va modula unul dintre parametrii generatorului PWM.
Cum amplitudinea PWM este un parametru fixat tehnologic (amplitudine TTL pentru semnal
numeric: 0/5V) parametrii folositi în modulaţie sunt Pulse Width (latimea impulsului, de unde şi
numele PWM), precum si perioada (frecventa) impulsului PWM. Comanda regulatorului se scalează
pentru a oferi un factor de umplere 0%...100% către generatorul PWM.
4. Cum se alege perioada de eşantionare pentru un SRA bazat pe PWM?

Teoretic, cănd implementăm un SRA pe un sistem embedded, algoritmul de reglare se calculează
din modelul dinamic al instalaţiei tehnologice pe baza unei anumite prerioade de eşantionare. Din
p.d.v. al implementării, practic nu există nici o legătură între perioada PWM (TPWM) şi perioada de
eşantionare Te.
Constructiv, generatorul PWM se informează la începutul perioadei de valoarea factorului de
umplere, după care generează semnalul PWM. Procesul se repetă cu o periodicitate dictată de TPWM.
Se pot face anumite consideraţii:
1. Cazul TPWM > Te
TPWM
Te
În acest caz, sistemul de reglare citeşte valorile erorii şi calculează comanda cu o frecvenţă mai mare
decât a PWM-ului. Se observă că anumite valori ale comenzii sunt calculate inutil de către regulator,
deoarece valoarea PWM nu se schimba decat la momente multiplu al perioadei PWM.
2. TPWM < Te
Te
TPWM
În acest caz, PWM se informează inutil câteodată despre valoare afactorului de umplere, care se
modifică cu o frecvenţă mai mică.
Rezultă că cel mai favorabil caz este acela în care Te = TPWM. În acest caz, fiecare comandă calculată
de regulator va influenţa, cu o întârziere constantă, valoarea PWM.
În celelalte cazuri, se observă o variaţie a momentelor la care comanda de ieşire se poate modifica
efectiv (jitter) faţă de momentele k*Te la care comenzile se calculează, fapt care poate influenţa
negativ calitatea reglării.
Te
TPWM
9. DAC and ADC
9.1 Digital-to-Analog Converters (DAC)

A DAC is a hardware device that takes a set of bits, typically from a processor, as input and
produces an analog signal proportional to the digital input as output. Digital to analog converters
might be as simple as an array of resistors configured in the typical 'R-2R' fashion or a hybrid
module that generates very precise results with many bits of resolution.
In an ideal DAC, the numbers are output as a sequence of impulses that are then filtered by a
reconstruction filter. This would, in principle, reproduce a sampled signal precisely up to the
Nyquist frequency, although in practice a perfect reconstruction filter cannot be practically
constructed as it has infinite phase delay; and there are errors due to quantisation.
The Pulse Width Modulator is the simplest DAC. A stable current or voltage is switched into
a low pass analog filter with a duration determined by the digital input code. This technique is often
used for electric motor speed control, and is now becoming common in high-fidelity audio.
DACs are at the beginning of the analog signal chain, which makes them very important to
system performance. The most important characteristics of these devices are:
- Resolution: This is the number of possible output levels the DAC is designed to reproduce.
This is usually stated as the number of bits it uses, which is the base two logarithm of the number of
levels. For instance a 1 bit DAC is designed to reproduce 2 (21) levels while an 8 bit DAC is
designed for 256 (28) levels. Resolution is related to the Effective Number of Bits which is a
measurement of the actual resolution attained by the DAC.
- Maximum sampling frequency: This is a measurement of the maximum speed at which the
DACs circuitry can operate and still produce the correct output. As stated in the Shannon-Nyquist
sampling theorem, a signal must be sampled at over twice the frequency of the desired signal. For
instance, to reproduce signals in the entire audible spectrum, which includes frequencies of up to 20
kHz, it is necessary to use DACs that operate at over 40 kHz.
- Monotonicity: This refers to the ability of DACs analog output to increase with an increase
in digital code or the converse. This characteristic is very important for DACs used as a low
frequency signal source or as a digitally programmable trim element.
- THD+N: This is a measurement of the distortion and noise introduced to the signal by the
DAC. It is expressed as a percentage of the total power of unwanted harmonic distortion and noise
that accompany the desired signal. This is a very important DAC characteristic for dynamic and
small signal DAC applications.
- Dynamic range: This is a measurement of the difference between the largest and smallest
signals the DAC can reproduce expressed in Decibels. This is usually related to DAC.
9.2 Analog-to-Digital Converters (ADC)

Analog-todigital converters (ADCs) do the exact opposite of DACs-they output a binary
word that is a digital representation of an analog voltage or current. An 8-bit ADC converts an input
into 256 steps. A 10-bit ADC produces 1024 steps. DACs and ADCs interface to a microprocessor
just like other peripheral ICs. Parts are available with different bus interface types, including SPI
and 1%. While the microprocessor side of a DAC or an ADC is the same as other parts, there are
some special considerations when dealing with these analog devices, which we’ll discuss in this
section.
9.2.1 Reference Voltage

The reference voltage is the maximum value that the ADC or DAC can convert. An 8-bit
ADC can convert values from 0V to the reference voltage. This voltage range is divided into 256
values, or steps. The size of the step is given by the following equation:
Re ferenceVoltage 5V
= = .0195Vor19.5mV
256 256
This is the step size of the converter. It also defines the converter’s resolution. Note that no
ADC or DAC can be more accurate than its reference. If your reference is a zener diode with a 10
percent tolerance, it doesn’t matter how many bits of resolution you have, your product will have a
10 percent variation between units unless you perform some kind of calibration as part of
production.
Some microcontrollers have internal ADCs. Many of these permit you to provide an external
reference, or they let you use the supply voltage as the reference. This typically frees the reference
pin for use as another analog input.
If the supply voltage is used as a reference and the supply voltage is 5V, measuring a 3V
input would produce the following result:
Digital word = (Vin/Vref) x 255 = (3V/5V) x 255 = 15310 = 9916
However, the result depends on the value of the 5V supply. If the supply voltage is high by 1
percent, it has a value of 5.05V. Now the value of the A/D conversion will be: (3V/5.05V)
x 255 = 15110 = 9716
So a 1 percent change in the supply voltage causes the conversion result to change by two
counts. Typical power supplies can vary by 2 or 3 percent, so power supply variations can have a
significant effect on the results. The power supply output can vary with loading, especially if there is
any significant drop in the cabling that connects the power supply to the microprocessor board.
Thus, if your design needs all the analog inputs and cannot use an external reference, be sure power
supply variations will not cause accuracy problems. One way to minimize such errors is to power
the measured signal from the microcontroller supply.
9.2.1 Resolution
The resolution of an ADC or DAC is determined by the reference input and by the word
width. The resolution defines the smallest voltage change that can be converted. As mentioned
earlier, the resolution is the same as the smallest step size and can be calculated by dividing the
reference voltage by the number of possible conversion values.
For the example we’ve been using so far, an 8-bit ADC with a 5V reference, the resolution is
.0195V (19.5mV). This means that any input voltage below 19.5mV will result in an output of zero.
Input voltages between 19.5 mV and 39 mVwill result in an output of 1. Between 39 mV and 58.6
mV, the output will be 3.
Resolution can be improved by reducing the reference input. Changing from 5V to 2.5V
gives a resolution of 2.5/256, or 9.7mV. However, the maximum voltage that can be measured is
now 2.5V instead of 5V.
The only way to increase resolution without changing the reference is to use an ADC with
more bits. A 10-bit ADC using a 5V reference has 21°, or 1024 possible output codes. Thus, the
resolution is 5\3/1024, or 4.88mV.
The resolution also has implications for system design, especially in the area of noise. A 0-
to-5V, 10-bit ADC with 4.88mV resolution will respond to 4.88mV of noise just like it will to a DC
input of 4.88mV. If your input signal has 10mV of noise, you will not get anything like 10 bits of
precision unless you take a number of samples and average them. This means you either have to
insure a very quiet input or allow time for multiple samples.
10. Communication
10.1 UART
Serial transmission of digital information (bits) through a single wire or other medium is
much more cost effective than parallel transmission through multiple wires. A UART is used to
convert the transmitted information between its sequential and parallel form at each end of the link.
Each UART contains a shift register which is the fundamental method of conversion between serial
and parallel forms.
The UART usually does not directly generate or receive the external signals used between
different items of equipment. Typically, separate interface devices are used to convert the logic level
signals of the UART to and from the external signaling levels.
10.1.1 Synchronous Serial Transmission

Synchronous serial transmission requires that the sender and receiver share a clock with one
another, or that the sender provide a strobe or other timing signal so that the receiver knows when to
“read” the next bit of the data. In most forms of serial Synchronous communication, if there is no
data available at a given instant to transmit, a fill character must be sent instead so that data is
always being transmitted. Synchronous communication is usually more efficient because only data
bits are transmitted between sender and receiver, and synchronous communication can be more
costly if extra wiring and circuits are required to share a clock signal between the sender and
receiver.
A form of Synchronous transmission is used with printers and fixed disk devices in that the
data is sent on one set of wires while a clock or strobe is sent on a different wire. Printers and fixed
disk devices are not normally serial devices because most fixed disk interface standards send an
entire word of data for each clock or strobe signal by using a separate wire for each bit of the word.
The standard serial communications hardware in the PC does not support Synchronous operations.
10.1.2 Asynchronous Serial Transmission
Asynchronous transmission allows data to be transmitted without the sender having to send a
clock signal to the receiver. Instead, the sender and receiver must agree on timing parameters in
advance and special bits are added to each word which is used to synchronize the sending and
receiving units.
When a word is given to the UART for Asynchronous transmissions, a bit called the "Start
Bit" is added to the beginning of each word that is to be transmitted. The Start Bit is used to alert the
receiver that a word of data is about to be sent, and to force the clock in the receiver into
synchronization with the clock in the transmitter. These two clocks must be accurate enough to not
have the frequency drift by more than 10% during the transmission of the remaining bits in the
word. After the Start Bit, the individual bits of the word of data are sent, with the Least Significant
Bit (LSB) being sent first. Each bit in the transmission is transmitted for exactly the same amount of
time as all of the other bits, and the receiver “looks” at the wire at approximately halfway through
the period assigned to each bit to determine if the bit is a 1 or a 0. For example, if it takes two
seconds to send each bit, the receiver will examine the signal to determine if it is a 1 or a 0 after one
second has passed, then it will wait two seconds and then examine the value of the next bit, and so
on.
The sender does not know when the receiver has “looked” at the value of the bit. The sender
only knows when the clock says to begin transmitting the next bit of the word.
When the entire data word has been sent, the transmitter may add a Parity Bit that the
transmitter generates. The Parity Bit may be used by the receiver to perform simple error checking.
Then at least one Stop Bit is sent by the transmitter.
When the receiver has received all of the bits in the data word, it may check for the Parity
Bits (both sender and receiver must agree on whether a Parity Bit is to be used), and then the
receiver looks for a Stop Bit. If the Stop Bit does not appear when it is supposed to, the UART
considers the entire word to be garbled and will report a Framing Error to the host processor when
the data word is read. The usual cause of a Framing Error is that the sender and receiver clocks were
not running at the same speed, or that the signal was interrupted.
Regardless of whether the data was received correctly or not, the UART automatically
discards the Start, Parity and Stop bits. If the sender and receiver are configured identically, these
bits are not passed to the host.
If another word is ready for transmission, the Start Bit for the new word can be sent as soon
as the Stop Bit for the previous word has been sent.
Because asynchronous data is “self synchronizing,” if there is no data to transmit, the
transmission line can be idle.
10.2 RS232
Due to it’s relative simplicity and low hardware overhead (as compared to parallel
interfacing), serial communications is used extensively within the electronics industry. Today, the
most popular serial communications standard in use is certainly the EIA/TIA–232–E specification.
This standard, which has been developed by the Electronic Industry Association and the
Telecommunications Industry Association (EIA/TIA), is more popularly referred to simply as “RS–
232” where “RS” stands for “recommended standard”. In recent years, this suffix has been replaced
with “EIA/TIA” to help identify the source of the standard.
The official name of the EIA/TIA–232–E standard is “Interface Between Data Terminal
Equipment and Data Circuit–Termination Equipment Employing Serial Binary Data Interchange”.
Although the name may sound intimidating, the standard is simply concerned with serial data
communication between a host system (Data Terminal Equipment, or “DTE”) and a peripheral
system (Data Circuit–Terminating Equipment, or “DCE”).
The EIA/TIA–232–E standard which was introduced in 1962 has been updated four times
since its introduction in order to better meet the needs of serial communication applications. The
letter “E” in the standard’s name indicates that this is the fifth revision of the standard.
RS–232 SPECIFICATIONS
RS–232 is a “complete” standard. This means that the standard sets out to ensure
compatibility between the host and peripheral systems by specifying 1) common voltage and signal
levels, 2) scommon pin wiring configurations, and 3) a minimal amount of control information
between the host and peripheral systems. Unlike many standards which simply specify the electrical
characteristics of a given interface, RS–232 specifies electrical, functional, and mechanical
characteristics in order to meet the above three criteria.
Because the functional characteristics of the interface are covered by the standard this
essentially means that RS–232 has defined the function of the different signals that are used in the
interface. These signals are divided into four different categories: common, data, control, and
timing. Table 1 illustrates the signals that are defined by the RS–232 standard.
Table 1. – RS 232 Defined Signals
As can be seen from the table there is an overwhelming number of signals defined by the
standard. The standard provides an abundance of control signals and supports a primary and
secondary communications channel. Fortunately few applications, if any, require all of these defined
signals. For example, only eight signals are used for a typical modem. Some simple applications
may require only four signals (two for data and two for handshaking) while others may require only
data signals with no handshaking.
The third area covered by RS–232 concerns the mechanical interface. In particular, RS–232
specifies a 25–pin connector. This is the minimum connector size that can accommodate all of the
signals defined in the functional portion of the standard. The pin assignment for this connector is
shown in Figure 1.6. The connector for DCE equipment is male for the connector housing and
female for the connection pins. Likewise, the DTE connector is a female housing with male
connection pins. Although RS–232 specifies a 25–position connector, it should be noted that often
this connector is not used. This is due to the fact that most applications do not require all of the
defined signals and therefore a 25–pin connector is larger than necessary. This being the case, it is
very common for other connector types to be used. Perhaps the most popular is the 9–position DB9S
connector which is also illustrated in Figure 1.6. This connector provides the means to transmit and
receive the necessary signals for modem applications, for example. This will be discussed in more
detail later.
Figure 1.6 – RS 232 Connector Pin Assignments
Most systems designed today do not operate using RS–232 voltage levels. Since this is the
case, level conversion is necessary to implement RS–232 communication. Level conversion is
performed by special RS–232 IC’s. These IC’s typically have line drivers that generate the voltage
levels required by RS–232 and line receivers that can receive RS–232 voltage levels without being
damaged. These line drivers and receivers typically invert the signal as well since a logic 1 is
represented by a low voltage level for RS–232 communication and likewise a logic 0 is represented
by a high logic level. Figure 1.7 illustrates the function of an RS–232 line driver/receiver in a typical
modem application. In this particular example, the signals necessary for serial communication are
generated and received by the Universal Asynchronous Receiver/Transmitter (UART).
The RS–232 line driver/receiver IC performs the level translation necessary between the
CMOS/TTL and RS–232 interface. The UART just mentioned performs the “overhead” tasks
necessary for asynchronous serial communication. For example, the asynchronous nature of this
type of communication usually requires that start and stop bits be initiated by the host system to
indicate to the peripheral system when communication will start and stop. Parity bits are also often
employed to ensure that the data sent has not been corrupted. The UART usually generates the start,
stop, and parity bits when transmitting data and can detect communication errors upon receiving
data. The UART also functions as the intermediary between byte–wide (parallel) and bit–wide
(serial) communication; it converts a byte of data into a serial bit stream for transmitting and
converts a serial bit stream into a byte of data when receiving.
Figure 1.7 – Typical RS 232 Modem Application
Now that an elementary explanation of the TTL/CMOS to RS–232 interface has been
provided we can consider some “real world” RS–232 applications. It has already been noted that
RS–232 applications rarely follow the RS–232 standard precisely. Perhaps the most significant
reason this is true is due to the fact that many of the defined signals are not necessary for most
applications. As such, the unnecessary signals are omitted. Many applications , such as a modem,
require only nine signals (two data signals, six control signals, and ground). Other applications may
require only five signals (two for data, two for handshaking, and ground), while others may require
only data signals with no handshake control. We will begin our investigation of “real world”
implementations by first considering the typical modem application.
RS–232 IN MODEM APPLICATIONS

Modem applications are one of the most popular uses for the RS–232 standard. Figure 1.8
illustrates a typical modem application utilizing the RS–232 interface standard. As can be seen in the
diagram, the PC is the DTE and the modem is the DCE. Communication between each PC and its
associated modem is accomplished using the RS–232 standard. Communication between the two
modems is accomplished via telecommunication. It should be noted that although a microcomputer
is usually the DTE in RS–232 applications, this is not mandatory according to a strict interpretation
of the standard.
Figure 1.8 – Modem Communication Between Two PC’s
10.3 Serial Peripheral Interface

The Serial Peripheral Interface Bus or SPI bus is a synchronous serial data link standard
named by Motorola that operates in full duplex mode. Devices communicate in master/slave mode
where the master device initiates the data frame. Multiple slave devices are allowed with individual
slave select (chip select) lines. Sometimes SPI is called a "four wire" serial bus, contrasting with
three, two, and one wire serial buses.
In the standard configuration for a slave device (see Figure 1.9), two control and two data
lines are used. The data output SDO serves on the one hand the reading back of data, offers however
also the possibility to cascade several devices. The data output of the preceding device then forms
the data input for the next IC.
Figure 1.9 – SPI slave
There is a MASTER and a SLAVE mode. The MASTER device provides the clock signal
and determines the state of the chip select lines, i.e. it activates the SLAVE it wants to communicate
with. CS and SCKL are therefore outputs.
The SLAVE device receives the clock and chip select from the MASTER, CS and SCKL are
therefore inputs.
This means there is one master, while the number of slaves is only limited by the number of
chip selects.
A SPI device can be a simple shift register up to an independent subsystem. The basic
principle of a shift register is always present. Command codes as well as data values are serially
transferred, pumped into a shift register and are then internally available for parallel processing.
Here we already see an important point, which must be considered in the philosophy of SPI bus
systems: The length of the shift registers is not fixed, but can differ from device to device. Normally
the shift registers are 8Bit or integral multiples of it. Of course there also exist shift registers with an
odd number of bits. For example two cascaded 9Bit EEPROMs can store 18Bit data.
If a SPI device is not selected, its data output goes into a high-impedance state (hi-Z), so that
it does not interfere with the currently activated devices. When cascading several SPI devices, they
are treated as one slave and therefore connected to the same chip select.
Thus there are two meaningful types of connection of master and slave devices. Figure 1.10 shows
the type of connection for cascading several devices.
Figure 1.10 – Cascading several SPI devices
In Figure 1.10 the cascaded devices are evidently looked at as one larger device and receive
therefore the same chip select. The data output of the preceding device is tied to the data input of the
next, thus forming a wider shift register.
If independent slaves are to be connected to a master another bus structure has to be chosen,
as shown in Figure 1.11. Here, the clock and the SDI data lines are brought to each slave. Also the
SDO data lines are tied together and led back to the master. Only the chip selects are separately
brought to each SPI device.
Figure 1.11 – Master with independent slaves
Last not least both types may be combined.

It is also possible to connect two micro controllers via SPI. For such a network, two protocol
variants are possible. In the first, there is only one master and several slaves and in the second, each
micro controller can take the role of the master. For the selection of slaves again two versions would
be possible but only one variant is supported by hardware. The hardware supported variant is with
the chip selects, while in the other the selection of the slaves is done by means of an ID packed into
the frames. The assignment of the IDs is done by software. Only the selected slave drives its output,
all other slaves are in high-impedance state. The output remains active as long as the slave is
selected by its address.
The first variant, named single-master protocol resembles the normal master-slave
communication. The micro controller configured as a slave behaves like a normal peripheral device.
The second possibility works with several masters and is therefore named multi-master
protocol. Each micro processor has the possibility to take the roll of the master and to address
another micro processor. One controller must permanently provide a clock signal. The MC68HC11
provides hardware error recognition, useful in multiple-master systems. There are two SPI system
errors. The first occurs if several SPI devices want to become master at the same time. The other is a
collision error that occurs for example when SPI devices work with different polarities. More details
can be found in the MC68HC11 manual.
The SPI requires two control lines (CS and SCLK) and two data lines (SDI and SDO).
Motorola names these lines MOSI (Master-Out-Slave-In) and MISO (Master-In-Slave-Out). The
chip select line is named SS (Slave-Select).
With CS (Chip-Select) the corresponding peripheral device is selected. This pin is mostly
active-low. In the unselected state the SDO lines are hi-Z and therefore inactive. The master decides
with which peripheral device it wants to communicate. The clock line SCLK is brought to the device
whether it is selected or not. The clock serves as synchronization of the data communication.
The majority of SPI devices provide these four lines. Sometimes it happens that SDI and
SDO are multiplexed, for example in the temperature sensor LM74 from National Semiconductor or
that one of these lines is missing. A peripheral device which must or cannot be configured, requires
no input line, only a data output. As soon as it gets selected it starts sending data. In some ADCs
therefore the SDI line is missing (e.g. MCCP3001 from Microchip).
There are also devices that have no data output. For example LCD controllers (e.g. COP472-
3 from National Semiconductor), which can be configured, but cannot send data or status messages.
Because there is no official specification, what exactly SPI is and what not, it is necessary to
consult the data sheets of the components one wants to use. Important are the permitted clock
frequencies and the type of valid transitions.
There are no general rules for transitions where data should be latched. Although not
specified by Motorola, in practice four modes are used. These four modes are the combinations of
CPOL and CPHA. In table 2, the four modes are listed.
SPI-mode CPOL CPHA
0 0 0
1 0 1
2 1 0
3 1 1
Table 2 – SPI Modes
If the phase of the clock is zero, i.e. CPHA = 0, data is latched at the rising edge of the clock
with CPOL = 0, and at the falling edge of the clock with CPOL = 1. If CPHA = 1, the polarities are
reversed. CPOL = 0 means falling edge, CPOL = 1 rising edge.
The micro controllers from Motorola allow the polarity and the phase of the clock to be adjusted. A
positive polarity results in latching data at the rising edge of the clock. However data is put on the
data line already at the falling edge in order to stabilize. Most peripherals which can only be slaves,
work with this configuration. If it should become necessary to use the other polarity, transitions are
reversed.
10.4 Local Interconnect Network (LIN)
Many mechanical components in the automotive sector have been replaced or are now being
replaced by intelligent mechatronical systems. A lot of wires are needed to connect these
components. To reduce the amount of wires and to handle communications between these systems,
many car manufacturers have created different bus systems that are incompatible with each other.
In order to have a standard sub-bus, car manufacturers in Europe have formed a consortium
to define a new communications standard for the automotive sector. The new bus, called LIN bus,
was invented to be used in simple switching applications like car seats, door locks, sun roofs, rain
sensors, mirrors and so on.
The LIN bus is a sub-bus system based on a serial communications protocol. The bus is a
single master / multiple slave bus that uses a single wire to transmit data. To reduce costs,
components can be driven without crystal or ceramic resonators. Time synchronization permits the
correct transmission and reception of data. The system is based on a UART / SCI hardware interface
that is common to most microcontrollers.
The bus detects defective nodes in the network. Data checksum and parity check guarantee
safety and error detection.
Features and possibilities

The LIN is a serial communications protocol which efficiently supports the control of
mechatronics nodes in distributed automotive applications. The main properties of the LIN bus are:
• Single master with multiple slave’s concept
• Low cost silicon implementation based on common UART/SCI interface hardware, an
equivalent in software or as pure state machine.
• Self synchronization without a quartz or ceramics resonator in the slave nodes
• Deterministic signal transmission with signal propagation time computable in advance
• Low cost single-wire implementation
• Speed up to 20 Kbit/s.
• Signal based application interaction
• Predictable behavior
• Reconfigurability
• transport layer and diagnostic support
Concept of operation
A cluster consists of one master task and several slave tasks. A master node contains the
master task as well as a slave task. All other slave nodes contain a slave task only. A node may
participate in more than one cluster. The term node relates to a single bus interface of a node if the
node has multiple bus interfaces. A sample cluster with one master node and two slave nodes is
depicted below:
Figure 1.12 – LIN sample cluster

The master task decides when and which frame shall be transferred on the bus. The slave
tasks provide the data transported by each frame.
Both the master task and the slave task are parts of the Frame handler.
Frames
A frame consists of a header (provided by the master task) and a response (provided by a
slave task).
The header consists of a break field and sync field followed by a frame identifier. The frame
identifier uniquely defines the purpose of the frame. The slave task appointed for providing the
response associated with the frame identifier transmits it, as depicted below. The response consists
of a data field and a checksum field.
The slave tasks interested in the data associated with the frame identifier receives the
response, verifies the checksum and uses the data transported.
This results in the following desired features:

- System flexibility: Nodes can be added to the LIN cluster without requiring hardware or
software changes in other slave nodes.
- Message routing: The content of a message is defined by the frame identifier (similar to
CAN).
- Multicast: Any number of nodes can simultaneously receive and act upon a single frame.
Schedule table
The master task (in the master node) transmits headers based on a schedule table. The
schedule table specifies the frames and the interval between the start of a frame and the start of the
following frame. The master application may use different schedule tables and select among them.
Signal Management
A signal is transported in the data field of a frame.
Signal Types
A signal is either a scalar value or a byte array.
A scalar signal is between 1 and 16 bits long. A one bit scalar signal is called a Boolean
signal. Scalar signals in the size of 2 to 16 bits are treated as unsigned integers.
A byte array is an array of between one and eight bytes.
Each signal has exactly one publisher, i.e. it is always written by the same node in the
cluster. Zero, one or multiple nodes may subscribe to the signal.
All signals have initial values. The initial value for a published signal is valid until the node
writes a new value to this signal. The initial value for a subscribed signal is valid until a new
updated value is received from another node.
Signal Consistency
Scalar signal writing or reading must be atomic operations, i.e. it should never be possible for
an application to receive a signal value that is partly updated. This also applies to byte arrays.
However, no consistency is guaranteed between any signals.
Signal Packing
A signal is transmitted with the LSB first and the MSB last. There is no restriction on
packing scalar signals over byte boundaries. Each byte in a byte array shall map to a single frame
byte starting with the lowest numbered data byte.
Several signals can be packed into one frame as long as they do not overlap each other.
Note that signal packing/unpacking is implemented more efficient in software based nodes if
signals are byte aligned and/or if they do not cross byte boundaries.
The same signal can be packed into multiple frames as long as the publisher of the signal is
the same. If a node is receiving one signal packed into multiple frames the latest received signal
value is valid. Handling the same signal packed into frames on different LIN clusters is out of the
scope.
Signal Reception and Transmission

The point in time when a signal is transmitted/received needs to be defined to help design
tools and testing tools to analyze timing of signals. This means that all implementations will behave
in a predictable way.
The definitions below do not contain factors such as bit rate tolerance, jitter, buffer copy
execution time, etc. These factors must be taken into account to get a more detailed analysis. The
intention for the definitions below is to have a basis for such analysis.
The timing is different for a master node and a slave node. The reason is that the master node
controls the schedule and knows which frame will be sent. A slave node gets this information first
when the header is transmitted on the bus.
A signal is considered received and available to the application as follows:
•Master node - at next time base tick after the maximum frame length. The master node
updates its received signals periodically at the time base start (i.e. at task level).
•Slave node - when the checksum for the received frame is validated. The slave node updates
its received signals directly after the frame is finished (i.e. at interrupt level).
A signal is considered transmitted (latest point in time when the application may write to the
signal) as follows:
•Master node - before the frame transmission is initiated.
•Slave node - when the ID for the frame is received.
Frame Structure
The structure of a frame is shown in Figure 1.13. The frame is constructed of a number of
fields, one break field followed by four to eleven byte fields, labeled as in the figure. The time it
takes to send a frame is the sum of the time to send each byte plus the response space and the inter-
byte spaces.
The header starts at the falling edge of the break field and ends after the end of the stop bit of
the protected identifier (PID) field. The response starts at the end of stop bit of the PID field and
ends at the after the stop bit of the checksum field.
The inter-byte space is the time between the end of the stop bit of the preceding field and the
start bit of the following byte. The response space is the inter-byte space between the PID field and
the first data field in the data. Both of them must be non-negative.
Figure 1.13 – LIN Frame Structure

Each byte field, except the break field, is transmitted as the byte field shown in Figure 4.6.
The LSB of the data is sent first and the MSB last. The start bit is encoded as a bit with value zero
(dominant) and the stop bit is encoded as a bit with value one (recessive).
Figure 1.14 – Byte Field

Break field
The break field is used to signal the beginning of a new frame. It is the only field that does
not comply with Figure 1.14. A break field is always generated by the master task (in the master
node) and it shall be at least 13 nominal bit times of dominant value, followed by a break delimiter,
as shown in Figure 1.15. The break delimiter shall be at least one nominal bit time long.
A slave node shall use a break detection threshold of 11 dominant local slave bit times. Slave
nodes with a bit rate tolerance better than FTOL_RES_SLAVE, (typically a crystal or ceramic
resonator) may use a 9.5 dominant nominal bit times break detection threshold.
Figure 1.15 – Break Field
Sync byte field

Sync is a byte field with the data value 0x55, as shown in Figure 1.16.
Figure 1.16 – Sync Field
A slave task shall always be able to detect the break/sync field sequence, even if it expects a
byte field (assuming the byte fields are separated from each other). A desired, but not required,
feature is to detect the break/sync field sequence even if the break is partially superimposed with a
data byte. When a break/sync field sequence happens, the transfer in progress shall be aborted and
processing of the new frame shall commence.
Protected identifier field
A protected identifier field consists of two sub-fields; the frame identifier and the parity. Bits
0 to 5 are the frame identifier and bits 6 and 7 are the parity.
Frame identifier
Six bits are reserved for the frame identifier; values in the range 0 to 63 can be used. The
frame identifiers are split in three categories:
•Values 0 to 59 (0x3B) are used for signal carrying frames,
•60 (0x3C) and 61 (0x3D) are used to carry diagnostic and configuration data,
•62 (0x3E) and 63 (0x3F) are reserved for future protocol enhancements.
Parity
The parity is calculated on the frame identifier bits as shown in equations (1) and (2):
P0 = ID0 + ID1 + ID2 + ID4 (1)
P1 = ¬ (ID1 + ID3 + ID4 + ID5) (2)
Mapping
The mapping of the bits (ID0 to ID5 and P0 and P1) is shown in Figure 1.17.
Figure 1.17 – Protected Identifier Field
Data
A frame carries between one and eight bytes of data. The number of data contained in a
frame with a specific frame identifier shall be agreed by the publisher and all subscribers. A data
byte is transmitted as part of a byte field, see Figure 1.14.
For data entities longer than one byte, the entity LSB is contained in the byte sent first and
the entity MSB in the byte sent last (little-endian). The data fields are labeled data 1, data 2... up to
maximum data 8, see Figure 1.18.
Figure 1.18 – Data Field
Checksum
The last field of a frame is the checksum. The checksum contains the inverted eight bit sum
with carry over all data bytes or all data bytes and the protected identifier. Checksum calculation
over the data bytes only is called classic checksum and it is used for the master request frame, slave
response frame and communication with LIN 1.x slaves.
Eight bit sum with carry is equivalent to sum all values and subtract 255 every time the sum
is greater or equal to 256. See section 2.8.3 for examples how to calculate the checksum.
Checksum calculation over the data bytes and the protected identifier byte is called enhanced
checksum and it is used for communication with LIN 2.x slaves.
The checksum is transmitted in a byte field. Use of classic or enhanced checksum is managed by the
master node and it is determined per frame identifier; classic in communication with LIN 1.x slave
nodes and enhanced in communication with LIN 2.x slave nodes.
10.4 Controller Area Network
The Controller Area Network (CAN) is a serial communications protocol which efficiently
supports distributed real-time control with a very high level of security. Its domain of application
ranges from high speed networks to low cost multiplex wiring. In automotive electronics, engine
control units, sensors, anti-skid-systems, etc. are connected using CAN with bitrates up to 1 Mbit/s.
At the same time it is cost effective to build into vehicle body electronics, e.g. lamp clusters electric
windows etc. to replace the wiring harness otherwise required.
The intention of this specification is to achieve compatibility between any two CAN
implementations. Compatibility, however, has different aspects regarding e.g. electrical features and
the interpretation of data to be transferred. To achieve design transparency and implementation
flexibility CAN has been subdivided into different layers according to the ISO/OSI Reference
Model:
• The Data Link Layer
- The Logical Link Control (LLC) sub-layer
- The Medium Access Control (MAC) sub-layer
• The Physical Layer
Note that in previous versions of the CAN specification the services and functions of the
LLC and MAC sub-layers of the Data Link Layer had been described in layers denoted as ’object
layer’ and ’transfer layer’. The scope of the LLC sub-layer is
• To provide services for data transfer and for remote data request,
• To decide which messages received by the LLC sub-layer are actually to be accepted,
• To provide means for recovery management and overload notifications.
There is much freedom in defining object handling. The scope of the MAC sub-layer mainly
is the transfer protocol, i.e. controlling the Framing, performing Arbitration, Error Checking, Error
Signaling and Fault Confinement. Within the MAC sub-layer it is decided whether the bus is free for
starting a new transmission or whether a reception is just starting. Also some general features of the
bit timing are regarded as part of the MAC sub-layer. It is in the nature of the MAC sub-layer that
there is no freedom for modifications.
The scope of the physical layer is the actual transfer of the bits between the different nodes
with respect to all electrical properties. Within one network the physical layer, of course, has to be
the same for all nodes. There may be, however, much freedom in selecting a physical layer.
The scope of this specification is to define the MAC sub-layer and a small part of the LLC
sub-layer of the Data Link Layer and to describe the consequences of the CAN protocol on the
surrounding layers
Basic Concepts
CAN has the following properties:
• Prioritization of messages
• Guarantee of latency times
• Configuration flexibility
• Multicast reception with time synchronization
• System wide data consistency
• Multi-master
• Error detection and signaling
• Automatic retransmission of corrupted messages as soon as the bus is idle again
•distinction between temporary errors and permanent failures of nodes and autonomous
switching off of defect nodes
Layered Architecture of CAN according to the OSI Reference Model
• The Physical Layer defines how signals are actually transmitted and therefore deals with
the description of Bit Timing, Bit Encoding, and Synchronization. Within this specification the
Driver/Receiver Characteristics of the Physical Layer are not defined so as to allow transmission
medium and signal level implementations to be optimized for their application.
• The MAC sub-layer represents the kernel of the CAN protocol. It presents messages
received from the LLC sub-layer and accepts messages to be transmitted to the LLC sub-layer. The
MAC sub-layer is responsible for Message Framing, Arbitration, Acknowledgment, Error Detection
and Signaling. The MAC sub-layer are supervised by a management entity called Fault Confinement
which is self-checking mechanism for distinguishing short disturbances from permanent failures.
• The LLC sub-layer is concerned with Message Filtering, Overload Notification and
Recovery Management.
The scope of this specification is to define the Data Link Layer and the consequences of the
CAN protocol on the surrounding layers.
Messages
Information on the bus is sent in fixed format messages of different but limited length. When
the bus is free any connected unit may start to transmit a new message.
Information Routing
In CAN systems a CAN node does not make use of any information about the system
configuration (e.g. station addresses). This has several important consequences.
System Flexibility: Nodes can be added to the CAN network without requiring any change in
the software or hardware of any node and application layer.
Message Routing: The content of a message is named by an IDENTIFIER. The
IDENTIFIER does not indicate the destination of the message, but describes the meaning of the
data, so that all nodes in the network are able to decide by Message Filtering whether the data is to
be acted upon by them or not.
Multicast: As a consequence of the concept of Message Filtering any number of nodes can
receive and simultaneously act upon the same message.
Data Consistency: Within a CAN network it is guaranteed that a message is simultaneously
accepted either by all nodes or by no node. Thus data consistency of a system is achieved by the
concepts of multicast and by error handling.
Bit rate
The speed of CAN may be different in different systems. However, in a given system the bit-rate is
uniform and fixed.
Priorities
The IDENTIFIER defines a static message priority during bus access.
Remote Data Request
By sending a REMOTE FRAME a node requiring data may request another node to send the
corresponding DATA FRAME. The DATA FRAME and the corresponding REMOTE FRAME are
named by the same IDENTIFIER.
Multi-master
When the bus is free any unit may start to transmit a message. The unit with the message of
higher priority to be transmitted gains bus access.
Arbitration
Whenever the bus is free, any unit may start to transmit a message. If 2 or more units start
transmitting messages at the same time, the bus access conflict is resolved by bitwise arbitration
using the IDENTIFIER. The mechanism of arbitration guarantees that neither information nor time
is lost. If a DATA FRAME and a REMOTE FRAME with the same IDENTIFIER are initiated at the
same time, the DATA FRAME prevails over the REMOTE FRAME. During arbitration every
transmitter compares the level of the bit transmitted with the level that is monitored on the bus. If
these levels are equal the unit may continue to send. When a ’recessive’ level is sent and a
’dominant’ level is monitored (see Bus Values), the unit has lost arbitration and must withdraw
without sending one more bit.
Safety
In order to achieve the utmost safety of data transfer, powerful measures for error detection,
signaling and self-checking are implemented in every CAN node.
Error Detection
For detecting errors the following measures have been taken:
- Monitoring (transmitters compare the bit levels to be transmitted with the bit levels detected
on the bus)
- Cyclic Redundancy Check
- Bit Stuffing
- Message Frame Check
Performance of Error Detection
The error detection mechanisms have the following properties:
- All global errors are detected.
- All local errors at transmitters are detected.
- Up to 5 randomly distributed errors in a message are detected.
- Burst errors of length less than 15 in a message are detected.
- Errors of any odd number in a message are detected.
Total residual error probability for undetected corrupted messages: less than message error
rate * 4.7 * 10-11.
Error Signaling and Recovery Time
Corrupted messages are flagged by any node detecting an error. Such messages are aborted
and will be retransmitted automatically. The recovery time from detecting an error until the start of
the next message is at most 31 bit times, if there is no further error.
Fault Confinement
CAN nodes are able to distinguish short disturbances from permanent failures.
Defective nodes are switched off.
Connections
The CAN serial communication link is a bus to which a number of units may be connected.
This number has no theoretical limit. Practically the total number of units will be limited by delay
times and/or electrical loads on the bus line.
Single Channel
The bus consists of a single channel that carries bits. From this data resynchronization
information can be derived. The way in which this channel is implemented is not fixed in this
specification. E.g. single wire (plus ground), two differential wires, optical fibers, etc.
Bus values
The bus can have one of two complementary logical values: ’dominant’ or ’recessive’.
During simultaneous transmission of ’dominant’ and ’recessive’ bits, the resulting bus value will be
’dominant’. For example, in case of a wired-AND implementation of the bus, the ’dominant’ level
would be represented by a logical ’0’ and the ’recessive’ level by a logical ’1’. Physical states (e.g.
electrical voltage, light) that represent the logical levels are not given in this specification.
Acknowledgment
All receivers check the consistency of the message being received and will acknowledge a
consistent message and flag an inconsistent message.
Sleep Mode / Wake-up
To reduce the system’s power consumption, a CAN-device may be set into sleep mode
without any internal activity and with disconnected bus drivers. The sleep mode is finished with a
wake-up by any bus activity or by internal conditions of the system. On wake-up, the internal
activity is restarted, although the MAC sub-layer will be waiting for the system’s oscillator to
stabilize and it will then wait until it has synchronized itself to the bus activity (by checking for
eleven consecutive ’recessive’ bits), before the bus drivers are set to "on-bus" again.
Message Transfer
Frame Formats
There are two different formats which differ in the length of the IDENTIFIER field: Frames
with the number of 11 bit IDENTIFIER are denoted Standard Frames. In contrast, frames containing
29 bit IDENTIFIER are denoted Extended Frames.
Frame Types
Message transfer is manifested and controlled by four different frame types:
- A DATA FRAME carries data from a transmitter to the receivers.
- A REMOTE FRAME is transmitted by a bus unit to request the transmission of the DATA
FRAME with the same IDENTIFIER.
- An ERROR FRAME is transmitted by any unit on detecting a bus error.
- An OVERLOAD FRAME is used to provide for an extra delay between the preceding and
the succeeding DATA or REMOTE FRAMEs.
DATA FRAMEs and REMOTE FRAMEs can be used both in Standard Frame Format and
Extended Frame Format; they are separated from preceding frames by an INTERFRAME SPACE.
DATA FRAME
A DATA FRAME is composed of seven different bit fields: START OF FRAME,
ARBITRATION FIELD, CONTROL FIELD, DATA FIELD, CRC FIELD, ACK FIELD, and END
OF FRAME. The DATA FIELD can be of length zero.
START OF FRAME (Standard Format as well as Extended Format)

The START OF FRAME (SOF) marks the beginning of DATA FRAMES and REMOTE
FRAMEs. It consists of a single ’dominant’ bit. A station is only allowed to start transmission when
the bus is idle (see ’INTERFRAME Spacing’). All stations have to synchronize to the leading edge
caused by START OF FRAME (see ’HARD SYNCHRONIZATION’) of the station starting
transmission first.
ARBITRATION FIELD
The format of the ARBITRATION FIELD is different for Standard Format and Extended
Format Frames.
- In Standard Format the ARBITRATION FIELD consists of the 11 bit IDENTIFIER and the
RTR-BIT. The IDENTIFIER bits are denoted ID-28 ... ID-18.
- In Extended Format the ARBITRATION FIELD consists of the 29 bit IDENTIFIER, the
SRR-Bit, the IDE-Bit, and the RTR-BIT. The IDENTIFIER bits are denoted ID-28 ... ID-0. In order
to distinguish between Standard Format and Extended Format the reserved bit r1 in previous CAN
specifications version 1.0-1.2 now is denoted as IDE Bit.
IDENTIFIER
IDENTIFIER - Standard Format
The IDENTIFIER’s length is 11 bits and corresponds to the Base ID in Extended Format.
These bits are transmitted in the order from ID-28 to ID-18. The least significant bit is ID-18. The 7
most significant bits (ID-28 - ID-22) must not be all ’recessive’.
IDENTIFIER - Extended Format

In contrast to the Standard Format the Extended Format consists of 29 bits. The format
comprises two sections:
Base ID with 11 bits and the
Extended ID with 18 bits
Base ID
The Base ID consists of 11 bits. It is transmitted in the order from ID-28 to ID-18. It is
equivalent to format of the Standard Identifier. The Base ID defines the Extended Frame’s base
priority.
Extended ID
The Extended ID consists of 18 bits. It is transmitted in the order of ID-17 to ID-0.
In a Standard Frame the IDENTIFIER is followed by the RTR bit.

RTR BIT (Standard Format as well as Extended Format)
Remote Transmission Request BIT
In DATA FRAMEs the RTR BIT has to be ’dominant’. Within a REMOTE FRAME the
RTR BIT has to be ’recessive’.
In an Extended Frame the Base ID is transmitted first, followed by the IDE bit and the
SRR bit. The Extended ID is transmitted after the SRR bit.

SRR BIT (Extended Format)
Substitute Remote Request BIT
The SRR is a recessive bit. It is transmitted in Extended Frames at the position of the RTR
bit in Standard Frames and so substitutes the RTR-Bit in the Standard Frame. Therefore, collisions
of a Standard Frame and an Extended Frame, the Base of which is the same as the Standard Frame’s
Identifier, are resolved in such a way that the Standard Frame prevails the Extended Frame.
IDE BIT (Extended Format)
Identifier Extension Bit
The IDE Bit belongs to
- The ARBITRATION FIELD for the Extended Format
- The Control Field for the Standard Format
The IDE bit in the Standard Format is transmitted ’dominant’, whereas in the Extended
Format the IDE bit is recessive.
CONTROL FIELD (Standard Format as well as Extended Format)

The CONTROL FIELD consists of six bits. The format of the CONTROL FIELD is different
for Standard Format and Extended Format. Frames in Standard Format include the DATA LENGTH
CODE, the IDE bit, which is transmitted ’dominant’ (see above), and the reserved bit r0. Frames in
the Extended Format include the DATA LENGTH CODE and two reserved bits r1 and r0. The
reserved bits have to be sent ’dominant’, but receivers accept ’dominant’ and ’recessive’ bits in all
combinations.
DATA LENGTH CODE (Standard Format as well as Extended Format)

The number of bytes in the DATA FIELD is indicated by the DATA LENGTH CODE. This DATA
LENGTH CODE is 4 bits wide and is transmitted within the CONTROL FIELD. Coding of the
number of data bytes by the DATA LENGTH CODE abbreviations:
- d ’dominant’
- r ’recessive’
DATA FRAME: admissible numbers of data bytes: {0, 1... 7, 8}.

Other values may not be used.
DATA FIELD (Standard Format as well as Extended Format)

The DATA FIELD consists of the data to be transferred within a DATA FRAME. It can contain
from 0 to 8 bytes, which each contain 8 bits which are transferred MSB first.
CRC FIELD (Standard Format as well as Extended Format) contains the CRC SEQUENCE
followed by a CRC DELIMITER
CRC SEQUENCE (Standard Format as well as Extended Format)

The frame check sequence is derived from a cyclic redundancy code best suited for frames with bit
counts less than 127 bits (BCH Code).
In order to carry out the CRC calculation the polynomial to be divided is defined as the
polynomial, the coefficients of which are given by the destuffed bit stream consisting of START OF
FRAME, ARBITRATION FIELD, CONTROL FIELD, DATA FIELD (if present) and, for the 15
lowest coefficients, by 0. This polynomial is divided (the coefficients are calculated modulo-2) by
the generator-polynomial:
X15 + X14 + X10 + X8 + X7 + X4 + X3 + 1.
The remainder of this polynomial division is the CRC SEQUENCE transmitted over the bus.
In order to implement this function, a 15 bit shift register CRC_RG (14:0) can be used. If NXTBIT
denotes the next bit of the bit stream, given by the destuffed bit sequence from START OF FRAME
until the end of the DATA FIELD, the CRC SEQUENCE is calculated as follows:
CRC_RG = 0; // initialize shift register
REPEAT
CRCNXT = NXTBIT EXOR CRC_RG (14);
CRC_RG (14:1) = CRC_RG (13:0); // shift left by
CRC_RG (0) = 0; // 1 position
IF CRCNXT THEN
CRC_RG (14:0) = CRC_RG (14:0) EXOR (4599hex);
ENDIF
UNTIL (CRC SEQUENCE starts or there is an ERROR condition)
After the transmission / reception of the last bit of the DATA FIELD, CRC_RG contains the
CRC sequence.
CRC DELIMITER (Standard Format as well as Extended Format) The CRC SEQUENCE is
followed by the CRC DELIMITER which consists of a single ’recessive’ bit.
ACK FIELD (Standard Format as well as Extended Format)

The ACK FIELD is two bits long and contains the ACK SLOT and the ACK DELIMITER. In the
ACK FIELD the transmitting station sends two ’recessive’ bits. A RECEIVER which has received a
valid message correctly, reports this to the TRANSMITTER by sending a ’dominant’ bit during the
ACK SLOT (it sends ’ACK’).
ACK SLOT
All stations having received the matching CRC SEQUENCE report this within the ACK SLOT by
super scribing the ’recessive’ bit of the TRANSMITTER by a ’dominant’ bit.
ACK DELIMITER
The ACK DELIMITER is the second bit of the ACK FIELD and has to be a ’recessive’ bit. As a
consequence, the ACK SLOT is surrounded by two ’recessive’ bits (CRC
DELIMITER, ACK DELIMITER).
END OF FRAME (Standard Format as well as Extended Format)

Each DATA FRAME and REMOTE FRAME is delimited by a flag sequence consisting of seven
’recessive’ bits.
11. IDE – Integrated Development Environment

An integrated development environment (IDE) is a software application that provides
comprehensive facilities to computer programmers for software development. An IDE normally
consists of a source code editor, a compiler and/or interpreter, build automation tools, and (usually)
a debugger. Sometimes a version control system and various tools are integrated to simplify the
construction of a GUI. Many modern IDEs also have a class browser, an object inspector, and a
class hierarchy diagram, for use with object oriented software development.
Typically an IDE is dedicated to a specific programming language, so as to provide a feature
set which most closely matches the programming paradigms of the language. However, some
multiple-language IDEs are in use, such as Eclipse, recent versions of NetBeans, and Microsoft
Visual Studio.
IDEs present a single program in which all development is done. This program typically
provides many features for authoring, modifying, compiling, deploying and debugging software.
The aim is to abstract the configuration necessary to piece together command line utilities in a
cohesive unit, which theoretically reduces the time to learn a language, and increases developer
productivity. It is also thought that the tight integration of development tasks can further increase
productivity. For example, code can be compiled while being written, providing instant feedback on
syntax errors. While most modern IDEs are graphical, IDEs in use before the advent of windowing
systems were text-based, using function keys or hotkeys to perform various tasks.
11.1 Source Code Editor

A source code editor is a text editor program designed specifically for editing source code of
computer programs by programmers. It may be a standalone application or it may be built into an
integrated development environment (IDE).
Source code editors have features specifically designed to simplify and speed up input of
source code, such as syntax highlighting, autocomplete and bracket matching functionality. These
editors also provide a convenient way to run a compiler, interpreter, debugger, or other program
relevant for software development process. So, while many text editors can be used to edit source
code, if they don't enhance, automate or ease the editing of code, they are not "source code editors,"
but simply "text editors that can also be used to edit source code."
Examples of source code editors: Notepad++ (Windows), PSPad (Windows), editix XML
Editor (Windows, Linux, Mac OS X), Crimson Editor (Windows), EmEditor (Windows) UltraEdit
(Windows), UNA (Windows, Linux, Mac OS X).
11.2 Compiler
A compiler is a computer program (or set of programs) that translates text written in a
computer language (the source language) into another language (the target language). The original
sequence is usually called the source code and the output called object code. Commonly the output
has a form suitable for processing by other programs (e.g., a linker), but it may be a human-readable
text file.
The most common reason for wanting to translate source code is to create an executable
program. The name "compiler" is primarily used for programs that translate source code from a
high-level programming language to a lower level language (e.g., assembly language or machine
language). A program that translates from a low level language to a higher level one is a decompiler.
A program that translates between high-level languages is usually called a language translator,
source to source translator, or language converter. A language rewriter is usually a program that
translates the form of expressions without a change of language.
A compiler is likely to perform many or all of the following operations: lexical analysis,
preprocessing, parsing, semantic analysis, code generation, and code optimization.
11.2.1 Front end

The front end analyzes the source code to build an internal representation of the program,
called the intermediate representation or IR. It also manages the symbol table, a data structure
mapping each symbol in the source code to associated information such as location, type and scope.
This is done over several phases, which includes some of the following:
- Line reconstruction. Languages which strop their keywords or allow arbitrary spaces within
identifiers require a phase before parsing, which converts the input character sequence to a canonical
form ready for the parser. The top-down, recursive-descent, table-driven parsers used in the 1960s
typically read the source one character at a time and did not require a separate tokenizing phase.
- Lexical analysis breaks the source code text into small pieces called tokens. Each token is a
single atomic unit of the language, for instance a keyword, identifier or symbol name. The token
syntax is typically a regular language, so a finite state automaton constructed from a regular
expression can be used to recognize it. This phase is also called lexing or scanning, and the software
doing lexical analysis is called a lexical analyzer or scanner.
- Preprocessing. Some languages, e.g., C, require a preprocessing phase which supports macro
substitution and conditional compilation. Typically the preprocessing phase occurs before syntactic
or semantic analysis; e.g. in the case of C, the preprocessor manipulates lexical tokens rather than
syntactic forms. However, some languages such as Scheme support macro substitutions based on
syntactic forms.
- Syntax analysis involves parsing the token sequence to identify the syntactic structure of the
program. This phase typically builds a parse tree, which replaces the linear sequence of tokens with
a tree structure built according to the rules of a formal grammar which define the language's syntax.
The parse tree is often analyzed, augmented, and transformed by later phases in the compiler.
- Semantic analysis is the phase in which the compiler adds semantic information to the parse
tree and builds the symbol table. This phase performs semantic checks such as type checking
(checking for type errors), or object binding (associating variable and function references with their
definitions), or definite assignment (requiring all local variables to be initialized before use),
rejecting incorrect programs or issuing warnings. Semantic analysis usually requires a complete
parse tree, meaning that this phase logically follows the parsing phase, and logically proceeds the
code generation phase, though it is often possible to fold multiple phases into one pass over the code
in a compiler implementation.
11.2.2 Back end

The term back end is sometimes confused with code generator because of the overlapped
functionality of generating assembly code. Some literature uses middle end to distinguish the
generic analysis and optimization phases in the back end from the machine-dependent code
generators.
The main phases of the back end include the following:
- Analysis: This is the gathering of program information from the intermediate representation
derived from the input. Typical analyses are data flow analysis to build use-define chains,
dependence analysis, alias analysis, pointer analysis, escape analysis etc. Accurate analysis is the
basis for any compiler optimization. The call graph and control flow graph are usually also built
during the analysis phase.
- Optimization: the intermediate language representation is transformed into functionally
equivalent but faster (or smaller) forms. Popular optimizations are inline expansion, dead code
elimination, constant propagation, loop transformation, register allocation or even automatic
parallelization.
- Code generation: the transformed intermediate language is translated into the output
language, usually the native machine language of the system. This involves resource and storage
decisions, such as deciding which variables to fit into registers and memory and the selection and
scheduling of appropriate machine instructions along with their associated addressing modes
Examples of compilers: MULTI, Local C Compiler, LabWindows CVI, GCC, Sun Studio.
11.3 Linker
A linker or link editor is a program that takes one or more objects generated by compilers
and assembles them into a single executable program.
Linkers can take objects from a collection called a library. Some linkers do not include the
whole library in the output; they only include its symbols that are referenced from other object files
or libraries. Libraries exist for diverse purposes, and one or more system libraries are usually linked
in by default.
The linker also takes care of arranging the objects in a program's address space. This may
involve relocating code that assumes a specific base address to another base. Since a compiler
seldom knows where an object will reside, it often assumes a fixed base location (for example,
zero). Relocating machine code may involve re-targeting of absolute jumps, loads and stores.
11.4 Debugger
A debugger is a program that is used to test and debug other programs. The code to be
examined might alternatively be running on an instruction set simulator (ISS), a technique that
allows great power in its ability to halt when specific conditions are encountered but which will
typically be much slower than executing the code directly on the appropriate processor.
When the program crashes, the debugger shows the position in the original code if it is a
source-level debugger or symbolic debugger, commonly seen in integrated development
environments. If it is a low-level debugger or a machine-language debugger it shows the line in the
disassembly. (A "crash" happens when the program cannot continue because of a programming bug.
For example, perhaps the program tried to use an instruction not available on the current version of
the CPU or attempted access to unavailable or protected memory.)
Typically, debuggers also offer more sophisticated functions such as running a program step
by step (single-stepping), stopping (breaking) (pausing the program to examine the current state) at
some kind of event by means of breakpoint, and tracking the values of some variables. Some
debuggers have the ability to modify the state of the program while it is running, rather than merely
to observe it.
The importance of a good debugger cannot be overstated. Indeed, the existence and quality
of such a tool for a given language and platform can often be the deciding factor in its use, even if
another language/platform is better-suited to the task. However, it is also important to note that
software can (and often does) behave differently running under a debugger than normally, due to the
inevitable changes the presence of a debugger will make to a software program's internal timing. As
a result, even with a good debugging tool, it is often very difficult to track down runtime problems
in complex multi-threaded or distributed systems.
Examples of debuggers: CodeView, DBG - A PHP Debugger and Profiler, DDD - Data
Display Debugger, Eclipse, TotalView, GNU Debugger (GDB), Insight, Interactive Disassembler.
12. Real-Time Operating Systems

12.1 Introduction
A real-time operating system (RTOS) is the key to many embedded systems today and,
provides a software platform upon which to build applications. Not all embedded systems are
designed with an RTOS. Some embedded systems with relatively simple hardware or a small
amount of software application code might not require an RTOS. Many embedded systems with
moderate-to-large software applications require some form of scheduling, and these systems require
an RTOS.
12.2 Defining an RTOS
A real-time operating system (RTOS) is a program that schedules execution in a timely

manner, manages system resources, and provides a consistent foundation for developing application
code. Application code designed on an RTOS can be quite diverse, ranging from a simple
application for a digital stopwatch to a much more complex application for aircraft navigation. Good
RTOSes, therefore, are scalable in order to meet different sets of requirements for different
applications.
For example, in some applications, an RTOS comprises only a kernel, which is the core
supervisory software that provides minimal logic, scheduling, and resource-management algorithms.
Every RTOS has a kernel. On the other hand, an RTOS can be a combination of various modules,
including the kernel, a file system, networking protocol stacks, and other components required for a
particular application, as illustrated at a high level in Figure 2.1.
Figure 2.1: High-level view of an RTOS, its kernel, and other components found in embedded
systems.
Most RTOS kernels contain the following components:

• Scheduler - is contained within each kernel and follows a set of algorithms that determines which
task executes when. Some common examples of scheduling algorithms include round-robin and
preemptive scheduling.
• Objects - are special kernel constructs that help developers create applications for real-time
embedded systems. Common kernel objects include tasks, semaphores, and message queues.
• Services - are operations that the kernel performs on an object or, generally operations such as
timing, interrupt handling, and resource management.
Figure 2.2 illustrates these components, each of which is described next.
Figure 2.2: Common components in an RTOS kernel that including objects, the scheduler, and some
services.
This diagram is highly simplified; remember that not all RTOS kernels conform to this exact
set of objects,scheduling algorithms, and services.
12.3 The Scheduler
The scheduler is at the heart of every kernel. A scheduler provides the algorithms needed to
determine which task executes when. To understand how scheduling works, this section describes
the following topics:
• schedulable entities,
• multitasking,
• context switching,
• dispatcher, and
• scheduling algorithms.
12.3.1 Schedulable Entities
A schedulable entity is a kernel object that can compete for execution time on a system,
based on a predefined scheduling algorithm. Tasks and processes are all examples of schedulable
entities found in most kernels.
A task is an independent thread of execution that contains a sequence of independently
schedulable instructions. Some kernels provide another type of a schedulable object called a process.
Processes are similar to tasks in that they can independently compete for CPU execution time.
Processes differ from tasks in that they provide better memory protection features, at the expense of
performance and memory overhead. Note that message queues and semaphores are not schedulable
entities. These items are inter-task communication objects used for synchronization and
communication.
So, how exactly does a scheduler handle multiple schedulable entities that need to run
simultaneously? The answer is by multitasking. The multitasking discussions are carried out in the
context of uniprocessor environments.
12.3.2 Multitasking
Multitasking is the ability of the operating system to handle multiple activities within set
deadlines. A real-time kernel might have multiple tasks that it has to schedule to run. One such
multitasking scenario is illustrated in Figure 2.3.
Figure 2.3: Multitasking using a context switch.
In this scenario, the kernel multitasks in such a way that many threads of execution appear to
be running concurrently; however, the kernel is actually interleaving executions sequentially, based
on a preset scheduling algorithm. The scheduler must ensure that the appropriate task runs at the
right time. An important point to note here is that the tasks follow the kernel’s scheduling algorithm,
while interrupt service routines (ISR) are triggered to run because of hardware interrupts and their
established priorities.
As the number of tasks to schedule increases, so do CPU performance requirements. This
fact is due to increased switching between the contexts of the different threads of execution.
12.3.3 The Context Switch
Each task has its own context, which is the state of the CPU registers required each time it is
scheduled to run. A context switch occurs when the scheduler switches from one task to another. To
better understand what happens during a context switch, let’s examine further what a typical kernel
does in this scenario. Every time a new task is created, the kernel also creates and maintains an
associated task control block (TCB). TCB’s are system data structures that the kernel uses to
maintain task-specific information. TCBs contain everything a kernel needs to know about a
particular task.
When a task is running, it’s context is highly dynamic. This dynamic context is maintained in
the TCB. When the task is not running, it’s context is frozen within the TCB, to be restored the next
time the task runs. A typical context switch scenario is illustrated in Figure 3. As shown in Figure 3,
when the kernel’s scheduler determines that it needs to stop running task 1 and start running task 2,
it takes the following steps:
1. The kernel saves task’s 1 context information in its TCB.
2. It loads task’s 2 context information from its TCB, which becomes the current thread of
execution.
3. The context of task 1 is frozen while task 2 executes, but if the scheduler needs to run task
1 again, task 1 continues from where it left off just before the context switch.
The time it takes for the scheduler to switch from one task to another is the context switch
time. It is relatively insignificant compared to most operations that a task performs. If an
application’s design includes frequent context switching, however, the application can incur
unnecessary performance overhead. Therefore, design applications in a way that does not involve
excess context switching. Every time an application makes a system call, the scheduler has an
opportunity to determine if it needs to switch contexts. When the scheduler determines a context
switch is necessary, it relies on an associated module, called the dispatcher, to make that switch
happen.
12.3.4 The Dispatcher
The dispatcher is the part of the scheduler that performs context switching and changes the
flow of execution. At any time an RTOS is running, the flow of execution, also known as flow of
control, is passing through one of three areas: through an application task, through an ISR, or
through the kernel. When a task or ISR makes a system call, the flow of control passes to the kernel
to execute one of the system routines provided by the kernel. When it is time to leave the kernel, the
dispatcher is responsible for passing control to one of the tasks in the user’s application. It will not
necessarily be the same task that made the system call. It is the scheduling algorithms of the
scheduler that determines which task executes next. It is the dispatcher that does the actual work of
context switching and passing execution control.
Depending on how the kernel is first entered, dispatching can happen differently. When a
task makes system calls, the dispatcher is used to exit the kernel after every system call completes.
In this case, the dispatcher is used on a call-by-call basis so that it can coordinate task-state
transitions that any of the system calls might have caused. (One or more tasks may have become
ready to run, for example.)
On the other hand, if an ISR makes system calls, the dispatcher is bypassed until the ISR
fully completes its execution. This process is true even if some resources have been freed that would
normally trigger a context switch between tasks. These context switches do not take place because
the ISR must complete without being interrupted by tasks. After the ISR completes execution, the
kernel exits through the dispatcher so that it can then dispatch the correct task.
12.3.5 Scheduling Algorithms
As mentioned earlier, the scheduler determines which task runs by following a scheduling
algorithm (also known as scheduling policy). Most kernels today support two common scheduling
algorithms:
• preemptive priority-based scheduling, and
• round-robin scheduling.
The RTOS manufacturer typically predefines these algorithms; however, in some cases,
developers can create and define their own scheduling algorithms. Each algorithm is described next.
Preemptive Priority-Based Scheduling

Of the two scheduling algorithms introduced here, most real-time kernels use preemptive
priority-based scheduling by default. As shown in Figure 2.4 with this type of scheduling, the task
that gets to run at any point is the task with the highest priority among all other tasks ready to run in
the system.
Figure 2.4: Preemptive priority-based scheduling.
Real-time kernels generally support 256 priority levels, in which 0 is the highest and 255 the
lowest. Some kernels appoint the priorities in reverse order, where 255 is the highest and 0 the
lowest. Regardless, the concepts are basically the same. With a preemptive priority-based scheduler,
each task has a priority, and the highest-priority task runs first. If a task with a priority higher than
the current task becomes ready to run, the kernel immediately saves the current task s context in its
TCB and switches to the higher-priority task. As shown in Figure 4 task 1 is preempted by higher-
priority task 2, which is then preempted by task 3. When task
3 completes, task 2 resumes; likewise, when task 2 completes, task 1 resumes.
Although tasks are assigned a priority when they are created, a task’s priority can be changed
dynamically using kernel-provided calls. The ability to change task priorities dynamically allows an
embedded application the flexibility to adjust to external events as they occur, creating a true real-
time, responsive system. Note, however, that misuse of this capability can lead to priority
inversions, deadlock, and eventual system failure.
Round-Robin Scheduling
Round-robin scheduling provides each task an equal share of the CPU execution time. Pure
round-robinn scheduling cannot satisfy real-time system requirements because in real-time systems,
tasks perform work of varying degrees of importance. Instead, preemptive, priority-based scheduling
can be augmented with round-robin scheduling which uses time slicing to achieve equal allocation
of the CPU for tasks of the same priority as shown in Figure 2.5.
Figure 2.5: Round-robin and preemptive scheduling.
With time slicing, each task executes for a defined interval, or time slice, in an ongoing
cycle, which is the round robin. A run-time counter tracks the time slice for each task, incrementing
on every clock tick. When one task s time slice completes, the counter is cleared, and the task is
placed at the end of the cycle. Newly added tasks of the same priority are placed at the end of the
cycle, with their run-time counters initialized to 0.
If a task in a round-robin cycle is preempted by a higher-priority task, its run-time count is
saved and then restored when the interrupted task is again eligible for execution. This idea is
illustrated in Figure 5, in which task 1 is preempted by a higher-priority task 4 but resumes where it
left off when task 4 completes.
12.4 Objects
Kernel objects are special constructs that are the building blocks for application development
for real-time embedded systems. The most common RTOS kernel objects are
• Tasks are concurrent and independent threads of execution that can compete for CPU execution
time.
• Semaphores are token-like objects that can be incremented or decremented by tasks for
synchronization or mutual exclusion.
• Message Queues are buffer-like data structures that can be used for synchronization, mutual
exclusion, and data exchange by passing messages between tasks. Developers creating real-time
embedded applications can combine these basic kernel objects (as well as others not mentioned
here) to solve common real-time design problems, such as concurrency, activity synchronization,
and data communication. These design problems and the kernel objects used to solve them are
discussed in more detail in later chapters.
12.4.1 Tasks
12.4.1.1 Introduction
Simple software applications are typically designed to run sequentially , one instruction at a
time, in a pre-determined chain of instructions. However, this scheme is inappropriate for real-time
embedded applications, which generally handle multiple inputs and outputs within tight time
constraints. Real-time embedded software applications must be designed for concurrency.
Concurrent design requires developers to decompose an application into small, schedulable,
and sequential program units. When done correctly, concurrent design allows system multitasking to
meet performance and timing requirements for a real-time system. Most RTOS kernels provide task
objects and task management services to facilitate designing concurrency within an application.
12.4.1.2 Defining a Task
A task is an independent thread of execution that can compete with other concurrent tasks for
processor execution time. As mentioned earlier, developers decompose applications into multiple concurrent
tasks to optimize the handling of inputs and outputs within set time constraints.
A task is schedulable. The task is able to compete for execution time on a system, based on a
predefined scheduling algorithm. A task is defined by it’s distinct set of parameters and supporting
data structures. Specifically, upon creation, each task has an associated name, a unique ID, a priority
(if part of a preemptive scheduling plan), a task control block (TCB), a stack, and a task routine, as
shown in Figure 2.6).
Together, these components make up what is known as the task object.
Figure 2.6: A task, its associated parameters, and supporting data structures.
When the kernel first starts, it creates its own set of system tasks and allocates the
appropriate priority for each from a set of reserved priority levels. The reserved priority levels refer
to the priorities used internally by the RTOS for its system tasks. An application should avoid using
these priority levels for its tasks because running application tasks at such level may affect the
overall system performance or behavior. For most RTOSes, these reserved priorities are not
enforced. The kernel needs it’s system tasks and their reserved priority levels to operate. These
priorities should not be modified. Examples of system tasks include:
• initialization or startup task initializes the system and creates and starts system tasks,
• idle task uses up processor idle cycles when no other activity is present,
• logging task logs system messages,
• exception-handling task handles exceptions, and
• debug agent task allows debugging with a host debugger. Note that other system tasks
might be created during initialization, depending on what other components are included with the
kernel.
The idle task, which is created at kernel startup, is one system task that bears mention and
should not be ignored. The idle task is set to the lowest priority, typically executes in an endless
loop, and runs when either no other task can run or when no other tasks exist, for the sole purpose of
using idle processor cycles. The idle task is necessary because the processor executes the instruction
to which the program counter register points while it is running. Unless the processor can be
suspended, the program counter must still point to valid instructions even when no tasks exist in the
system or when no tasks can run. Therefore, the idle task ensures the processor program counter is
always valid when no other tasks are running.
In some cases, however, the kernel might allow a user-configured routine to run instead of
the idle task in order to implement special requirements for a particular application. One example of
a special requirement is power conservation. When no other tasks can run, the kernel can switch
control to the user-supplied routine instead of to the idle task. In this case, the user-supplied routine
acts like the idle task but instead initiates power conservation code, such as system suspension, after
a period of idle time.
After the kernel has initialized and created all of the required tasks, the kernel jumps to a
predefined entry point (such as a predefined function) that serves, in effect, as the beginning of the
application. From the entry point, the developer can initialize and create other application tasks , as
well as other kernel objects, which the application design might require. As the developer creates
new tasks, the developer must assign each a task name, priority, stack size, and a task routine. The
kernel does the rest by assigning each task a unique ID and creating an associated TCB and stack
space in memory for it.
12.4.1.3 Task States and Scheduling
Whether it's a system task or an application task, at any time each task exists in one of a small
number of states, including ready, running, or blocked. As the real-time embedded system runs, each
task moves from one state to another, according to the logic of a simple finite state machine (FSM).
Figure 2.7 illustrates a typical FSM for task execution states, with brief descriptions of state
transitions.
Figure 2.7: A typical finite state machine for task execution states.
Although kernels can define task-state groupings differently, generally three main states are
used in most typical preemptive-scheduling kernels, including:
• ready state-the task is ready to run but cannot because a higher priority task is executing.
• blocked state-the task has requested a resource that is not available, has requested to wait
until some event occurs, or has delayed itself for some duration.
• running state-the task is the highest priority task and is running.
Note some commercial kernels, such as the VxWorks kernel, define other, more granular
states, such as suspended, pended, and delayed. In this case, pended and delayed are actually sub-
states of the blocked state. A pended task is waiting for a resource that it needs to be freed; a delayed
task is waiting for a timing delay to end. The suspended state exists for debugging purposes. For
more detailed information on the way a particular RTOS kernel implements its FSM for each task,
refer to the kernel's user manual.
Regardless of how a kernel implements a task's FSM, it must maintain the current state of all
tasks in a running system. As calls are made into the kernel by executing tasks, the kernel's
scheduler first determines which tasks need to change states and then makes those changes.
In some cases, the kernel changes the states of some tasks, but no context switching occurs
because the state of the highest priority task is unaffected. In other cases, however, these state
changes result in a context switch because the former highest priority task either gets blocked or is
no longer the highest priority task. When this process happens, the former running task is put into
the blocked or ready state, and the new highest priority task starts to execute.
The following describe the ready, running, and blocked states in more detail. These
descriptions are based on a single-processor system and a kernel using a priority-based preemptive
scheduling algorithm.
Ready State
When a task is first created and made ready to run, the kernel puts it into the ready state. In
this state, the task actively competes with all other ready tasks for the processor's execution time. As
Figure 2.7 shows, tasks in the ready state cannot move directly to the blocked state. A task first
needs to run so it can make a blocking call, which is a call to a function that cannot immediately run
to completion, thus putting the task in the blocked state. Ready tasks, therefore, can only move to
the running state. Because many tasks might be in the ready state, the kernel's scheduler uses the
priority of each task to determine which task to move to the running state.
For a kernel that supports only one task per priority level, the scheduling algorithm is
straightforward-the highest priority task that is ready runs next. In this implementation, the kernel
limits the number of tasks in an application to the number of priority levels.
However, most kernels support more than one task per priority level, allowing many more
tasks in an application. In this case, the scheduling algorithm is more complicated and involves
maintaining a task-ready list . Some kernels maintain a separate task-ready list for each priority
level; others have one combined list.
Figure 2.8 illustrates, in a five-step scenario, how a kernel scheduler might use a task-ready
list to move tasks from the ready state to the running state. This example assumes a single-processor
system and a priority-based preemptive scheduling algorithm in which 255 is the lowest priority and
0 is the highest. Note that for simplicity this example does not show system tasks, such as the idle
task.
Figure 2.8: Five steps showing the way a task-ready list works.
In this example, tasks 1, 2, 3, 4, and 5 are ready to run, and the kernel queues them by
priority in a task-ready list. Task 1 is the highest priority task (70); tasks 2, 3, and 4 are at the next-
highest priority level (80); and task 5 is the lowest priority (90). The following steps explains how a
kernel might use the task-ready list to move tasks to and from the ready state:
1. Tasks 1, 2, 3, 4, and 5 are ready to run and are waiting in the task-ready list.
2. Because task 1 has the highest priority (70), it is the first task ready to run. If nothing
higher is running, the kernel removes task 1 from the ready list and moves it to the running state.
3. During execution, task 1 makes a blocking call. As a result, the kernel moves task 1 to the
blocked state; takes task 2, which is first in the list of the next-highest priority tasks (80), off the
ready list; and moves task 2 to the running state.
4. Next, task 2 makes a blocking call. The kernel moves task 2 to the blocked state; takes
task 3, which is next in line of the priority 80 tasks, off the ready list; and moves task 3 to the
running state.
5. As task 3 runs, frees the resource that task 2 requested. The kernel returns task 2 to the
ready state and inserts it at the end of the list of tasks ready to run at priority level 80. Task 3
continues as the currently running task.
Although not illustrated here, if task 1 became unblocked at this point in the scenario, the
kernel would move task 1 to the running state because its priority is higher than the currently
running task (task 3). As with task 2 earlier, task 3 at this point would be moved to the ready state
and inserted after task 2 (same priority of 80) and before task 5 (next priority of 90).
Running State
On a single-processor system, only one task can run at a time. In this case, when a task is
moved to the running state, the processor loads its registers with this task's context. The processor
can then execute the task's instructions and manipulate the associated stack. A task can move back to
the ready state while it is running. When a task moves from the running state to the ready state, it is
preempted by a higher priority task. In this case, the preempted task is put in the appropriate,
priority-based location in the task-ready list, and the higher priority task is moved from the ready
state to the running state.
Unlike a ready task, a running task can move to the blocked state in any of the following
ways:
• by making a call that requests an unavailable resource,
• by making a call that requests to wait for an event to occur, and
• by making a call to delay the task for some duration.
• In each of these cases, the task is moved from the running state to the blocked state, as
described next.
Blocked State
The possibility of blocked states is extremely important in real-time systems because without
blocked states, lower priority tasks could not run. If higher priority tasks are not designed to block,
CPU starvation can result.
CPU starvation occurs when higher priority tasks use all of the CPU execution time and
lower priority tasks do not get to run.
A task can only move to the blocked state by making a blocking call, requesting that some
blocking condition be met. A blocked task remains blocked until the blocking condition is met. (It
probably ought to be called the un blocking condition, but blocking is the terminology in common
use among real-time programmers.) Examples of how blocking conditions are met include the
following:
• a semaphore token (described later) for which a task is waiting is released,
• a message, on which the task is waiting, arrives in a message queue, or
• a time delay imposed on the task expires.
When a task becomes unblocked, the task might move from the blocked state to the ready
state if it is not the highest priority task. The task is then put into the task-ready list at the
appropriate priority-based location, as described earlier.
However, if the unblocked task is the highest priority task, the task moves directly to the
running state (without going through the ready state) and preempts the currently running task. The
preempted task is then moved to the ready state and put into the appropriate priority-based location
in the task-ready list.
12.4.1.4 Typical Task Operations
In addition to providing a task object, kernels also provide task-management services . Task-
management services include the actions that a kernel performs behind the scenes to support tasks,
for example, creating and maintaining the TCB and task stacks.
A kernel, however, also provides an API that allows developers to manipulate tasks. Some of
the more common operations that developers can perform with a task object from within the
application include:
• creating and deleting tasks,
• controlling task scheduling, and
• obtaining task information.
Developers should learn how to perform each of these operations for the kernel selected for
the project. Each operation is briefly discussed next.
Task Creation and Deletion

The most fundamental operations that developers must learn are creating and deleting tasks.
Developers typically create a task using one or two operations, depending on the kernel s
API. Some kernels allow developers first to create a task and then start it. In this case, the task is
first created and put into a suspended state; then, the task is moved to the ready state when it is
started (made ready to run).
Creating tasks in this manner might be useful for debugging or when special initialization
needs to occur between the times that a task is created and started. However, in most cases, it is
sufficient to create and start a task using one kernel call.
The suspended state is similar to the blocked state, in that the suspended task is neither
running nor ready to run. However, a task does not move into or out of the suspended state via the
same operations that move a task to or from the blocked state. The exact nature of the suspended
state varies between RTOSes. For the present purpose, it is sufficient to know that the task is not yet
ready to run.
Starting a task does not make it run immediately; it puts the task on the task-ready list.
Many kernels also provide user-configurable hooks , which are mechanisms that execute
programmer-supplied functions, at the time of specific kernel events. The programmer registers the
function with the kernel by passing a function pointer to a kernel-provided API . The kernel
executes this function when the event of interest occurs. Such events can include:
• when a task is first created,
• when a task is suspended for any reason and a context switch occurs, and
• when a task is deleted.
Hooks are useful when executing special initialization code upon task creation, implementing
status tracking or monitoring upon task context switches, or executing clean-up code upon task
deletion.
Carefully consider how tasks are to be deleted in the embedded application. Many kernel
implementations allow any task to delete any other task. During the deletion process, a kernel
terminates the task and frees memory by deleting the task s TCB and stack.
However, when tasks execute, they can acquire memory or access resources using other
kernel objects. If the task is deleted incorrectly, the task might not get to release these resources. For
example, assume that a task acquires a semaphore token to get exclusive access to a shared data
structure. While the task is operating on this data structure, the task gets deleted. If not handled
appropriately, this abrupt deletion of the operating task can result in:
• a corrupt data structure, due to an incomplete write operation,
• an unreleased semaphore, which will not be available for other tasks that might need to
acquire it, and
• an inaccessible data structure, due to the unreleased semaphore.
As a result, premature deletion of a task can result in memory or resource leaks.
A memory leak occurs when memory is acquired but not released, which causes the system
to run out of memory eventually. A resource leak occurs when a resource is acquired but never
released, which results in a memory leak because each resource takes up space in memory. Many
kernels provide task-deletion locks, a pair of calls that protect a task from being prematurely deleted
during a critical section of code.
Task Scheduling
From the time a task is created to the time it is deleted, the task can move through various
states resulting from program execution and kernel scheduling. Although much of this state
changing is automatic, many kernels provide a set of API calls that allow developers to control when
a task moves to a different state ( Suspend, Resume, Delay, Restart, Get Priority, Set Priority,
Preemption lock, Preemption unlock ).
Using manual scheduling, developers can suspend and resume tasks from within an
application. Doing so might be important for debugging purposes or, as discussed earlier, for
suspending a high-priority task so that lower priority tasks can execute.
A developer might want to delay (block) a task, for example, to allow manual scheduling or
to wait for an external condition that does not have an associated interrupt. Delaying a task causes it
to relinquish the CPU and allow another task to execute. After the delay expires, the task is returned
to the task-ready list after all other ready tasks at its priority level. A delayed task waiting for an
external condition can wake up after a set time to check whether a specified condition or event has
occurred, which is called polling.
A developer might also want to restart a task, which is not the same as resuming a suspended
task. Restarting a task begins the task as if it had not been previously executing. The internal state
the task possessed at the time it was suspended (for example, the CPU registers used and the
resources acquired) is lost when a task is restarted. By contrast, resuming a task begins the task in
the same internal state it possessed when it was suspended.
Restarting a task is useful during debugging or when reinitializing a task after a catastrophic
error. During debugging, a developer can restart a task to step through its code again from start to
finish. In the case of catastrophic error, the developer can restart a task and ensure that the system
continues to operate without having to be completely reinitialized.
Getting and setting a task s priority during execution lets developers control task scheduling
manually. This process is helpful during a priority inversion , in which a lower priority task has a
shared resource that a higher priority task requires and is preempted by an unrelated medium-
priority task. A simple fix for this problem is to free the shared resource by dynamically increasing
the priority of the lower priority task to that of the higher priority task allowing the task to run and
release the resource that the higher priority task requires and then decreasing the former lower
priority task to its original priority.
Finally, the kernel might support preemption locks , a pair of calls used to disable and enable
preemption in applications. This feature can be useful if a task is executing in a critical section of
code : one in which the task must not be preempted by other tasks.
Obtaining Task Information

Kernels provide routines that allow developers to access task information (Get the current
task’s ID, Get the current task’s TCB) within their applications. This information is useful for
debugging and monitoring.
One use is to obtain a particular task s ID, which is used to get more information about the
task by getting its TCB. Obtaining a TCB, however, only takes a snapshot of the task context. If a
task is not dormant (e.g., suspended), its context might be dynamic, and the snapshot information
might change by the time it is used. Hence, use this functionality wisely, so that decisions aren t
made in the application based on querying a constantly changing task context.
12.4.2 Semaphores
12.4.2.1 Introduction
Multiple concurrent threads of execution within an application must be able to synchronize
their execution and coordinate mutually exclusive access to shared resources. To address these
requirements, RTOS kernels provide a semaphore object and associated semaphore management
services.
12.4.2.2 Defining Semaphores

A semaphore (sometimes called a semaphore token) is a kernel object that one or more
threads of execution can acquire or release for the purposes of synchronization or mutual exclusion.
When a semaphore is first created, the kernel assigns to it an associated semaphore control
block (SCB), a unique ID, a value (binary or a count), and a task-waiting list, as shown in Figure
2.9.
Figure 2.9: A semaphore, its associated parameters, and supporting data structures.
A semaphore is like a key that allows a task to carry out some operation or to access a
resource. If the task can acquire the semaphore, it can carry out the intended operation or access the
resource. A single semaphore can be acquired a finite number of times. In this sense, acquiring a
semaphore is like acquiring the duplicate of a key from an apartment manager when the apartment
manager runs out of duplicates, the manager can give out no more keys. Likewise, when a
semaphore s limit is reached, it can no longer be acquired until someone gives a key back or releases
the semaphore.
The kernel tracks the number of times a semaphore has been acquired or released by
maintaining a token count, which is initialized to a value when the semaphore is created. As a task
acquires the semaphore, the token count is decremented; as a task releases the semaphore, the count
is incremented.
If the token count reaches 0, the semaphore has no tokens left. A requesting task, therefore,
cannot acquire the semaphore, and the task blocks if it chooses to wait for the semaphore to become
available.
The task-waiting list tracks all tasks blocked while waiting on an unavailable semaphore.
These blocked tasks are kept in the task-waiting list in either first in/first out (FIFO) order or highest
priority first order.
When an unavailable semaphore becomes available, the kernel allows the first task in the
task-waiting list to acquire it. The kernel moves this unblocked task either to the running state, if it is
the highest priority task, or to the ready state, until it becomes the highest priority task and is able to
run. Note that the exact implementation of a task-waiting list can vary from one kernel to another.
A kernel can support many different types of semaphores, including binary, counting, and
mutual-exclusion (mutex) semaphores.
Binary Semaphores
A binary semaphore can have a value of either 0 or 1. When a binary semaphore s value is 0,
the semaphore is considered unavailable (or empty); when the value is 1, the binary semaphore is
considered available (or full). Note that when a binary semaphore is first created, it can be initialized
to either available or unavailable (1 or 0, respectively). The state diagram of a binary semaphore is
shown in Figure 2.10.
Figure 2.10: The state diagram of a binary semaphore.

Binary semaphores are treated as global resources, which means they are shared among all
tasks that need them. Making the semaphore a global resource allows any task to release it, even if
the task did not initially acquire it.
Counting Semaphores
A counting semaphore uses a count to allow it to be acquired or released multiple times.
When creating a counting semaphore, assign the semaphore a count that denotes the number of
semaphore tokens it has initially.
If the initial count is 0, the counting semaphore is created in the unavailable state. If the
count is greater than 0, the semaphore is created in the available state, and the number of tokens it
has equals its count, as shown in Figure 2.11.
Figure 2.11: The state diagram of a counting semaphore.
One or more tasks can continue to acquire a token from the counting semaphore until no
tokens are left. When all the tokens are gone, the count equals 0, and the counting semaphore moves
from the available state to the unavailable state. To move from the unavailable state back to the
available state, a semaphore token must be released by any task. Note that, as with binary
semaphores, counting semaphores are global resources that can be shared by all tasks that need
them. This feature allows any task to release a counting semaphore token. Each release operation
increments the count by one, even if the task making this call did not acquire a token in the first
place.
Some implementations of counting semaphores might allow the count to be bounded. A
bounded count is a count in which the initial count set for the counting semaphore, determined when
the semaphore was first created, acts as the maximum count for the semaphore. An unbounded count
allows the counting semaphore to count beyond the initial count to the maximum value that can be
held by the count s data type (e.g., an unsigned integer or an unsigned long value).
Mutual Exclusion (Mutex) Semaphores

A mutual exclusion (mutex) semaphore is a special binary semaphore that supports
ownership, recursive access, task deletion safety, and one or more protocols for avoiding problems
inherent to mutual exclusion. Figure 2.12 illustrates the state diagram of a mutex.
Figure 2.12: The state diagram of a mutual exclusion (mutex) semaphore.
As opposed to the available and unavailable states in binary and counting semaphores, the
states of a mutex are unlocked or locked (0 or 1, respectively). A mutex is initially created in the
unlocked state, in which it can be acquired by a task. After being acquired, the mutex moves to the
locked state. Conversely, when the task releases the mutex, the mutex returns to the unlocked state.
Note that some kernels might use the terms lock and unlock for a mutex instead of acquire and
release.
Depending on the implementation, a mutex can support additional features not found in
binary or counting semaphores. These key differentiating features include ownership, recursive
locking, task deletion safety, and priority inversion avoidance protocols.
Mutex Ownership
Ownership of a mutex is gained when a task first locks the mutex by acquiring it.
Conversely, a task loses ownership of the mutex when it unlocks it by releasing it. When a task
owns the mutex, it is not possible for any other task to lock or unlock that mutex. Contrast this
concept with the binary semaphore, which can be released by any task, even a task that did not
originally acquire the semaphore.
Recursive Locking
Many mutex implementations also support recursive locking , which allows the task that
owns the mutex to acquire it multiple times in the locked state. Depending on the implementation,
recursion within a mutex can be automatically built into the mutex, or it might need to be enabled
explicitly when the mutex is first created.
The mutex with recursive locking is called a recursive mutex . This type of mutex is most
useful when a task requiring exclusive access to a shared resource calls one or more routines that
also require access to the same resource. A recursive mutex allows nested attempts to lock the mutex
to succeed, rather than cause deadlock , which is a condition in which two or more tasks are blocked
and are waiting on mutually locked resources. The problem of recursion and deadlocks is discussed
later in this chapter, as well as later in this book.
As shown in Figure 12, when a recursive mutex is first locked, the kernel registers the task
that locked it as the owner of the mutex. On successive attempts, the kernel uses an internal lock
count associated with the mutex to track the number of times that the task currently owning the
mutex has recursively acquired it. To properly unlock the mutex, it must be released the same
number of times.
In this example, a lock count tracks the two states of a mutex (0 for unlocked and 1 for
locked), as well as the number of times it has been recursively locked (lock count > 1). In other
implementations, a mutex might maintain two counts: a binary value to track its state, and a separate
lock count to track the number of times it has been acquired in the lock state by the task that owns it.
Do not confuse the counting facility for a locked mutex with the counting facility for a
counting semaphore. The count used for the mutex tracks the number of times that the task owning
the mutex has locked or unlocked the mutex. The count used for the counting semaphore tracks the
number of tokens that have been acquired or released by any task. Additionally, the count for the
mutex is always unbounded, which allows multiple recursive accesses.
Task Deletion Safety
Some mutex implementations also have built-in task deletion safety. Premature task deletion
is avoided by using task deletion locks when a task locks and unlocks a mutex. Enabling this
capability within a mutex ensures that while a task owns the mutex, the task cannot be deleted.
Typically protection from premature deletion is enabled by setting the appropriate initialization
options when creating the mutex.
Priority Inversion Avoidance
Priority inversion commonly happens in poorly designed real-time embedded applications.
Priority inversion occurs when a higher priority task is blocked and is waiting for a resource being
used by a lower priority task, which has itself been preempted by an unrelated medium-priority task.
In this situation, the higher priority task’s priority level has effectively been inverted to the lower
priority task s level.
Enabling certain protocols that are typically built into mutexes can help avoid priority
inversion. Two common protocols used for avoiding priority inversion include:
• priority inheritance protocol ensures that the priority level of the lower priority task that
has acquired the mutex is raised to that of the higher priority task that has requested the mutex when
inversion happens.
The priority of the raised task is lowered to its original value after the task releases the mutex
that the higher priority task requires.
• ceiling priority protocol ensures that the priority level of the task that acquires the mutex
is automatically set to the highest priority of all possible tasks that might request that mutex when it
is first acquired until it is released.
When the mutex is released, the priority of the task is lowered to its original value.
12.4.2.3 Typical Semaphore Operations
Typical operations that developers might want to perform with the semaphores in an
application include:
• creating and deleting semaphores,
• acquiring and releasing semaphores,
• clearing a semaphore s task-waiting list, and
• getting semaphore information.
Creating and Deleting Semaphores
Several things must be considered when creating and deleting semaphores. If a kernel
supports different types of semaphores, different calls might be used for creating binary, counting,
and mutex semaphores, as follows:
• binary specify the initial semaphore state and the task-waiting order.
• counting specify the initial semaphore count and the task-waiting order.
• mutex specify the task-waiting order and enable task deletion safety, recursion, and
priority-inversion avoidance protocols, if supported.
Semaphores can be deleted from within any task by specifying their IDs and making
semaphore-deletion calls.
Deleting a semaphore is not the same as releasing it. When a semaphore is deleted, blocked
tasks in its task-waiting list are unblocked and moved either to the ready state or to the running state
(if the unblocked task has the highest priority). Any tasks, however, that try to acquire the deleted
semaphore return with an error because the semaphore no longer exists. Additionally, do not delete a
semaphore while it is in use (e.g., acquired). This action might result in data corruption or other
serious problems if the semaphore is protecting a shared resource or a critical section of code.
Acquiring and Releasing Semaphores
The operations for acquiring and releasing a semaphore might have different names,
depending on the kernel: for example, take and give , sm_p and sm_v , pend and post , and lock and
unlock . Regardless of the name, they all effectively acquire and release semaphores.
Tasks typically make a request to acquire a semaphore in one of the following ways:
• Wait forever task remains blocked until it is able to acquire a semaphore.
• Wait with a timeout task remains blocked until it is able to acquire a semaphore or until a
set interval of time, called the timeout interval , passes. At this point, the task is removed from the
semaphore’s task-waiting list and put in either the ready state or the running state.
• Do not wait task makes a request to acquire a semaphore token, but, if one is not available,
the task does not block.
Note that ISRs can also release binary and counting semaphores. Note that most kernels do
not support ISRs locking and unlocking mutexes, as it is not meaningful to do so from an ISR. It is
also not meaningful to acquire either binary or counting semaphores inside an ISR.
Any task can release a binary or counting semaphore; however, a mutex can only be released
(unlocked) by the task that first acquired (locked) it. Note that incorrectly releasing a binary or
counting semaphore can result in losing mutually exclusive access to a shared resource or in an I/O
device malfunction.
For example, a task can gain access to a shared data structure by acquiring an associated
semaphore. If a second task accidentally releases that semaphore, this step can potentially free a
third task waiting for that same semaphore, allowing that third task to also gain access to the same
data structure. Having multiple tasks trying to modify the same data structure at the same time
results in corrupted data.
Clearing Semaphore Task-Waiting Lists

To clear all tasks waiting on a semaphore task-waiting list, some kernels support a flush
operation.
The flush operation is useful for broadcast signaling to a group of tasks. For example, a
developer might design multiple tasks to complete certain activities first and then block while trying
to acquire a common semaphore that is made unavailable. After the last task finishes doing what it
needs to, the task can execute a semaphore flush operation on the common semaphore. This
operation frees all tasks waiting in the semaphore’s task waiting list. The synchronization scenario
just described is also called thread rendezvous, when multiple tasks executions need to meet at some
point in time to synchronize execution control.
Getting Semaphore Information
At some point in the application design, developers need to obtain semaphore information
(Show general information about semaphore, Get a list of IDs of tasks that are blocked on a
semaphore) to perform monitoring or debugging
These operations are relatively straightforward but should be used judiciously, as the
semaphore information might be dynamic at the time it is requested.
12.5 Services
Along with objects, most kernels provide services that help developers create applications for
real-time embedded systems. These services comprise sets of API calls that can be used to perform
operations on kernel objects or can be used in general to facilitate timer management, interrupt
handling, device I/O, and memory management. Again, other services might be provided; these
services are those most commonly found in RTOS kernels.

Sisteme Integrate Ver5

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sisteme Integrate Ver5

Uploaded by

Copyright:

Available Formats

2009 – 2011

1.1 What is an embedded system ? ---------------------------------------------------------------------------------6

1.2 Microprocessor and Microcontroller Architectures --------------------------------------------------------7

1.3 Microprocessor/Microcontroller Basics --------------------------------------------------------------------- 11

1.3.1 What is a microprocessor? .............................................................................................................. 11

1.3.1.2 Microprocessor Fundamentals ................................................................................................. 16

1.3.2 What is a microcontroller? .............................................................................................................. 20

1.3.3 Some differences between microprocessors and microcontrollers................................................ 21

1.4 Compiling, Linking, and Locating ------------------------------------------------------------------------------ 26

1.4.1 The Build Process ........................................................................................................................ 26

1.4.2 Compiling .................................................................................................................................... 28

1.4.3 Linking ......................................................................................................................................... 28

1.4.4 Locating ....................................................................................................................................... 30

1.4.5 Dowloading and Debugging ........................................................................................................ 30

1.4.6 Emulators .................................................................................................................................... 32

2. Fixed points vs. Floating point numbers. Fundamentals .............................................................................. 34

2.1 About Fixed-Point Numbers ----------------------------------------------------------------------------------------- 34

2.2.1 Quantization, Range and Precision ............................................................................................. 41

2.2.2 Recommendations for Arithmetic and Scaling ........................................................................... 46

3. Microcontroller CPU, Interupts, Memory, and I/O ...................................................................................... 49

3.1 CPU – Central Processing Unit -------------------------------------------------------------------------------------- 49

3.2 Interrupts ---------------------------------------------------------------------------------------------------------------- 50

3.1.1.1 Vectored Interrupts & Non-Vectored Interrupts ............................................................... 53

3.1.1.2 Interrupt Priority ................................................................................................................ 54

3.1.1.3 Serial communication with polling and interrupts ............................................................ 54

3.3 On-Chip Memory ------------------------------------------------------------------------------------------------------- 59

3.3.2 Random-Access Memory (RAM)...................................................................................................... 62

3.3.3 Hybrid Types .................................................................................................................................... 64

3.4.1 Study of External Peripherals .......................................................................................................... 67

3.4.1.1 Initialize the Hardware ....................................................................................................... 67

3.4.2 Peripheral devices ........................................................................................................................... 68

3.4.2.1 Control and Status Registers .................................................................................................... 68

3.4.2.2 The Device Driver Philosophy ................................................................................................... 69

6. Flip-Flops, Registers, Counters ..................................................................................................................... 75

6.1 Flip-Flops ----------------------------------------------------------------------------------------------------------------- 75

6.1.1 RS Flip-Flops ..................................................................................................................................... 75

6.1.1 Gated D latch ................................................................................................................................... 76

6.1.2 Master-Slave and Edge-Triggered D Flip-Flops................................................................................ 76

6.1.3 D Flip-Flops with Clear and Preset ................................................................................................... 78

6.1.4 T Flip-Flop ........................................................................................................................................ 80

6.1.5 T JK Flip-Flop .................................................................................................................................... 81

6.2 Registers ------------------------------------------------------------------------------------------------------------------ 82

6.2.1 Shift Register---------------------------------------------------------------------------------------------------------- 82

6.2.2 Parallel-Access Shift Register ------------------------------------------------------------------------------------- 83

6.3 Counters ------------------------------------------------------------------------------------------------------------------ 84

6.3.1 Asynchronous Counters................................................................................................................... 84

6.3.1.1 Up-Counter with T Flip-Flops .................................................................................................... 85

6.3.1.2 Down-Counter with T Flip-Flops ............................................................................................... 86

6.3.2 Synchronous Counters..................................................................................................................... 86

6.3.2.1 Synchronous Counter with T Flip-Flops .................................................................................... 86

6.3.2.2 Synchronous Counter with D Flip-Flops ................................................................................... 88

3.3.1.1 Reloading a timer ............................................................................................................... 93

3.3.1.2 Input Capture Timer ........................................................................................................... 93

3.3.1.3 Watchdog Timer................................................................................................................. 93

3.3.1.4 Using Timers ....................................................................................................................... 94

8. PWM Control ................................................................................................................................................ 95

8.1 Examples and description -------------------------------------------------------------------------------------------- 95

8.2 Concepts of Pulse Width Modulation (PWM) ----------------------------------------------------------------- 102

8.3 PWM Study ------------------------------------------------------------------------------------------------------------ 105

9. DAC and ADC .............................................................................................................................................. 112

9.1 Digital-to-Analog Converters (DAC)------------------------------------------------------------------------------ 112

9.2 Analog-to-Digital Converters (ADC)------------------------------------------------------------------------------ 113

9.2.1 Reference Voltage ......................................................................................................................... 113