Professional Documents
Culture Documents
FACULTY OF
AUTOMATION
EMBEDDED SYSTEMS
Cosmin Ionete
Dragos Surlea
Nicolae Neagu
Contents
1. Embedded Systems Architecture ................................................................................................................6
2.2 Scaling--------------------------------------------------------------------------------------------------------------------- 36
3.4 I/O-------------------------------------------------------------------------------------------------------------------------- 66
5. Decodificarea adreselor................................................................................................................................ 71
7. Timers/Counters ........................................................................................................................................... 92
Microcontroller Architectures
A. Princeton (Von Neumann) vs. Harvard
There are also differences in the basic CPU architectures used, and these tend to reflect the
application. Microprocessor based machines usually have a von Neumann architecture with a
single memory for both programs and data to allow maximum flexibility in allocation of
memory. Microcontroller chips, on the other hand, frequently embody the Harvard
architecture, which has separate memories for programs and data. Figure 1.1 illustrates this
difference.
Figure 1.1 - At left is the von Neumann architecture; at right is the Harvard architecture
Princeton architecture:
One advantage the Harvard architecture has for embedded applications is due to the two types of
memory used in embedded systems. A fixed program and constants can be stored in non-volatile
ROM memory while working variable data storage can reside in volatile RAM. Volatile memory
loses its contents when power is removed, but non-volatile ROM memory always maintains its
contents even after power is removed.
The Harvard architecture also has the potential advantage of a separate interface allowing
twice the memory transfer rate by allowing instruction fetches to occur in parallel with data
transfers. Unfortunately, in most Harvard architecture machines, the memory is connected to the
CPU using a bus that limits the parallelism to a single bus.
A typical embedded computer consists of the CPU, memory, and I/O. They are most often
connected by means of a shared bus for communication. The peripherals on a microcontroller chip
are typically timers, counters, serial or parallel data ports, and analog-to-digital and digital-to-analog
converters that are integrated directly on the chip. The performance of these peripherals is generally
less than that of dedicated peripheral chips, which are frequently used with microprocessor chips.
However, having the bus connections, CPU, memory, and I/O functions on one chip has several
advantages:
- Fewer chips are required since most functions are already present on the processor chip.
- Lower cost and smaller size result from a simpler design.
- Lower power requirements because on-chip power requirements are much smaller than
external loads.
- Fewer external connections are required because most are made on-chip, and most of the
chip connections can be used for I/O.
- More pins on the chip are available for user I/O since they aren’t needed for the bus.
- Overall reliability is higher since there are fewer components and interconnections.
Of course there are disadvantages too, including:
- Reduced flexibility since you can’t easily change the functions designed into the chip.
- Expansion of memory or I/O is limited or impossible.
- Limited data transfer rates due to practical size and speed limits for a single-chip.
- Lower performance I/O because of design compromises to fit everything on one chip.
The von Neumann machine, with only one memory, requires all instruction and data
transfers to occur on the same interface. This is sometimes referred to as the “von Neumann
bottleneck.” In common computer architectures, this is the primary upper limit to processor
throughput. The Harvard architecture has the potential advantage of a separate interface allowing
twice the memory transfer rate by allowing instruction fetches to occur in parallel with data
transfers. Unfortunately, in most Harvard architecture machines, the memory is connected to the
CPU using a bus that limits the parallelism to a single bus. The memory separation is still used to
advantage in microcontrollers, as the program is usually stored in non-volatile memory (program is
not lost when power is removed), and the temporary data storage is in volatile memory.
Non-volatile memories, such as read-only memory (ROM) are used in both types of systems
to store permanent programs. In a desktop PC, ROMs are used to store just the start-up or bootstrap
programs and hardware specific programs. Volatile random access memory (RAM) can be read and
written easily, but it loses its contents when power is removed. RAM is used to store both
application programs and data in PCs that need to be able to run many different programs. In a
dedicated embedded computer, however, the programs are stored permanently in ROM where they
will always be available. Microcontroller chips that are used
in dedicated applications generally use ROM for program storage and RAM for data storage.
Microcoded
Processor within a processor
Signals required to execute instructions "fetched" from internal "Control ROM" memory
Allows for great flexibility in instruction set
Easier to design
Slower than hardwired
Hardwired
Signals required to execute instruction generated by logic gates (combinational circuitry)
The "control matrix" is:
Faster
Less flexible
There are three groups of signals, or buses, that connect the CPU to the other major components.
The buses are:
- Data bus
- Address bus
- Control bus
• concepts of address and data is fundamental to the operation of the microprocessor
• memory -consists of locations uniquely identified by CPU through their address
• CPU communicates with those addresses to read and write the data
• the communications go via buses
• the CPU -responsible for control of address, data and control buses
• All devices attached to data bus -potential clash
• Devices connected to data buses can be driven to high-impedance states
• The ability of devices to set their output at either logic 1, logic 0 or in a high impedance state is an
essential feature of common bus systems and is termed a tristate device.
A. Data bus - to transfer the data associated with the processing function of the microprocessor. (8
lines, typically)
The data bus width is defined as the number of bits that can be transferred on the bus at one
time. This defines the processor’s “word size.” Many chip vendors define the word size based on the
width of an internal data bus. A processor with eight data bus pins is an 8-bit CPU. Both instructions
and data are transferred on the data bus one “word” at a time. This allows the re-use of the same
connections for many different types of information. Due to packaging limitations, the number of
connections or pins on a chip is limited. By sharing the pins in this way, the number of pins required
is reduced at the expense of increased complexity in the external circuits. Many processors also take
this a step further and share some or all of the data bus pins to carry address information as well.
This is referred to as a multiplexed address/data bus. Processors that have multiplexed address/data
buses require an external address latch to separate and hold the address information stable for the
duration of a data transfer. The processor controls the direction of data transfer on the data
bus(read/write).
B. Address bus - contains the address of a specific memory location for accessing (reading/writing)
stored data. 16, typically
The address bus is a set of wires that are used to point to the memory or I/O location that is
to be read from or written to. The address signals must generally be held at a constant value for
some period of time before, during, and after the data is transferred. In most cases, the processor
actively drives the address bus with either instruction or data addresses.
Memory Read and Write Cycles
• Hardware Control lines used by the CPU to Control reads and Writes to Memory
• Active low signal RD asserted for a Read Cycle
• Active Low signal WR indicates a write
• RD and WR signals supply timing information to memory device
Read cycle
• It lasts 2 cycles of the clock signal:
1. address of required memory location puton address bus (by CPU), at rising edge
2. while device held at ‘tristate’ level -control bus issues ‘read signal’ (active low) to the device
(2nd cycle begins)
3. after delay -valid data placed on data bus
4. levels on the data bus sampled by CPUat falling edgeof the 2nd cycle
Write cycle
1. CPU places address at rising edge
2. decoding logic selects correct device
3. 2nd cycle -rising edge: CPU outputs data onto data bus & sets WRITE control bus signal active
(LOW)
•Note:–memory devices & other I/O components have static logic -do not depend on clock signal-
read data from data bus when write signal high (inactive) - data must be valid for transition
C. Control bus - carries the control signals to the memory and the I/O devices. Arbitrary number,
often 15.
The control bus is an assortment of signals that determine what kind of information is on the
data bus and determines where the data will go, in conjunction with the address bus. Most of the
design process is concerned with the logic and timing of the control signals. The timing analysis is
primarily involved with the relative timing between these control signals and the appearance and
disappearance of data and addresses on their respective buses.
1.3.1.2 Microprocessor Fundamentals
The CPU
The ALU
• The arithmetic and logic unit (ALU) -responsible for data manipulation
• arithmetic operations, logic operations (AND, OR, XOR etc.)
• bit shifting, rotating, incrementing, decrementing, negate, complementing, addition etc.
Registers
• Registers –data/adressesthat CPU currently uses -stored in special memory (Small and fast)
locations on the CPU
• accumulator register-input to ALU is stored temp and sometime I/O operations. It may be 8, 16,
32 bits wide
• flags registeror status register–Individual bits in the register are called flags. Conditions of the
latest ALU operations are reflected. Used by subsequent jump, branch instructions
• general purpose register-temporary storage for data or addresses. Not assigned any specific task.
• program counter-tracks CPUs position in program. Width of the program counter is same as
address bus
• instruction register-stores instruction where it can be decoded; not accessible by the programmer
• index registers-hold the address of an operand when the indexed address mode is used
• stack pointer register-holds the address of the next memory location in the stack in RAM. Stack -
special area of RAM: last-in first-out (LIFO or FILO) file organisation. It is used during subroutine
calls andinteruppts
Types of registers:
Stack
• Part of memory where program data can be stored by a simple PUSH operation
• Restore data by a POP
• Stack is in main memory and is defined by the program
• Stack Pointer (SP) keeps track of the next location available on the Stack
• Organised as a FILO Buffer
General Registers
• Small set of internal registers -temporary data storage
• CU ensures that data from the correct register is presented to the CPU
• CU ensures that data is written back to correct register
• Accumulator usually holds ALU result
Status or Flags Register
• CF -Carry Flag
•1 -there is a carry out from the most significant bit
•0 -no carry out frommsb
• PF -Parity Flag
•1-low bye has an even number of 1 bits
•0 -low byte has odd parity
• AF -Auxiliary carry Flag
•1 -carry out from bit 3 on addition
•0 -borrow into bit 3 on addition
• ZF -Zero Flag
•1 -zero result
•0 -non-zero result
• SF -Sign Flag
•1 -msbis 1 (negative)
•0 -msbis 0 (positive)
• TF -Trap Flag
•Used by debuggers for single step operation
•1 -Trap on
•0 -Trap off
• IF -Interrupt Flag
•1 -Enabled
•0 -Disabled
• OF -Overflow Flag
•1 -signed overflow occurred
•0 -no overflow
The program status word (PSW) is an area of memory or a hardware register which contains
information about program state used by the operating system and the underlying hardware. It will
normally include a pointer (address) to the next instruction to be executed. The program status word
typically contains an error status field and condition codes such as the interrupt enable/disable bit
and a supervisor / user mode bit.
PSW
PSW contains information such as:
condition code bits (set by various comparison instructions)
CPU priority
mode:
user-mode: only a subset of instructions and features are accessible.
kernel-mode: all instructions and features are accessible.
The program status word (PSW) is 32 bits in length and contains the information required for proper
program execution. The PSW includes the instruction address, condition code, and other fields. In
general, the PSW is used to control instruction sequencing and to hold and indicate the status of the
system in relation to the program currently being executed. The active or controlling PSW is called
the current PSW. By storing the current PSW during an interruption, the status of the CPU can be
preserved for subsequent inspection. By loading a new PSW or part of a PSW, the state of the CPU
can be initialized or changed
Primary role: provide inexpensive, programmable logic control and interfacing to external devices
e.g., turns devices on/off, monitor external conditions
• A Timer module to allow the microcontroller to perform tasks for certain time
periods.
• Serial I/O (UART) for data flow between microcontroller and devices such as a PC or
other microcontroller.
• Analog input and output (e.g., to receive data from sensors or control motors)
• Interrupt capability (from a variety of sources)
• Bus/external memory interfaces (for RAM or ROM)
• Built-in monitor/debugger program
• Support for external peripherals (e.g., I/O and bus extenders)
A typical microcontroller; the different sub units integrated onto the microcontroller chip.
Instruction sets:
MP: processing intensive
powerful addressing modes
instructions to perform complex operations & manipulate large volumes of data
processing capability of MCs never approaches those of MPs
large instructions -- e.g., 80X86 7-byte long instructions
MC: cater to control of inputs and outputs
instructions to set/clear bits
boolean operations (AND, OR, XOR, NOT, jump if a bit is set/cleared), etc.
Extremely compact instructions, many implemented in one byte
(Control program must often fit in the small, on-chip ROM)
Instruction sets:
• The set of instructions given to the μP to execute a task is called an instruction set
• Generally, instructions can be classified into the following categories:
– Data transfer
– Arithmetic
– Logical
– Program control
• Differ depending on the manufacturer, but some are reasonably common to most μP's.
A. Data transfer
1. Load
• reads the content of a specified memory location and copies it to the specified register
location in the CPU
2. Store
• copies the current contents of a specified register into a specified memory location.
B. Arithmetic
3. Add
• Adds the contents of a specified memory location to the data in some register
4. Decrement
• subtracts 1 from the content of a specified location.
5. Compare
• indicates whether the contents of a register are greater than, less than or same as the
contents of a specified memory location. The result appears as a flag in the status register.
C. Logical
6. AND
• carries out the logical AND operation with the contents of a specified memory location and
the data in some register
7. OR
• carries out the logical OR operation with the contents of a specified memory location and
the data in some register
8. EXCLUSIVE OR-(similar to 6, but for exclusive OR)
9. Logical shift
• moving the pattern of bits in the register one place to the left or right by moving zero (0) to
the end of the number
10. Arithmetic shift
• moving the pattern of bits one place left/right but with copying of the end bit into the
vacancy created by shift
D. Program control
•11. Jump
• changes the sequence in which the program is executed. So the program counter jumps to
some specified location (other than sequential)
12. Branch
• a conditional instruction which might be 'branch if zero'or 'branch if plus'. It is followed
if the right conditions are met.
13. Halt
• stops all further microprocessor activities
Hardware & Instructionset support:
MC: built-in I/O operations, event timing, enabling & setting up priority levels
for interrupts caused by external stimuli
MP: usually require external circuitry to do similar things (e.g, 8255 PPI, 8254 PIT,
8259 PIC)
Bus widths:
MP: very wide
large memory address spaces (>4 Gbytes)
lots of data (Data bus: 32, 64, 128 bits wide)
MC: narrow
relatively small memory address spaces (typically kBytes)
less data (Data bus typically 4, 8, 16 bits wide)
Clock rates:
MP very fast (> 1 GHz)
MC: Relatively slow (typically 10-20 MHz)
since most I/O devices being controlled are relatively slow
Cost:
MP's expensive (often > $100)
MCs cheap (often $1 - $10)
4-bit: < $1.00
8-bit: $1.00 - $8.00
16-32-bit: $6.00 - $20.00
1.4 Compiling, Linking, and Locating
The process of converting the source code representation of your embedded software into an
executable binary image involves three distinct steps. First, each of the source files must be
compiled or assembled into an object file. Second, all of the object files that result from the first step
must be linked together to produce a single object file, called the relocatable program. Finally,
physical memory addresses must be assigned to the relative offsets within the relocatable program in
a process called relocation. The result of this third step is a file that contains an executable binary
image that is ready to be run on the embedded system.
The embedded software development process just described is illustrated in Figure below. In this
figure, the three steps are shown from top to bottom, with the tools that perform them shown in
boxes that have rounded corners. Each of these development tools takes one or more files as input
and produces a single output file. More specific information about these tools and the files they
produce is provided in the sections that follow.
Each of the steps of the embedded software build process is a transformation performed by
software running on a general-purpose computer. To distinguish this development computer
(usually a PC or Unix workstation) from the target embedded system, it is referred to as the
host computer. In other words, the compiler, assembler, linker, and locator are all pieces of
software that run on a host computer, rather than on the embedded system itself. Yet, despite
the fact that they run on some other computer platform, these tools combine their efforts to
produce an executable binary image that will execute properly only on the target embedded
system. This split of responsibilities is shown in Figure below.
1.4.2 Compiling
The job of a compiler is mainly to translate programs written in some human-readable language into
an equivalent set of opcodes for a particular processor. In that sense, an assembler is also a compiler
(you might call it an "assembly language compiler") but one that performs a much simpler one-to-
one translation from one line of human-readable mnemonics to the equivalent opcode. Everything in
this section applies equally to compilers and assemblers. Together these tools make up the first step
of the embedded software build process.
Of course, each processor has its own unique machine language, so you need to choose a compiler
that is capable of producing programs for your specific target processor. In the embedded systems
case, this compiler almost always runs on the host computer. It simply doesn't make sense to execute
the compiler on the embedded system itself. A compiler such as this-that runs on one computer
platform and produces code for another-is called a cross-compiler. The use of a cross-compiler is
one of the defining features of embedded software development.
Regardless of the input language (C/C++, assembly, or any other), the output of the cross-compiler
will be an object file. This is a specially formatted binary file that contains the set of instructions and
data resulting from the language translation process. Although parts of this file contain executable
code, the object file is not intended to be executed directly. In fact, the internal structure of an object
file emphasizes the incompleteness of the larger program.
The contents of an object file can be thought of as a very large, flexible data structure. The structure
of the file is usually defined by a standard format like the Common Object File Format (COFF) or
Extended Linker Format (ELF). If you'll be using more than one compiler (i.e., you'll be writing
parts of your program in different source languages), you need to make sure that each is capable of
producing object files in the same format. Although many compilers (particularly those that run on
Unix platforms) support standard object file formats like COFF and ELF ( gcc supports both), there
are also some others that produce object files only in proprietary formats. If you're using one of the
compilers in the latter group, you might find that you need to buy all of your other development
tools from the same vendor.
Most object files begin with a header that describes the sections that follow. Each of these sections
contains one or more blocks of code or data that originated within the original source file. However,
these blocks have been regrouped by the compiler into related sections. For example, all of the code
blocks are collected into a section called text, initialized global variables (and their initial values)
into a section called data, and uninitialized global variables into a section called bss.
There is also usually a symbol table somewhere in the object file that contains the names and
locations of all the variables and functions referenced within the source file. Parts of this table may
be incomplete, however, because not all of the variables and functions are always defined in the
same file. These are the symbols that refer to variables and functions defined in other source files.
And it is up to the linker to resolve such unresolved references.
1.4.3 Linking
All of the object files resulting from step one (compiling) must be combined in a special way before
the program can be executed. The object files themselves are individually incomplete, most notably
in that some of the internal variable and function references have not yet been resolved. The job of
the linker is to combine these object files and, in the process, to resolve all of the unresolved
symbols.
The output of the linker is a new object file that contains all of the code and data from the input
object files and is in the same object file format. It does this by merging the text, data, and bss
sections of the input files. So, when the linker is finished executing, all of the machine language
code from all of the input object files will be in the text section of the new file, and all of the
initialized and uninitialized variables will reside in the new data and bss sections, respectively.
While the linker is in the process of merging the section contents, it is also on the lookout for
unresolved symbols. For example, if one object file contains an unresolved reference to a variable
named foo and a variable with that same name is declared in one of the other object files, the linker
will match them up. The unresolved reference will be replaced with a reference to the actual
variable. In other words, if foo is located at offset 14 of the output data section, its entry in the
symbol table will now contain that address.
The GNU linker (ld ) runs on all of the same host platforms as the GNU compiler. It is essentially a
command-line tool that takes the names of all the object files to be linked together as arguments. For
embedded development, a special object file that contains the compiled startup code must also be
included within this list.
Startup Code
One of the things that traditional software development tools do automatically is to insert startup
code. Startup code is a small block of assembly language code that prepares the way for the
execution of software written in a high-level language.
Each high-level language has its own set of expectations about the runtime environment. For
example, C and C++ both utilize an implicit stack. Space for the stack has to be allocated and
initialized before software written in either language can be properly executed. That is just one of
the responsibilities assigned to startup code for C/C++ programs.
Most cross-compilers for embedded systems include an assembly language file called startup.asm,
crt0.s (short for C runtime), or something similar. The location and contents of this file are usually
described in the documentation supplied with the compiler.
Startup code for C/C++ programs usually consists of the following actions, performed in the order
described:
1. Disable all interrupts.
2. Copy any initialized data from ROM to RAM.
3. Zero the uninitialized data area.
4. Allocate space for and initialize the stack.
5. Initialize the processor's stack pointer.
6. Create and initialize the heap.
7. Execute the constructors and initializers for all global variables (C++ only).
8. Enable interrupts.
9. Call main.
Typically, the startup code will also include a few instructions after the call to main.
These instructions will be executed only in the event that the high-level language program exits (i.e.,
the call to main returns). Depending on the nature of the embedded system, you might want to use
these instructions to halt the processor, reset the entire system, or transfer control to a debugging
tool.
Because the startup code is not inserted automatically, the programmer must usually assemble it
himself and include the resulting object file among the list of input files to the linker. He might even
need to give the linker a special command-line option to prevent it from inserting the usual startup
code.
If the same symbol is declared in more than one object file, the linker is unable to proceed. It will
likely appeal to the programmer-by displaying an error message-and exit. However, if symbol
reference instead remains unresolved after all of the object files have been merged, the linker will try
to resolve the reference on its own. The reference might be to a function that is part of the standard
library, so the linker will open each of the libraries described to it on the command line (in the order
provided) and examine their symbol tables. If it finds a function with that name, the reference will
be resolved by including the associated code and data sections within the output object file.
After merging all of the code and data sections and resolving all of the symbol references, the linker
produces a special "relocatable" copy of the program. In other words, the program is complete
except for one thing: no memory addresses have yet been assigned to the code and data sections
within. If you weren't working on an embedded system, you'd be finished building your software
now.
But embedded programmers aren't generally finished with the build process at this point.
Even if your embedded system includes an operating system, you'll probably still need an absolutely
located binary image. In fact, if there is an operating system, the code and data of which it consists
are most likely within the relocatable program too. The entire embedded application-including the
operating system-is almost always statically linked together and executed as a single binary image.
1.4.4 Locating
The tool that performs the conversion from relocatable program to executable binary image is called
a locator. It takes responsibility for the easiest step of the three. In fact, you will have to do most of
the work in this step yourself, by providing information about the memory on the target board as
input to the locator. The locator will use this information to assign physical memory addresses to
each of the code and data sections within the relocatable program. It will then produce an output file
that contains a binary memory image that can be loaded into the target ROM.
In many cases, the locator is a separate development tool. However, in the case of the GNU tools,
this functionality is built right into the linker. Try not to be confused by this one particular
implementation. Whether you are writing software for a general-purpose computer or an embedded
system, at some point the sections of your relocatable program must have actual addresses assigned
to them. In the first case, the operating system does it for you at load time. In the second, you must
perform the step with a special tool. This is true even if the locator is a part of the linker.
The memory information required by the GNU linker can be passed to it in the form of a linker
script. Such scripts are sometimes used to control the exact order of the code and data sections
within the relocatable program.
The debug monitor resides in ROM-having been placed there in the manner described earlier (either
by you or at the factory)-and is automatically started whenever the target processor is reset. It
monitors the communications link to the host computer and responds to requests from the remote
debugger running there. Of course, these requests and the monitor's responses must conform to some
predefined communications protocol and are typically of a very low-level nature. Examples of
requests the remote debugger can make are "read register x," "modify register y," "read n bytes of
memory starting at address," and "modify the data at address." The remote debugger combines
sequences of these low-level commands to accomplish high-level debugging tasks like downloading
a program, single-stepping through it, and setting breakpoints.
Communication between the frontend and the debug monitor is byte-oriented and designed for
transmission over a serial connection, RS232 or USB.
Remote debuggers are one of the most commonly used downloading and testing tools during
development of embedded software. This is mainly because of their low cost. Embedded software
developers already have the requisite host computer. In addition, the price of a remote debugger
frontend does not add significantly to the cost of a suite of cross-development tools (compiler,
linker, locator, etc.). Finally, the suppliers of remote debuggers often desire to give away the source
code for their debug monitors, in order to increase the size of their installed user base.
As shipped, the Keil board includes a free debug monitor in Flash memory. Together with host
software provided by Arcom, this debug monitor can be used to download programs directly into
target RAM and execute them.
1.4.6 Emulators
Remote debuggers are helpful for monitoring and controlling the state of embedded software, but
only an in-circuit emulator (ICE) allows you to examine the state of the processor on which that
program is running. In fact, an ICE actually takes the place of - or emulates - the processor on your
target board. It is itself an embedded system, with its own copy of the target processor, RAM, ROM,
and its own embedded software. As a result, in-circuit emulators are usually pretty expensive-often
more expensive than the target hardware. But they are a powerful tool, and in a tight debugging spot
nothing else will help you get the job done better.
Like a debug monitor, an emulator uses a remote debugger for its human interface. In some cases, it
is even possible to use the same debugger frontend for both. But because the emulator has its own
copy of the target processor it is possible to monitor and control the state of the processor in real
time. This allows the emulator to support such powerful debugging features as hardware breakpoints
and real-time tracing, in addition to the features provided by any debug monitor.
With a debug monitor, you can set breakpoints in your program. However, these software
breakpoints are restricted to instruction fetches-the equivalent of the command "stop execution if
this instruction is about to be fetched." Emulators, by contrast, also support hardware breakpoints.
Hardware breakpoints allow you to stop execution in response to a wide variety of events. These
events include not only instruction fetches, but also memory and I/O reads and writes, and
interrupts. For example, you might set a hardware breakpoint on the event "variable foo contains 15
and register AX becomes 0."
Another useful feature of an in-circuit emulator is real-time tracing. Typically, an emulator
incorporates a large block of special-purpose RAM that is dedicated to storing information about
each of the processor cycles that are executed. This feature allows you to see in exactly what order
things happened, so it can help you answer questions, such as, did the timer interrupt occur before or
after the variable bar became 94? In addition, it is usually possible to either restrict the information
that is stored or post-process the data prior to viewing it in order to cut down on the amount of trace
data to be examined.
ROM Emulators
One other type of emulator is worth mentioning at this point. A ROM emulator is a device that
emulates a read-only memory device. Like an ICE, it is an embedded system that connects to the
target and communicates with the host. However, this time the target connection is via a ROM
socket. To the embedded processor, it looks like any other read-only memory device. But to the
remote debugger, it looks like a debug monitor.
ROM emulators have several advantages over debug monitors. First, no one has to port the debug
monitor code to your particular target hardware. Second, the ROM emulator supplies its own serial
or network connection to the host, so it is not necessary to use the target's own, usually limited,
resources. And finally, the ROM emulator is a true replacement for the original ROM, so none of the
target's memory is used up by the debug monitor code.
By far, the biggest disadvantage of a simulator is that it only simulates the processor. And embedded
systems frequently contain one or more other important peripherals. Interaction with these devices
can sometimes be imitated with simulator scripts or other workarounds, but such workarounds are
often more trouble to create than the simulation is valuable. So you probably won't do too much with
the simulator once you have the actual embedded hardware available to you.
Once you have access to your target hardware-and especially during the hardware debugging-logic
analyzers and oscilloscopes can be indispensable debugging tools. They are most useful for
debugging the interactions between the processor and other chips on the board.
Because they can only view signals that lie outside the processor, however, they cannot control the
flow of execution of your software like a debugger or an emulator can. This makes these tools
significantly less useful by themselves. But coupled with a software debugging tool like a remote
debugger or an emulator, they can be extremely valuable.
An oscilloscope is another piece of laboratory equipment for hardware debugging. But this one is
used to examine any electrical signal, analog or digital, on any piece of hardware.
Oscilloscopes are sometimes useful for quickly observing the voltage on a particular pin or, in the
absence of a logic analyzer, for something slightly more complex. However, the number of inputs is
much smaller (there are usually about four) and advanced triggering logic is not often available. As
a result, it'll be useful to you only rarely as a software debugging tool.
Most of the debugging tools described in this chapter will be used at some point or another in every
embedded project. Oscilloscopes and logic analyzers are most often used to debug hardware
problems - simulators during early stages of the software development, and debug monitors and
emulators during the actual software debugging. To be most effective, you should understand what
each tool is for and when and where to apply it for the greatest impact.
Programming
Generally done in either the core's native assembly language or C
Sometimes HLL support (often BASIC) is available
Assemblers/Linkers often supplied free by the micro's manufacturer
C compilers vary from free and very buggy to very expensive and only moderately buggy
Environments generally not friendly or reliable
Downloading
Program development usually done on a PC
Software tools must produce a file to download to the MC's EPROM
Several standard formats (e.g., binary, hex)
EPROM burner often necessary
Can download program to an EPROM emulator
But to reprogram, must us an UV erasor first
Flash memory programmers make this easier
Very easy to reprogram with inexpensive "in-circuit debugger"
Interacts with MC via 3 pins + power + ground
Or can be programmed/debugged with a resident monitor program
on-chip UART for communications with PC
No burner or UV erasor needed
No expensive quartz window required
Expedites program-test-erase-reprogram code development cycle
Monitor
A program module that communicates with PC software
Typically uses a serial port to talk to a PC's terminal program
Capabilities vary widely
Usually can send/receive text and ASCII-converted numbers
Often has commands to examine/change registers, memory locations, I/O ports
Fixed-point numbers are stored in data types that are characterized by their word size in bits, binary
point, and whether they are signed or unsigned. The Simulink® Fixed Point™ software supports
integers, fractionals, and generalized fixed-point numbers. The main difference among these data
types is their default binary point.
A common representation of a binary fixed-point number (either signed or unsigned) is shown in the
following figure.
where
* The most significant bit (MSB) is the leftmost bit, and is represented by location bws – 1.
* The least significant bit (LSB) is the rightmost bit, and is represented by location b0.
* The binary point is shown four places to the left of the LSB.
Computer hardware typically represents the negation of a binary fixed-point number in three
different ways: sign/magnitude, one's complement, and two's complement. Two's complement is the
preferred representation of signed fixed-point numbers and is supported by the Simulink Fixed Point
software.
Negation using two's complement consists of a bit inversion (translation into one's complement)
followed by the addition of a one. For example, the two's complement of 000101 is 111011.
Whether a fixed-point value is signed or unsigned is usually not encoded explicitly within the binary
word; that is, there is no sign bit. Instead, the sign information is implicitly defined within the
computer architecture.
The binary point is the means by which fixed-point numbers are scaled. It is usually the software
that determines the binary point. When performing basic math functions such as addition or
subtraction, the hardware uses the same logic circuits regardless of the value of the scale factor. In
essence, the logic circuits have no knowledge of a scale factor. They are performing signed or
unsigned fixed-point binary algebra as if the binary point is to the right of b0.
Within the Simulink Fixed Point software, the main difference between fixed-point data types is the
default binary point. For integers and fractionals, the binary point is fixed at the default value. For
generalized fixed-point data types, you must either explicitly specify the scaling by configuring
dialog box parameters, or inherit the scaling from another block. The sections that follow describe
the supported fixed-point data types.
Integers
The default binary point for signed and unsigned integer data types is assumed to be just to the right
of the LSB. You specify unsigned and signed integers with the uint and sint functions, respectively.
Fractionals
The default binary point for unsigned fractional data types is just to the left of the MSB, while for
signed fractionals the binary point is just to the right of the MSB. If you specify guard bits, then they
lie to the left of the binary point. You specify unsigned and signed fractional numbers with the ufrac
and sfrac functions, respectively.
For signed and unsigned generalized fixed-point numbers, there is no default binary point. You
specify unsigned and signed generalized fixed-point numbers with the ufix and sfix functions,
respectively.
Note: You can also use the fixdt function to create integer, fractional, and generalized fixed-point
objects.
2.2 Scaling
The dynamic range of fixed-point numbers is much less than that of floating-point numbers with
equivalent word sizes. To avoid overflow conditions and minimize quantization errors, fixed-point
numbers must be scaled.
With the Simulink Fixed Point software, you can select a fixed-point data type whose scaling is
defined by its default binary point, or you can select a generalized fixed-point data type and choose
an arbitrary linear scaling that suits your needs. This section presents the scaling choices available
for generalized fixed-point data types.
where
* B is the bias.
* F is the fractional slope. It is normalized such that 1 F 2 With the fractional slope, we can use
as value for the LSB of the number representation a value 2 E S 2 E+1 . The slope is the value
assigned to the LSB of the representation.
Note: S and B are constants and do not show up in the computer hardware directly—only the
quantization value Q is stored in computer memory (as a variable).
Binary-Point-Only Scaling
As the name implies, binary-point-only (or power-of-two) scaling involves moving only the binary
point within the generalized fixed-point word. The advantage of this scaling mode is that the number
of processor arithmetic operations is minimized.
With binary-point-only scaling, the components of the general [Slope Bias] formula have these
values:
*F=1
* S = 2E
*B=0
That is, the scaling of the quantized real-world number is defined only by the slope S, which is
restricted to a power of two.
In the Simulink Fixed Point software, you specify binary-point-only scaling with the syntax 2^-E
where E is unrestricted. This creates a MATLAB® structure with a bias B = 0 and a fractional slope
F = 1.0. For example, the syntax 2^-10 defines a scaling such that the binary point is at a location 10
places to the left of the least significant bit.
When you scale by slope and bias, the slope S and bias B of the quantized real-world number can
take on any value. You specify scaling by slope and bias with the syntax [slope bias], which creates
a MATLAB structure with the given slope and bias. For example, a [Slope Bias] scaling specified
by [5/9 10] defines a slope of 5/9 and a bias of 10. The slope must be a positive number.
5 2 5 1 10 −1
= = 2 = F 2E
9 9 2 9
Examples:
The number x is converted to a signed, 10-bit generalized fixed-point data type with binary-point-
only scaling of 2-7 (that is, the binary point is located seven places to the left of the rightmost bit).
0.033333 0.
0.066666 0.0
0.133332 0.00
0.266664 0.000
0.533328 0.0000
1.066656 ->0.066656 0.00001
0.133312 0.000010
0.266624 0.0000100
0.533248 0.00001000
0.132992 0.0000100010
etc
We use 10-bit generalized fixed-point data type with binary-point-only scaling of 2-7, so we use 7
bits for fractional representation and 3 bits for signed integer:
0.33333 0.
0.66666 0.0
0.66664 0.010
0.66656 0.01010
0.66624 0.0101010
0.66496 0. 010101010
1.32992 0. 0101010101
etc
0.0033333 0.
0.0066666 0.0
0.0133332 0.00
0.0266664 0.000
0.0533328 0.0000
0.1066656 0.00000
0.2133312 0.000000
0.4266624 0.0000000
0.8533248 0.00000000
etc
x = 0.0000000 /(011...) ~= 0
fi(v,s,w,f) returns a fixed-point object with value v, signedness s, word length w, and fraction length
f.
fi(0.33333,1,10,7, 'RoundMode','floor')
ans =
0.328125000000000
Signed: true
WordLength: 10
FractionLength: 7
RoundMode: floor
OverflowMode: saturate
ProductMode: FullPrecision
MaxProductWordLength: 128
SumMode: FullPrecision
MaxSumWordLength: 128
CastBeforeSum: true
>> 1/4+1/16+1/64
ans =
0.328125000000000
fi(0.33333,1,10,7)
ans =
0.335937500000000
>> (1/4+1/16+1/64)+1/128
ans =
0.335937500000000
Another example:
>>x=2^-7
x=
0.007812500000000
>>round(m/x)*x
ans =
0 0 0 0
0 0 0 0
0 0 0 0
0.031250000000000 0 0 0
0.335937500000000 0.031250000000000 0 0
>>fi(m,0,10,7)
M=
>> fi(M,0,10,7)
ans =
0 0 0 0
0 0 0 0
0.0078 0 0 0
0.1016 0.0078 0 0
>>fi(M,0,10,7)*round(m(5,1)/x)*x
ans =
0 0 0 0
0 0 0 0
0.0026 0 0 0
0.0341 0.0026 0 0
The sections that follow describe the relationship between arithmetic operations and fixed-point
scaling, and offer some basic recommendations that may be appropriate for your fixed-point design.
For each arithmetic operation,
* The scaling of the result is automatically selected based on the scaling of the two inputs. In
other words, the scaling is inherited.
* Scaling choices are based on
In embedded systems, the scaling of variables at the hardware interface (the ADC or DAC) is fixed.
However for most other variables, the scaling is something you can choose to give the best design.
When scaling fixed-point variables, it is important to remember that
* Your scaling choices depend on the particular design you are simulating.
* There is no best scaling approach. All choices have associated advantages and disadvantages. It
is the goal of this section to expose these advantages and disadvantages to you.
From the previous analysis of fixed-point variables scaled within the general [Slope Bias] encoding
scheme, you can conclude
* Addition, subtraction, multiplication, and division can be very involved unless certain choices
are made for the biases and slopes.
* Binary-point-only scaling guarantees simpler math, but generally sacrifices some precision.
* Rounding and overflow handling schemes. You must make these decisions before an actual
fixed-point realization is achieved.
A. Quantization
Within the context of the general [Slope Bias] encoding scheme, the value of an unsigned fixed-
point quantity is given by
* S is given by Fx2E, where the scaling is unrestricted because the binary point does not have
to be contiguous with the word.
The formats for 8-bit signed and unsigned fixed-point values are shown in the following figure.
Note that you cannot discern whether these numbers are signed or unsigned data types merely by
inspection since this information is not explicitly encoded within the word.
The binary number 0011.0101 yields the same value for the unsigned and two's complement
representation because the MSB = 0. Setting B = 0 and using the appropriate weights, bit
multipliers, and scaling, the value is
Conversely, the binary number 1011.0101 yields different values for the unsigned and two's
complement representation since the MSB = 1.
Setting B = 0 and using the appropriate weights, bit multipliers, and scaling, the unsigned value is
Range
The range of representable numbers for an unsigned and two's complement fixed-point number of
size ws, scaling S, and bias B is illustrated in the following figure.
For both the signed and unsigned fixed-point numbers of any data type, the number of different bit
patterns is 2ws.
For example, if the fixed-point data type is an integer with scaling defined as S = 1 and B = 0, then
the maximum unsigned value is 2ws - 1, because zero must be represented. In two's complement,
negative numbers must be represented as well as zero, so the maximum value is 2ws - 1- 1.
Additionally, since there is only one representation for zero, there must be an unequal number of
positive and negative numbers. This means there is a representation for
Precision
The precision (scaling) of integer and fractional data types is specified by the default binary point.
For generalized fixed-point data types, the scaling must be explicitly defined as either [Slope Bias]
or binary-point-only. In either case, the precision is given by the slope.
The low limit, high limit, and default binary-point-only scaling for the supported fixed-point data
types discussed in Binary Point Interpretation are given in the following table. See Limitations on
Precision and Limitations on Range for more information.
The precision, range of signed values, and range of unsigned values for an 8-bit generalized fixed-
point data type with binary-point-only scaling follow. Note that the first scaling value (21) represents
a binary point that is not contiguous with the word.
The precision and range of signed and unsigned values for an 8-bit fixed-point data type using
[Slope Bias] scaling follow. The slope starts at a value of 1.25 and the bias is 1.0 for all slopes. Note
that the slope is the same as the precision.
The following table provides a key for various symbols that may appear in Simulink products to
indicate the data type and scaling of a fixed-point value.
2.2.2 Recommendations for Arithmetic and Scaling
Introduction
The sections that follow describe the relationship between arithmetic operations and fixed-point
scaling, and offer some basic recommendations that may be appropriate for your fixed-point design.
For each arithmetic operation,
* The scaling of the result is automatically selected based on the scaling of the two inputs. In
other words, the scaling is inherited.
In embedded systems, the scaling of variables at the hardware interface (the ADC or DAC) is fixed.
However for most other variables, the scaling is something you can choose to give the best design.
When scaling fixed-point variables, it is important to remember that
* Your scaling choices depend on the particular design you are simulating.
* There is no best scaling approach. All choices have associated advantages and disadvantages.
It is the goal of this section to expose these advantages and disadvantages to you.
Addition
These values are represented by the general [Slope Bias] encoding scheme described in Scaling:
In a fixed-point system, the addition of values results in finding the variable Qa:
* In general, there are two multiplications of a constant and a variable, two additions, and
some additional bit shifting.
In the process of finding the scaling of the sum, one reasonable goal is to simplify the calculations.
Simplifying the calculations should reduce the number of operations, thereby increasing execution
speed. The following choices can help to minimize the number of arithmetic operations:
* Set Fa = Fb or Fa = Fc. Either choice eliminates one of the two constant times variable
multiplications.
These equations appear to be equivalent. However, your choice of rounding and precision may make
one choice stand out over the other. To further simplify matters, you could choose Ea = Ec or Ea =
Eb. This will eliminate some bit shifting.
In the process of finding the scaling of the sum, one reasonable goal is maximum precision. You can
determine the maximum-precision scaling if the range of the variable is known. Example:
Maximizing Precision shows that you can determine the range of a fixed-point operation from and .
For a summation, you can determine the range from
In most cases the input and output word sizes are much greater than one, and the slope becomes
which depends only on the size of the input and output words. The corresponding bias is
The value of the bias depends on whether the inputs and output are signed or unsigned numbers.
If the inputs and output are all unsigned, then the minimum values for these variables are all zero
and the bias reduces to a particularly simple form:
If the inputs and the output are all signed, then the bias becomes
Binary-Point-Only Scaling
This scaling choice results in only one addition and some bit shifting. The avoidance of any
multiplications is a big advantage of binary-point-only scaling.
3. Microcontroller CPU, Interupts, Memory, and I/O
The interconnection between the CPU, memory, and I/O of the address and data buses is
generally a one-to-one connection. The hard part is designing the appropriate circuitry to adapt the
control signals present on each device to be compatible with that of the other devices. The most
basic control signals are generated by the CPU to control the data transfers between the CPU and
memory, and between the CPU and I/O devices. The four most common types of CPU controlled
data transfers are:
- CPU reads data/instructions from memory (memory read)
- CPU writes data to memory (memory write)
- CPU reads data from an input device (I/O read)
- CPU writes data to an output device (I/O write)
Registers
Registers are simply a combination of various flip-flops that can be used to temporarily store
data or to delay signals. A storage register is a form of fast programmable internal processor
memory usually used to temporarily store, copy, and modify operands that are immediately or
frequently used by the system. Shift registers delay signals by passing the signals between the
various internal flip-flops with every clock pulse.
Registers are made up of a set of flip-flops that can be activated either individually or as a
set. In fact, it is the number of flip-flops in each register that is actually used to describe a processor
(for example, a 32-bit processor has working registers that are 32 bits wide containing 32 flip-flops,
a 16-bit processor has working registers that are 16 bits wide containing 16 flipflops, and so on).
The number of flip-flops within these registers also determines the width of the data buses used in
the system
While ISA designs do not all use registers in the same way to process the data, storage
typically falls under one of two categories, either general purpose or special purpose. General
purpose registers can be used to store and manipulate any type of data determined by the
programmer, whereas special purpose registers can only be used in a manner specified by the ISA,
including holding results for specific types of computations, having predetermined flags (single bits
within a register that can act and be controlled independently), acting as counters (registers that can
be programmed to change states—that is, increment— asynchronously or synchronously after a
specified length of time), and controlling I/O ports (registers managing the external I/O pins
connected to the body of the processor and to board I/O). Shift registers are inherently special
purpose, because of their limited functionality.
The number of registers, the types of registers, and the size of the data that these registers can
store (8-bit, 16-bit, 32-bit, and so forth) varies depending on the CPU, according to the ISA
definitions. In the cycle of fetching and executing instructions, the CPU’s registers have to be fast,
so as to quickly feed data to the ALU, for example, and to receive data from the CPUs internal data
bus. Registers are also multi-ported so as to be able to both receive and transmit data to these CPU
components.
3.2 Interrupts
Now that you know the names and addresses of the memory and peripherals attached to the
processor, it is time to learn how to communicate with the latter. There are two basic communication
techniques: polling and interrupts. In either case, the processor usually issues some sort of
commands to the device-by way of the memory or I/O space-and waits for the device to complete
the assigned task. For example, the processor might ask a timer to count down from 1000 to 0. Once
the countdown begins, the processor is interested in just one thing: is the timer finished counting
yet?
If polling is used, then the processor repeatedly checks to see if the task has been completed.
This is analogous to the small child who repeatedly asks "are we there yet?" throughout a long trip.
Like the child, the processor spends a large amount of otherwise useful time asking the question and
getting a negative response. To implement polling in software, you need only create a loop that
reads the status register of the device in question.
The second communication technique uses interrupts.
An interrupt is an asynchronous electrical signal from a peripheral to the processor. When interrupts
are used, the processor issues commands to the peripheral exactly as before, but then waits for an
interrupt to signal completion of the assigned work. While the processor is waiting for the interrupt
to arrive, it is free to continue working on other things. When the interrupt signal is finally asserted,
the processor temporarily sets aside its current work and executes a small piece of software called
the interrupt service routine (ISR). When the ISR completes, the processor returns to the work that
was interrupted.
Of course, this isn't all automatic. The programmer must write the ISR himself and "install" and
enable it so that it will be executed when the relevant interrupt occurs. The first few times you do
this, it will be a significant challenge. But, even so, the use of interrupts generally decreases the
complexity of one's overall code by giving it a better structure. Rather than device polling being
embedded within an unrelated part of the program, the two pieces of code remain appropriately
separate.
On the whole, interrupts are a much more efficient use of the processor than polling. The processor
is able to use a larger percentage of its waiting time to perform useful work.
However, there is some overhead associated with each interrupt. It takes a good bit of time-relative
to the length of time it takes to execute an opcode-to put aside the processor's current work and
transfer control to the interrupt service routine. Many of the processor's registers must be saved in
memory, and lower-priority interrupts must be disabled. So in practice both methods are used
frequently. Interrupts are used when efficiency is paramount or multiple devices must be monitored
simultaneously. Polling is used when the processor must respond to some event more quickly than is
possible using interrupts.
DEFINITIONS
• Interrupt - Hardware-supported asynchronous transfer of control to an interrupt vector
• Interrupt Vector - Dedicated location in memory that specifies address execution jumps to
• Interrupt Handler - Code that is reachable from an interrupt vector
• Interrupt Controller - Peripheral device that manages interrupts for the processor
• Pending - Firing condition met and noticed but interrupt handler has not began to execute
• Interrupt Latency - Time from interrupt’s firing condition being met and start of execution of
interrupt handler
• Nested Interrupt - Occurs when one interrupt handler preempts another
• Reentrant Interrupt - Multiple invocations of a single interrupt handler are concurrently
active
An interrupt is an asynchronous signal from hardware indicating the need for attention or a
synchronous event in software indicating the need for a change in execution.
Hardware interrupts are triggered by a physical event, such as the closure of a switch, that
causes a specific subroutine to be called. They can be thought of as a sort of hardware initiated
subroutine call. They can and do occur at any time in the program, depending on when the event
occurs. These are referred to as asynchronous events because they may occur during the execution
of any part of the program. Interrupts allow the programs to respond to an event when it occurs.
A software interrupt is a special subroutine call. It is synchronous meaning that it always occurs at
the same time and place in the program that is interrupted. It is frequently used as a quick and simple
way to do a subroutine call for accessing programs such as the operating system and I/O programs.
Software interrupts are usually implemented as instructions in the instruction set, which cause a
context switch to an interrupt handler similar to a hardware interrupt.
Interrupts can be categorized into: maskable interrupt (IRQ), non-maskable interrupt (NMI),
interprocessor interrupt (IPI), software interrupt, and spurious interrupt.
- A maskable interrupt (IRQ) is a hardware interrupt that may be ignored by setting a bit in an
interrupt mask register's (IMR) bit-mask.
- Likewise, a non-maskable interrupt (NMI) is a hardware interrupt that does not have a bit-
mask associated with it - meaning that it can never be ignored. NMIs are often used for timers,
especially watchdog timers.
- An interprocessor interrupt is a special case of interrupt that is generated by one processor to
interrupt another processor in a multiprocessor system.
- A software interrupt is an interrupt generated within a processor by executing an instruction.
Software interrupts are often used to implement System calls because they implement a subroutine
call with a CPU ring level change.
- A spurious interrupt is a hardware interrupt that is unwanted. They are typically generated
by system conditions such as electrical interference on an interrupt line or through incorrectly
designed hardware.
An interrupt can notify the processor when an analog-to-digital converter (ADC) has new
data, when a timer rolls over, when a direct memory access (DMA) transfer is complete, when
another processor wants to communicate, or when almost any asynchronous event happens. The
interrupt hardware is initialized and programmed by the system software. When an interrupt is
acknowledged, that process is performed by hardware internal to the processor and the interrupt
controller integrated circuit (IC) (if any).
When an interrupt occurs, the on-chip hardware performs the following functions:
• It saves the program counter (the address the processor was executing when the
interrupt occurred) on the stack. Some processors save other information as well, such as register
contents.
• It executes an interrupt acknowledge cycle to get a vector from the interrupting peripheral,
depending on the processor and the specific type of interrupt.
• It branches to a predetermined address specific to that particular interrupt.
The destination address is the interrupt service routine (ISR, or sometimes ISP for interrupt
service process). The ISR performs whatever functions are required and then returns. When the
return code is executed, the processor performs the following tasks:
• It retrieves the return address and any other saved information from the stack.
• It resumes execution at the return address.
The return address, in nearly all cases, is the address that would have been executed next if
the interrupt had not occurred. If the implementation is correct the code that was interrupted will not
even know that an interrupt occurred. The hardware part of this process occurs at hardware speed-
microseconds or even tens of nanoseconds for a fast CPU with a high clock rate.
Re-entrant code or a re-entrant routine is code that can be interrupted at any point when
partially complete, then called by another process, and later return to the point where it was
interrupted to complete the original function without any errors. Non-re-entrant code, however,
cannot be interrupted and then called again without problems. An example of a program that is not
re-entrant is one that uses a fixed memory address to store a temporary result. If the program is
interrupted while the temporary variable is in use and then the routine is called again, the value in
the temporary variable would be changed. When execution returns to the point where it was
interrupted, the temporary variable will have the wrong value. In order to be re-entrant, a program
must keep a separate copy of all internal variables for each invocation. Re-entrant code is required
for any subroutines that must be available to more than one interrupt driven task.
Interrupts can be processed between execution of instructions by the CPU any time they are
enabled. Most CPUs check for the presence of an interrupt request at the end of every instruction. If
interrupts are enabled, the processor saves the contents of the program counter (PC) on the stack,
and loads the PC with the address of the ISR. Some CPUs allow certain instructions to be
interrupted when they take a long time to process, such as a block move instruction.
3.1.1.1 Vectored Interrupts & Non-Vectored Interrupts
Interrupt Map
Most embedded systems have only a handful of interrupts. Associated with each of these are an
interrupt pin (on the outside of the processor chip) and an ISR. In order for the processor to execute
the correct ISR, a mapping must exist between interrupt pins and ISRs. This mapping usually takes
the form of an interrupt vector table. The vector table is usually just an array of pointers to functions,
located at some known memory address. The processor uses the interrupt type (a unique number
associated with each interrupt pin) as its index into this array.
The value stored at that location in the vector table is usually just the address of the ISR to be
executed.
It is important to initialize the interrupt vector table correctly. (If it is done incorrectly, the ISR
might be executed in response to the wrong interrupt or never executed at all.) The first part of this
process is to create an interrupt map that organizes the relevant information. An interrupt map is a
table that contains a list of interrupt types and the devices to which they refer. This information
should be included in the documentation provided with the board.
In a vectored interrupt system, the interrupt request is accompanied by an identifier, referred to as a
vector or interrupt vector number that defines the source of the interrupt. The vector is a pointer that
is used as an index into a table known as the interrupt vector table. This table contains the addresses
of the ISRs that are to be executed when the corresponding interrupts are processed.
When a vectored interrupt is processed, the CPU goes through the following sequence of
events to begin execution of the ISR:
- After acknowledging the interrupt, the CPU receives the vector number.
- The CPU converts the vector into a memory address in the vector table.
- The ISR address is fetched from the vector table and placed in the program counter.
For example, when an external event occurs, the interrupting device activates the IRQ input
to the interrupt controller that then requests an interrupt cycle from the CPU. When the CPU
acknowledges the interrupt, the interrupt controller passes the vector number to the CPU. The CPU
converts the vector number to a memory address. This address points to the place in memory, which
in turn contains the address of ISR.
For systems with non-vectored interrupts, there is only one interrupt service routine entry
point, and the ISR code must determine what caused the interrupt if there are multiple interrupt
sources in the system. When an interrupt occurs a call to a fixed location is executed, and that begins
execution of the ISR. It is possible to have multiple interrupts pointing to the same ISR. The first act
of such an ISR is to determine which interrupt occurred and branch to the appropriate handler. Serial
I/O ports frequently have one vector for transmit and receive interrupts.
3.1.1.2 Interrupt Priority
There are a number of variations in the way interrupts can be handled by the processor.
These variations include how multiple interrupts are handled, if they can be turned off, and how they
are triggered. Some processors allow multiple (nested) interrupts, meaning the CPU can handle
multiple interrupts simultaneously. In other words, interrupts can interrupt interrupts. When multiple
interrupts are sent to the CPU, some method must be used to determine which is handled first. Here
are the most common prioritization schemes currently in use.
- Fixed (static) multi-level priority. This uses a priority encoder to assign priorities, with the
highest priority interrupt processed first. Nested interrupts allow an ISR itself to be interrupted by a
higher-priority device. Interrupts from lower-priority devices are ignored until the higher-priority
ISR is completed. This is the most common method of assigning priorities to interrupts.
- Variable (dynamic) multi-level priority. One problem with fixed priority is that one type of
event can “dominate” the CPU to the exclusion of other events. The solution is to rotate priority
each time an event occurs. This ensures that no interrupt gets “locked out” and all interrupts will
eventually be processed. This scheme is good for multi-user systems because eventually everyone
gets priority.
- Equal single-level priority. If an interrupt occurs with an interrupt, the new interrupt gains
control of the processor.
Depending on the interrupt strategy, the parts from the endless loop are structured differently.
Initialization part
Other initializations
Serial communication
initialization
Other initializations
Part 1
Part 2
Part 3
If we want to detail further the serial receive/ transmit parts, we can implement:
Initialization part
Other initializations
Wait loop
Serial communication
initialization Byte received?
No
Other initializations
Part 1
Part 2
Wait loop
Serial transmit part Byte transmited?
No
Part 3
Other initializations
Serial communication
initialization Byte received?
Other initializations
No
Yes
Take byte from receive buffer
Part 1
Part 2
Wait loop
Serial transmit part Byte transmited?
No
Part 3
The main drawback of this implementation is the fact that, during wait loops, some other program
parts need to be executed. For example, if we don’t receive a byte, the program will stay forever in
the receive loop. We can eliminate the wait loops from the receiving part, using the implementation:
Definition: An interrupt is an event external to the currently executing process that causes a change
in the normal flow of instruction execution;
An interrupt is usually generated by hardware devices external to the CPU (UART for example)
For example, from table below we can use serial communication in interrupt mode by setting EA = 1
and ES = 1. Then we have to write an interrupt service routine which is called by the interrupt.
For using the serial communication in polling mode, we have to set at least ES = 0 for disabling the
specific serial interrupt, or to disable all interrupts by setting EA = 0.
Initialization part HW for Serial communication
Other initializations
No (RI = = 1) ?
Serial communication Byte received ?
initialization (ES = 0)
Yes
Other initializations Yes
RI = 0
char X = SBUF RI = 1
Part 1
Byte transmited ?
(TI = = 1) ?
Part 2 No
Yes
Yes
Serial transmit part TI = 1
TI = 0
SBUF = char X
Part 3
Part 1
Generate transmit
Part 2
ISR interrupt
Transmit Byte transmited ?
TI = 0
Part 3 Yes
TI = 1
SBUF = char X
Types of Memory
Many types of memory devices are available for use in modern computer systems. As an embedded
software engineer, you must be aware of the differences between them and understand how to use
each type effectively. In our discussion, we will approach these devices from a software viewpoint.
As you are reading, try to keep in mind that the development of these devices took several decades
and that there are significant physical differences in the underlying hardware. The names of the
memory types frequently reflect the historical nature of the development process and are often more
confusing than insightful.
Most software developers think of memory as being either random-access (RAM) or read-only
(ROM). But, in fact, there are subtypes of each and even a third class of hybrid memories. In a RAM
device, the data stored at each memory location can be read or written, as desired. In a ROM device,
the data stored at each memory location can be read at will, but never written. In some cases, it is
possible to overwrite the data in a ROM-like device. Such devices are called hybrid memories
because they exhibit some of the characteristics of both RAM and ROM. Figures below provides a
classification system for the memory devices that are commonly found in embedded systems.
3.3.1 Read-Only Memory (ROM)
Types of ROM
Memories in the ROM family are distinguished by the methods used to write new data to them
(usually called programming) and the number of times they can be rewritten. This classification
reflects the evolution of ROM devices from hardwired to one-time programmable to erasable-and-
programmable. A common feature across all these devices is their ability to retain data and programs
forever, even during a power failure.
The very first ROMs were hardwired devices that contained a preprogrammed set of data or
instructions. The contents of the ROM had to be specified before chip production, so the actual data
could be used to arrange the transistors inside the chip! Hardwired memories are still used, though
they are now called "masked ROMs" to distinguish them from other types of ROM. The main
advantage of a masked ROM is a low production cost. Unfortunately, the cost is low only when
hundreds of thousands of copies of the same ROM are required.
One step up from the masked ROM is the PROM (programmable ROM), which is purchased in an
unprogrammed state. If you were to look at the contents of an unprogrammed PROM, you would see
that the data is made up entirely of 1's. The process of writing your data to the PROM involves a
special piece of equipment called a device programmer. The device programmer writes data to the
device one word at a time, by applying an electrical charge to the input pins of the chip. Once a
PROM has been programmed in this way, its contents can never be changed. If the code or data
stored in the PROM must be changed, the current device must be discarded. As a result, PROMs are
also known as one-time programmable (OTP) devices.
An EPROM (erasable-and-programmable ROM) is programmed in exactly the same manner as a
PROM. However, EPROMs can be erased and reprogrammed repeatedly. To erase an EPROM, you
simply expose the device to a strong source of ultraviolet light. (There is a "window" in the top of
the device to let the ultraviolet light reach the silicon.) By doing this, you essentially reset the entire
chip to its initial-unprogrammed-state. Though more expensive than PROMs, their ability to be
reprogrammed makes EPROMs an essential part of the software development and testing process.
On-chip ROM is memory integrated into a processor that contains data or instructions that remain
even when there is no power in the system, due to a small, longer-life battery, and therefore is
considered to be nonvolatile memory (NVM). The content of on-chip ROM usually can only be read
by the system it is used in.
The most common types of on-chip ROM include:
- MROM (mask ROM), which is ROM (with data content) that is permanently etched into the
microchip during the manufacturing of the processor, and cannot be modified later.
- PROMs (programmable ROM), or OTPs (one-time programmables), which is a type of
ROM that can be integrated on-chip, that is one-time programmable by a PROM programmer (in
other words, it can be programmed outside the manufacturing factory).
- EPROM (erasable programmable ROM), which is ROM that can be integrated on a
processor, in which content can be erased and reprogrammed more than once (the number of times
erasure and re-use can occur depends on the processor). The content of EPROM is written to the
device using special separate devices and erased, either selectively or in its entirety using other
devices that output intense ultraviolet light into the processor’s built-in window.
- EEPROM (electrically erasable programmable ROM), which, like EPROM, can be erased
and reprogrammed more than once. The number of times erasure and re-use can occur depends on
the processor. Unlike EPROMs, the content of EEPROM can be written and erased without using
any special devices while the embedded system is functioning.With EEPROMs, erasing must be
done in its entirety, unlike EPROMs, which can be erased selectively.
A cheaper and faster variation of the EEPROM is Flash memory. Where EEPROMs are written and
erased at the byte level, Flash can be written and erased in blocks or sectors (a group of bytes). Like
EEPROM, Flash can be erased while still in the embedded device.
As shown in Figure 1.5, DRAM memory cells are circuits with capacitors that hold a charge
in place (the charges or lack thereof reflecting data). DRAM capacitors need to be refreshed
frequently with power in order to maintain their respective charges, and to recharge capacitors after
DRAM is read (reading DRAM discharges the capacitor). The cycle of discharging and recharging
of memory cells is why this type of RAM is called dynamic.
Reading speed
Although the relative speed of RAM vs. ROM has varied over time, as of 2007 large RAM
chips can be read faster than most ROMs. For this reason (and to make for uniform access), ROM
content is sometimes copied to RAM or shadowed before its first use, and subsequently read from
RAM.
Writing speed
For those types of ROM that can be electrically modified, writing speed is always much
slower than reading speed, and it may require unusually high voltage, the movement of jumper plugs
to apply write-enable signals, and special lock/unlock command codes. Modern NAND Flash
achieves the highest write speeds of any rewritable ROM technology, with speeds as high as
15 MiB/s (or 70 ns/bit), by allowing (indeed requiring) large blocks of memory cells to be written
simultaneously.
As memory technology has matured in recent years, the line between RAM and ROM devices has
blurred. There are now several types of memory that combine the best features of both.
These devices do not belong to either group and can be collectively referred to as hybrid memory
devices. Hybrid memories can be read and written as desired, like RAM, but maintain their contents
without electrical power, just like ROM. Two of the hybrid devices, EEPROM and Flash, are
descendants of ROM devices; the third, NVRAM, is a modified version of SRAM.
EEPROMs are electrically-erasable-and-programmable. Internally, they are similar to EPROMs, but
the erase operation is accomplished electrically, rather than by exposure to ultraviolet light. Any
byte within an EEPROM can be erased and rewritten. Once written, the new data will remain in the
device forever-or at least until it is electrically erased. The tradeoff for this improved functionality is
mainly higher cost. Write cycles are also significantly longer than writes to a RAM, so you wouldn't
want to use an EEPROM for your main system memory.
Flash memory is the most recent advancement in memory technology. It combines all the best
features of the memory devices described thus far. Flash memory devices are high density, low cost,
nonvolatile, fast (to read, but not to write), and electrically reprogrammable. These advantages are
overwhelming and the use of Flash memory has increased dramatically in embedded systems as a
direct result. From a software viewpoint, Flash and EEPROM technologies are very similar. The
major difference is that Flash devices can be erased only one sector at a time, not byte by byte.
Typical sector sizes are in the range of 256 bytes to 16 kilobytes. Despite this disadvantage, Flash is
much more popular than EEPROM and is rapidly displacing many of the ROM devices as well.
The third member of the hybrid memory class is NVRAM (nonvolatile RAM). Nonvolatility is also
a characteristic of the ROM and hybrid memories discussed earlier. However, an NVRAM is
physically very different from those devices. An NVRAM is usually just an SRAM with a battery
backup. When the power is turned on, the NVRAM operates just like any other SRAM. But when
the power is turned off, the NVRAM draws just enough electrical power from the battery to retain
its current contents. NVRAM is fairly common in embedded systems. However, it is very
expensive-even more expensive than SRAM-so its applications are typically limited to the storage of
only a few hundred bytes of system-critical information that cannot be stored in any better way.
Memory Management
Goals:
Protect the programs from each other, and the kernel from the programs.
Perform relocation
Relocation:
User program thinks it has the whole address space from address 0x0 to 0xffffffff.
Really it only has a part of the physical memory.
Need to map virtual address into a physical address.
This is performed by the MMU (Memory Management Unit), but the OS must configure it.
Context Switching
Whenever execution switches between a user program and the OS, a context switch occurs. The
operating system must now:
Save the PC, Stack Pointer, PSW.
Save the contents of the registers.
Reprogram the MMU registers.
Wait while the instructions in the CPU pipeline are trashed.
Wait for cache lines to load from new program’s memory.
Context switching is pretty expensive.
3.4 I/O
The entire point of an embedded microprocessor is to monitor or control some real-world
event. To do this, the microprocessor must have I/O capability. Like a desktop computer without a
monitor, printer, or keyboard, an embedded microprocessor without I/O is just a paperweight. The
I/O from an embedded control system falls into two broad categories: digital and analog. However,
at the microprocessor level, all I/O is digital. (Some microprocessor ICs have built-in ADCs, but the
processor itself still works with digital values.) The simplest form of I/O is a register that the
microprocessor can write to or a buffer that it can read.
Most of the peripherals require the use of a certain set of pins on the processor. In many
cases, the majority of those pins can be used for their specific function (serial port receiver, timer
output, DMA control signal, etc.), or they can be programmed to just act as a simple input or output
pin (PIO). This flexibility allows the silicon to be configured based on the needs of the design. For
example, if you don’t need two serial ports (and the processor comes with two), then the pins that
are allocated to the second port (RX2, TX2, and maybe DTR2, CTS2, etc…) can be programmed to
function as simple PIO pins and used to drive an LED or read a switch. Programmable pins are
sometimes referred to as dual function. Note that this dual functionality should not be assumed. How
each pin is configured and the ability to configure it to run in different modes is dependent on the
processor implementation.Often a pin name is chosen to reflect the pin’s dual personality. For
example if RX2 can be configured as a serial port 2 receiver or as a PIO pin, then it will probably be
labeled as RX2/PION (or something similar), where N is some number between one and M, and M
is the number of PIO pins on the processor.Some microprocessors may be advertised as having a set
of features but actually provide these features on dual-function pins. Hence, the full set of advertised
features (two serial ports and 32 PIO lines) may not be simultaneously available (because the pins
used for the second serial port are dual-functioned as PIO lines.
3.4.1 Study of External Peripherals
At this point, you've studied every aspect of the new hardware except the external peripherals.
These are the hardware devices that reside outside the processor chip and communicate with it by
way of interrupts and I/O or memory-mapped registers.
Begin by making a list of the external peripherals. Depending on your application, this list might
include LCD or keyboard controllers, A/D converters, network interface chips, or custom ASICs
(Application-Specific Generated Circuits). In the case of the Arcom board, the list contains just three
items: the Zilog 85230 Serial Controller, parallel port, and debugger port.
You should obtain a copy of the user's manual or databook for each device on your list. At this early
stage of the project, your goal in reading these documents is to understand the basic functions of the
device. What does the device do? What registers are used to issue commands and receive the
results? What do the various bits and larger fields within these registers mean?
When, if ever, does the device generate interrupts? How are interrupts acknowledged or cleared at
the device?
When you are designing the embedded software, you should try to break the program down along
device lines. It is usually a good idea to associate a software module called a device driver with each
of the external peripherals. This is nothing more than a collection of software routines that control
the operation of the peripheral and isolate the application software from the details of that particular
hardware device.
The final step in getting to know your new hardware is to write some initialization software.
This is your best opportunity to develop a close working relationship with the hardware-especially if
you will be developing the remainder of the software in a high-level language. During hardware
initialization it will be impossible to avoid using assembly language. However, after completing this
step, you will be ready to begin writing small programs in C or C++.
The hardware initialization should be executed before the startup code.
The code described there assumes that the hardware has already been initialized and concerns itself
only with creating a proper runtime environment for high-level language programs.
Figure below provides an overview of the entire initialization process, from processor reset through
hardware initialization and C/C++ startup code to main.
The first stage of the initialization process is the reset code. This is a small piece of assembly
(usually only two or three instructions) that the processor executes immediately after it is powered
on or reset. The sole purpose of this code is to transfer control to the hardware initialization routine.
The first instruction of the reset code must be placed at a specific location in memory, usually called
the reset address, that is specified in the processor databook. Most of the actual hardware
initialization takes place in the second stage. At this point, we need to inform the processor about its
environment. This is also a good place to initialize the interrupt controller and other critical
peripherals. Less critical hardware devices can be initialized when the associated device driver is
started, usually from within main.
Intel's 8051/80251 has several internal registers that must be programmed before any useful work
can be done with the processor. These registers are responsible for setting up the memory and I/O
maps and are part of the processor's internal chip-select unit. By programming the chip-select
registers, you are essentially waking up each of the memory and I/O devices that are connected to
the processor. Each chip-select register is associated with a single "chip enable" wire that runs from
the processor to some other chip. The association between particular chip-selects and hardware
devices must be established by the hardware designer. All you need to do is get a list of chip-select
settings from him and load those settings into the chip-select registers.
The third initialization stage contains the startup code, its job is to the prepare the way for code
written in a high-level language. Of importance here is only that the startup code calls main. From
that point forward, all of your other software can be written in C or C++.
3.4.2 Peripheral devices
In addition to the processor and memory, most embedded systems contain a handful of other
hardware devices. Some of these devices are specific to the application domain, whileothers-like
timers and serial ports-are useful in a wide variety of systems. The most generically useful of these
are often included within the same chip as the processor and are called internal, or on-chip,
peripherals. Hardware devices that reside outside the processor chip are, therefore, said to be
external peripherals. In this chapter we'll discuss the most common software issues that arise when
interfacing to a peripheral of either type.
3.4.2.1 Control and Status Registers
The basic interface between an embedded processor and a peripheral device is a set of control and
status registers. These registers are part of the peripheral hardware, and their locations, size, and
individual meanings are features of the peripheral. For example, the registers within a serial
controller are very different from those in a timer/counter. In this section, I'll describe how to
manipulate the contents of these control and status registers directly from your C/C++ programs.
Depending upon the design of the processor and board, peripheral devices are located either in the
processor's memory space or within the I/O space. In fact, it is common for embedded systems to
include some peripherals of each type. These are called memory-mapped and I/O-mapped
peripherals, respectively. Of the two types, memory-mapped peripherals are generally easier to work
with and are increasingly popular.
Memory-mapped control and status registers can be made to look just like ordinary variables. To
do this, you need simply declare a pointer to the register, or block of registers, and set the value of
the pointer explicitly.
Note, however, that there is one very important difference between device registers and ordinary
variables. The contents of a device register can change without the knowledge or intervention of
your program. That's because the register contents can also be modified by the peripheral hardware.
By contrast, the contents of a variable will not change unless your program modifies them explicitly.
For that reason, we say that the contents of a device register are volatile, or subject to change
without notice.
The C/C++ keyword volatile should be used when declaring pointers to device registers.
This warns the compiler not to make any assumptions about the data stored at that address.
For example, if the compiler sees a write to the volatile location followed by another write to that
same location, it will not assume that the first write is an unnecessary use of processor time. In other
words, the keyword volatile instructs the optimization phase of the compiler to treat that variable
as though its behavior cannot be predicted at compile time.
The primary disadvantage of the other type of device registers, I/O-mapped registers, is that there
is no standard way to access them from C or C++. Such registers are accessible only with the help of
special machine-language instructions. And these processor-specific instructions are not supported
by the C or C++ language standards. So it is necessary to use special library routines or inline
assembly (as we did in Chapter 2) to read and write the registers of an I/O-mapped device.
3.4.2.2 The Device Driver Philosophy
When it comes to designing device drivers, you should always focus on one easily stated goal: hide
the hardware completely. When you're finished, you want the device driver module to be the only
piece of software in the entire system that reads or writes that particular device's control and status
registers directly. In addition, if the device generates any interrupts, the interrupt service routine that
responds to them should be an integral part of the device driver.
In this section, I'll explain why is recommend this philosophy and how it can be achieved.
Of course, attempts to hide the hardware completely are difficult. Any programming interface you
select will reflect the broad features of the device. That's to be expected. The goal should be to
create a programming interface that would not need to be changed if the underlying peripheral
were replaced with another in its general class. For example, all Flash memory devices share the
concepts of sectors (though the sector size can differ between chips). An erase operation can be
performed only on an entire sector, and once erased, individual bytes or words can be rewritten. So
the programming interface provided by the Flash driver example in the last chapter should work
with any Flash memory device. The specific features of the AMD 29F010 are hidden from that
level, as desired.
Device drivers for embedded systems are quite different from their workstation counterparts.
In a modern computer workstation, device drivers are most often concerned with satisfying the
requirements of the operating system. For example, workstation operating systems generally impose
strict requirements on the software interface between themselves and a network card. The device
driver for a particular network card must conform to this software interface, regardless of the
features and capabilities of the underlying hardware. Application programs that want to use the
network card are forced to use the networking API provided by the operating system and don't have
direct access to the card itself. In this case, the goal of hiding the hardware completely is easily met.
By contrast, the application software in an embedded system can easily access your hardware.
In fact, because all of the software is linked together into a single binary image, there is rarely even
a distinction made between application software, operating system, and device drivers.
The drawing of these lines and the enforcement of hardware access restrictions are purely the
responsibilities of the software developers. Both are design decisions that the developers must
consciously make. In other words, the implementers of embedded software can more easily cheat on
the software design than their non-embedded peers.
The benefits of good device driver design are threefold.
• First, because of the modularization, the structure of the overall software is easier to
understand.
• Second, because there is only one module that ever interacts directly with the peripheral's
registers, the state of the hardware can be more accurately tracked.
• And, last but not least, software changes that result from hardware changes are localized to
the device driver.
Each of these benefits can and will help to reduce the total number of bugs in your embedded
software. But you have to be willing to put in a bit of extra effort at design time in order to realize
such savings.
If you agree with the philosophy of hiding all hardware specifics and interactions within the device
driver, it will usually consist of the five components in the following list. To make driver
implementation as simple and incremental as possible, these elements should be developed in the
order in which they are presented.
1. A data structure that overlays the memory-mapped control and status registers of the device
The first step in the driver development process is to create a C-style struct that looks just like the
memory-mapped registers of your device. This usually involves studying the data book for the
peripheral and creating a table of the control and status registers and their offsets.
Then, beginning with the register at the lowest offset, start filling out the struct. (If one or more
locations are unused or reserved, be sure to place dummy variables there to fill in the additional
space.)
2. A set of variables to track the current state of the hardware and device driver
The second step in the driver development process is to figure out what variables you will need to
track the state of the hardware and device driver. For example, in the case of the timer/counter unit
described earlier we'll probably need to know if the hardware has been initialized. And if it has been,
we might also want to know the length of the running countdown.
Some device drivers create more than one software device. This is a purely logical device that is
implemented over the top of the basic peripheral hardware. For example, it is easy to imagine that
more than one software timer could be created from a single timer/counter unit. The timer/counter
unit would be configured to generate a periodic clock tick, and the device driver would then manage
a set of software timers of various lengths by maintaining state information for each.
4. A set of routines that, taken together, provide an API for users of the device driver
After you've successfully initialized the device, you can start adding other functionality to the driver.
Hopefully, you've already settled on the names and purposes of the various routines, as well as their
respective parameters and return values. All that's left to do now is implement and test each one.
We'll see examples of such routines in the next section.
5. Decodificarea adreselor.
When both inputs, R and S, are equal to 0 the latch maintains its existing state. This state may be
either Qa = 0 and Qb = 1, or Qa = 1 and Qb = 0, which is indicated in the truth table by stating that
the Qa and Qb outputs have values 0/1 and 1/0, respectively. Observe that Qa and Qb are
complements of each other in this case. When R = 0 and S = 1, the latch is set into a state where Qa
= 1 and Qb = 0.
When R = 1 and S = 0, the latch is reset into a state where Qa = 0 and Qb = 1. The fourth possibility
is to have R = S = 1. In this case both Qa and Qb will be 0.
The basic SR latch can serve as a useful memory element. It remembers its state when both the S
and R inputs are 0. It changes its state in response to changes in the signals on these inputs. The state
changes occur at the time when the changes in the signals occur. If we cannot control the time of
such changes, then we don’t know when the latch may change its state.
As we will see, there is also a need for storage elements that can change their states no more than
once during one clock cycle. We will discuss two types of circuits that exhibit such behavior.
Consider the circuit given above, which consists of two gated D latches. The first, called master,
changes its state while Clock = 1. The second, called slave, changes its state while Clock = 0. The
operation of the circuit is such that when the clock is high, the master tracks the value of the D input
signal and the slave does not change. Thus the value of Qm follows any changes in D, and the value
of Qs remains constant. When the clock signal changes to 0, the master stage stops following the
changes in the D input. At the same time, the slave stage responds to the value of the signal Qm and
changes state accordingly. Since Qm does not change while Clock = 0, the slave stage can undergo
at most one change of state during a clock cycle. From the external observer’s point of view,
namely, the circuit connected to the output of the slave stage, the master-slave circuit changes its
state at the negative-going edge of the clock. The negative edge is the edge where the clock signal
changes from 1 to 0. Regardless of the number of changes in the D input to the master stage during
one clock cycle, the observer of the Qs signal will see only the change that corresponds to the D
input at the negative edge of the clock. The above circuit is called a master-slave D flip-flop. The
term flip-flop denotes a storage element that changes its output state at the edge of a controlling
clock signal. The timing diagram for this flip-flop and a graphical symbol are given also. In the
symbol we use the > mark to denote that the flip-flop responds to the “active edge” of the clock. We
place a bubble on the clock input to indicate that the active edge for this particular circuit is the
negative edge.
A positive-edge-triggered D flip-flop.
It requires only six NAND gates and, hence, fewer transistors. The operation of the circuit is as
follows. When Clock = 0, the outputs of gates 2 and 3 are high. Thus P1 = P2 = 1, which maintains
the output latch, comprising gates 5 and 6, in its present state. At the same time, the signal P3 is
equal to D, and P4 is equal to its complement D. When Clock changes to 1, the following changes
take place. The values of P3 and P4 are transmitted through gates 2 and 3 to cause P1 = D and P2 =
D, which sets Q = D and Q = D. To operate reliably, P3 and P4 must be stable when Clock changes
from 0 to 1. Hence the setup time of the flip-flop is equal to the delay from the D input through gates
4 and 1 to P3. The hold time is given by the delay through gate 3 because once P2 is stable, the
changes in D no longer matter.
For proper operation it is necessary to show that, after Clock changes to 1, any further changes in D
will not affect the output latch as long as Clock = 1. We have to consider two cases. Suppose first
that D = 0 at the positive edge of the clock. Then P2 = 0, which will keep the output of gate 4 equal
to 1 as long as Clock = 1, regardless of the value of the D input. The second case is if D = 1 at the
positive edge of the clock. Then P1 = 0, which forces the outputs of gates 1 and 3 to be equal to 1,
regardless of the D input. Therefore, the flip-flop ignores changes in the D input while Clock = 1.
Obviously, it must be possible to clear the count to zero, which means that all flip-flops must have Q
= 0. It is equally useful to be able to preset each flip-flop to Q = 1, to insert some specific count as
the initial value in the counter.
6.1.4 T Flip-Flop
The D flip-flop is a versatile storage element that can be used for many purposes. By including some
simple logic circuitry to drive its input, the D flip-flop may appear to be a different type of storage
element. An interesting modification is presented below.
This circuit uses a positive-edge-triggered D flip-flop. The feedback connections make the input
signal D equal to either the value of Q or Q under the control of the signal that is labeled T. On each
positive edge of the clock, the flip-flop may change its state Q(t). If T = 0, then D = Q and the state
will remain the same, that is, Q(t + 1) = Q(t). But if T = 1, then D = Q and the new state will be Q(t
+ 1) = Q(t). Therefore, the overall operation of the circuit is that it retains its present state if T = 0,
and it reverses its present state if T = 1.
Any circuit that implements this truth table is called a T flip-flop. The name T flip-flop derives from
the behavior of the circuit, which “toggles” its state when T = 1. The toggle feature makes the T flip-
flop a useful element for building counter circuits.
6.1.5 T JK Flip-Flop
Another interesting circuit can be derived from above figure. Instead of using a single control input,
T, we can use two inputs, J and K, as indicated in Figure below. For this circuit the input D is
defined as
D = JQ + KQ
A corresponding truth table is given in also. The circuit is called a JK flip-flop. It combines the
behaviors of SR and T flip-flops in a useful way. It behaves as the SR flip-flop, where J = S and K =
R, for all input values except J = K = 1. For the latter case, which has to be avoided in the SR flip-
flop, the JK flip-flop toggles its state like the T flip-flop.
The JK flip-flop is a versatile circuit. It can be used for straight storage purposes, just like the D and
SR flip-flops. But it can also serve as a T flip-flop by connecting the J and K inputs together.
Summary of Terminology
We have used the terminology that is quite common. But the reader should be aware that different
interpretations of the terms latch and flip-flop can be found in the literature. Our terminology can be
summarized as follows:
Basic latch is a feedback connection of two NOR gates or two NAND gates, which can store one bit
of information. It can be set to 1 using the S input and reset to 0 using the R input.
Gated latch is a basic latch that includes input gating and a control input signal. The latch retains its
existing state when the control input is equal to 0. Its state may be changed when the control signal
is equal to 1. In our discussion we referred to the control input as the clock. We considered two
types of gated latches:
• Gated SR latch uses the S and R inputs to set the latch to 1 or reset it to 0, respectively.
• Gated D latch uses the D input to force the latch into a state that has the same logic value as
the D input.
A flip-flop is a storage element based on the gated latch principle, which can have its output
state changed only on the edge of the controlling clock signal. We considered two types:
• Edge-triggered flip-flop is affected only by the input values present when the active edge of
the clock occurs.
• Master-slave flip-flop is built with two gated latches. The master stage is active during half
of the clock cycle, and the slave stage is active during the other half.
The output value of the flip-flop changes on the edge of the clock that activates the transfer into the
slave stage. Master-slave flip-flops can be edge-triggered or level sensitive. If the master stage is a
gated D latch, then it behaves as an edge-triggered flip-flop. If the master stage is a gated SR latch,
then the flip-flop is level sensitive.
6.2 Registers
A flip-flop stores one bit of information. When a set of n flip-flops is used to store n bits of
information, such as an n-bit number, we refer to these flip-flops as a register. A common clock is
used for each flip-flop in a register, and each flip-flop operates as described in the previous sections.
The term register is merely a convenience for referring to n-bit structures consisting of flip-flops.
Figure below shows a four-bit shift register that is used to shift its contents one bit position to the
right. The data bits are loaded into the shift register in a serial fashion using the In input. The
contents of each flip-flop are transferred to the next flip-flop at each positive edge of the clock. An
illustration of the transfer is given below, which shows what happens when the signal values at In
during eight consecutive clock cycles are 1, 0, 1, 1, 1, 0, 0, and 0, assuming that the initial state of
all flip-flops is 0.
To implement a shift register, it is necessary to use either edge-triggered or master-slave flip-flops.
The level-sensitive gated latches are not suitable, because a change in the value of In would
propagate through more than one latch during the time when the clock is equal to 1.
Figure below shows a four-bit shift register that allows the parallel access. Instead of using the
normal shift register connection, the D input of each flip-flop is connected to two different sources.
One source is the preceding flip-flop, which is needed for the shift register operation. The other
source is the external input that corresponds to the bit that is
to be loaded into the flip-flop as a part of the parallel-load operation. The control signal Shift/Load is
used to select the mode of operation. If Shift/Load = 0, then the circuit operates as a shift register. If
Shift/Load = 1, then the parallel input data are loaded into the register. In both cases the action takes
place on the positive edge of the clock.
We have chosen to label the flip-flops outputs as Q3, . . . ,Q0 because shift registers are often used to
hold binary numbers. The contents of the register can be accessed in parallel by observing the
outputs of all flip-flops. The flip-flops can also be accessed serially, by observing the values of Q 0
during consecutive clock cycles while the contents are being shifted. A circuit in which data can be
loaded in series and then accessed in parallel is called a series-to-parallel converter. Similarly, the
opposite type of circuit is a parallel-to-series converter. The presented circuit can perform both of
these functions.
6.3 Counters
Counter circuits are used in digital systems for many purposes. They may count the number of
occurrences of certain events, generate timing intervals for control of various tasks in a system, keep
track of time elapsed between specific events, and so on.
Counters can be implemented using the adder/substractor circuits. However, since we only need to
change the contents of a counter by 1, it is not necessary to use such elaborate circuits. Instead, we
can use much simpler circuits that have a significantly lower cost. We will show how the counter
circuits can be designed using T and D flip-flops.
Figure 7.20b shows a timing diagram for the counter. The value of Q0 toggles once each clock
cycle. The change takes place shortly after the positive edge of the Clock signal. The delay is caused
by the propagation delay through the flip-flop. Since the second flip-flop is clocked by Q0, the value
of Q1 changes shortly after the negative edge of the Q0 signal.
Similarly, the value of Q2 changes shortly after the negative edge of the Q1 signal. If we look at the
values Q2Q1Q0 as the count, then the timing diagram indicates that the counting sequence is 0, 1, 2,
3, 4, 5, 6, 7, 0, 1, and so on. This circuit is a modulo-8 counter. Because it counts in the upward
direction, we call it an up-counter.
The counter in Figure above has three stages, each comprising a single flip-flop. Only the first stage
responds directly to the Clock signal; we say that this stage is synchronized to the clock. The other
two stages respond after an additional delay. For example, when Count = 3, the next clock pulse will
cause the Count to go to 4. As indicated by the arrows in the timing diagram, this change requires
the toggling of the states of all three flip-flops. The change in Q0 is observed only after a
propagation delay from the positive edge of Clock. The Q1 and Q2 flip-flops have not yet changed;
hence for a brief time the count is Q2Q1Q0 = 010. The change in Q1 appears after a second
propagation delay, at which point the count is 000. Finally, the change in Q2 occurs after a third
delay, at which point the stable state of the circuit is reached and the count is 100. The circuit in
Figure below is an asynchronous counter, or a ripple counter.
The timing diagram shows that this circuit counts in the sequence 0, 7, 6, 5, 4, 3, 2, 1, 0, 7, and so
on. Because it counts in the downward direction, we say that it is a down-counter.
It is possible to combine the functionality of the circuits above circuits to form a counter that can
count either up or down. Such a counter is called an up/down-counter.
T1 = Q0
T2 = Q0Q1
T3 = Q0Q1Q2
Tn = Q0・Q1 ・ ・ ・Qn−1
Instead of using AND gates of increased size for each stage, which may lead to fan-in problems, we
use a factored arrangement, as shown in the figure. This arrangement does not slow down the
response of the counter, because all flip-flops change their states after a propagation delay from the
positive edge of the clock. Note that a change in the value of
Q0 may have to propagate through several AND gates to reach the flip-flops in the higher stages of
the counter, which requires a certain amount of time. This time must not exceed the clock period.
Actually, it must be less than the clock period minus the setup time for the flip-flops.
Enable and Clear Capability
The above counters change their contents in response to each clock pulse. Often it is desirable to be
able to inhibit counting, so that the count remains in its present state. This may be accomplished by
including an Enable control signal, as indicated below.
The circuit is the counter where the Enable signal controls directly the T input of the first flip-flop.
Connecting the Enable also to the AND gate chain means that if Enable = 0, then all T inputs will be
equal to 0. If Enable = 1, then the counter operates as explained previously.
In many applications it is necessary to start with the count equal to zero. This is easily achieved if
the flip-flops can be cleared. The clear inputs on all flip-flops can be tied together and driven by a
Clear control input.
It is not obvious how D flip-flops can be used to implement a counter. Here we will present a circuit
structure that meets the requirements. We gives a four-bit up-counter that counts in the sequence 0,
1, 2, . . . , 14, 15, 0, 1, and so on. The count is indicated by the flip-flop outputs Q3Q2Q1Q0. If we
assume that Enable = 1, then the D inputs of the flip-flops are defined by the expressions
D0 = Q0 = 1 ⊕ Q0
D1 = Q1 ⊕ Q0
D2 = Q2 ⊕ Q1Q0
D3 = Q3 ⊕ Q2Q1Q0
Di = Qi ⊕ Qi−1Qi−2・・・Q1Q0
We will show how to derive these equations in Chapter 8.
We have included the Enable control signal so that the counter counts the clock pulses only if
Enable = 1. In effect, the above equations are modified to implement the circuit in the figure as
follows
D0 = Q0 ⊕ Enable
D1 = Q1 ⊕ Q0 ・ Enable
D2 = Q2 ⊕ Q1 ・ Q0 ・ Enable
The operation of the counter is based on our observation for that the state of the flip-flop in stage i
changes only if all preceding flip-flops are in the state Q = 1. This makes the output of the AND gate
that feeds stage i equal to 1, which causes the output of the XOR gate connected to Di to be equal to
Qi . Otherwise, the output of the XOR gate provides Di = Qi , and the flip-flop remains in the same
state.
This resembles the carry propagation in a carry-look-ahead adder circuit; hence the AND-gate chain
can be thought of as the carry chain. Even though the circuit is only a four-bit counter, we have
included an extra AND that produces the “output carry.” This signal makes it easy to concatenate
two such four-bit counters to create an eight-bit counter.
Using the Clear and Preset inputs for this purpose is a possibility, but a better approach is discussed
below.
A two-input multiplexer is inserted before each D input. One input to the multiplexer is used to
provide the normal counting operation. The other input is a data bit that can be loaded directly into
the flip-flop. A control input, Load, is used to choose the mode of operation. The circuit counts
when Load = 0. A new initial value, D3D2D1D0, is loaded into the counter when Load = 1.
Reset Synchronization
We have already mentioned that it is important to be able to clear, or reset, the contents of a counter
prior to commencing a counting operation. This can be done using the clear capability of the
individual flip-flops. But we may also be interested in resetting the count to 0 during the normal
counting process. An n-bit up-counter functions naturally as a modulo- 2n counter. Suppose that we
wish to have a counter that counts modulo some base that is not a power of 2. For example, we may
want to design a modulo-6 counter, for which the counting sequence is 0, 1, 2, 3, 4, 5, 0, 1, and so
on.
The most straightforward approach is to recognize when the count reaches 5 and then reset the
counter. An AND gate can be used to detect the occurrence of the count of 5.
Actually, it is sufficient to ascertain that Q2 = Q0 = 1, which is true only for 5 in our desired
counting sequence. A circuit based on this approach is given below. It uses a three-bit synchronous
counter of the type depicted in Figure 7.25. The parallel-load feature of the counter is used to reset
its contents when the count reaches 5. The resetting action takes place at the positive clock edge
after the count has reached 5. It involves loading D2D1D0 = 000 into the flip-flops. As seen in the
timing diagram in Figure 7.26b, the desired counting sequence is achieved, with each value of the
count being established for one full clock cycle. Because the counter is reset on the active edge of
the clock, we say that this type of counter has a synchronous reset.
The flip-flops are cleared to 0 a short time after the NAND gate has detected the count of 5. This
time depends on the gate delays in the circuit, but not on the clock. Therefore, signal values Q 2Q1Q0
= 101 are maintained for a time that is much less than a clock cycle. Depending on a particular
application of such a counter, this may be adequate, but it may also be completely unacceptable. For
example, if the counter is used in a digital system where all operations in the system are
synchronized by the same clock, then this narrow pulse denoting Count = 5 would not be seen by the
rest of the system. To solve this problem, we could try to use a modulo-7 counter instead, assuming
that the system would ignore the short pulse that denotes the count of 6. This is not a good way of
designing circuits, because undesirable pulses often cause unforeseen difficulties in practice.
7. Timers/Counters
Timers and counters, which are present in most microcontroller chips, allow generation of pulses
and interrupts at regular intervals. They can also be used to count pulses and measure event timing.
Some of the more sophisticated versions can measure frequency, pulse width, and relative pulse
timing on inputs. Outputs can be defined to have a given repetition rate, pulse width, and even
complex sequences of pulses in some cases.
A simple timer consists of a simple, loadable 8-bit counter. You could build this from a couple of
74HC161 counters or equivalent PLD logic.
The microprocessor can write a value to the timer that is transferred to the counter outputs. If
the counter is an UP counter, it counts up. A DOWN counter counts down. A typical timer
embedded in a microcontroller or in a timer IC will have some means to start the timer once it is
loaded, typically by setting a bit in a register. The clock input to the counter may be a derivative of
the microprocessor clock or it may be a signal applied to one of the external pins. A real timer will
also provide the outputs of the counter to the microprocessor so it can read the count. If the
microprocessor loads this timer with a value of 0xFE and then starts the timer, it will count from FE
to FF on the next clock. On the second clock, it will count from FF to 00 and generate an output.
The output of the timer may set a flip-flop that the microprocessor can read, or it may generate an
interrupt to the microprocessor, or both. The timer may stop once it has generated an output, or it
may continue counting from 00 back to FF. The problem with a continuously running timer is that it
will count from the loaded value the first time it counts up, but the second time it will start from 00.
3.3.1.1 Reloading a timer
This timer has an 8-bit latch to hold the value written by the microprocessor. When the
microprocessor writes to the latch, it also loads the counter. An OR gate also loads the timer when it
rolls over from FF to 00. For this example, we will assume that the logic in the IC gets all the
polarities and timings of the load signal correct so that there are no glitches or race conditions.
The way this timer works is that the microprocessor writes a value to the latch (also loading
it into the timer) and then starts the timer. When the timer rolls over from FF to 00, it generates an
output (again, either a latched bit for the microprocessor to read or an interrupt). At the same time
that the output is generated, the timer is loaded from the latch contents. Since the latch still holds the
value written by the microprocessor, the counter will start counting again from the same point it did
before. Now the timer will produce a regular output with the same accuracy as the input clock. This
output could be used to generate a regular interrupt, to provide a baud rate clock to a UART, or to
provide a signal to any device that needs a regular pulse. A variation of this feature used in some
microcontrollers does not load the counter with the desired count value but instead loads it into a
digital comparator. The comparator compares the counter value to the value written by the
microprocessor. The counter starts at zero and counts up. When the count equals the value written
by the microprocessor, the counter is reset to zero and the process repeats. The effect is the same as
the timer just described.
3.3.1.2 Input Capture Timer
In this case, the timer counts from zero to FF. When a pulse occurs on the capture input pin,
the contents of the counter are transferred to an 8-bit latch and the counter is reset. The input pulse
also generates an interrupt to the microprocessor. The timer is connected directly to the input pin; in
an actual circuit, of course, there will be some gating and synchronizing logic to make sure all the
timing is right. Similarly, the capture pin will not connect directly to a microprocessor interrupt but
will be passed through some flip-flops, timing logic, interrupt controller logic, and so on.
This configuration is typically used to measure the time between the leading edge of two
pulses. The timer is run at a constant clock, usually a derivative of the microprocessor clock. Each
time an edge occurs on the input capture pin, the processor is interrupted and the software reads the
capture latch. The value in the latch is the number of clocks that occurred since the last pulse. Some
microcontrollers do not reset the counter on an input capture but let
the counter free run. In those configurations, the software must remember the previous reading and
subtract the new reading from it. When the counter rolls over from FF to 00, the software must
recognize that fact and correct the numbers; if it doesn’t, negative values will result. Many
microcontrollers that provide a capturetype timer also provide a means for the counter to generate an
interrupt when it rolls over, which can simplify this software task.
3.3.1.3 Watchdog Timer
The watchdog timer (WDT) acts as a safety net for the system. If the software stops
responding or attending to the task at hand, the watchdog timer detects that something is amiss and
resets the software automatically. The system might stop responding as a result of any number of
difficult-to-detect hardware or firmware defects. For example, if an unusual condition causes a
buffer over run that corrupts the stack frame, some function’s return address could be overwritten.
When that function completes, it then returns to the wrong spot leaving the system utterly confused.
Runaway pointers (firmware) or a glitch on the data bus (hardware) can cause similar crashes.
Different external factors can cause “glitches.” For example, even a small electrostatic discharge
near the device might cause enough interference to momentarily change the state of one bit on the
address or data bus. Unfortunately, these kinds of defects can be very intermittent, making them
easy to miss during the project’s system test stage.
The watchdog timer is a great protector. Its sole purpose is to monitor the CPU with a “you
scratch my back and I’ll scratch yours” kind of relationship. The typical watchdog has an input pin
that must be toggled periodically (forexample, once every second). If the watchdog is not toggled
within that period, it pulses one of its output pins. Typically, this output pin is tied either to the
CPU’s reset line or to some nonmaskable interrupt (NMI), and the input pin is tied to an I/O line of
the CPU. Consequently, if the firmware does not keep the watchdog input line toggling at the
specified rate, the watchdog assumes that the firmware has stopped working, complains, and causes
the CPU to be restarted.
3.3.1.4 Using Timers
Time-Based Temperature Measurement - An example that illustrates some of the
important issues you must consider when using timers involves measurement of temperature. The
Maxim MAX6576 is an IC that measures temperature. The MAX6576 has a single wire output and
produces a square wave with a period that is proportional to temperature in degrees Kelvin. The
MAX6576 can operate from -40°C to +125°C. By connecting the TSO and TS1 inputs to ground or
Vcc in various combinations, the MAX6576 can be configured so that the period varies 10, 40, 160,
or 640µs per degree. In the configuration shown, the period will vary by 40µs per degree. At 25° C,
the period will be:
(25 + 273.15) x 40 = 11,926 microseconds, or 11.926ms
Say you connect this to an microprocessor using input capture mode. Let’s supose the
microprocessor is operating with a 4.096MHz crystal and using a prescaler of 256, so the timer gets
a clock of 4.096MHz/256, or 16,000Hz. The counter increments every 62.5 µs. For this application,
it doesn’t matter whether the input capture occurs on the rising or falling edge of the MAX6576
output.
How accurately can you measure temperature with this arrangement? Since the MAX6576
changes 40µs per degree and the clock to the counter is 16,00OHz, each increment of the counter
corresponds to 62.5/40 or 1.56 degrees. This is the best resolution you can get. If the temperature of
the sensor is 25°C, the captured count value will be 11,926/62.5 = 190.8. Since the counter can only
capture integral values, the actual count will be 190 (the .8 is dropped). For the count to be less than
190, the temperature must go to 23.7°C. Any changes between these two values cannot be read by
the microprocessor.
If we decide that this is insufficient accuracy for our application, we might change the
prescaler to 1, making the counter clock the same as the CPU clock, 4.096MHz. Now the counter
increments every 244.1ns, and the resolution is 244.1ns/40µs, or .0061 degrees per counter
increment. This is much better accuracy than the sensor itself has. What happens in this
configuration if the temperature goes from 25°C to 125°C? The count value will go from 11,926 to
15,926. This will result in a captured count of 65,232. The timer is 16 bits wide, so this is not a
problem, but it is very close to the 65,535 upper limit of the counter.
What happens at 125°C if we take the accuracy of the sensor itself into account? The
MAX6576 has a typical accuracy of 35°C at 125°C, but the maximum error is +5°C. This means
that, at 125°C, the output may actually indicate up to 130°C. At 130°C, the output period is
16126ms. This corresponds to a count value of 66,052, which means the timer we are using would
roll over from 65,535 to zero while sampling. The actual count that would be captured would be
517, indicating a much lower temperature than the MAX6576 is actually sensing.
There are several solutions to this specific problem: The timer prescaler could be changed,
the configuration of the MAX6576 could be changed, or even the microprocessor crystal could be
changed. You could leave the hardware as-is and handle the error in software by detecting the
rollover. The important point is to perform this type of analysis when you use timers in
microprocessor designs.
Another issue that arises from this example is that of sampling time. The system can only
sample the temperature at a rate equal to the period of the output. As the temperature goes up, the
time between samples also goes up. If several samples need to be averaged, the sampling rate goes
down proportionally. While a worstcase sample time of 16ms is probably not unreasonable for a
temperature measurement system, an analysis of the effects of sample time should be performed in
cases where the input rate of a signal affects it.
Motor Control - Say you have a DC motor that is part of a microprocessor controlsystem. The
motor has an encoder that produces 100 pulses per revolution, and the microprocessor must control
the speed of the motor from 10RPM to 2000RPM. Some undefined external source provides a
command to the microprocessor to set motor speed.
At 10RPM, the microprocessor will get pulses from the motor encoder at the following
frequency:
Re v Pulses 1 Min Pulses
10 100 = 16.6
Min Re v 60 Sec Sec
A similar calculation results in a frequency of 3333.33 pulses/sec at 2000RPM. If the input
capture hardware is configured to generate an interrupt when the input pulse occurs, then the
processor will get an interrupt every 60ms at lORPM, and every 300 ps at 2000 RPM.
Say we want to calculate motor speed by using a microcontroller with input capture capability to
measure the time between encoder pulses. If the input capture is measured with a 1MHz reference
clock, then the input capture registers will contain 1 MHz/16.6Hz or 60,024 at 10RPM. Similarly,
the registers will contain a value of 300 at 2000RPM.
The l00 count encoder produces one pulse every 3.6 degrees of rotation (360/100). This is
true at any motor speed. However, the input capture reference clock is fixed, so its accuracy (in
degrees of rotation) vanes with the motor speed. At 10RPM, each reference clock corresponds to:
EncoderPulses Deg 1 Sec
16.66 3.6 = 60 10 −6 DegreesPer Re ferenceClock
Sec EncoderPulse 1000000 Re ferenceClock
At 2000RPM, this becomes .012 degrees. While either of these is probably adequate for a
motor control application, the principle is important; at faster RPM, the accuracy of the reference
clock with respect to the input signal is less.
8. PWM Control
8.1 Examples and description
Pulse-width modulation control works by switching the power supplied to the motor on and off
very rapidly. The DC voltage is converted to a square-wave signal, alternating between fully on
(nearly 12V) and zero, giving the motor a series of power "kicks".
If the switching frequency is high enough, the motor runs at a steady speed due to its fly-wheel
momentum.
By adjusting the duty cycle of the signal (modulating the width of the pulse, hence the 'PWM') ie,
the time fraction it is "on", the average power can be varied, and hence the motor speed.
Advantages are,
1. The output transistor is either on or off, not partly on as with normal regulation, so less
power is wasted as heat and smaller heat-sinks can be used.
2. With a suitable circuit there is little voltage loss across the output transistor, so the top end of
the control range gets nearer to the supply voltage than linear regulator circuits.
3. The full-power pulsing action will run fans at a much lower speed than an equivalent steady
voltage.
Disadvantages:
1. Without adding extra circuitry, any fan speed signal is lost, as the fan electronics' power
supply is no longer continuous.
2. The 12V "kicks" may be audible if the fan is not well-mounted, especially at low revs. A
clicking or growling vibration at PWM frequency can be amplified by case panels.
3. Some authorities claim the pulsed power puts more stress on the fan bearings and windings,
shortening its life.
An oscillator is used to generate a triangle or sawtooth waveform (green line). At low frequencies
the motor speed tends to be jerky, at high frequencies the motor's inductance becomes significant
and power is lost. Frequencies of 30-200Hz are commonly used.
A comparator compares the sawtooth voltage with the reference voltage. When the sawtooth voltage
rises above the reference voltage, a power transistor is switched on. As it falls below the reference, it
is switched off. This gives a square wave output to the fan motor.
If the potentiometer is adjusted to give a high reference voltage (raising the blue line), the sawtooth
never reaches it, so output is zero. With a low reference, the comparator is always on, giving full
power.
A simple PWM consists of an 8-bit up/down counter that counts from 00 to FF, then back
down to 00 and an 8-bit comparator that compares the value in the 8-bit latch to the counter value.
When the two values are equal, the comparator clocks the “D” flipflop (again, timing logic makes
sure everything works correctly). If the counter is counting up, a “1” is clocked into the “D” flip-
flop. If the counter is counting down, a “0” is loaded. The flip-flop output is connected to one of the
microcontroller output pins.
Say the microprocessor writes a value of 0xFE into the latch. The counter counts from 00 to
FE, where the PWM output goes to “1” because the counter bits match the latched value. The
counter continues to FF, then back down through FE to zero. When the counter passes through FE,
the PWM output goes to zero. So in this case, the PWM output is high for two counts (FE and FF)
out of 256, or about .78 percent duty cycle. If the microprocessor writes 0xF0 to the latch, the PWM
output will be high from F0 to FF and back to F0, for a total of 30 counts or 11.7 percent duty cycle.
A more sophisticated PWM timer would include a second latch and comparator so the
counter can reverse direction at values other than FF. In such a timer, this comparator would set the
frequency of the PWM signal while the other comparator would set the duty cycle. Some
microprocessors provide other means to generate PWM. Some microcontrollers don’t use an
up/down counter but instead they provide two comparators. After the first count value is reached,
the counter is reset and the second comparator is used to indicate end-of-count. The output pin
indicates which comparator is being used so a PWM output can be generated by controlling the
ratios of the comparator values.
PWM Output
Similar considerations apply to timer outputs. If you are using an 8-bit timer to generate a
PWM signal, the output duty cycle can only be changed by one timer count, or 1 in 256. This results
in a duty cycle resolution of .3 percent. Note, though, that this applies only if the timer is allowed to
run a full 256 counts. If you are using an 8-bit timer but only 100 counts for the PWM period, then
one step is 1 percent of the total period. In this case, the best resolution you can get is I percent. This
is sufficient for many applications but is inadequate for others. In an application in which you vary
the PWM period and duty cycle, you need to be sure that the resolution at the fastest period (least
number of timer counts per cycle) is adequate for the application.
PWM Control
Pulse-width modulation control works by switching the power supplied to the motor on and off very
rapidly. The DC voltage is converted to a square-wave signal, alternating between fully on (nearly
12V) and zero, giving the motor a series of power "kicks".
If the switching frequency is high enough, the motor runs at a steady speed due to its fly-wheel
momentum.
By adjusting the duty cycle of the signal (modulating the width of the pulse, hence the 'PWM') ie,
the time fraction it is "on", the average power can be varied, and hence the motor speed.
Advantages are,
The output transistor is either on or off, not partly on as with normal regulation, so less power is
wasted as heat and smaller heat-sinks can be used.
With a suitable circuit there is little voltage loss across the output transistor, so the top end of the
control range gets nearer to the supply voltage than linear regulator circuits.
The full-power pulsing action will run fans at a much lower speed than an equivalent steady voltage.
Disadvantages:
Without adding extra circuitry, any fan speed signal is lost, as the fan electronics' power supply is no
longer continuous.
The 12V "kicks" may be audible if the fan is not well-mounted, especially at low revs. A clicking or
growling vibration at PWM frequency can be amplified by case panels. A way of overcoming this
by "blunting" the square-wave pulse is described in Application Note #58 from Telcom. (a 58k pdf
file, right-click to download). I've tried this, it works, but some of advantage #3 is lost.
Some authorities claim the pulsed power puts more stress on the fan bearings and windings,
shortening its life.
How It Works
An oscillator is used to generate a triangle or sawtooth waveform (green line). At low frequencies
the motor speed tends to be jerky, at high frequencies the motor's inductance becomes significant
and power is lost. Frequencies of 30-200Hz are commonly used.
A comparator compares the sawtooth voltage with the reference voltage. When the sawtooth voltage
rises above the reference voltage, a power transistor is switched on. As it falls below the reference, it
is switched off. This gives a square wave output to the fan motor.
If the potentiometer is adjusted to give a high reference voltage (raising the blue line), the sawtooth
never reaches it, so output is zero. With a low reference, the comparator is always on, giving full
power.
Principle
An example of PWM: the supply voltage (blue) modulated as a series of pulses results in a sine-like
flux density waveform (red) in a magnetic circuit of electromagnetic actuator. The smoothness of
the resultant waveform can be controlled by the width and number of modulated impulses (per given
cycle)
Pulse-width modulation uses a square wave whose pulse width is modulated resulting in the
variation of the average value of the waveform. If we consider a square waveform f(t) with a low
value ymin, a high value ymax and a duty cycle D (see figure 1), the average value of the waveform is
given by:
As f(t) is a square wave, its value is ymax for and ymin for . The above
expression then becomes:
This latter expression can be fairly simplified in many cases where ymin = 0 as . From
this, it is obvious that the average value of the signal ( ) is directly dependent on the duty cycle D.
Fig. 2: A simple method to generate the PWM pulse train corresponding to a given signal is the
intersective PWM: the signal (here the green sinewave) is compared with a sawtooth waveform
(blue). When the latter is less than the former, the PWM signal (magenta) is in high state (1).
Otherwise it is in the low state (0).
The simplest way to generate a PWM signal is the intersective method, which requires only a
sawtooth or a triangle waveform (easily generated using a simple oscillator) and a comparator.
When the value of the reference signal (the green sine wave in figure 2) is more than the modulation
waveform (blue), the PWM signal (magenta) is in the high state, otherwise it is in the low state.
Fig. 3 : Principle of the delta PWM. The output signal (blue) is compared with the limits (green).
These limits correspond to the reference signal (red), offset by a given value. Every time the output
signal reaches one of the limits, the PWM signal changes state.
Fig. 4 : Principle of the sigma-delta PWM. The top green waveform is the reference signal, on which
the output signal (PWM, in the middle plot) is subtracted to form the error signal (blue, in top plot).
This error is integrated (bottom plot), and when the integral of the error exceeds the limits (red
lines), the output changes state.
Fig. 5 : Three types of PWM signals (blue): leading edge modulation (top), trailing edge modulation
(middle) and centered pulses (both edges are modulated, bottom). The green lines are the sawtooth
signals used to generate the PWM waveforms using the intersective method.
Square wave is a unique function for many applications such as Pulse Width Modulation (PWM).
PWM is widely used in a variety of applications in measurement and digital controls. It offers a
simple method for digital control logic to create an analog equivalence.
The majority of microcontrollers today has built-in PWM capability that facilitates the
implementation of the control. Using PWM in communication systems is very popular due to the
fact that the digital signal is more robust and less vulnerable to noise.
8.2 Concepts of Pulse Width Modulation (PWM)
PWM is a method of digitally encoding analog signal levels. The duty cycle of a square wave is
modulated to encode a specific analog signal level using high-resolution counters. The PWM signal
is still a digital signal because at the given instant of time, the full DC supply is either fully on or
fully off.
The voltage or current source is supplied to the analog load by a repetitive series of ON and OFF
pulses. The ON time is the period when the DC supply is applied to the load, and the OFF time is
the period when the DC supply is switched off. If the available bandwidth is suffi cient, any analog
value can be encoded using PWM.
An analog signal has a continuously varying value, with infinite resolution in both time and
magnitude, and it can be used to control many electronic devices directly. For example, in a simple
analog radio, a knob is connected to a variable resistor. When turning the knob, the resistance goes
down or up, and the current fl owing through the resistor increases or decreases. Consequently, the
current that drives the speaker is changed proportionally, thus increasing or decreasing the volume.
Although analog control may be considered intuitive and simple, it is not always economically
attractive or practical. Analog circuits tend to drift over time and are very difficult to tune.
Problems solved by precision analog circuits can be large, heavy, and expensive.
Analog circuits tend to generate heat through the power dissipation. The power dissipated is
proportional to the voltage across the active elements, multiplied by the current that flows through it.
Analog circuitry can also be sensitive to noise because of its infinite resolution; even minor
perturbations of an analog signal can change its value.
By controlling analog circuits digitally, system costs and power consumption can be drastically
reduced. Many microcontrollers and digital signal processors (DSPs) already include PWM
controllers in the chip, thus making implementation easier.
Figure 1 illustrates a circuit established using a battery, a switch and a LED. This circuit turns on the
LED for one second and then turns off the LED for one second using the switch control.
The LED is ON for 50% of the period and OFF the other 50%. The period is defined as the total
time it takes to complete one cycle (from OFF to ON state and back to OFF state).
The signal can be further characterized by the duty cycle, which is the ratio of the “ON” time
divided by the period. A high duty cycle generates a bright LED while a small duty cycle generates
a dimmer LED. The example shown in Figure 1 provides a 50% duty cycle.
In Figure 2, two waveforms with different frequencies produce the same amount of light. Note that
the
amount of light is independent from the frequency, but proportional to the duty cycle.
The frequency range you can use to control a circuit is limited by the number of response time to the
circuit.
In the example shown in Figure 1, a low frequency can cause the LED to flash noticeably. A high
frequency, in turn, can cause an inductive load to saturate.
For example, a transformer has a limited frequency range to transfer the energy efficiently. For some
designs, harmonics (or beat frequencies) of the PWM frequency can get coupled into the analog
circuitry, causing unwanted noise. If the right frequency is selected, the load being controlled will
act as a stabilizer, a light will glow continuously and the momentum will allow a rotor to turn
smoothly.
The PWM signals are easy to generate using a comparator with a sine wave as one of the input
signals. Figure 3 shows a sample block diagram of an analog PWM generator.
Figures 4 and 5 show the PWM output waveform (red line) generated by a comparator with two
input signals: a sine wave (black line) and an input signal (gray line). The input signal of 0.5 VDC is
the voltage reference to be compared with the sine wave to produce a PWM waveform.
With the steady-state reference voltage of 0.5 VDC, a PWM waveform with 50% duty cycle is
generated.
If the reference voltage decreases to 0.25 VDC, the generated PWM waveform will have a higher
duty cycle, as shown in Figure 5
PWM offers several advantages over an analog control. For example, using PWM to control the
brightness of a lamp, the heat dissipated from the lamp is less than the heat generated from an
analog control that converts the current to heat. Hence, less power is delivered to the load (light),
which will prolong the life cycle of the load.
With a higher frequency rate, the light (load) brightness can be controlled as smoothly as an analog
control.
Rotors can operate at a lower speed if they are controlled by PWM. Some of the rotors might not
function with low analog current. When an analog current controls a rotor, it will not produce
significant torque at low speed. The magnetic field that is created by the small current is insufficient
to turn the rotor. On the other hand, a PWM current can create short pulses of magnetic flux at full
strength that enables the rotor to turn at a slow speed.
Combining ON/OFF (1/0) states with the variety voltage and the duty cycle, PWM can output at a
desired voltage level. Thus, it can be used as voltage regulator for many applications. When the
desired voltage level is higher than the output voltage level, the state will be ON (1). On the other
hand, the state will be OFF (0) when the desired voltage level is lower than the output voltage level.
For example, PWM can be applied when CPLD is used for simple voltage regulation or with a
FPGA for complex control algorithms using its internal DSP blocks.
In addition, the entire control circuit can be digitized using the PWM technique. This eliminates the
need to use digital-to-analog converters in control circuitries. The digital control lines generated by
PWM reduce the susceptibility of your circuit to the interference.
The technology has become more pervasive as PWM controls are incorporated into low cost
microcontrollers. Microcontrollers offer simple commands to vary the duty cycle and frequencies of
the
PWM control signal. PWM is also widely used in the communications fi eld because the digital
signals are extremely immune to noise.
The popularity of PWM will continue to grow as the functionality becomes more popular in
microcontrollers and development tools. Hence, having profound knowledge of PWM will make it
easier to incorporate in your designs and works.
In addition, when working on a PWM design, a U1252A handheld DMM can be a great tool for
creating a waveform.
Glossary
Duty cycle – the percentage of time of a pulse train at its higher voltage
Pulse width – total time during which the pulse is in the “true state”
1. De ce PWM
Pentru a conduce sisteme continuale este necesar sa furnizăm semnale de control continue în timp.
În practica reglării numerice aceasta se face folosind convertoare numeric-analogice (CNA). Această
opţiune este relativ scumpă, iar în practica sistemelor embedded este evitată. PWM s-a impus ca o
metodă de generare a unor semnale de control pentru instalaţii continuale folosind ieşiri numerice,
disponibile în număr mare pe orice micrcontroller.
Deşi ieşirile numerice oferă numai informaţie logică, variabila timp este cea folosită de
implementarea PWM pentru emularea unui semnal analogic.
comanda 1
numerica
s+1
Pulse Transfer Fcn
Generator
Scope
comanda 1
0.5 continuala
s+1
Constant Transfer Fcn 1
Modelul de mai sus implementează cele 2 tipuri de control. Pe de o parte avem de-aface cu un
generator de impulsuri (configurabil) care implementează o comandă PWM. Pe de altă parte, acelaşi
sistem este comandat de o mărime de comandă continuală, o mărime de comandă constantă care
reprezintă valoarea factorului de umplere (Duty Cycle, Pulse Width). În exemplul de mai sus,
valoarea este 0.5 (deoarece Pulse Width = 50%)
a. amplitudinea impulsului
b. perioada impulsului
c. laţimea impulsului (Pulse Width, factorul de umplere)
Rezultatul unei simulari este:
a. Amplitudinea impulsului este data de valoarea High a ieşirii numerice, fiind un dat
constructiv. Teoretic, reprezintă valoarea maximă a comenzii aplicate instalaţiei. Cu alte cuvinte,
daca vom alege un factor de umplere de 100%, vom aplica o marime continuă cu amplitudinea 1.
b. Perioada impulsului reprezintă un parametru al PWM care trebuie adaptat la dinamica
sistemului controlat. În exemplul nostru, se observă ca distanţa în timp între valorile extreme ale
oscilaţiilor este de 2 secunde. Dacă vom păstra ceilalţi parametri dar vom reduce perioada la 0.4 sec,
vom obţine:
Cu alte cuvinte, prin reducerea perioadei PWM, sistemul condus va avea amplitudini ale oscilaţiilor
mai mici, dar la frecvenţe mai mari, cu consecinţe asupra elementelor de execuţie. Pentru a elimina
acest neajuns, se preferă introducerea unui filtru la intrarea instalaţiei:
comanda 1 1
numerica
.2s+1 s+1
Pulse Generator Filtru PWM Transfer Fcn
Pulse Width = 50 %
PWM period = 0.4
Scope
comanda 1
0.5 continuala
s+1
Constant Transfer Fcn 1
Acest filtru PWM trebuie sa nu influenţeze dinamica sistemului comandat, deci constanta sa de timp
trebuie sa fie mai mică (de 5 ori în cazul nostru) decât a sistemului comandat.
Cea mai bună reţetă este să alegem un filtru cu o constantă de timp de 10 ori mai mică decât a
sistemului comandat (dinamica sistemului controlat nu este influenţată prea mult de filtrul PWM),
iar perioada PWM să fie de 10 ori mai mică decât a filtrului (buna filtrare a PWM), deci de 100 de
ori mai mică decât a sistemului comandat:
comanda 1 1
numerica
.1s+1 s+1
Pulse Generator Filtru PWM Transfer Fcn
Pulse Width = 50 %
PWM period = 0.01
Scope
comanda 1
0.5 continuala
s+1
Constant Transfer Fcn 1
Se observă cum oscilaţiile date de PWM a dispărut.
2
unde PWM = 2f PWM = .
TPWM
Formulele de calcul ale coeficienţilor în acest caz sunt date de:
1
C0 =
TPWM x
TPWM
PWM ( t )dt
2
Sn =
TPWM x
TPWM
PWM (t ) sin( n PWM t )dt
TPWM
Bode Diagram
0
-5
-10
-15
Magnitude (dB)
-20
-25
-30
-35
-40
0
Phase (deg)
-45
-90
-2 -1 0 1 2
10 10 10 10 10
Frequency (rad/sec)
În diagrama Bode de mai sus, am reprezentat caracteristica de frecvenţă a instalaţiei aperiodice, în
care am considerat PWM = 10−1 [rad / sec] . Se observă cum caracteristica de frecvenţă a instalaţiei va
lăsa sa treacă la ieşire primele 16 frecvenţe din deszoltarea Fourier.
Dacă vom mări frecvenţa PWM-ului, de ex. la valoarea PWM = 4 10−1 [rad / sec] , mai puţine frecvenţe
se vor regăsi la ieşirea instalaţiei.
Bode Diagram
0
-5
-10
-15
Magnitude (dB)
-20
-25
-30
-35
-40
0
Phase (deg)
-45
-90
-2 -1 0 1 2
10 10 10 10 10
Frequency (rad/sec)
Dacă vom mari valoarea frecvenţei PWM la PWM = 10[rad / sec] , vom observa că instalaţia va filtra
practic toate frecvenţele din dezvoltarea Fourier, la ieşire regăsindu-se numai componenta continuă,
care se foloseşte pentru comandă
Bode Diagram
0
-5
-10
-15
Magnitude (dB)
-20
-25
-30
-35
-40
0
Phase (deg)
-45
-90
-2 -1 0 1 2
10 10 10 10 10
Frequency (rad/sec)
c. Pulse Width (factorul de umplere) al PWM acţionează asupra valorii medii a comenzii. Un
Pulse Width de 70% va conduce sistemul la un regim staţionar variabil în jurul valorii 0.7.
1
Într-adevăr, cum C0 =
TPWM x
TPWM
PWM ( t )dt este componenta continuă din dezvoltarea Fourier, asupra
oricărui sistem va acţiona această componentă împreună cu celelate frcvenţe multiplu ale frecvenţei
de bază. Dacă aceste frecvenţe sunt filtrate de către instalaţie (cum este cazul de mai sus), atunci
asupra sistemului controlat va acţiona numai componenta continuă, exact ca în cazul unui CNA.
comanda 1
numerica
s+1
Pulse Transfer Fcn
Generator
Pulse width = 70 %
PWM period = 0.1
Scope
comanda 1
0.7 continuala
s+1
Constant Transfer Fcn 1
Sistem comandat
Generator PWM cu PWM
Regulator Emulare
(PID) CAN
Algoritmul de reglare numeric, de ex. un PID, va modula unul dintre parametrii generatorului PWM.
Cum amplitudinea PWM este un parametru fixat tehnologic (amplitudine TTL pentru semnal
numeric: 0/5V) parametrii folositi în modulaţie sunt Pulse Width (latimea impulsului, de unde şi
numele PWM), precum si perioada (frecventa) impulsului PWM. Comanda regulatorului se scalează
pentru a oferi un factor de umplere 0%...100% către generatorul PWM.
TPWM
Te
În acest caz, sistemul de reglare citeşte valorile erorii şi calculează comanda cu o frecvenţă mai mare
decât a PWM-ului. Se observă că anumite valori ale comenzii sunt calculate inutil de către regulator,
deoarece valoarea PWM nu se schimba decat la momente multiplu al perioadei PWM.
2. TPWM < Te
Te
TPWM
În acest caz, PWM se informează inutil câteodată despre valoare afactorului de umplere, care se
modifică cu o frecvenţă mai mică.
Rezultă că cel mai favorabil caz este acela în care Te = TPWM. În acest caz, fiecare comandă calculată
de regulator va influenţa, cu o întârziere constantă, valoarea PWM.
În celelalte cazuri, se observă o variaţie a momentelor la care comanda de ieşire se poate modifica
efectiv (jitter) faţă de momentele k*Te la care comenzile se calculează, fapt care poate influenţa
negativ calitatea reglării.
Te
TPWM
9.2.1 Resolution
The resolution of an ADC or DAC is determined by the reference input and by the word
width. The resolution defines the smallest voltage change that can be converted. As mentioned
earlier, the resolution is the same as the smallest step size and can be calculated by dividing the
reference voltage by the number of possible conversion values.
For the example we’ve been using so far, an 8-bit ADC with a 5V reference, the resolution is
.0195V (19.5mV). This means that any input voltage below 19.5mV will result in an output of zero.
Input voltages between 19.5 mV and 39 mVwill result in an output of 1. Between 39 mV and 58.6
mV, the output will be 3.
Resolution can be improved by reducing the reference input. Changing from 5V to 2.5V
gives a resolution of 2.5/256, or 9.7mV. However, the maximum voltage that can be measured is
now 2.5V instead of 5V.
The only way to increase resolution without changing the reference is to use an ADC with
more bits. A 10-bit ADC using a 5V reference has 21°, or 1024 possible output codes. Thus, the
resolution is 5\3/1024, or 4.88mV.
The resolution also has implications for system design, especially in the area of noise. A 0-
to-5V, 10-bit ADC with 4.88mV resolution will respond to 4.88mV of noise just like it will to a DC
input of 4.88mV. If your input signal has 10mV of noise, you will not get anything like 10 bits of
precision unless you take a number of samples and average them. This means you either have to
insure a very quiet input or allow time for multiple samples.
10. Communication
10.1 UART
Serial transmission of digital information (bits) through a single wire or other medium is
much more cost effective than parallel transmission through multiple wires. A UART is used to
convert the transmitted information between its sequential and parallel form at each end of the link.
Each UART contains a shift register which is the fundamental method of conversion between serial
and parallel forms.
The UART usually does not directly generate or receive the external signals used between
different items of equipment. Typically, separate interface devices are used to convert the logic level
signals of the UART to and from the external signaling levels.
10.2 RS232
Due to it’s relative simplicity and low hardware overhead (as compared to parallel
interfacing), serial communications is used extensively within the electronics industry. Today, the
most popular serial communications standard in use is certainly the EIA/TIA–232–E specification.
This standard, which has been developed by the Electronic Industry Association and the
Telecommunications Industry Association (EIA/TIA), is more popularly referred to simply as “RS–
232” where “RS” stands for “recommended standard”. In recent years, this suffix has been replaced
with “EIA/TIA” to help identify the source of the standard.
The official name of the EIA/TIA–232–E standard is “Interface Between Data Terminal
Equipment and Data Circuit–Termination Equipment Employing Serial Binary Data Interchange”.
Although the name may sound intimidating, the standard is simply concerned with serial data
communication between a host system (Data Terminal Equipment, or “DTE”) and a peripheral
system (Data Circuit–Terminating Equipment, or “DCE”).
The EIA/TIA–232–E standard which was introduced in 1962 has been updated four times
since its introduction in order to better meet the needs of serial communication applications. The
letter “E” in the standard’s name indicates that this is the fifth revision of the standard.
RS–232 SPECIFICATIONS
RS–232 is a “complete” standard. This means that the standard sets out to ensure
compatibility between the host and peripheral systems by specifying 1) common voltage and signal
levels, 2) scommon pin wiring configurations, and 3) a minimal amount of control information
between the host and peripheral systems. Unlike many standards which simply specify the electrical
characteristics of a given interface, RS–232 specifies electrical, functional, and mechanical
characteristics in order to meet the above three criteria.
Because the functional characteristics of the interface are covered by the standard this
essentially means that RS–232 has defined the function of the different signals that are used in the
interface. These signals are divided into four different categories: common, data, control, and
timing. Table 1 illustrates the signals that are defined by the RS–232 standard.
As can be seen from the table there is an overwhelming number of signals defined by the
standard. The standard provides an abundance of control signals and supports a primary and
secondary communications channel. Fortunately few applications, if any, require all of these defined
signals. For example, only eight signals are used for a typical modem. Some simple applications
may require only four signals (two for data and two for handshaking) while others may require only
data signals with no handshaking.
The third area covered by RS–232 concerns the mechanical interface. In particular, RS–232
specifies a 25–pin connector. This is the minimum connector size that can accommodate all of the
signals defined in the functional portion of the standard. The pin assignment for this connector is
shown in Figure 1.6. The connector for DCE equipment is male for the connector housing and
female for the connection pins. Likewise, the DTE connector is a female housing with male
connection pins. Although RS–232 specifies a 25–position connector, it should be noted that often
this connector is not used. This is due to the fact that most applications do not require all of the
defined signals and therefore a 25–pin connector is larger than necessary. This being the case, it is
very common for other connector types to be used. Perhaps the most popular is the 9–position DB9S
connector which is also illustrated in Figure 1.6. This connector provides the means to transmit and
receive the necessary signals for modem applications, for example. This will be discussed in more
detail later.
Most systems designed today do not operate using RS–232 voltage levels. Since this is the
case, level conversion is necessary to implement RS–232 communication. Level conversion is
performed by special RS–232 IC’s. These IC’s typically have line drivers that generate the voltage
levels required by RS–232 and line receivers that can receive RS–232 voltage levels without being
damaged. These line drivers and receivers typically invert the signal as well since a logic 1 is
represented by a low voltage level for RS–232 communication and likewise a logic 0 is represented
by a high logic level. Figure 1.7 illustrates the function of an RS–232 line driver/receiver in a typical
modem application. In this particular example, the signals necessary for serial communication are
generated and received by the Universal Asynchronous Receiver/Transmitter (UART).
The RS–232 line driver/receiver IC performs the level translation necessary between the
CMOS/TTL and RS–232 interface. The UART just mentioned performs the “overhead” tasks
necessary for asynchronous serial communication. For example, the asynchronous nature of this
type of communication usually requires that start and stop bits be initiated by the host system to
indicate to the peripheral system when communication will start and stop. Parity bits are also often
employed to ensure that the data sent has not been corrupted. The UART usually generates the start,
stop, and parity bits when transmitting data and can detect communication errors upon receiving
data. The UART also functions as the intermediary between byte–wide (parallel) and bit–wide
(serial) communication; it converts a byte of data into a serial bit stream for transmitting and
converts a serial bit stream into a byte of data when receiving.
Now that an elementary explanation of the TTL/CMOS to RS–232 interface has been
provided we can consider some “real world” RS–232 applications. It has already been noted that
RS–232 applications rarely follow the RS–232 standard precisely. Perhaps the most significant
reason this is true is due to the fact that many of the defined signals are not necessary for most
applications. As such, the unnecessary signals are omitted. Many applications , such as a modem,
require only nine signals (two data signals, six control signals, and ground). Other applications may
require only five signals (two for data, two for handshaking, and ground), while others may require
only data signals with no handshake control. We will begin our investigation of “real world”
implementations by first considering the typical modem application.
There is a MASTER and a SLAVE mode. The MASTER device provides the clock signal
and determines the state of the chip select lines, i.e. it activates the SLAVE it wants to communicate
with. CS and SCKL are therefore outputs.
The SLAVE device receives the clock and chip select from the MASTER, CS and SCKL are
therefore inputs.
This means there is one master, while the number of slaves is only limited by the number of
chip selects.
A SPI device can be a simple shift register up to an independent subsystem. The basic
principle of a shift register is always present. Command codes as well as data values are serially
transferred, pumped into a shift register and are then internally available for parallel processing.
Here we already see an important point, which must be considered in the philosophy of SPI bus
systems: The length of the shift registers is not fixed, but can differ from device to device. Normally
the shift registers are 8Bit or integral multiples of it. Of course there also exist shift registers with an
odd number of bits. For example two cascaded 9Bit EEPROMs can store 18Bit data.
If a SPI device is not selected, its data output goes into a high-impedance state (hi-Z), so that
it does not interfere with the currently activated devices. When cascading several SPI devices, they
are treated as one slave and therefore connected to the same chip select.
Thus there are two meaningful types of connection of master and slave devices. Figure 1.10 shows
the type of connection for cascading several devices.
In Figure 1.10 the cascaded devices are evidently looked at as one larger device and receive
therefore the same chip select. The data output of the preceding device is tied to the data input of the
next, thus forming a wider shift register.
If independent slaves are to be connected to a master another bus structure has to be chosen,
as shown in Figure 1.11. Here, the clock and the SDI data lines are brought to each slave. Also the
SDO data lines are tied together and led back to the master. Only the chip selects are separately
brought to each SPI device.
Figure 1.11 – Master with independent slaves
Concept of operation
A cluster consists of one master task and several slave tasks. A master node contains the
master task as well as a slave task. All other slave nodes contain a slave task only. A node may
participate in more than one cluster. The term node relates to a single bus interface of a node if the
node has multiple bus interfaces. A sample cluster with one master node and two slave nodes is
depicted below:
Frames
A frame consists of a header (provided by the master task) and a response (provided by a
slave task).
The header consists of a break field and sync field followed by a frame identifier. The frame
identifier uniquely defines the purpose of the frame. The slave task appointed for providing the
response associated with the frame identifier transmits it, as depicted below. The response consists
of a data field and a checksum field.
The slave tasks interested in the data associated with the frame identifier receives the
response, verifies the checksum and uses the data transported.
Schedule table
The master task (in the master node) transmits headers based on a schedule table. The
schedule table specifies the frames and the interval between the start of a frame and the start of the
following frame. The master application may use different schedule tables and select among them.
Signal Management
A signal is transported in the data field of a frame.
Signal Types
A signal is either a scalar value or a byte array.
A scalar signal is between 1 and 16 bits long. A one bit scalar signal is called a Boolean
signal. Scalar signals in the size of 2 to 16 bits are treated as unsigned integers.
A byte array is an array of between one and eight bytes.
Each signal has exactly one publisher, i.e. it is always written by the same node in the
cluster. Zero, one or multiple nodes may subscribe to the signal.
All signals have initial values. The initial value for a published signal is valid until the node
writes a new value to this signal. The initial value for a subscribed signal is valid until a new
updated value is received from another node.
Signal Consistency
Scalar signal writing or reading must be atomic operations, i.e. it should never be possible for
an application to receive a signal value that is partly updated. This also applies to byte arrays.
However, no consistency is guaranteed between any signals.
Signal Packing
A signal is transmitted with the LSB first and the MSB last. There is no restriction on
packing scalar signals over byte boundaries. Each byte in a byte array shall map to a single frame
byte starting with the lowest numbered data byte.
Several signals can be packed into one frame as long as they do not overlap each other.
Note that signal packing/unpacking is implemented more efficient in software based nodes if
signals are byte aligned and/or if they do not cross byte boundaries.
The same signal can be packed into multiple frames as long as the publisher of the signal is
the same. If a node is receiving one signal packed into multiple frames the latest received signal
value is valid. Handling the same signal packed into frames on different LIN clusters is out of the
scope.
Frame Structure
The structure of a frame is shown in Figure 1.13. The frame is constructed of a number of
fields, one break field followed by four to eleven byte fields, labeled as in the figure. The time it
takes to send a frame is the sum of the time to send each byte plus the response space and the inter-
byte spaces.
The header starts at the falling edge of the break field and ends after the end of the stop bit of
the protected identifier (PID) field. The response starts at the end of stop bit of the PID field and
ends at the after the stop bit of the checksum field.
The inter-byte space is the time between the end of the stop bit of the preceding field and the
start bit of the following byte. The response space is the inter-byte space between the PID field and
the first data field in the data. Both of them must be non-negative.
A slave task shall always be able to detect the break/sync field sequence, even if it expects a
byte field (assuming the byte fields are separated from each other). A desired, but not required,
feature is to detect the break/sync field sequence even if the break is partially superimposed with a
data byte. When a break/sync field sequence happens, the transfer in progress shall be aborted and
processing of the new frame shall commence.
Protected identifier field
A protected identifier field consists of two sub-fields; the frame identifier and the parity. Bits
0 to 5 are the frame identifier and bits 6 and 7 are the parity.
Frame identifier
Six bits are reserved for the frame identifier; values in the range 0 to 63 can be used. The
frame identifiers are split in three categories:
•Values 0 to 59 (0x3B) are used for signal carrying frames,
•60 (0x3C) and 61 (0x3D) are used to carry diagnostic and configuration data,
•62 (0x3E) and 63 (0x3F) are reserved for future protocol enhancements.
Parity
The parity is calculated on the frame identifier bits as shown in equations (1) and (2):
P0 = ID0 + ID1 + ID2 + ID4 (1)
P1 = ¬ (ID1 + ID3 + ID4 + ID5) (2)
Mapping
The mapping of the bits (ID0 to ID5 and P0 and P1) is shown in Figure 1.17.
Data
A frame carries between one and eight bytes of data. The number of data contained in a
frame with a specific frame identifier shall be agreed by the publisher and all subscribers. A data
byte is transmitted as part of a byte field, see Figure 1.14.
For data entities longer than one byte, the entity LSB is contained in the byte sent first and
the entity MSB in the byte sent last (little-endian). The data fields are labeled data 1, data 2... up to
maximum data 8, see Figure 1.18.
Checksum
The last field of a frame is the checksum. The checksum contains the inverted eight bit sum
with carry over all data bytes or all data bytes and the protected identifier. Checksum calculation
over the data bytes only is called classic checksum and it is used for the master request frame, slave
response frame and communication with LIN 1.x slaves.
Eight bit sum with carry is equivalent to sum all values and subtract 255 every time the sum
is greater or equal to 256. See section 2.8.3 for examples how to calculate the checksum.
Checksum calculation over the data bytes and the protected identifier byte is called enhanced
checksum and it is used for communication with LIN 2.x slaves.
The checksum is transmitted in a byte field. Use of classic or enhanced checksum is managed by the
master node and it is determined per frame identifier; classic in communication with LIN 1.x slave
nodes and enhanced in communication with LIN 2.x slave nodes.
The Controller Area Network (CAN) is a serial communications protocol which efficiently
supports distributed real-time control with a very high level of security. Its domain of application
ranges from high speed networks to low cost multiplex wiring. In automotive electronics, engine
control units, sensors, anti-skid-systems, etc. are connected using CAN with bitrates up to 1 Mbit/s.
At the same time it is cost effective to build into vehicle body electronics, e.g. lamp clusters electric
windows etc. to replace the wiring harness otherwise required.
The intention of this specification is to achieve compatibility between any two CAN
implementations. Compatibility, however, has different aspects regarding e.g. electrical features and
the interpretation of data to be transferred. To achieve design transparency and implementation
flexibility CAN has been subdivided into different layers according to the ISO/OSI Reference
Model:
• The Data Link Layer
- The Logical Link Control (LLC) sub-layer
- The Medium Access Control (MAC) sub-layer
• The Physical Layer
Note that in previous versions of the CAN specification the services and functions of the
LLC and MAC sub-layers of the Data Link Layer had been described in layers denoted as ’object
layer’ and ’transfer layer’. The scope of the LLC sub-layer is
• To provide services for data transfer and for remote data request,
• To decide which messages received by the LLC sub-layer are actually to be accepted,
• To provide means for recovery management and overload notifications.
There is much freedom in defining object handling. The scope of the MAC sub-layer mainly
is the transfer protocol, i.e. controlling the Framing, performing Arbitration, Error Checking, Error
Signaling and Fault Confinement. Within the MAC sub-layer it is decided whether the bus is free for
starting a new transmission or whether a reception is just starting. Also some general features of the
bit timing are regarded as part of the MAC sub-layer. It is in the nature of the MAC sub-layer that
there is no freedom for modifications.
The scope of the physical layer is the actual transfer of the bits between the different nodes
with respect to all electrical properties. Within one network the physical layer, of course, has to be
the same for all nodes. There may be, however, much freedom in selecting a physical layer.
The scope of this specification is to define the MAC sub-layer and a small part of the LLC
sub-layer of the Data Link Layer and to describe the consequences of the CAN protocol on the
surrounding layers
Basic Concepts
CAN has the following properties:
• Prioritization of messages
• Guarantee of latency times
• Configuration flexibility
• Multicast reception with time synchronization
• System wide data consistency
• Multi-master
• Error detection and signaling
• Automatic retransmission of corrupted messages as soon as the bus is idle again
•distinction between temporary errors and permanent failures of nodes and autonomous
switching off of defect nodes
Layered Architecture of CAN according to the OSI Reference Model
• The Physical Layer defines how signals are actually transmitted and therefore deals with
the description of Bit Timing, Bit Encoding, and Synchronization. Within this specification the
Driver/Receiver Characteristics of the Physical Layer are not defined so as to allow transmission
medium and signal level implementations to be optimized for their application.
• The MAC sub-layer represents the kernel of the CAN protocol. It presents messages
received from the LLC sub-layer and accepts messages to be transmitted to the LLC sub-layer. The
MAC sub-layer is responsible for Message Framing, Arbitration, Acknowledgment, Error Detection
and Signaling. The MAC sub-layer are supervised by a management entity called Fault Confinement
which is self-checking mechanism for distinguishing short disturbances from permanent failures.
• The LLC sub-layer is concerned with Message Filtering, Overload Notification and
Recovery Management.
The scope of this specification is to define the Data Link Layer and the consequences of the
CAN protocol on the surrounding layers.
Messages
Information on the bus is sent in fixed format messages of different but limited length. When
the bus is free any connected unit may start to transmit a new message.
Information Routing
In CAN systems a CAN node does not make use of any information about the system
configuration (e.g. station addresses). This has several important consequences.
System Flexibility: Nodes can be added to the CAN network without requiring any change in
the software or hardware of any node and application layer.
Message Routing: The content of a message is named by an IDENTIFIER. The
IDENTIFIER does not indicate the destination of the message, but describes the meaning of the
data, so that all nodes in the network are able to decide by Message Filtering whether the data is to
be acted upon by them or not.
Multicast: As a consequence of the concept of Message Filtering any number of nodes can
receive and simultaneously act upon the same message.
Data Consistency: Within a CAN network it is guaranteed that a message is simultaneously
accepted either by all nodes or by no node. Thus data consistency of a system is achieved by the
concepts of multicast and by error handling.
Bit rate
The speed of CAN may be different in different systems. However, in a given system the bit-rate is
uniform and fixed.
Priorities
The IDENTIFIER defines a static message priority during bus access.
Remote Data Request
By sending a REMOTE FRAME a node requiring data may request another node to send the
corresponding DATA FRAME. The DATA FRAME and the corresponding REMOTE FRAME are
named by the same IDENTIFIER.
Multi-master
When the bus is free any unit may start to transmit a message. The unit with the message of
higher priority to be transmitted gains bus access.
Arbitration
Whenever the bus is free, any unit may start to transmit a message. If 2 or more units start
transmitting messages at the same time, the bus access conflict is resolved by bitwise arbitration
using the IDENTIFIER. The mechanism of arbitration guarantees that neither information nor time
is lost. If a DATA FRAME and a REMOTE FRAME with the same IDENTIFIER are initiated at the
same time, the DATA FRAME prevails over the REMOTE FRAME. During arbitration every
transmitter compares the level of the bit transmitted with the level that is monitored on the bus. If
these levels are equal the unit may continue to send. When a ’recessive’ level is sent and a
’dominant’ level is monitored (see Bus Values), the unit has lost arbitration and must withdraw
without sending one more bit.
Safety
In order to achieve the utmost safety of data transfer, powerful measures for error detection,
signaling and self-checking are implemented in every CAN node.
Error Detection
For detecting errors the following measures have been taken:
- Monitoring (transmitters compare the bit levels to be transmitted with the bit levels detected
on the bus)
- Cyclic Redundancy Check
- Bit Stuffing
- Message Frame Check
Performance of Error Detection
The error detection mechanisms have the following properties:
- All global errors are detected.
- All local errors at transmitters are detected.
- Up to 5 randomly distributed errors in a message are detected.
- Burst errors of length less than 15 in a message are detected.
- Errors of any odd number in a message are detected.
Total residual error probability for undetected corrupted messages: less than message error
rate * 4.7 * 10-11.
Error Signaling and Recovery Time
Corrupted messages are flagged by any node detecting an error. Such messages are aborted
and will be retransmitted automatically. The recovery time from detecting an error until the start of
the next message is at most 31 bit times, if there is no further error.
Fault Confinement
CAN nodes are able to distinguish short disturbances from permanent failures.
Defective nodes are switched off.
Connections
The CAN serial communication link is a bus to which a number of units may be connected.
This number has no theoretical limit. Practically the total number of units will be limited by delay
times and/or electrical loads on the bus line.
Single Channel
The bus consists of a single channel that carries bits. From this data resynchronization
information can be derived. The way in which this channel is implemented is not fixed in this
specification. E.g. single wire (plus ground), two differential wires, optical fibers, etc.
Bus values
The bus can have one of two complementary logical values: ’dominant’ or ’recessive’.
During simultaneous transmission of ’dominant’ and ’recessive’ bits, the resulting bus value will be
’dominant’. For example, in case of a wired-AND implementation of the bus, the ’dominant’ level
would be represented by a logical ’0’ and the ’recessive’ level by a logical ’1’. Physical states (e.g.
electrical voltage, light) that represent the logical levels are not given in this specification.
Acknowledgment
All receivers check the consistency of the message being received and will acknowledge a
consistent message and flag an inconsistent message.
Sleep Mode / Wake-up
To reduce the system’s power consumption, a CAN-device may be set into sleep mode
without any internal activity and with disconnected bus drivers. The sleep mode is finished with a
wake-up by any bus activity or by internal conditions of the system. On wake-up, the internal
activity is restarted, although the MAC sub-layer will be waiting for the system’s oscillator to
stabilize and it will then wait until it has synchronized itself to the bus activity (by checking for
eleven consecutive ’recessive’ bits), before the bus drivers are set to "on-bus" again.
Message Transfer
Frame Formats
There are two different formats which differ in the length of the IDENTIFIER field: Frames
with the number of 11 bit IDENTIFIER are denoted Standard Frames. In contrast, frames containing
29 bit IDENTIFIER are denoted Extended Frames.
Frame Types
Message transfer is manifested and controlled by four different frame types:
- A DATA FRAME carries data from a transmitter to the receivers.
- A REMOTE FRAME is transmitted by a bus unit to request the transmission of the DATA
FRAME with the same IDENTIFIER.
- An ERROR FRAME is transmitted by any unit on detecting a bus error.
- An OVERLOAD FRAME is used to provide for an extra delay between the preceding and
the succeeding DATA or REMOTE FRAMEs.
DATA FRAMEs and REMOTE FRAMEs can be used both in Standard Frame Format and
Extended Frame Format; they are separated from preceding frames by an INTERFRAME SPACE.
DATA FRAME
A DATA FRAME is composed of seven different bit fields: START OF FRAME,
ARBITRATION FIELD, CONTROL FIELD, DATA FIELD, CRC FIELD, ACK FIELD, and END
OF FRAME. The DATA FIELD can be of length zero.
Base ID
The Base ID consists of 11 bits. It is transmitted in the order from ID-28 to ID-18. It is
equivalent to format of the Standard Identifier. The Base ID defines the Extended Frame’s base
priority.
Extended ID
The Extended ID consists of 18 bits. It is transmitted in the order of ID-17 to ID-0.
CRC FIELD (Standard Format as well as Extended Format) contains the CRC SEQUENCE
followed by a CRC DELIMITER
After the transmission / reception of the last bit of the DATA FIELD, CRC_RG contains the
CRC sequence.
CRC DELIMITER (Standard Format as well as Extended Format) The CRC SEQUENCE is
followed by the CRC DELIMITER which consists of a single ’recessive’ bit.
ACK DELIMITER
The ACK DELIMITER is the second bit of the ACK FIELD and has to be a ’recessive’ bit. As a
consequence, the ACK SLOT is surrounded by two ’recessive’ bits (CRC
DELIMITER, ACK DELIMITER).
11.2 Compiler
A compiler is a computer program (or set of programs) that translates text written in a
computer language (the source language) into another language (the target language). The original
sequence is usually called the source code and the output called object code. Commonly the output
has a form suitable for processing by other programs (e.g., a linker), but it may be a human-readable
text file.
The most common reason for wanting to translate source code is to create an executable
program. The name "compiler" is primarily used for programs that translate source code from a
high-level programming language to a lower level language (e.g., assembly language or machine
language). A program that translates from a low level language to a higher level one is a decompiler.
A program that translates between high-level languages is usually called a language translator,
source to source translator, or language converter. A language rewriter is usually a program that
translates the form of expressions without a change of language.
A compiler is likely to perform many or all of the following operations: lexical analysis,
preprocessing, parsing, semantic analysis, code generation, and code optimization.
11.3 Linker
A linker or link editor is a program that takes one or more objects generated by compilers
and assembles them into a single executable program.
Linkers can take objects from a collection called a library. Some linkers do not include the
whole library in the output; they only include its symbols that are referenced from other object files
or libraries. Libraries exist for diverse purposes, and one or more system libraries are usually linked
in by default.
The linker also takes care of arranging the objects in a program's address space. This may
involve relocating code that assumes a specific base address to another base. Since a compiler
seldom knows where an object will reside, it often assumes a fixed base location (for example,
zero). Relocating machine code may involve re-targeting of absolute jumps, loads and stores.
11.4 Debugger
A debugger is a program that is used to test and debug other programs. The code to be
examined might alternatively be running on an instruction set simulator (ISS), a technique that
allows great power in its ability to halt when specific conditions are encountered but which will
typically be much slower than executing the code directly on the appropriate processor.
When the program crashes, the debugger shows the position in the original code if it is a
source-level debugger or symbolic debugger, commonly seen in integrated development
environments. If it is a low-level debugger or a machine-language debugger it shows the line in the
disassembly. (A "crash" happens when the program cannot continue because of a programming bug.
For example, perhaps the program tried to use an instruction not available on the current version of
the CPU or attempted access to unavailable or protected memory.)
Typically, debuggers also offer more sophisticated functions such as running a program step
by step (single-stepping), stopping (breaking) (pausing the program to examine the current state) at
some kind of event by means of breakpoint, and tracking the values of some variables. Some
debuggers have the ability to modify the state of the program while it is running, rather than merely
to observe it.
The importance of a good debugger cannot be overstated. Indeed, the existence and quality
of such a tool for a given language and platform can often be the deciding factor in its use, even if
another language/platform is better-suited to the task. However, it is also important to note that
software can (and often does) behave differently running under a debugger than normally, due to the
inevitable changes the presence of a debugger will make to a software program's internal timing. As
a result, even with a good debugging tool, it is often very difficult to track down runtime problems
in complex multi-threaded or distributed systems.
Examples of debuggers: CodeView, DBG - A PHP Debugger and Profiler, DDD - Data
Display Debugger, Eclipse, TotalView, GNU Debugger (GDB), Insight, Interactive Disassembler.
A real-time operating system (RTOS) is the key to many embedded systems today and,
provides a software platform upon which to build applications. Not all embedded systems are
designed with an RTOS. Some embedded systems with relatively simple hardware or a small
amount of software application code might not require an RTOS. Many embedded systems with
moderate-to-large software applications require some form of scheduling, and these systems require
an RTOS.
Figure 2.1: High-level view of an RTOS, its kernel, and other components found in embedded
systems.
This diagram is highly simplified; remember that not all RTOS kernels conform to this exact
set of objects,scheduling algorithms, and services.
The scheduler is at the heart of every kernel. A scheduler provides the algorithms needed to
determine which task executes when. To understand how scheduling works, this section describes
the following topics:
• schedulable entities,
• multitasking,
• context switching,
• dispatcher, and
• scheduling algorithms.
12.3.1 Schedulable Entities
A schedulable entity is a kernel object that can compete for execution time on a system,
based on a predefined scheduling algorithm. Tasks and processes are all examples of schedulable
entities found in most kernels.
A task is an independent thread of execution that contains a sequence of independently
schedulable instructions. Some kernels provide another type of a schedulable object called a process.
Processes are similar to tasks in that they can independently compete for CPU execution time.
Processes differ from tasks in that they provide better memory protection features, at the expense of
performance and memory overhead. Note that message queues and semaphores are not schedulable
entities. These items are inter-task communication objects used for synchronization and
communication.
So, how exactly does a scheduler handle multiple schedulable entities that need to run
simultaneously? The answer is by multitasking. The multitasking discussions are carried out in the
context of uniprocessor environments.
12.3.2 Multitasking
Multitasking is the ability of the operating system to handle multiple activities within set
deadlines. A real-time kernel might have multiple tasks that it has to schedule to run. One such
multitasking scenario is illustrated in Figure 2.3.
In this scenario, the kernel multitasks in such a way that many threads of execution appear to
be running concurrently; however, the kernel is actually interleaving executions sequentially, based
on a preset scheduling algorithm. The scheduler must ensure that the appropriate task runs at the
right time. An important point to note here is that the tasks follow the kernel’s scheduling algorithm,
while interrupt service routines (ISR) are triggered to run because of hardware interrupts and their
established priorities.
As the number of tasks to schedule increases, so do CPU performance requirements. This
fact is due to increased switching between the contexts of the different threads of execution.
Each task has its own context, which is the state of the CPU registers required each time it is
scheduled to run. A context switch occurs when the scheduler switches from one task to another. To
better understand what happens during a context switch, let’s examine further what a typical kernel
does in this scenario. Every time a new task is created, the kernel also creates and maintains an
associated task control block (TCB). TCB’s are system data structures that the kernel uses to
maintain task-specific information. TCBs contain everything a kernel needs to know about a
particular task.
When a task is running, it’s context is highly dynamic. This dynamic context is maintained in
the TCB. When the task is not running, it’s context is frozen within the TCB, to be restored the next
time the task runs. A typical context switch scenario is illustrated in Figure 3. As shown in Figure 3,
when the kernel’s scheduler determines that it needs to stop running task 1 and start running task 2,
it takes the following steps:
1. The kernel saves task’s 1 context information in its TCB.
2. It loads task’s 2 context information from its TCB, which becomes the current thread of
execution.
3. The context of task 1 is frozen while task 2 executes, but if the scheduler needs to run task
1 again, task 1 continues from where it left off just before the context switch.
The time it takes for the scheduler to switch from one task to another is the context switch
time. It is relatively insignificant compared to most operations that a task performs. If an
application’s design includes frequent context switching, however, the application can incur
unnecessary performance overhead. Therefore, design applications in a way that does not involve
excess context switching. Every time an application makes a system call, the scheduler has an
opportunity to determine if it needs to switch contexts. When the scheduler determines a context
switch is necessary, it relies on an associated module, called the dispatcher, to make that switch
happen.
The dispatcher is the part of the scheduler that performs context switching and changes the
flow of execution. At any time an RTOS is running, the flow of execution, also known as flow of
control, is passing through one of three areas: through an application task, through an ISR, or
through the kernel. When a task or ISR makes a system call, the flow of control passes to the kernel
to execute one of the system routines provided by the kernel. When it is time to leave the kernel, the
dispatcher is responsible for passing control to one of the tasks in the user’s application. It will not
necessarily be the same task that made the system call. It is the scheduling algorithms of the
scheduler that determines which task executes next. It is the dispatcher that does the actual work of
context switching and passing execution control.
Depending on how the kernel is first entered, dispatching can happen differently. When a
task makes system calls, the dispatcher is used to exit the kernel after every system call completes.
In this case, the dispatcher is used on a call-by-call basis so that it can coordinate task-state
transitions that any of the system calls might have caused. (One or more tasks may have become
ready to run, for example.)
On the other hand, if an ISR makes system calls, the dispatcher is bypassed until the ISR
fully completes its execution. This process is true even if some resources have been freed that would
normally trigger a context switch between tasks. These context switches do not take place because
the ISR must complete without being interrupted by tasks. After the ISR completes execution, the
kernel exits through the dispatcher so that it can then dispatch the correct task.
As mentioned earlier, the scheduler determines which task runs by following a scheduling
algorithm (also known as scheduling policy). Most kernels today support two common scheduling
algorithms:
• preemptive priority-based scheduling, and
• round-robin scheduling.
The RTOS manufacturer typically predefines these algorithms; however, in some cases,
developers can create and define their own scheduling algorithms. Each algorithm is described next.
Real-time kernels generally support 256 priority levels, in which 0 is the highest and 255 the
lowest. Some kernels appoint the priorities in reverse order, where 255 is the highest and 0 the
lowest. Regardless, the concepts are basically the same. With a preemptive priority-based scheduler,
each task has a priority, and the highest-priority task runs first. If a task with a priority higher than
the current task becomes ready to run, the kernel immediately saves the current task s context in its
TCB and switches to the higher-priority task. As shown in Figure 4 task 1 is preempted by higher-
priority task 2, which is then preempted by task 3. When task
3 completes, task 2 resumes; likewise, when task 2 completes, task 1 resumes.
Although tasks are assigned a priority when they are created, a task’s priority can be changed
dynamically using kernel-provided calls. The ability to change task priorities dynamically allows an
embedded application the flexibility to adjust to external events as they occur, creating a true real-
time, responsive system. Note, however, that misuse of this capability can lead to priority
inversions, deadlock, and eventual system failure.
Round-Robin Scheduling
Round-robin scheduling provides each task an equal share of the CPU execution time. Pure
round-robinn scheduling cannot satisfy real-time system requirements because in real-time systems,
tasks perform work of varying degrees of importance. Instead, preemptive, priority-based scheduling
can be augmented with round-robin scheduling which uses time slicing to achieve equal allocation
of the CPU for tasks of the same priority as shown in Figure 2.5.
With time slicing, each task executes for a defined interval, or time slice, in an ongoing
cycle, which is the round robin. A run-time counter tracks the time slice for each task, incrementing
on every clock tick. When one task s time slice completes, the counter is cleared, and the task is
placed at the end of the cycle. Newly added tasks of the same priority are placed at the end of the
cycle, with their run-time counters initialized to 0.
If a task in a round-robin cycle is preempted by a higher-priority task, its run-time count is
saved and then restored when the interrupted task is again eligible for execution. This idea is
illustrated in Figure 5, in which task 1 is preempted by a higher-priority task 4 but resumes where it
left off when task 4 completes.
12.4 Objects
Kernel objects are special constructs that are the building blocks for application development
for real-time embedded systems. The most common RTOS kernel objects are
• Tasks are concurrent and independent threads of execution that can compete for CPU execution
time.
• Semaphores are token-like objects that can be incremented or decremented by tasks for
synchronization or mutual exclusion.
• Message Queues are buffer-like data structures that can be used for synchronization, mutual
exclusion, and data exchange by passing messages between tasks. Developers creating real-time
embedded applications can combine these basic kernel objects (as well as others not mentioned
here) to solve common real-time design problems, such as concurrency, activity synchronization,
and data communication. These design problems and the kernel objects used to solve them are
discussed in more detail in later chapters.
12.4.1 Tasks
12.4.1.1 Introduction
Simple software applications are typically designed to run sequentially , one instruction at a
time, in a pre-determined chain of instructions. However, this scheme is inappropriate for real-time
embedded applications, which generally handle multiple inputs and outputs within tight time
constraints. Real-time embedded software applications must be designed for concurrency.
Concurrent design requires developers to decompose an application into small, schedulable,
and sequential program units. When done correctly, concurrent design allows system multitasking to
meet performance and timing requirements for a real-time system. Most RTOS kernels provide task
objects and task management services to facilitate designing concurrency within an application.
A task is an independent thread of execution that can compete with other concurrent tasks for
processor execution time. As mentioned earlier, developers decompose applications into multiple concurrent
tasks to optimize the handling of inputs and outputs within set time constraints.
A task is schedulable. The task is able to compete for execution time on a system, based on a
predefined scheduling algorithm. A task is defined by it’s distinct set of parameters and supporting
data structures. Specifically, upon creation, each task has an associated name, a unique ID, a priority
(if part of a preemptive scheduling plan), a task control block (TCB), a stack, and a task routine, as
shown in Figure 2.6).
Together, these components make up what is known as the task object.
Figure 2.6: A task, its associated parameters, and supporting data structures.
When the kernel first starts, it creates its own set of system tasks and allocates the
appropriate priority for each from a set of reserved priority levels. The reserved priority levels refer
to the priorities used internally by the RTOS for its system tasks. An application should avoid using
these priority levels for its tasks because running application tasks at such level may affect the
overall system performance or behavior. For most RTOSes, these reserved priorities are not
enforced. The kernel needs it’s system tasks and their reserved priority levels to operate. These
priorities should not be modified. Examples of system tasks include:
• initialization or startup task initializes the system and creates and starts system tasks,
• idle task uses up processor idle cycles when no other activity is present,
• logging task logs system messages,
• exception-handling task handles exceptions, and
• debug agent task allows debugging with a host debugger. Note that other system tasks
might be created during initialization, depending on what other components are included with the
kernel.
The idle task, which is created at kernel startup, is one system task that bears mention and
should not be ignored. The idle task is set to the lowest priority, typically executes in an endless
loop, and runs when either no other task can run or when no other tasks exist, for the sole purpose of
using idle processor cycles. The idle task is necessary because the processor executes the instruction
to which the program counter register points while it is running. Unless the processor can be
suspended, the program counter must still point to valid instructions even when no tasks exist in the
system or when no tasks can run. Therefore, the idle task ensures the processor program counter is
always valid when no other tasks are running.
In some cases, however, the kernel might allow a user-configured routine to run instead of
the idle task in order to implement special requirements for a particular application. One example of
a special requirement is power conservation. When no other tasks can run, the kernel can switch
control to the user-supplied routine instead of to the idle task. In this case, the user-supplied routine
acts like the idle task but instead initiates power conservation code, such as system suspension, after
a period of idle time.
After the kernel has initialized and created all of the required tasks, the kernel jumps to a
predefined entry point (such as a predefined function) that serves, in effect, as the beginning of the
application. From the entry point, the developer can initialize and create other application tasks , as
well as other kernel objects, which the application design might require. As the developer creates
new tasks, the developer must assign each a task name, priority, stack size, and a task routine. The
kernel does the rest by assigning each task a unique ID and creating an associated TCB and stack
space in memory for it.
Whether it's a system task or an application task, at any time each task exists in one of a small
number of states, including ready, running, or blocked. As the real-time embedded system runs, each
task moves from one state to another, according to the logic of a simple finite state machine (FSM).
Figure 2.7 illustrates a typical FSM for task execution states, with brief descriptions of state
transitions.
Figure 2.7: A typical finite state machine for task execution states.
Although kernels can define task-state groupings differently, generally three main states are
used in most typical preemptive-scheduling kernels, including:
• ready state-the task is ready to run but cannot because a higher priority task is executing.
• blocked state-the task has requested a resource that is not available, has requested to wait
until some event occurs, or has delayed itself for some duration.
• running state-the task is the highest priority task and is running.
Note some commercial kernels, such as the VxWorks kernel, define other, more granular
states, such as suspended, pended, and delayed. In this case, pended and delayed are actually sub-
states of the blocked state. A pended task is waiting for a resource that it needs to be freed; a delayed
task is waiting for a timing delay to end. The suspended state exists for debugging purposes. For
more detailed information on the way a particular RTOS kernel implements its FSM for each task,
refer to the kernel's user manual.
Regardless of how a kernel implements a task's FSM, it must maintain the current state of all
tasks in a running system. As calls are made into the kernel by executing tasks, the kernel's
scheduler first determines which tasks need to change states and then makes those changes.
In some cases, the kernel changes the states of some tasks, but no context switching occurs
because the state of the highest priority task is unaffected. In other cases, however, these state
changes result in a context switch because the former highest priority task either gets blocked or is
no longer the highest priority task. When this process happens, the former running task is put into
the blocked or ready state, and the new highest priority task starts to execute.
The following describe the ready, running, and blocked states in more detail. These
descriptions are based on a single-processor system and a kernel using a priority-based preemptive
scheduling algorithm.
Ready State
When a task is first created and made ready to run, the kernel puts it into the ready state. In
this state, the task actively competes with all other ready tasks for the processor's execution time. As
Figure 2.7 shows, tasks in the ready state cannot move directly to the blocked state. A task first
needs to run so it can make a blocking call, which is a call to a function that cannot immediately run
to completion, thus putting the task in the blocked state. Ready tasks, therefore, can only move to
the running state. Because many tasks might be in the ready state, the kernel's scheduler uses the
priority of each task to determine which task to move to the running state.
For a kernel that supports only one task per priority level, the scheduling algorithm is
straightforward-the highest priority task that is ready runs next. In this implementation, the kernel
limits the number of tasks in an application to the number of priority levels.
However, most kernels support more than one task per priority level, allowing many more
tasks in an application. In this case, the scheduling algorithm is more complicated and involves
maintaining a task-ready list . Some kernels maintain a separate task-ready list for each priority
level; others have one combined list.
Figure 2.8 illustrates, in a five-step scenario, how a kernel scheduler might use a task-ready
list to move tasks from the ready state to the running state. This example assumes a single-processor
system and a priority-based preemptive scheduling algorithm in which 255 is the lowest priority and
0 is the highest. Note that for simplicity this example does not show system tasks, such as the idle
task.
Figure 2.8: Five steps showing the way a task-ready list works.
In this example, tasks 1, 2, 3, 4, and 5 are ready to run, and the kernel queues them by
priority in a task-ready list. Task 1 is the highest priority task (70); tasks 2, 3, and 4 are at the next-
highest priority level (80); and task 5 is the lowest priority (90). The following steps explains how a
kernel might use the task-ready list to move tasks to and from the ready state:
1. Tasks 1, 2, 3, 4, and 5 are ready to run and are waiting in the task-ready list.
2. Because task 1 has the highest priority (70), it is the first task ready to run. If nothing
higher is running, the kernel removes task 1 from the ready list and moves it to the running state.
3. During execution, task 1 makes a blocking call. As a result, the kernel moves task 1 to the
blocked state; takes task 2, which is first in the list of the next-highest priority tasks (80), off the
ready list; and moves task 2 to the running state.
4. Next, task 2 makes a blocking call. The kernel moves task 2 to the blocked state; takes
task 3, which is next in line of the priority 80 tasks, off the ready list; and moves task 3 to the
running state.
5. As task 3 runs, frees the resource that task 2 requested. The kernel returns task 2 to the
ready state and inserts it at the end of the list of tasks ready to run at priority level 80. Task 3
continues as the currently running task.
Although not illustrated here, if task 1 became unblocked at this point in the scenario, the
kernel would move task 1 to the running state because its priority is higher than the currently
running task (task 3). As with task 2 earlier, task 3 at this point would be moved to the ready state
and inserted after task 2 (same priority of 80) and before task 5 (next priority of 90).
Running State
On a single-processor system, only one task can run at a time. In this case, when a task is
moved to the running state, the processor loads its registers with this task's context. The processor
can then execute the task's instructions and manipulate the associated stack. A task can move back to
the ready state while it is running. When a task moves from the running state to the ready state, it is
preempted by a higher priority task. In this case, the preempted task is put in the appropriate,
priority-based location in the task-ready list, and the higher priority task is moved from the ready
state to the running state.
Unlike a ready task, a running task can move to the blocked state in any of the following
ways:
• by making a call that requests an unavailable resource,
• by making a call that requests to wait for an event to occur, and
• by making a call to delay the task for some duration.
• In each of these cases, the task is moved from the running state to the blocked state, as
described next.
Blocked State
The possibility of blocked states is extremely important in real-time systems because without
blocked states, lower priority tasks could not run. If higher priority tasks are not designed to block,
CPU starvation can result.
CPU starvation occurs when higher priority tasks use all of the CPU execution time and
lower priority tasks do not get to run.
A task can only move to the blocked state by making a blocking call, requesting that some
blocking condition be met. A blocked task remains blocked until the blocking condition is met. (It
probably ought to be called the un blocking condition, but blocking is the terminology in common
use among real-time programmers.) Examples of how blocking conditions are met include the
following:
• a semaphore token (described later) for which a task is waiting is released,
• a message, on which the task is waiting, arrives in a message queue, or
• a time delay imposed on the task expires.
When a task becomes unblocked, the task might move from the blocked state to the ready
state if it is not the highest priority task. The task is then put into the task-ready list at the
appropriate priority-based location, as described earlier.
However, if the unblocked task is the highest priority task, the task moves directly to the
running state (without going through the ready state) and preempts the currently running task. The
preempted task is then moved to the ready state and put into the appropriate priority-based location
in the task-ready list.
In addition to providing a task object, kernels also provide task-management services . Task-
management services include the actions that a kernel performs behind the scenes to support tasks,
for example, creating and maintaining the TCB and task stacks.
A kernel, however, also provides an API that allows developers to manipulate tasks. Some of
the more common operations that developers can perform with a task object from within the
application include:
• creating and deleting tasks,
• controlling task scheduling, and
• obtaining task information.
Developers should learn how to perform each of these operations for the kernel selected for
the project. Each operation is briefly discussed next.
Task Scheduling
From the time a task is created to the time it is deleted, the task can move through various
states resulting from program execution and kernel scheduling. Although much of this state
changing is automatic, many kernels provide a set of API calls that allow developers to control when
a task moves to a different state ( Suspend, Resume, Delay, Restart, Get Priority, Set Priority,
Preemption lock, Preemption unlock ).
Using manual scheduling, developers can suspend and resume tasks from within an
application. Doing so might be important for debugging purposes or, as discussed earlier, for
suspending a high-priority task so that lower priority tasks can execute.
A developer might want to delay (block) a task, for example, to allow manual scheduling or
to wait for an external condition that does not have an associated interrupt. Delaying a task causes it
to relinquish the CPU and allow another task to execute. After the delay expires, the task is returned
to the task-ready list after all other ready tasks at its priority level. A delayed task waiting for an
external condition can wake up after a set time to check whether a specified condition or event has
occurred, which is called polling.
A developer might also want to restart a task, which is not the same as resuming a suspended
task. Restarting a task begins the task as if it had not been previously executing. The internal state
the task possessed at the time it was suspended (for example, the CPU registers used and the
resources acquired) is lost when a task is restarted. By contrast, resuming a task begins the task in
the same internal state it possessed when it was suspended.
Restarting a task is useful during debugging or when reinitializing a task after a catastrophic
error. During debugging, a developer can restart a task to step through its code again from start to
finish. In the case of catastrophic error, the developer can restart a task and ensure that the system
continues to operate without having to be completely reinitialized.
Getting and setting a task s priority during execution lets developers control task scheduling
manually. This process is helpful during a priority inversion , in which a lower priority task has a
shared resource that a higher priority task requires and is preempted by an unrelated medium-
priority task. A simple fix for this problem is to free the shared resource by dynamically increasing
the priority of the lower priority task to that of the higher priority task allowing the task to run and
release the resource that the higher priority task requires and then decreasing the former lower
priority task to its original priority.
Finally, the kernel might support preemption locks , a pair of calls used to disable and enable
preemption in applications. This feature can be useful if a task is executing in a critical section of
code : one in which the task must not be preempted by other tasks.
12.4.2 Semaphores
12.4.2.1 Introduction
Multiple concurrent threads of execution within an application must be able to synchronize
their execution and coordinate mutually exclusive access to shared resources. To address these
requirements, RTOS kernels provide a semaphore object and associated semaphore management
services.
A semaphore is like a key that allows a task to carry out some operation or to access a
resource. If the task can acquire the semaphore, it can carry out the intended operation or access the
resource. A single semaphore can be acquired a finite number of times. In this sense, acquiring a
semaphore is like acquiring the duplicate of a key from an apartment manager when the apartment
manager runs out of duplicates, the manager can give out no more keys. Likewise, when a
semaphore s limit is reached, it can no longer be acquired until someone gives a key back or releases
the semaphore.
The kernel tracks the number of times a semaphore has been acquired or released by
maintaining a token count, which is initialized to a value when the semaphore is created. As a task
acquires the semaphore, the token count is decremented; as a task releases the semaphore, the count
is incremented.
If the token count reaches 0, the semaphore has no tokens left. A requesting task, therefore,
cannot acquire the semaphore, and the task blocks if it chooses to wait for the semaphore to become
available.
The task-waiting list tracks all tasks blocked while waiting on an unavailable semaphore.
These blocked tasks are kept in the task-waiting list in either first in/first out (FIFO) order or highest
priority first order.
When an unavailable semaphore becomes available, the kernel allows the first task in the
task-waiting list to acquire it. The kernel moves this unblocked task either to the running state, if it is
the highest priority task, or to the ready state, until it becomes the highest priority task and is able to
run. Note that the exact implementation of a task-waiting list can vary from one kernel to another.
A kernel can support many different types of semaphores, including binary, counting, and
mutual-exclusion (mutex) semaphores.
Binary Semaphores
A binary semaphore can have a value of either 0 or 1. When a binary semaphore s value is 0,
the semaphore is considered unavailable (or empty); when the value is 1, the binary semaphore is
considered available (or full). Note that when a binary semaphore is first created, it can be initialized
to either available or unavailable (1 or 0, respectively). The state diagram of a binary semaphore is
shown in Figure 2.10.
Counting Semaphores
A counting semaphore uses a count to allow it to be acquired or released multiple times.
When creating a counting semaphore, assign the semaphore a count that denotes the number of
semaphore tokens it has initially.
If the initial count is 0, the counting semaphore is created in the unavailable state. If the
count is greater than 0, the semaphore is created in the available state, and the number of tokens it
has equals its count, as shown in Figure 2.11.
One or more tasks can continue to acquire a token from the counting semaphore until no
tokens are left. When all the tokens are gone, the count equals 0, and the counting semaphore moves
from the available state to the unavailable state. To move from the unavailable state back to the
available state, a semaphore token must be released by any task. Note that, as with binary
semaphores, counting semaphores are global resources that can be shared by all tasks that need
them. This feature allows any task to release a counting semaphore token. Each release operation
increments the count by one, even if the task making this call did not acquire a token in the first
place.
Some implementations of counting semaphores might allow the count to be bounded. A
bounded count is a count in which the initial count set for the counting semaphore, determined when
the semaphore was first created, acts as the maximum count for the semaphore. An unbounded count
allows the counting semaphore to count beyond the initial count to the maximum value that can be
held by the count s data type (e.g., an unsigned integer or an unsigned long value).
As opposed to the available and unavailable states in binary and counting semaphores, the
states of a mutex are unlocked or locked (0 or 1, respectively). A mutex is initially created in the
unlocked state, in which it can be acquired by a task. After being acquired, the mutex moves to the
locked state. Conversely, when the task releases the mutex, the mutex returns to the unlocked state.
Note that some kernels might use the terms lock and unlock for a mutex instead of acquire and
release.
Depending on the implementation, a mutex can support additional features not found in
binary or counting semaphores. These key differentiating features include ownership, recursive
locking, task deletion safety, and priority inversion avoidance protocols.
Mutex Ownership
Ownership of a mutex is gained when a task first locks the mutex by acquiring it.
Conversely, a task loses ownership of the mutex when it unlocks it by releasing it. When a task
owns the mutex, it is not possible for any other task to lock or unlock that mutex. Contrast this
concept with the binary semaphore, which can be released by any task, even a task that did not
originally acquire the semaphore.
Recursive Locking
Many mutex implementations also support recursive locking , which allows the task that
owns the mutex to acquire it multiple times in the locked state. Depending on the implementation,
recursion within a mutex can be automatically built into the mutex, or it might need to be enabled
explicitly when the mutex is first created.
The mutex with recursive locking is called a recursive mutex . This type of mutex is most
useful when a task requiring exclusive access to a shared resource calls one or more routines that
also require access to the same resource. A recursive mutex allows nested attempts to lock the mutex
to succeed, rather than cause deadlock , which is a condition in which two or more tasks are blocked
and are waiting on mutually locked resources. The problem of recursion and deadlocks is discussed
later in this chapter, as well as later in this book.
As shown in Figure 12, when a recursive mutex is first locked, the kernel registers the task
that locked it as the owner of the mutex. On successive attempts, the kernel uses an internal lock
count associated with the mutex to track the number of times that the task currently owning the
mutex has recursively acquired it. To properly unlock the mutex, it must be released the same
number of times.
In this example, a lock count tracks the two states of a mutex (0 for unlocked and 1 for
locked), as well as the number of times it has been recursively locked (lock count > 1). In other
implementations, a mutex might maintain two counts: a binary value to track its state, and a separate
lock count to track the number of times it has been acquired in the lock state by the task that owns it.
Do not confuse the counting facility for a locked mutex with the counting facility for a
counting semaphore. The count used for the mutex tracks the number of times that the task owning
the mutex has locked or unlocked the mutex. The count used for the counting semaphore tracks the
number of tokens that have been acquired or released by any task. Additionally, the count for the
mutex is always unbounded, which allows multiple recursive accesses.
Task Deletion Safety
Some mutex implementations also have built-in task deletion safety. Premature task deletion
is avoided by using task deletion locks when a task locks and unlocks a mutex. Enabling this
capability within a mutex ensures that while a task owns the mutex, the task cannot be deleted.
Typically protection from premature deletion is enabled by setting the appropriate initialization
options when creating the mutex.
Priority Inversion Avoidance
Priority inversion commonly happens in poorly designed real-time embedded applications.
Priority inversion occurs when a higher priority task is blocked and is waiting for a resource being
used by a lower priority task, which has itself been preempted by an unrelated medium-priority task.
In this situation, the higher priority task’s priority level has effectively been inverted to the lower
priority task s level.
Enabling certain protocols that are typically built into mutexes can help avoid priority
inversion. Two common protocols used for avoiding priority inversion include:
• priority inheritance protocol ensures that the priority level of the lower priority task that
has acquired the mutex is raised to that of the higher priority task that has requested the mutex when
inversion happens.
The priority of the raised task is lowered to its original value after the task releases the mutex
that the higher priority task requires.
• ceiling priority protocol ensures that the priority level of the task that acquires the mutex
is automatically set to the highest priority of all possible tasks that might request that mutex when it
is first acquired until it is released.
When the mutex is released, the priority of the task is lowered to its original value.
12.4.2.3 Typical Semaphore Operations
Typical operations that developers might want to perform with the semaphores in an
application include:
• creating and deleting semaphores,
• acquiring and releasing semaphores,
• clearing a semaphore s task-waiting list, and
• getting semaphore information.
Creating and Deleting Semaphores
Several things must be considered when creating and deleting semaphores. If a kernel
supports different types of semaphores, different calls might be used for creating binary, counting,
and mutex semaphores, as follows:
• binary specify the initial semaphore state and the task-waiting order.
• counting specify the initial semaphore count and the task-waiting order.
• mutex specify the task-waiting order and enable task deletion safety, recursion, and
priority-inversion avoidance protocols, if supported.
Semaphores can be deleted from within any task by specifying their IDs and making
semaphore-deletion calls.
Deleting a semaphore is not the same as releasing it. When a semaphore is deleted, blocked
tasks in its task-waiting list are unblocked and moved either to the ready state or to the running state
(if the unblocked task has the highest priority). Any tasks, however, that try to acquire the deleted
semaphore return with an error because the semaphore no longer exists. Additionally, do not delete a
semaphore while it is in use (e.g., acquired). This action might result in data corruption or other
serious problems if the semaphore is protecting a shared resource or a critical section of code.
Acquiring and Releasing Semaphores
The operations for acquiring and releasing a semaphore might have different names,
depending on the kernel: for example, take and give , sm_p and sm_v , pend and post , and lock and
unlock . Regardless of the name, they all effectively acquire and release semaphores.
Tasks typically make a request to acquire a semaphore in one of the following ways:
• Wait forever task remains blocked until it is able to acquire a semaphore.
• Wait with a timeout task remains blocked until it is able to acquire a semaphore or until a
set interval of time, called the timeout interval , passes. At this point, the task is removed from the
semaphore’s task-waiting list and put in either the ready state or the running state.
• Do not wait task makes a request to acquire a semaphore token, but, if one is not available,
the task does not block.
Note that ISRs can also release binary and counting semaphores. Note that most kernels do
not support ISRs locking and unlocking mutexes, as it is not meaningful to do so from an ISR. It is
also not meaningful to acquire either binary or counting semaphores inside an ISR.
Any task can release a binary or counting semaphore; however, a mutex can only be released
(unlocked) by the task that first acquired (locked) it. Note that incorrectly releasing a binary or
counting semaphore can result in losing mutually exclusive access to a shared resource or in an I/O
device malfunction.
For example, a task can gain access to a shared data structure by acquiring an associated
semaphore. If a second task accidentally releases that semaphore, this step can potentially free a
third task waiting for that same semaphore, allowing that third task to also gain access to the same
data structure. Having multiple tasks trying to modify the same data structure at the same time
results in corrupted data.
12.5 Services
Along with objects, most kernels provide services that help developers create applications for
real-time embedded systems. These services comprise sets of API calls that can be used to perform
operations on kernel objects or can be used in general to facilitate timer management, interrupt
handling, device I/O, and memory management. Again, other services might be provided; these
services are those most commonly found in RTOS kernels.