You are on page 1of 51

UNIT I

Introduction to Embedded Systems


What is the difference between an embedded system and a general computing
system?
Before going to know about embedded system it is necessary to understand general computing
system like PC.

Personal computer runs with microprocessor at higher speed and is designed for running
multiple tasks at a time.

But in embedded systems working with micro controllers (designed to run a single
task only at a particular time) don't have capability to run many tasks at a time.

Embedded system is a complex system where one or more peripheral devices interact with
each other by using available resources.

For example let us take an example of trending technology IOT Data logger.

Moreover, home weather stations are also being developed for health monitoring and
safety reasons.

In today's world PC is also involving to make embedded devices. But how to build such systems
using IOT and embedded systems.

Coming back to data logging application,

 The Data logger is a device that collects data from the environment around us may be
external environment parameters outside (Rainfall, Temperature, Humidity, Sun
duration, Atmospheric pressure.
 The collected data is then transmitted to server installed at remote location. The data
can be monitored anywhere.
Role of PC in embedded system

You can run an embedded application but to store data you need a data base.

For this purpose MySQL data base is used and to view the real time data front end software
like PHP are used.

General computing systems meant to be used for generic things. Say for example, desktop PC
can be used for various use cases. It can be used for office work, it can work as multimedia
console, it can be used for software development, it can be used as email client etc etc. So based
on whatever software you run, it works for different applications.

Now coming back to embedded systems, they are designed for a specific functions..An
embedded video streamer works as video streaming device, a pulse oximeter works to detect
oxigen saturation in blood, a thermostat works to monitor and control room temperature, take a
simple example of remote and it just works to control your TV. So embedded systems designed
and work for a specific functions.

Some of the embedded systems might use generic computing hardware and OS but still they
provide only specific functions. For example, I have worked on a digital video recorder which
pretty much uses PC hardware and Linux OS but when the system boots it works as DVR. All the
other user interfaces are disabled and runs the application which is meant for video recorder.

_________________________________________________________________

Page 1
A Brief about Embedded System their Classifications and Applications
An embedded system is an electronic system that has software and is embedded in computer hardware. It
is programmable or non- programmable depending on the application. An Embedded system is defined as
a way of working, organizing, performing single or multiple tasks according to a set of rules. In an
embedded system, all the units assemble and work together according to the program. Examples of
embedded systems include numerous products such as microwave ovens, washing machine, printers,
automobiles, cameras, etc. These systems use microprocessors, microcontrollers as well as processors like
DSPs. This article gives an overview of what is an embedded system and types of embedded system.

The important characteristics of embedded systems are speed, size, power, reliability, accuracy,
adaptability. Therefore, when the embedded system performs the operations at high speed, then it can be
used for real -time applications. The Size of the system and power consumption should be very low, and
then the system can be easily adaptable for different situations.

What is an embedded system?

An Embedded system is a combination of computer hardware and software. As with any electronic
system, this system requires a hardware platform and that is built with a microprocessor
or microcontroller. The Embedded system hardware includes elements like user interface, Input/output
interfaces, display and memory, etc. Generally, an embedded system comprises power supply, processor,
memory, timers, serial communication ports and system application specific circuits.

Embedded system software is written in a high-level language, and then compiled to achieve a specific
function within a non-volatile memory in the hardware. Embedded system software is designed to keep in
view of three limits. They are availability of system memory and processor speed. When the system runs
endlessly, there is a need to limit the power dissipation for events like run, stop and wake up.

Types of Embedded Systems

Embedded systems can be classified into different types based on performance, functional requirements
and performance of the microcontroller.

Page 2
Embedded systems are classified into four categories based on their performance and functional
requirements:

 Stand alone embedded systems


 Real time embedded systems
 Networked embedded systems
 Mobile embedded systems

Embedded Systems are classified into three types based on the performance of the microcontroller such as

 Small scale embedded systems


 Medium scale embedded systems
 Sophisticated embedded systems

Stand Alone Embedded Systems

Stand alone embedded systems do not require a host system like a computer, it works by itself. It takes
the input from the input ports either analog or digital and processes, calculates and converts the data and
gives the resulting data through the connected device-Which either controls, drives or displays the
connected devices. Examples for the stand alone embedded systems are mp3 players, digital cameras,
video game consoles, microwave ovens and temperature measurement systems.

Real Time Embedded Systems

A real time embedded system is defined as, a system which gives a required o/p in a particular time.
These types of embedded systems follow the time deadlines for completion of a task. Real time embedded
systems are classified into two types such as soft and hard real time systems.

Networked Embedded Systems

These types of embedded systems are related to a network to access the resources. The connected network
can be LAN, WAN or the internet. The connection can be any wired or wireless. This type of embedded
system is the fastest growing area in embedded system applications. The embedded web server is a type
of system wherein all embedded devices are connected to a web server and accessed and controlled by a
web browser. Example for the LAN networked embedded system is a home security system wherein all
sensors are connected and run on the protocol TCP/IP

Mobile Embedded Systems

Mobile embedded systems are used in portable embedded devices like cell phones, mobiles, digital
cameras, mp3 players and personal digital assistants, etc. The basic limitation of these devices is the other
resources and limitation of memory.

Page 3
Small Scale Embedded Systems

These types of embedded systems are designed with a single 8 or 16-bit microcontroller that may even be
activated by a battery. For developing embedded software for small scale embedded systems, the main
programming tools are an editor, assembler, cross assembler and integrated development environment
(IDE).

Medium Scale Embedded Systems

These types of embedded systems design with a single or 16 or 32 bit microcontroller, RISCs or DSPs.
These types of embedded systems have both hardware and software complexities. For developing
embedded software for medium scale embedded systems, the main programming tools are C, C++, and
JAVA, Visual C++, and RTOS, debugger, source code engineering tool, simulator and IDE.

Sophisticated Embedded Systems

These types of embedded systems have enormous hardware and software complexities that may need
ASIPs, IPs, PLAs, scalable or configurable processors. They are used for cutting-edge applications that
need hardware and software Co-design and components which have to assemble in the final system.

Applications of Embedded Systems:

Embedded systems are used in different applications like automobiles, telecommunications, smart cards,
missiles, satellites, computer networking and digital consumer electronics.

Embedded Systems in Automobiles and in telecommunications

 Motor and cruise control system


 Body or Engine safety
 Entertainment and multimedia in car
 E-Com and Mobile access
 Robotics in assembly line
 Wireless communication
 Mobile computing and networking

Embedded Systems in Smart Cards, Missiles and Satellites

 Security systems
 Telephone and banking
 Defence and aerospace
 Communication

Embedded Systems in Peripherals & Computer Networking

 Displays and Monitors


 Networking Systems
 Image Processing
 Network cards and printers

Embedded Systems in Consumer Electronics

 Digital Cameras
 Set top Boxes
 High Definition TVs
 DVDs

Page 4
What are the characteristics of embedded system?
Embedded systems come in a variety of shapes and sizes, from the largest multiple-rack data storage or
networking powerhouses to tiny modules such as your personal MP3 player or cellular handset.
Following are some of the usual characteristics of an embedded system:

 Contains a processing engine, such as a general-purpose microprocessor


 Typically designed for a specific application or purpose
 Includes a simple (or no) user interface, such as an automotive engine ignition controller
 Often is resource-limited. For example, it might have a small memory foot-print and no hard
drive
 Might have power limitations, such as a requirement to operate from batteries
 Not typically used as a general-purpose computing platform
 Generally has application software built in, not user-selected
 Ships with all intended application hardware and software pre-integrated
 Often is intended for applications without human intervention
Most commonly, embedded systems are resource-constrained compared to the typical desktop PC.
Embedded systems often have limited memory, small or no hard-drives, and sometimes no external
network connectivity. Frequently, the only user interface is a serial port and some LEDs. These and other
issues can present challenges to the embedded system developer. With advancements in IOT, embedded
systems are getting much more complex.

Components of an embedded system


Essentially an embedded system is a miniaturized computer. The following block diagram depicts a typical embedded system.

Modules in an embedded system


As with any computing device, it has the following components:
Processor:

Processor is the brain controls the entire system. It holds the logic circuitry that responds to and processes
the basic instructions that drives the system. It manipulates the control and data path to achieve the
expected functionality. There are many application-specific and general-purpose microprocessors
available. We will discuss in detail about the microprocessors in next post.
Memories:

Memories are the components that support the processor to hold data temporarily or permanently for
immediate use and or later use. Memories are of two types: non-volatile – capable of withholding data

Page 5
after power cycle and volatile – not capable for persistent data storage. Memories come in various
technologies and sizes that can be chosen based on the specific needs of the system.
Inputs/Outputs – IOs:

An embedded system responds to events from the external world and results in certain action be taken.
This is accomplished with the IOs – inputs and outputs. While a real external world event is in form of a
continuous analog signal (temperature reading of a furnace) or a discrete digital signal (on/off state of a
switch), the input presented to the embedded system is generally digitized. This digitization is achieved
using ADC‟s for the analog signal or comparator circuits for a digital signal or it might be presented
digitally using channels like an external message to the system. Similarly the outputs in the digital form
are converted to suit the real world using many available techniques to be discussed later.
User Interface:

While the IO‟s typically refers to interaction between the embedded system and the controlled system, the
user interface represents the interaction between the user and system. Various technologies are available
to implement the User interface including LCD‟s, touch panels, key pads, buttons etc. An intuitive design
enables easier and enhanced control of the system and this is becoming more of a mandatory requirement
rather than an option.
Power Supply:

Another most important consideration of the design is the power supply. The system may be powered
directly from the power line or using a battery depending up on the nature of the use. Its design requires
deep analysis and careful incorporation according to the system requirement. Improper designing of
power supply may not only affect the system performance but may end up damaging the system also.
Further nowadays with emphasis on a greener world, it is more important to design a power-efficient
system.
Mechanicals:

The mechanicals include cabinet and connectors. Cabinet protects the system from various external
factors and provides ambience to the internals. The connectors support the connection of the external
signals into the system

___________________________________________________________________________________

Timer/Counter
A timer is a specialized type of clock which is used to measure time intervals. A timer that counts from
zero upwards for measuring time elapsed is often called a stopwatch. It is a device that counts down
from a specified time interval and used to generate a time delay, for example, an hourglass is a timer.
A counter is a device that stores (and sometimes displays) the number of times a particular event or
process occurred, with respect to a clock signal. It is used to count the events happening outside the
microcontroller. In electronics, counters can be implemented quite easily using register-type circuits
such as a flip-flop.

Difference between a Timer and a Counter

The points that differentiate a timer from a counter are as follows −

Page 6
Timer Counter

The register incremented for every machine The register is incremented considering
cycle. 1 to 0 transition at its corresponding to
an external input pin (T0, T1).

Maximum count rate is 1/12 of the oscillator Maximum count rate is 1/24 of the
frequency. oscillator frequency.

A timer uses the frequency of the internal A counter uses an external signal to
clock, and generates delay. count pulses.

_________________________________________________________________________________

Memory Types

Many types of memory devices are available for use in modern computer systems. As an embedded
software engineer, you must be aware of the differences between them and understand how to use each
type effectively. In our discussion, we will approach these devices from the software developer's
perspective. Keep in mind that the development of these devices took several decades and that their
underlying hardware differs significantly. The names of the memory types frequently reflect the historical
nature of the development process and are often more confusing than insightful. Figure 1 classifies the
memory devices we'll discuss as RAM, ROM, or a hybrid of the two.

Figure 1. Common memory types in embedded systems

Types of RAM

The RAM family includes two important memory devices: static RAM (SRAM) and dynamic RAM
(DRAM). The primary difference between them is the lifetime of the data they store. SRAM retains its
contents as long as electrical power is applied to the chip. If the power is turned off or lost temporarily, its
contents will be lost forever. DRAM, on the other hand, has an extremely short data lifetime-typically
about four milliseconds. This is true even when power is applied constantly.

In short, SRAM has all the properties of the memory you think of when you hear the word RAM.
Compared to that, DRAM seems kind of useless. By itself, it is. However, a simple piece of hardware
called a DRAM controller can be used to make DRAM behave more like SRAM. The job of the DRAM
controller is to periodically refresh the data stored in the DRAM. By refreshing the data before it expires,
the contents of memory can be kept alive for as long as they are needed. So DRAM is as useful as SRAM
after all.

Page 7
When deciding which type of RAM to use, a system designer must consider access time and cost. SRAM
devices offer extremely fast access times (approximately four times faster than DRAM) but are much
more expensive to produce. Generally, SRAM is used only where access speed is extremely important. A
lower cost-per-byte makes DRAM attractive whenever large amounts of RAM are required. Many
embedded systems include both types: a small block of SRAM (a few kilobytes) along a critical data path
and a much larger block of DRAM (perhaps even Megabytes) for everything else.

Types of ROM

Memories in the ROM family are distinguished by the methods used to write new data to them (usually
called programming), and the number of times they can be rewritten. This classification reflects the
evolution of ROM devices from hardwired to programmable to erasable-and-programmable. A common
feature of all these devices is their ability to retain data and programs forever, even during a power
failure.

The very first ROMs were hardwired devices that contained a preprogrammed set of data or instructions.
The contents of the ROM had to be specified before chip production, so the actual data could be used to
arrange the transistors inside the chip. Hardwired memories are still used, though they are now called
"masked ROMs" to distinguish them from other types of ROM. The primary advantage of a masked
ROM is its low production cost. Unfortunately, the cost is low only when large quantities of the same
ROM are required.

One step up from the masked ROM is the PROM (programmable ROM), which is purchased in an
unprogrammed state. If you were to look at the contents of an unprogrammed PROM, you would see that
the data is made up entirely of 1's. The process of writing your data to the PROM involves a special piece
of equipment called a device programmer. The device programmer writes data to the device one word at a
time by applying an electrical charge to the input pins of the chip. Once a PROM has been programmed
in this way, its contents can never be changed. If the code or data stored in the PROM must be changed,
the current device must be discarded. As a result, PROMs are also known as one-time programmable
(OTP) devices.

An EPROM (erasable-and-programmable ROM) is programmed in exactly the same manner as a PROM.


However, EPROMs can be erased and reprogrammed repeatedly. To erase an EPROM, you simply
expose the device to a strong source of ultraviolet light. (A window in the top of the device allows the
light to reach the silicon.) By doing this, you essentially reset the entire chip to its initial--unprogrammed-
-state. Though more expensive than PROMs, their ability to be reprogrammed makes EPROMs an
essential part of the software development and testing process.

Hybrids

As memory technology has matured in recent years, the line between RAM and ROM has blurred. Now,
several types of memory combine features of both. These devices do not belong to either group and can
be collectively referred to as hybrid memory devices. Hybrid memories can be read and written as
desired, like RAM, but maintain their contents without electrical power, just like ROM. Two of the
hybrid devices, EEPROM and flash, are descendants of ROM devices. These are typically used to store
code. The third hybrid, NVRAM, is a modified version of SRAM. NVRAM usually holds persistent data.

EEPROMs are electrically-erasable-and-programmable. Internally, they are similar to EPROMs, but the
erase operation is accomplished electrically, rather than by exposure to ultraviolet light. Any byte within
an EEPROM may be erased and rewritten. Once written, the new data will remain in the device forever--
or at least until it is electrically erased. The primary tradeoff for this improved functionality is higher cost,
though write cycles are also significantly longer than writes to a RAM. So you wouldn't want to use an
EEPROM for your main system memory.

Page 8
Flash memory combines the best features of the memory devices described thus far. Flash memory
devices are high density, low cost, nonvolatile, fast (to read, but not to write), and electrically
reprogrammable. These advantages are overwhelming and, as a direct result, the use of flash memory has
increased dramatically in embedded systems. From a software viewpoint, flash and EEPROM
technologies are very similar. The major difference is that flash devices can only be erased one sector at a
time, not byte-by-byte. Typical sector sizes are in the range 256 bytes to 16KB. Despite this disadvantage,
flash is much more popular than EEPROM and is rapidly displacing many of the ROM devices as well.

The third member of the hybrid memory class is NVRAM (non-volatile RAM). Nonvolatility is also a
characteristic of the ROM and hybrid memories discussed previously. However, an NVRAM is
physically very different from those devices. An NVRAM is usually just an SRAM with a battery backup.
When the power is turned on, the NVRAM operates just like any other SRAM. When the power is turned
off, the NVRAM draws just enough power from the battery to retain its data. NVRAM is fairly common
in embedded systems. However, it is expensive--even more expensive than SRAM, because of the
battery--so its applications are typically limited to the storage of a few hundred bytes of system-critical
information that can't be stored in any better way.

Table 1 summarizes the features of each type of memory discussed here, but keep in mind that different
memory types serve different purposes. Each memory type has its strengths and weaknesses. Side-by-side
comparisons are not always effective.

Erase Max Erase Cost (per


Type Volatile? Writeable? Speed
Size Cycles Byte)
SRAM Yes Yes Byte Unlimited Expensive Fast
DRAM Yes Yes Byte Unlimited Moderate Moderate
Masked
No No n/a n/a Inexpensive Fast
ROM
Once, with a
PROM No device n/a n/a Moderate Fast
programmer
Yes, with a Limited
Entire
EPROM No device (consult Moderate Fast
Chip
programmer datasheet)
Limited Fast to read,
EEPROM No Yes Byte (consult Expensive slow to
datasheet) erase/write
Limited Fast to read,
Flash No Yes Sector (consult Moderate slow to
datasheet) erase/write
Expensive
NVRAM No Yes Byte Unlimited (SRAM + Fast
battery)

Table 1. Characteristics of the various memory types

________________________________________________________________________________________________________________

Data and Address bus concepts


According to computer architecture, a bus is defined as a system that
transfers data between hardware components of a computer or between two separate computers.

System bus is a single bus that helps all major components of a computer to communicate with each other. It is
made up of an address bus, data bus and a
control bus.

Page 9
Data bus and Address bus?

The data bus carries the data to be stored, while address bus carries the location to where it should be
stored.

1) Address Bus : Address bus is a part of the computer system bus that is dedicated for specifying a
physical address.

--> When the computer processor needs to read or write from or to the memory, it uses the address bus to
specify the physical address of the individual memory block it needs to access (the actual data is sent
along the data bus).

--> More correctly, when the processor wants to write some data to the memory, it will assert the write
signal, set the write address on the address bus and put the data on to the data bus.

Similarly, when the processor wants to read some data residing in the memory, it will assert the read
signal and set the read address on the address bus. After receiving this signal, the memory controller will
get the data from the specific memory block (after checking the address bus to get the read address) and
then it will place the data of the memory block on to the data bus.

Good to know : The size of the memory that can be addressed by the system determines the width of the
address bus and vice versa. For example, if the width of the address bus is 32 bits, the system can address
2^32 memory blocks (that is equal to 4GB memory space, given that one block holds 1 byte of data).

2) Data Bus : A data bus simply carries data.

Typically, the same data bus is used for both read/write operations.

--> When it is a write operation, the processor will put the data (to be written) on to the data bus.

--> When it is the read operation, the memory controller will get the data from the specific memory block
and put it in to the data bus.

Good to know :

--> The data bus consist of 8, 16, or 32 parallel lines.

--> The data bus is bidirectional bus, means the data can be transferred from CPU to main memory and
vice versa.

--> The number of data lines used in the data bus is equal to the size of data word being written or read.

Page 10
--> The data bus also connects the I/O ports and CPU. So, the CPU can write data to or read it from the
memory or I/O ports.

What is the difference between Address Bus and Data Bus?

-> Data bus is bidirectional, while address bus is unidirectional.

--> That means data travels in both directions but the addresses will travel in only one direction. The
reason for this is that unlike the data, the address is always specified by the processor.

The width of the data bus is determined by the size of the individual memory block, while the width of the
address bus is determined by the size of the memory that should be addressed by the system.

____________________________________________________________________________________

Embedded Processor and their types


Processor is the heart of an embedded system. It is the basic unit that takes inputs and produces an output
after processing the data. For an embedded system designer, it is necessary to have the knowledge of
both microprocessors and microcontrollers.

Processors in a System
A processor has two essential units −

 Program Flow Control Unit (CU)


 Execution Unit (EU)
The CU includes a fetch unit for fetching instructions from the memory. The EU has circuits that
implement the instructions pertaining to data transfer operation and data conversion from one form to
another.
The EU includes the Arithmetic and Logical Unit (ALU) and also the circuits that execute instructions
for a program control task such as interrupt, or jump to another set of instructions.
A processor runs the cycles of fetch and executes the instructions in the same sequence as they are
fetched from memory.

Types of Processors
Processors can be of the following categories −
 General Purpose Processor (GPP)
o Microprocessor
o Microcontroller
o Embedded Processor
o Digital Signal Processor
o Media Processor
 Application Specific System Processor (ASSP)
 Application Specific Instruction Processors (ASIPs)
 GPP core(s) or ASIP core(s) on either an Application Specific Integrated Circuit (ASIC) or a
Very Large Scale Integration (VLSI) circuit.

Page 11
Microprocessor
A microprocessor is a single VLSI chip having a CPU. In addition, it may also have other units such as
coaches, floating point processing arithmetic unit, and pipelining units that help in faster processing of
instructions.
Earlier generation microprocessors‟ fetch-and-execute cycle was guided by a clock frequency of order of
~1 MHz Processors now operate at a clock frequency of 2GHz

Microcontroller
A microcontroller is a single-chip VLSI unit (also called microcomputer) which, although having
limited computational capabilities, possesses enhanced input/output capability and a number of on-chip
functional units.

CPU RAM ROM

I/O Port Timer Serial COM Port

Microcontrollers are particularly used in embedded systems for real-time control applications with on-
chip program memory and devices.

Microprocessor vs. Microcontroller


Let us now take look at the most notable differences between a microprocessor and a microcontroller.

Microprocessor Microcontroller

Microprocessors are multitasking in Single task oriented. For example, a


nature. Can perform multiple tasks at washing machine is designed for
a time. For example, on computer we washing clothes only.
can play music while writing text in
text editor.

RAM, ROM, I/O Ports, and Timers RAM, ROM, I/O Ports, and Timers
can be added externally and can vary cannot be added externally. These
in numbers. components are to be embedded
together on a chip and are fixed in
numbers.

Page 12
Fixed number for memory or I/O
makes a microcontroller ideal for a
Designers can decide the number of limited but specific task.
memory or I/O ports needed.

External support of external memory Microcontrollers are lightweight


and I/O ports makes a and cheaper than a microprocessor.
microprocessor-based system heavier
and costlier.

External devices require more space A microcontroller-based system


and their power consumption is consumes less power and takes less
higher. space.

________________________________________________________________________________

Embedded System Design Process


The different steps in an embedded system design process include the following.

Embedded System Development Cycle

 Determine the requirements


 Design the system architecture
 Select the OS
 Choose the processor and peripherals
 Choose the development platform
 Code the applications and optimize
 Verify the software on the host system
 Verify the software on the target system
Page 13
Determine the Requirements

Functional and nonfunctional

 Multimode or multifunctional system


 Size, cost, weight, etc.

Choosing the hardware components

 Application specific hardware


 External interfaces
 Input and output devices

Design the System Architecture

The architecture of an embedded system depends on,

 Whether the system is real time


 Whether operating system needs to be embedded
 Cost, size, power consumption, etc.

Select the OS

If operating system we can select,

 Real-time operating systems like RTLinux, VX works, pSOS, QNX, VRTX, etc.
 Nonreal operating systems like Windows CE, embedded Windows XP, etc.

Choose the Processor

The following processors can be used in the development of an embedded system

 Microprocessors-8085, 8086, Pentium


 Microcontrollers-PIC, MCS-51, MSP-430, AVR
 Digital signal processor- dsPIC, Sharp, Blackfin,Tigersharc
 Choose the development platform

Choose the Development Platform

The development platforms of an embedded systems include the following

 The hardware platform


 The programming language
 The operating system
 The development tools

Code the Applications and Optimize

The coding of an embedded system can be done by using the following programming languages.

 Assembly language
 C language
 Object oriented languages like C++, Java, etc.
 Optimizing the code

Page 14
Verify the Software on the Host System

 Compile and assemble the source code into object file


 Use a simulator to simulate the working of the system

Verify the Software on the Target System

 Download the program using a programmer device


 Use an Emulator or on chip debugging tools to verify the software

Install the Program in the Chip

To install the developed code into a microcontroller needs the following two items

A Programmer Hardware

The hardware of an embedded system can communicate to both the microcontroller and the PC. This
allows it to get what the personal computer is saying and write those to the microcontroller chip. Here, the
USB interface is used to communicate to the PC and the ISP interface is used to communicate with the
MCU.

A Programmer Software

A programmer software of an embedded systems runs on your PC where you ran the IDE tool. The main
function of this tool is to read data from the hex file produced by a „C‟ compiler and transfer them to the
hardware allied on the USB port.

A Development Board

The final and most essential piece is a development board. This board makes it easy to work with
microcontroller while throughout the learning phase. A simple hardware development board has some
important features.

Power Supply Circuit

Power supply circuit helps simple connection with a DC motor. It helps in connecting with a DC adapter.
It alters 12V from an adapter to a 5v for an operation of a microcontroller. It also makes these 5volts
accessible in male headers so that the operator can get 5v for their operation. For instance, to power the
module you need to interface with a microcontroller. To make the practice comfortable, this unit also
includes a power switch to switch ON/OFF the whole board and an LED to specify the status of the
power of the board.

Crystal Oscillator

The crystal oscillator is the heart of the microcontroller unit. For exact timing of your application, you
require a crystal oscillator. It offers a temperature and voltage independent CLK source.

ISP Header

ISP header is used to modernize the program of the MCU without changing it from the board. Here the
programmer is linked using a cable.

Page 15
I/O Ports

I/O ports are used to connect the microcontroller board to the exterior world, all the peripherals are allied
using these ports. They are existing in male headers so that user can make a construction to them very
simply.

____________________________________________________________________________________

Programming Languages for Embedded Systems


C/C++
According to a 2016 survey by IEEE Spectrum, C and C++ took the top two spots for being the most
popular and used programming languages in embedded systems. This is unlikely to come as a surprise
to seasoned engineers, scientists, and hobbyists who are almost guaranteed to have used one or both of
these languages to a large extent at some point.
C was created in the early 70s by Dennis Ritchie as a high-level programming language in UNIX
operating systems. At the time, the Assembly language was largely used, which required many lines of
code to accomplish a task. The B programming language was then created to accomplish these tasks
with fewer lines of code, but it did not have data types or structures. C was then created, which
implemented features missing in B. C then became the standard programming language in UNIX
systems.
C++‟s genesis began in the late 70s by a PhD student named Bjarne Stroustup, who was using a
programming language called Simula. Eventually, he switched to C, which was much faster and
allowed low-level programming. But then Stroustup began to add features found in Simula to C. The
hybrid became “C with Classes” and, in 1983, was renamed to C++. The „++‟ in C indicates an
increment, so C++ is a nod to the C foundation in the language.
The difference between C and C++ is generally that it is that C is a procedural language meant to be
used in system programming and is more “lightweight” (requires less memory), whereas C++ is more
general and object-oriented.
Learning C or C++ is a great way to get started in embedded systems programming. Some say that if
you can learn C, you can learn any language. It also doesn‟t hurt that it‟s so widely used, even to this
day.

Rust
Just as C++ is to C, Rust is to C++. Rust is an open-source, general-purpose programming language
developed by Mozilla Research, primarily focusing on safety and integrity.
Rust began as a personal project by Graydon Hoare in 2006 and is a relatively new language, released
in 2015. It has quickly gained popularity and was voted as 2016 and 2017‟s favorite language by the
Stack Overflow community.
Rust‟s features include algebraic data types, type inference, and pattern matching, just to name a few.
There is some expectation that Rust will eventually overtake C++ in widespread use.

Python
While not traditionally associated with embedded systems, Python is beginning to be taken more
seriously in embedded systems applications. It is often the first language students will learn in a
computer science degree program and you'll find all sorts of interesting Easter eggs and nods to popular
culture hidden in its nomenclature.
Python was created by Guido Van Rossum in the 80s and was named after “Monty Python‟s Flying
Circus”. It‟s a general purpose, multi-programming paradigm language which focuses on readability
and writability, eliminating as much unnecessary writing as possible for straightforward code.

Page 16
Out of the box, Python might not be as useful for embedded programming as C or C++, but with
numerous libraries available, it's easy to implement features that make it just as useful. It is excellent for
automating testing, and collecting and analyzing data.

VHDL and Verilog


Hardware Description Languages are used a lot in FPGAs and in parallel programming applications.
They are quite different from many other types of languages in that they are usually based on hardware,
hence “hardware description”.
Verilog was invented in the early 80s as one of the first HDLs, used primarily in the modeling of
electronic systems. The language name is a shortened version of “VERIfication of LOGic”. The
program is based off of a hierarchy of modules to be able to describe a system. Verilog may be easier to
learn if you already have experience with programming in C.
VHDL was developed by the US Department of Defense in the late 80s, initially as a way to better
understand ASIC behavior. It eventually evolved into becoming an HDL, based off the Ada
programming language. VHDL is used frequently in industrial applications.

Embedded Systems – Tools


Compilers and Assemblers
Compiler
A compiler is a computer program (or a set of programs) that transforms the source code written in a
programming language (the source language) into another computer language (normally binary format).
The most common reason for conversion is to create an executable program. The name "compiler" is
primarily used for programs that translate the source code from a highlevel programming language to a
low-level language (e.g., assembly language or machine code).

Cross-Compiler
If the compiled program can run on a computer having different CPU or operating system than the
computer on which the compiler compiled the program, then that compiler is known as a cross-compiler.

Decompiler
A program that can translate a program from a low-level language to a high-level language is called a
decompiler.

Language Converter
A program that translates programs written in different high-level languages is normally called a
language translator, source to source translator, or language converter.
A compiler is likely to perform the following operations −

 Preprocessing

 Parsing

 Semantic Analysis (Syntax-directed translation)

 Code generation

 Code optimization

Assemblers
An assembler is a program that takes basic computer instructions (called as assembly language) and
converts them into a pattern of bits that the computer's processor can use to perform its basic operations.
An assembler creates object code by translating assembly instruction mnemonics into opcodes, resolving

Page 17
symbolic names to memory locations. Assembly language uses a mnemonic to represent each low-level
machine operation (opcode).

Debugging Tools in an Embedded System


Debugging is a methodical process to find and reduce the number of bugs in a computer program or a
piece of electronic hardware, so that it works as expected. Debugging is difficult when subsystems are
tightly coupled, because a small change in one subsystem can create bugs in another. The debugging
tools used in embedded systems differ greatly in terms of their development time and debugging
features. We will discuss here the following debugging tools −

 Simulators
 Microcontroller starter kits
 Emulator

Simulators
Code is tested for the MCU / system by simulating it on the host computer used for code development.
Simulators try to model the behaviour of the complete microcontroller in software.

Functions of Simulators
A simulator performs the following functions −
 Defines the processor or processing device family as well as its various versions for the target system.

 Monitors the detailed information of a source code part with labels and symbolic arguments as the
execution goes on for each single step.

 Provides the status of RAM and simulated ports of the target system for each single step execution.

 Monitors system response and determines throughput.

 Provides trace of the output of contents of program counter versus the processor registers.

 Provides the detailed meaning of the present command.

 Monitors the detailed information of the simulator commands as these are entered from the keyboard or
selected from the menu.

 Supports the conditions (up to 8 or 16 or 32 conditions) and unconditional breakpoints.

 Provides breakpoints and the trace which are together the important testing and debugging tool.

 Facilitates synchronizing the internal peripherals and delays.

Microcontroller Starter Kit


A microcontroller starter kit consists of −

 Hardware board (Evaluation board)


 In-system programmer
 Some software tools like compiler, assembler, linker, etc.
 Sometimes, an IDE and code size limited evaluation version of a compiler.
A big advantage of these kits over simulators is that they work in real-time and thus allow for easy
input/output functionality verification. Starter kits, however, are completely sufficient and the cheapest
option to develop simple microcontroller projects.

Page 18
Emulators
An emulator is a hardware kit or a software program or can be both which emulates the functions of one
computer system (the guest) in another computer system (the host), different from the first one, so that
the emulated behavior closely resembles the behavior of the real system (the guest).
Emulation refers to the ability of a computer program in an electronic device to emulate (imitate) another
program or device. Emulation focuses on recreating an original computer environment. Emulators have
the ability to maintain a closer connection to the authenticity of the digital object. An emulator helps the
user to work on any kind of application or operating system on a platform in a similar way as the
software runs as in its original environment.
_____________ End of Unit ONE ___________

Page 19
UNIT II
CISC and RISC
CISC is a Complex Instruction Set Computer. It is a computer that can address a large number of
instructions.
In the early 1980s, computer designers recommended that computers should use fewer instructions with
simple constructs so that they can be executed much faster within the CPU without having to use
memory. Such computers are classified as Reduced Instruction Set Computer or RISC.

CISC vs RISC
The following points differentiate a CISC from a RISC −

CISC RISC

Larger set of instructions. Easy to Smaller set of Instructions. Difficult to


program program.

Simpler design of compiler, considering Complex design of compiler.


larger set of instructions.

Many addressing modes causing Few addressing modes, fix instruction


complex instruction formats. format.

Instruction length is variable. Instruction length varies.

Higher clock cycles per second. Low clock cycle per second.

Emphasis is on hardware. Emphasis is on software.

Control unit implements large Each instruction is to be executed by


instruction set using micro-program hardware.
unit.

Slower execution, as instructions are Faster execution, as each instruction is


to be read from memory and decoded to be executed by hardware.
by the decoder unit.

Pipelining is not possible. Pipelining of instructions is possible,


considering single clock cycle.

________________________________________________________________________________

Von-Neumann Architecture vs Harvard Architecture


When data and code lie in different memory blocks, then the architecture is referred
as Harvard architecture. In case data and code lie in the same memory block, then the architecture is
referred as Von Neumann architecture.

Page 20
Von Neumann Architecture
The Von Neumann architecture was first proposed by a computer scientist John von Neumann. In
this architecture, one data path or bus exists for both instruction and data. As a result, the CPU does one
operation at a time. It either fetches an instruction from memory, or performs read/write operation on
data. So an instruction fetch and a data operation cannot occur simultaneously, sharing a common bus.

Von-Neumann architecture supports simple hardware. It allows the use of a single, sequential memory.
Today's processing speeds vastly outpace memory access times, and we employ a very fast but small
amount of memory (cache) local to the processor.

Harvard Architecture
The Harvard architecture offers separate storage and signal buses for instructions and data.
This architecture has data storage entirely contained within the CPU, and there is no access to the
instruction storage as data. Computers have separate memory areas for program instructions and data
using internal data buses, allowing simultaneous access to both instructions and data.
Programs needed to be loaded by an operator; the processor could not boot itself. In a
Harvard architecture, there is no need to make the two memories share properties.

Von-Neumann Architecture vs Harvard Architecture


The following points distinguish the Von Neumann Architecture from the Harvard Architecture.

Von-Neumann Architecture Harvard Architecture

Single memory to be shared by both Separate memories for code and


code and data. data.

Processor needs to fetch code in a Single clock cycle is sufficient,


separate clock cycle and data in another as separate buses are used to
clock cycle. So it requires two clock access code and data.
cycles.

Higher speed, thus less time consuming. Slower in speed, thus more time-
consuming.

Simple in design. Complex in design.

Page 21
ARM Architecture www.ti.com

1.2 ARM Architecture


ARM cores are designed specifically for embedded systems. The needs of embedded systems can be
satisfied only if features of RISC and CISC are considered together for processor design. So ARM
architecture is not a pure RISC architecture. It has a blend of both RISC and CISC features.
Table 1.1. ARM Architecture Features and Benefits

Features Benefits to embedded system


High Performance Ensures the system has a fast response
Low power consumption Makes the system more energy efficient
Low silicon area Reduces the size and also consumes less power
High Code density Helps embedded system to have less memory footprint
Used to load data from the memory to the ARM CPU register or store data
Load/store architecture from the CPU register to the memory; enables the memory access when
required
Register bank with large number Required to perform most of the operations within the CPU and provides
of working registers faster context switch in a multitasking applications

Fig 1.7. ARM 7 Architecture [an excerpt from Google image]

20 Embedded Systems and Features

Page 22
www.ti.com ARM Architecture

1.2.1 A Basic architecture of the ARM7core


ARM 7, the basic architecture of ARM series of cores, is introduced here in this section. A brief
introduction about each functional block of the architecture of ARM7 core shown in Figure.1.2 is
presented below.
The Register Bank has sixteen general purpose registers (R0-R15) and a current program status
register (CPSR) which are accessible by user applications. In addition to that, it has twenty numbers of
banked registers specifically used for different operating modes of ARM core. These are invisible to
user applications. The register bank has two read ports to read operand1 and operand2 and one write
port to write back the result of operation to the any register specified in the instruction. It has an
additional bidirectional port to update the program counter with address register and incrementer.
Address register content is incremented at every sequential byte access by the incrementer but the
program counter is incremented by four in ARM state of the core or is incremented by 2 in Thumb state
of the core at every instruction access. ARM and Thumb states of the core are discussed in section
1.3. Address register is directly connected to the address bus.
 The barrel shifter can shift or rotate operand 2 by specified number of bits prior to arithmetic or
logic operations.
 The 32 bit ALU performs the arithmetic and logic functions.
 The data in and data out registers hold the input and output data from and to the memory.
 The instruction decoder and associated control logic generates appropriate control signals for
the data path after decoding the fetched instruction.
 The MAC unit is to multiply two register operands and accumulate with another register
holding the partial sum of the products.
The encoded instruction byte of the program saved in the code memory is fetched through the data
bus and first enters into the data-in register of the ARM architecture from where it is delivered to the
instruction decoder. After the instruction is decoded, appropriate control signals are generated for the
data path. The required registers are activated in the register bank and the operands flow out from two
read ports of register bank to the ALU: operand1 through A-bus and operand2 through B-bus after
preprocessing at barrel shifter. The result of operation at ALU is written back to the result register
through a write port at register bank. For Load/Store instructions, after decoding the instruction, the
data memory address is first calculated at ALU as specified in the instruction and the pointer register is
updated at the register bank. The address in the pointer register is given to the address register to
access the memory and transfer data. If it is a load multiple or store multiple instruction, the core does
not halt before completing the required number of data transfers unless it is a reset exception.

1.2.2 Registers
Registers are for temporary data storage within processor architecture. As shown in Fig.1.8, ARM
processor has sixteen numbers of general purpose registers, R0-R15 and a current program status
register (CPSR) defined for user mode of operation. Each of these registers is of 32-bits. Out of these
registers, R13, R14 and R15 have special purposes.
R13: Used as the stack pointer that holds the address of the top of the stack in the current processor
mode.
R14: Used as the link register that saves the content of program counter on control transfer due to the
occurrence of exceptions or using the branch instructions in the program.
R15: Used as the program counter that points to the next instruction to be executed. In ARM state, all
instructions are of 32-bits (four bytes) for which, PC is always aligned to a word boundary. This means
that the least significant two bits of the PC are always zero. The PC can also be halfword (16-bit)
aligned for Thumb state (16 bit instructions) or byte aligned for Jazelle state (8-bit instructions)
supported by different versions of ARM architecture.

Embedded Systems and Features 21

Page 23
ARM Architecture www.ti.com

Fig 1.8. User mode register set

1.2.3 Current Program Status Register (CPSR)

Fig 1.9. A generic CPSR Format


CPSR, a 32-bit status register, holds the current state of the ARM core. As shown in Fig 1.4, the
register is divided into four different fields- flags, status, extension and control; each of 8-bits. The flag
field has the bit specification for four condition flags; N, Z, C and V and is used for arithmetic and logic
instructions.
 N-(Negation flag) = 1 indicates negative result from ALU.
 Z- (Zero flag) = 1 indicates zero result from ALU.

22 Embedded Systems and Features

Page 24
www.ti.com ARM Architecture

 C- (Carry flag) = 1 indicates ALU operation generated carry.


 V- (Overflow flag) =1 indicates ALU operation overflowed.
Most of the ARM instructions are conditionally executed. Based on the status of these condition flags,
condition codes are used along with instruction mnemonics to control whether or not the instruction
will be executed. Status and extension fields are reserved for future usage. In the control field, the
least significant five bits are used to save the modes of operation of ARM core. Processor mode can
be changed by directly modifying these control bits. The most significant three bits I, F and T have
significance as below:

 I = 1 indicates IRQ is disabled


0 indicates IRQ is enabled.

 F = 1 indicates FIQ is disabled


0 indicates FIQ is enabled.

 T= 1 indicates the Thumb state is active.


0 indicates ARM state is active.

These are processor specific features and are explained in detail in section 1.4.

1.2.4 Operating States


There could be three operating states of the ARM core depending on the implementation. These are
specified as ARM, Thumb and Jazelle. Correspondingly there are three instruction sets. The state of
the core determines which instruction set is currently used for execution. The status of Thumb T bit in
the CPSR register indicates whether the core is in ARM or Thumb state as explained in the previous
section. Similarly bit 24 of CPSR register reflects the status of Jazelle state if it is there in the
implementation. In ARM state, the core executes all 32 bits instructions. When core enters to Thumb
state, it executes all 16 bits Thumb instructions. In Jazelle state, instructions length is 8 bits only.
Since in Thumb and Jazzel state, the instructions lengths are reduced to half or one fourth of ARM
instruction length, high code density is achieved by using those states. So frequently called functions
are written in Thumb states to reduce the program length.

1.2.5 Operating Modes


ARM core has seven operating modes basically used to isolate users programs from the protected
memory or OS services. The operating modes are: user, system, fast interrupt request (FIQ), interrupt
request (IRQ), abort, supervisor and undefined mode. Out of these, only user mode is unprivileged,
remaining six are privileged modes. The basic difference between privileged and unprivileged mode is
the access permission to protected area of the memory and write access permission to CPSR_c given
to only privileged modes. All application programs run in user mode. All operating system kernel
functions and services run in system or supervisor mode. After reset, core enters to supervisor mode.
FIQ mode is for interrupt requesting faster response and low latency and IRQ mode correspond to the
low priority interrupt available on the processor itself. Processor enters abort mode to handle memory
access violation. In the execution flow, when processor encounters an instruction that is not supported
by the instruction set implementation, it enters to undefined mode. All exceptions are handled in
privileged modes. Privileged modes have complete read and write access to both flags and control
fields but unprivileged user mode has only read access to the control field while both read and write
access to the flags field. Processor mode is changed automatically by the occurrence of exceptions or

Embedded Systems and Features 23

Page 25
ARM Architecture www.ti.com

by modifying the control bits of CPSR by writing its binary pattern as shown in table 1.1, being in a
privileged mode.
Table 1.2. Processor mode with binary Pattern mode Control bits [4:0]

Abort (abt) 10111


Fast interrupt request( fiq) 10001
Interrupt request (irq) 10010
Supervisor( svc) 10011
System (sys) 11111
Undefined(und) 11011
User( usr) 10000

1.2.6 Programming Model


Programming model of a processor is basically a set of working registers used to perform the
operations defined in its instruction set. ARM programming model has total 37 registers in its register
bank which are segmented for different modes of operation as shown in Fig 1.10.User mode register
set is shared by the system mode also.
Each of the remaining privileged modes has a set of banked registers which are active and accessible
to the programmer only when the core enters to the corresponding mode. Banked registers for a
particular mode are physical replication of few of the user mode registers along with a saved program
status register (SPSR) shown by shading in the figure.
If the processor mode is changed, for example from user to FIQ mode due to occurrence of hardware
interrupt (fiq), the banked registersR8-R14 from the FIQ mode will replace the corresponding registers
in user mode but the remaining user mode registers (R0-R7) can still be used in FIQ mode after saving
the previous contents.
It means registers R8-R14 of user mode are unaffected by this mode change. The purpose of these
banked registers is to reduce the context saving overhead. There is only one dedicated PC (R15) and
one CPSR for all the operation modes.
When a mode is changed, the PC and CPSR contents are saved in the link register (R14) and SPSR
of the new mode respectively. While returning back to previous mode, special instructions are used to
restore back the saved register contents. There is no SPSR available in user mode and one important
feature is that, when a mode change is forced, CPSR content is not saved in SPSR. It happens only
when exception occurs.

24 Embedded Systems and Features

Page 26
www.ti.com ARM Architecture

Fig 1.10. Complete Register Bank

1.2.7 Interrupt and Exception Handling


Interrupt and exception handling mechanisms are used by any processor to respond to the I/O request
or when CPU intervention is required to handle any error that may occur during program execution.
CPU becomes forced to stop the normal execution to respond to the exception or interrupt at a faster
rate. ARM core supports seven exceptions including the hardware interrupts.

1.2.7.1 Interrupt Handling


Interrupt is used to get service from CPU by generating a request only when it is required. Two types
of interrupts are: hardware interrupt and software interrupt. Hardware interrupt comes from the
peripherals or I/O‟s connected in a system and software interrupt by executing an interrupt instruction.
When any interrupt occurs during program execution, CPU completes the current instruction
execution, saves the return address in the defined portion of the memory called the stack and then
responds to the interrupt. The control of program execution is transferred to the corresponding
interrupt vector address where interrupt service routine (ISR) is written. At the end of ISR execution,
the return address is retrieved from the stack and control resumes back at the address where it had
left. ARM core has two hardware interrupts FIQ and IRQ but those are considered also as exceptions
since core follows the same process of handling both interrupts and exceptions.

1.2.7.2 Exceptions Handling


An exception is any condition, an unexpected event or error that needs to halt the normal execution of
instructions. ARM core considers hardware interrupts under exceptions. It supports seven exceptions
as follows: reset, data abort, prefetch abort, FIQ, IRQ, SWI and undefined. Each of the ARM
exceptions is associated with certain operating mode and causes the ARM core to enter to that
particular mode automatically on exception entry. In the group of exceptions, there are supervisor calls
such as reset and software interrupts (SWI), undefined instruction trap, memory access failure such as
data abort and prefetch abort and hardware interrupts such as fast interrupt request (FIQ) and interrupt

Embedded Systems and Features 25

Page 27
ARM Architecture www.ti.com

request (IRQ). As more than one interrupt may occur simultaneously, exceptions are prioritized. In the
priorities list given in the table.2, the SWI and undefined exceptions have same priority as they are
mutually exclusive because both are caused by an instruction entering the execution stage. When any
exception occurs, the control transfers to the corresponding vector address. In the vector address,
certain branch instruction would be written to access the actual interrupt handler or ISR saved at a
different location. It is required as the vector addresses are sequentially separated by four bytes, only
one 32-bit ARM instruction can be written at each of it. The I and F bits in the control field of CPSR
register are disabled in combination while handling the exceptions. All the ARM exceptions are
executed in ARM state of the core. If the control is there in Thumb state it has to switch back to ARM
state to handle the exception. Table 1.2 gives information about all the attributes discussed here.
Table 1.3. Exceptions and attributes
Vector Address
Exceptions Operation Modes Priority I-bit in CPSR F-bit in CPSR
Low/High
Reset Supervisor 1 1 1 0x00000000/0xffff0000
Undefined Undefined 6 1 0 0x00000004/0xffff0004
SWI Supervisor 6 1 0 0x00000008/0xffff0008
Prefetch Abort Abort 5 1 0 0x0000000C/0xffff000C
Data Abort Abort 2 1 0 0x00000010/0xffff0010
IRQ IRQ 4 1 0 0x00000018/0xffff0018
FIQ FIQ 3 1 1 0x0000001C/0xffff001C

Exception Entry
When an exception occurs, ARM core goes through the following sequence of operations.
 It changes to the operating mode corresponding to the particular exception.
 It saves the return address, the content of PC (r15) in lr( r14) of the new mode.
 It saves the previous state of the core, the content of the CPSR in the SPSR of the new mode.
 It disables IRQs by setting bit 7 of the CPSR and, if the exception is a reset or fast interrupt,
disables further fast interrupts by setting bit 6 of the CPSR.
 PC is loaded with the vector address to begin executing the relevant exception handler.

Exception Exit
The user program is resumed once the interrupt handler execution is completed. The following steps
are followed in doing so.
 The saved context must be restored back from the handler's stack.
 The CPSR must be restored from the appropriate SPSR.
 The PC must be restored back from the link register of exception mode.

1.2.8 ARM Instruction Set


In any processor architecture, an instruction includes an opcode that specifies the operation to
perform, such as add contents of two registers or move data from a register to memory etc, with
specified operands, which may specify registers, memory locations, or immediate data. Instruction set
of a processor gives information about the instructions, addressing modes and the timing requirement
for the execution of each instruction. The instruction set is always specified by the processor designer.
Every processor implements its instruction set in the architecture. ARM Ltd being the processor core
designer and not the silicon manufacturer, it defines the instruction set to be implemented by the chip
manufacturers.
Features

26 Embedded Systems and Features

Page 28
www.ti.com ARM Architecture

ARM architecture has two instruction sets. The ARM instruction set and Thumb instruction set. In ARM
instruction set, all instructions are 32 bits wide and are aligned at 4-bytes boundaries in memory. On
the other hand, in thumb instruction set, all instructions are of 16 bits wide and are aligned at even or
two bytes boundaries in memory.
 The important features of the ARM and Thumb instruction set are:
 Most of the instructions are executed in one cycle.
 Load/Store architecture for accessing data from external memory with powerful auto-indexing
addressing modes.
 Inclusion of load and store multiple register instructions.
 3-address instructions: two source operand registers and the result register are all distinctly
specified.
 Data processing instructions act only on registers.
 Every instruction can be conditionally executed which improves the performance and code
density by reducing the number of branch instructions.
 The ability to execute a barrel shift operation and an ALU operation of a single complex
instruction in a single clock cycle.
 Inclusion of advanced DSP instructions in the ARM instruction set for the multiply and
accumulate (MAC) unit replaces the need of separate digital signal processor.
 Implementation of coprocessor instruction set with extension of the programming model.
 The Thumb instruction set is 16-bit compressed representation of the ARM instructions that
provides high code density.
ARM Instructions can be categorized into following broad classes:
 Data movement instructions
 Data Processing Instructions
o Arithmetic/logic Instructions
o Barrel shifting instructions
o Comparison Instructions
o Multiply Instructions
 Branch Instructions
 Load and store Instructions
o Load and Store register instruction
o Load and Store multiple register instructions
o Stack instructions
o Swap register and memory content
 Program Status register Instructions
o Set the values of the conditional code flag
o Set the values of the interrupt enable bit
o Set the processor mode
 Exception generating Instructions
o Software Interrupt Instruction
o Software Break Point instruction

1.2.8.1 Barrel Shifter


A unique and powerful feature of the ARM processor is the ability to shift or rotate the data in one of
the source registers by a specific number of positions before it enters the ALU, which increases the
functionality of many data processing operations.
Two operands are accessed from the register bank for data processing instructions as shown in Fig
1.11. Operand1 comes on the A-bus straight way to the ALU and operand2 comes on the B-bus is
preprocessed at the barrel shifter before entering to ALU. The barrel shifter is a combinational logic
unit that shifts or rotates the data as it is specified in the instruction. Then ALU performs the arithmetic

Embedded Systems and Features 27

Page 29
ARM Architecture www.ti.com

or logic operation on these operands and the result is written back into the register bank through the
ALU bus. Both the barrel shifting and ALU operations happen in the same instruction cycle.
The data processing instructions that do not use the barrel shift are: the MUL (multiply), CLZ (count
leading zeros), and QADD (signed saturated 32-bit add) instructions.

Fig 1.11. Barrel Shifter with ALU


Table 1.4 Barrel Shifting Operations
Mnemonic Description Shift Result
LSL Logical shift left xLSLy x<<y
LSR Logical shift right xLSRy (Unsigned)x>>y
ASR Arithmetic right shift xASRy (signed)x>>y
ROR Rotate right xRORy ((unsigned)x>>y) | (x<<(32 − y))
RRX Rotate right extended xRRX (c flag<<31) | ((unsigned)x>>1)
*Rm is operan2 in data processing operations.
Table 1.5. Barrel shift operation syntax for data processing instructions.
Shift Operations Syntax
Logical shift left „Rm‟ by shift immediate value Rm, LSL #shift_imm

Logical shift left„Rm‟ by the amount in register „Rs‟ Rm, LSL Rs

Logical shift right „Rm‟ by shift_ immediate value Rm, LSR #shift_imm
Logical shift right „Rm‟ by the amount in register „Rs‟ Rm, LSR Rs
Arithmetic shift right „Rm‟ by shift immediate value Rm, ASR #shift_imm
Arithmetic shift right „Rm‟ by register „Rs‟ Rm, ASR Rs
Rotate right „Rm‟ by shift immediate value Rm, ROR #shift_imm
Rotate right „Rm‟ by the amount in register „Rs‟ Rm, ROR Rs
Rotate right „Rm‟ with extend Rm, RRX

28 Embedded Systems and Features

Page 30
www.ti.com ARM Architecture

Example: Let the data in R1= 0X00000080, R2 = 0X00000004 and R3=0X000000FF


After executing the instruction MOV R3, R1, LSR R2
Data in R3= 0X00000008, R1=0X00000080 and R2= 0X00000004
Description:
 The data in register R1 is logically right shifted by the data that is there in R2.Which means
(0X00000080) >> (0X00000004). It becomes 0X00000008.

 Move this value to register R3. So R3=0X00000008 now. But data in R1 and R2 remain
unchanged.
Table 1.6. Instruction set table

Instruction Mnemonic Description Example Working

1. Data Movement instructions

Syntax: <instruction>{<condition>}{S} Rd, N


MOV r1, r2, LSL
MOV Move a 32-bit value into a register Move (r2<<4) to r1.
#4
Move the NOT of the 32-bit value into a
MVN MVN r1, r3 Move (~ r3) to r1.
register
2. Data Processing instructions

i. Arithmetic Instructions; Syntax:<instruction>{<cond>}{S} Rd, Rn, N


ADC Add two 32-bit values and carry ADC r1, r2, r3 r1= r2+r3+Carry
ADD r4, r5, r3, r4= r5+
ADD add two 32-bit values
LSR # r1 (r3>> by r1)
Reverse subtract with carry of two 32-bit r3= r1- r2 -
RSC RSC r3, r2, r1
values ! Carry
RSB Reverse subtract of two 32-bit values RSB r3, r2, r1 r3= r1- r2
SBC Subtract with carry of two 32-bit values SBC r2,r4, r6 r2=r4-r6- !Carry
SUB Subtract two 32-bit values SUB r2,r4, r6 r2=r4-r6
ii. Logical Instructions; Syntax:<instruction>{<cond>}{S} Rd, Rn, N
AND logical bitwise AND of two 32-bit values AND r7, r5, r2 r7= r5 & r2
ORR r6, r4, r1,
ORR logical bitwise OR of two 32-bit values r6= r4 | (r1>>r2)
LSR r2
EOR logical exclusive OR of two 32-bit values EOR r5, r1, r2 r5= r1 ^ r2
BIC logical bit clear (AND NOT) BIC r3, r1,r4 r3= r1 & ~ r4
iii. Comparison Instructions; Syntax:<instruction>{<cond>} Rn, N
Flags set as results
CMN Compare negated CMN r1, r2
of r1+r2
Flags set as results
CMP Compare CMP r1, # 0XFF
of r1-0XFF
Flags set as results
TEQ Test for equality of two 32-bit values TEQ r3, r5
of r3 ^ r5
Flags set as results
TST Test bits of a 32-bit values TST r1, r2
of r1& r2
IV. Multiply Instructions; Syntax:MLA{<cond>}{S} Rd, Rm, Rs, Rn; MUL{<cond>}{S} Rd, Rm, Rs
MLA Multiply and accumulate MLA r1,r2,r3,r4 r1=(r2*r3)+r4
MUL Multiply MUL r3, r7, r6 R3= r7*r6

Embedded Systems and Features 29

Page 31
ARM Architecture www.ti.com

3. Branch instructions

Syntax: B{<cond>} label; BL{<cond>} label; BX{<cond>} Rm; BLX{<cond>} label | Rm


B Branch B label PC= label
PC=label and
Lr= Address of the
BL Branch with link BL label
next instruction after
BL.
PC=r5 & 0Xfffffffe
BX Branch exchange BX r5
and T= r5 & 1
PC=r6 &
0Xfffffffe, T=r6 & 1
BLX Branch exchange with link BLX r6 and lr= address of
the next instruction
after BLX.
4. Load/Store Instructions

I. Single register transfer; Syntax:<LDR|STR>{<cond>} Rd, Address


Load r0 with the
LDR content of memory
LDR Load register from memory
r0, [r2, #0X8] address pointed to
by [r2+0X8]
Store r1 into the
memory address
STR
STR Store register to memory pointed to by r4 and
r1, [r4], #0X10
update r4 by
[r4+0X10]
II. Multiple register transfer; Syntax:<LDM|STM>{<cond>}<addressing mode> Rn{!},{registers};
Addressing modes: IA-Increment after; IB-Increment before; DA-Decrement after; DB-Decrement before:-
Increment or decrement the memory pointer after or before the data transfer.
r2=[r6];
LDMIA r3= [r6+4];
LDM Load multiple registers from memory
r6!, {r2-r4} r4=[r6+8] and
update r6 by [r6+12]
[r1-4]=r5
[r1-8]=r4
STMDB
STM Store multiple registers to memory [r1-12]=r3 and
r1!, {r3-r5}
update r1 by
[r1-12}

III. Stack Operations ; Syntax:<LDM|STM><addressing mode> SP{!},{registers};


Addressing modes: FA-Full ascending ; FD-Full descending ;EA-Empty ascending ;ED –Empty descending;

r1= [Sp+4]
r2= [Sp+8]
Load multiple registers from stack LDMED
LDM r3= [Sp+12] and Sp
memory Sp!, {r1, r3}
is updated by
[Sp+12]
[Sp-4]= r6
[Sp-8]= r5
STMFD
STM Store multiple registers to stack memory [Sp-12]=r4 and
Sp!, {r4,r6}
Sp is updated by
[Sp-12]
IV. Swap instruction ; Syntax: SWP{B}{<cond>} Rd,Rm,[Rn]
SWP swap a word between memory and a SWP/SWPB Load a 32 bit word

30 Embedded Systems and Features

Page 32
www.ti.com ARM Architecture

register r0, r1, [r2] or a byte from the


memory address in
r2 into r0 and store
swap a byte between memory and a the data in r1 to the
SWPB
register memory address in
r2.
5. Program status register instructions
MRS{<cond>} Rd,<cpsr|spsr>;MSR{<cond>} <cpsr|spsr>_<fields>,Rm
MSR{<cond>} <cpsr|spsr>_<fields>,#immediate
Move the content of cpsr or spsr to a Move the content of
MRS MOV r1, CPSR
register. CPSR register to r1.
Update the flag field
Move an immediate data or register to a
MSR MSR CPSR_f, r1 of CPSR by the
specific field of cpsr or spsr.
content in r1.
6. Exception generating instructions

Software interrupt instruction ; Syntax: SWI{<cond>} SWI_number (immediate 24 bit)


Software interrupt for an operating Execute software
system routine. interrupt at
SWI Change to Supervisor mode. SWI 0X123456 0X123456 in ARM
CPSR is saved in SPSR. state of the core.
Control branches to interrupt vector. T =0 in CPSR.

1.2.8.2 Addressing modes


Addressing mode is the way of addressing data or operand in the instruction. Every processor
instruction set offers different addressing modes to determine the address of operands. Some
fundamental addressing modes used by most of the processors are: register addressing, immediate
addressing, direct addressing and register indirect addressing. In register addressing mode, the
operand is held in a register which is specified in the instruction. In immediate addressing mode, the
operand is held in the instruction. In direct addressing mode, the operand resides in the memory
whose address is specified in the instruction. Similarly in register indirect addressing mode, the
operand is held in the memory whose address resides in a register that is specified in the instruction.
Section 1.8.1 gives a clear view of addressing modes supported by ARM instruction set.

1.2.8.2.1 ARM Addressing modes:


 Register Addressing: The operands are in the registers.
MOV R1, R2 // move content of R2 to R1 //

SUB R0, R1, R2 //subtract content of R2 from R1 and move the result to R0 //

 Relative Addressing: Address of the memory directly specified in the instruction.


Bsubroutine1// branch to suroutine1 //

BEQ LOOP // branch to LOOP if previous instruction sets the zero flag i.e, Z=1 //

 Immediate Addressing: Operand2 is an immediate value.


SUB R0, R0, #1// Save (R0 –1) to R0 //

MOV R0, #0xFF00 // Put 0xFF00 to R0 //


 Register Indirect Addressing: Address of the memory location that holds the operands there
in a register.
LDR R1, [R2]//Load R1 with the data pointed by register R2. //

Embedded Systems and Features 31

Page 33
ARM Architecture www.ti.com

ADD R0, R1, [R2]//add R1 with the data pointed by R2 and put the result into R0//

 Register Offset Addressing: Operand2 is in a register with some offset calculation.


MOV R0, R2, LSL #3 // (R2 << 3), then move to R0 //

AND R0, R1, R2, LSR R3// (R2 >> R3), logically AND with R1 and move result to R0 //

 Register based with Offset Addressing: Effective memory address has to be calculated
from a base address and an offset. Offset can be an immediate offset, register offset or scaled
register offset.
 Pre-Indexed Addressing
LDR R2, [R3, #0x0F] // Immediate offset.
// Take value in R3, add to 0x0F, use it as address and load data from that address to
R2 //

STR R1, [R0, -R2] // Register offset


// Use (R0-R2) as address of the memory and store data of R1 to that address.//

LDR R3, [R1, R2 LSR #8] // Scaled register offset//


// Use (R1+ (R2>>8)) as address and load the data from that address to R3. //

 Pre-Indexed with write back also called auto-indexing with pre-indexed addressing.
symbol indicates that the instruction saves the calculated address in the base address
register.
LDR R0, [R1, #4]! // Immediate offset //
// Use (R1+4) as address and load the data from that address to R0 and update R1 by
(R1+4)//

STR R1, [R2, R0]! // Register offset //


// Use (R2+R0) as address and store the data from R1 to that address. Update R2 by
(R2+R0) //

STR R3, [R1, R2 LSL #4]! // Scaled register offset //


// Use (R1+ (R2<<4)) as address and store the data from R3 to that address. Update
R1
by (R1+ (R2<<4)) //

 Post-Indexed also called auto-indexing with post-indexed addressing.


LDR R0, [R1], #4 // Immediate offset //
// Load the data pointed to by R1 to R0 and then update R1 by (R1+4). //

STR R1, [R3], R4 // Register offset //


// Store the data in R1 to the memory location pointed to by R3 and then update R3 by
(R3+R4)//

LDR R2, [R0], -R3, LSR #4 // Scaled register offset //


// Load the data from the address pointed to by R0 to R2 and then update R0 to (R0-
(R3>>4)). //

32 Embedded Systems and Features

Page 34
The Cortex-M Processor family
The Cortex-M processor family is more focused on the lower end of the performance scale.
However, these processors are still quite powerful when compared to other typical processors used
in most microcontrollers. For example, the Cortex-M4 and Cortex-M7 processors are being used in
many high-performance microcontroller products, with maximum clock frequency going up to over
200MHz.

Of course, performance is not the only factor when selecting a processor. In many applications, low
power and cost are the key selection criteria. Therefore, the Cortex-M processor family contains
various products to address different needs:

Descriptions
Cortex-M0 A very small processor (starting from 12K gates) for low cost, ultra low power
microcontrollers and deeply embedded applications
Cortex-M0+ The most energy-efficient processor for small embedded system. Similar size and
programmer’s model to the Cortex-M0 processor, but with additional features like
single cycle I/O interface and vector table relocations
Cortex-M1 A small processor design optimized for FPGA designs and provides Tightly Coupled
Memory (TCM) implementation using memory blocks on the FPGAs. Same
instruction set as the Cortex-M0
Cortex-M3 A small but powerful embedded processor for low-power microcontrollers that has
a rich instruction set to enable it to handle complex tasks quicker. It has a
hardware divider and Multiply-Accumulate (MAC) instructions. In addition, it also
has comprehensive debug and trace features to enable software developers to
develop their applications quicker
Cortex-M4 It provides all the features on the Cortex-M3, with additional instructions target at
Digital Signal Processing (DSP) tasks, such as Single Instruction Multiple Data
(SIMD) and faster single cycle MAC operations. In addition, it also have an optional
single precision floating point unit that support IEEE 754 floating point standard
Cortex-M7 High-performance processor for high-end microcontrollers and processing
intensive applications. It has all the ISA features available in Cortex-M4, with
additional support for double-precision floating point, as well as additional
memory features like cache and Tightly Coupled Memory (TCM)
Table : The Cortex-M processor family

Page 35
www.ti.com ARM Architecture

Microcontroller profile (Cortex -M)


Cortex M series of architectures have v6-M as cortex M0, M0+ and M1 and v7-M with Cortex M3, M4
and other successors. This series of architectures developed for deeply embedded microcontroller
profile, offer lowest gate count so smallest silicon area. These are flexible and powerful designs with
completely predictable and deterministic interrupt handling capabilities by introducing the nested vector
interrupt controller (NVIC). The small instruction sets support for high code density and simplified
software development. Developers are able to achieve 32-bit performance at 8-bit price. The very low
gate count of Cortex M0 facilitates its deployment in analog and mixed mode devices. Due to further
demanding applications requiring even better energy efficiency, Cortex M0+ was designed with two
stage pipeline and achieved high performance with very low dynamic power consumption, reduced
branch shadow and reduced number of flash memory access. Cortex M1 was designed for
implementation in FPGA. It is functionally a subset of Cortex M3 and runs ARM v6 instruction set with
OS extension options. It has 32-bit AHB lite bus interface, separate tightly coupled memory interface
and JTAG interface to facilitate debug options. It has three stage pipeline implementation and
configurable NVIC for reducing interrupt latency.

ARMv7-M architecture:
Key features for ARMv7-M architectures are:
 Enable implementations with industry excelling power, performance and silicon area
constraints with simple pipeline design.
 Highly predictable and deterministic operation with Single/low cycle instruction execution and
minimum interrupt latency with cache less memory design.
 Exception handlers are standard C/C++ functions align with ARM‟s programming standard.
 Debug and software profiling support.
 Cortex M3 is the first architecture of ARMv7-M profile. Subsequently the architecture was
enhanced by DSP extensions and named as Cortex M4. Cortex M3 a general purpose CPU,
has optimized debug options for microcontroller applications. It has only Thumb-2 processing
core which has blend of ARM 32-bits and Thumb 16 bits instructions which removes the need
of ARM-Thumb interworking and offers high code density at high energy efficiency. A
hardware divide instruction was introduced in the instruction set and a number of multiply
instructions are also available to improve data processing performance. It supports only two
modes of operation called thread and handler mode. User programs run in thread mode and
exceptions are handled in handler mode which is privileged. All exceptions could be
programmed in C/C++. NVIC is part of Cortex-M3 macrocell. It is a 32 bit core with 18 working
registers: r0-r7 as low registers, r8-r12 as high registers, three special purpose registers, r13
stack pointer, r14 link register, r15 the program counter, one program status register xPSR
and one more stack pointer banked for handler mode. The Cortex-M3 and Cortex-M4
processors also support unaligned data accesses, a feature previously available only in high-

Embedded Systems and Features 35

Page 36
ARM Architecture www.ti.com

end processors. Cortex M4 comes under the nomenclature of ARMv7E-M. It was developed
as a high performance digital signal controller with 72 DSP instructions implemented along
with Cortex M3 instruction set retained. Single cycle execution of multiply and accumulate
instructions provides 45% speed improvement compared to Cortex M3.

1.2.11.1 Cortex M4 Features:


 Thumb2 instruction set delivers the significant benefits of high code density of Thumb with
32-bit performance of ARM.
 Optional IEEE754-compliant single-precision Floating Point Unit.
 Code-patch ability for memory system updates.
 Power control optimization by integrating sleep and deep sleep modes.
 Hardware division and fast multiply and accumulate for SIMD DSP instructions.
 Saturating arithmetic for noise cancellation in signal processing.
 Deterministic, low latency interrupt handling for real time-critical applications.
 Optional Memory Protection Unit(MPU) for safety-critical applications
 Extensive implementation of debug, trace and code profiling capabilities.

Fig 1.14. Cortex M4 core architecture


The ARM Cortex-M4 architecture is built on a high-performance processing core, with a 3-stage
pipeline. Harvard architecture, optional IEEE754-compliant single-precision floating-point computation,
a range of single-cycle and SIMD multiplication and multiply-with-accumulate capabilities, saturating
arithmetic and dedicated hardware division features make it typically suitable for high precision digital
signal processing applications. The processor delivers excellent energy efficiency at high code density
and significantly improving interrupt handling and system debug capabilities. A generic system on chip
architecture of Cortex M4 is shown in fig 1.14. The brief description of each functional block is given
below.
Nested Vectored Interrupt Controller (NVIC):
Tightly integrated with the processor core, NVIC is a configurable Interrupt Controller used to deliver
excellent real time interrupt performance. Very low interrupt latency is achieved through its hardware
stacking registers. The processor automatically saves and retrieves its state on exception entry and
exit removing the code overhead from ISRs. It also has the ability of interrupting the load and store
multiple atomic instructions that provides faster interrupt response. The NVIC includes a Non

36 Embedded Systems and Features

Page 37
www.ti.com ARM Architecture

Maskable Interrupt (NMI) and can provide up to 256 interrupt priority levels for each of 240 interrupts it
supports. A higher priority interrupt can preempt the currently running ISR facilitating interrupt nesting.
Wake Up Interrupt Controller (WIC):
To optimize low-power designs, the NVIC integrates with an optional peripheral called Wake up
interrupt controller to implement sleep modes and an optional deep sleep function. When the WIC is
enabled, the power management unit powers down the processor and makes it enter deep sleep
mode. When the WIC receives an interrupt, it takes few clock cycles to wake-up the processor and
restore its state. So it adds to interrupt latency in deep sleep mode. WIC is not programmable and
operates completely with hardware signals.
Memory Protection Unit:
In embedded OS, MPU is used for safeguarding memory used for kernel functions from unauthorized
access by user program. In OS environment, when any untrusted user program tries to access
memory protected by MPU, the processor generates a memory manage fault causing a fault
exception. MPU divides the memory map into a number of regions defining memory attributes for
each. MPU separates and protects the code, data and stack for each task required for safety critical
embedded systems. MPU can be implemented to enforce privilege access rules and separate tasks. It
is an optional block in Cortex M4.
Bus Matrix:
The processor contains a bus matrix that arbitrates the processor core and optional Debug
Access Port (DAP) memory accesses to both the external memory system, the internal System Control
Spaces and to various debug components. It arbitrates requests from different bus masters in the
system. Bus matrix is connected to the code interface for accessing the code memory, SRAM and
peripheral interface for data memory and other peripherals and the optional MPU for managing
different memory regions.
Debug Access Port (DAP):
DAP, the implementation of ARM debug interface enables debug access to various master ports on
the ARM SoC. It provides system access for the debugger tool using AHB-AP, APB-AP and JTAG-AP
without halting the processor. Embedded Trace Macrocell (ETM) generates instruction trace.
Instrumentation Trace Macrocell (ITM) allows software-generated debug messages and also to
generate timestamp information. Data Watchpoint and Trace (DWT) unit can be used to generate data
trace, event trace, and profiling trace information. Flash patch and break point (FPB) implements
hardware breakpoints, patches code and data from Code space to System space. Serial wire viewer
(SWV) is one bit ETM port. SWV provides different types of information like program counter values,
data read and write cycles, peripheral values, event counters and exceptions.
Floating Point Unit (FPU):
Cortex M4 architecture suggests an optional FPU which is IEEE 754 single precision compliant. The
core instruction set supports various signal processing operations. It executes single instruction
multiple data (SIMD) instructions with 16 bit data types. Floating point core supports addition,
multiplication and hardware division. It has a 32X32 multiply and accumulate (MAC) unit that produces
64 bit results. Embedded signal processing applications that involve data compression, statistical
signal processing, measuring, filtering and compressing real world analog signals can use Cortex M4
with FPU.
Floating point unit supports:
 Conversions between fixed point and floating point data formats and instructions with floating
point immediate data.
 Saturation math.
 Decouple 3-stage pipeline.

Embedded Systems and Features 37

Page 38
ARM Architecture www.ti.com

 Three modes of operations: full compliance mode, flush-to-zero mode and default NaN mode.
 To be disabled when it is not in use to conserve energy.

1.2.12 Operating States and Operating Modes:

Fig 1.15. Operating States of ARM Core


Cortex M4 has two operating states: thumb state and debug state as shown in fig 1.15. When it is
executing any instruction, it is in thumb state. The core operates on thumb2 technology. Most of the
instructions are 16 bit thumb instructions in addition to few 32 bit instructions. When any debug request
is received from the debugger on the host computer, or the execution flow hits with a break point
instruction, the core gets halted and enters to the debug state. Cortex M4 supports both JTAG and
serial wire viewer debug ports. When the debug condition is removed, the core is unhalted and
reenters to the thumb state.

38 Embedded Systems and Features

Page 39
www.ti.com ARM Architecture

Fig 1.16. Operating Modes of ARM Core


Cortex M4 has two operating modes: the thread mode and the handler mode as shown in fig 1.12.
After reset, the core enters to the thread mode and executes the OS kernel program or the initialization
code with a privileged access level. The thread mode with privileged access level can enter to an
unprivileged access level under program control to safe guard the trust zone of system software. But
the thread mode with unprivileged access level cannot revert back to a privileged access level. If
required so, the processor core has to make use of the exception mechanism. When an exception
occurs, the NVIC automatically saves the user program context to the selected stack of thread mode.
The core enters the handler mode and executes the exception handler code. In handler mode, the
core always has privileged access level. When it returns from the exception handler, it returns back to
the thread mode after restoring its state and context.
The implementation of privileged and unprivileged access mechanism ensures protection of memory
accesses to critical regions of code. In a system having an embedded OS, the kernel executes at
privileged access level and application tasks execute at unprivileged access level. The memory access
permissions can be assigned appropriately to application tasks using the Memory Protection Unit
(MPU) and shared memory corruption can be prevented.

1.2.13 Programming Model:


The programming model of the Cortex-M4 processors has 18 working registers shown in Fig 1.17.
Thirteen of them are general purpose 32-bit registers, three have special uses and two stack pointers.

Embedded Systems and Features 39

Page 40
ARM Architecture www.ti.com

Fig 1.17. Programming model


R0 - R12
Registers R0 to R12 are general purpose registers. The first eight (R0 - R7) are called low registers
and used by most of the 16-bit instructions due to the limited available bits in the instruction encoding
format. The high registers (R8 - R12) can be used with 32-bit instructions, and a few 16-bit
instructions.

R13: stack pointer (SP)


There are two different Stack Pointers in the register bank. The Main Stack Pointer (MSP) and the
process stack pointer (PSP). After reset, the processor core enters thread mode. MSP is the default
stack pointer selected. It is also used when the processor enters handler mode. The PSP can only be
used in Thread Mode by the user application code with unprivileged access level. In normal program
flow, only one of these Stack Pointers is visible. PSP is basically used when the stack of the OS kernel
and application tasks are needed to be separated. All applications may not require embedded OS. In
such cases PSP is not used and MSP is used both in thread and handler modes.

R14: link register (LR)


Link register is used to hold the return address to respond to a control transfer instruction while calling
a function or subroutine. At the end of the subroutine, the value of LR is loaded into the program

40 Embedded Systems and Features

Page 41
www.ti.com ARM Architecture

counter (PC), so that program control can resume the calling program. If a function needs to call
another function, it needs to save the value of LR in the stack before entering to the new function.

R15/ program counter (PC)


Program Counter points to the next instruction to be executed. It can also be used as a general
purpose register that can be both read and updated. When PC is used as a destination register, it
causes a branch operation. Since the instructions are of 16 bits or 32 bits length, the PC must be
aligned to half-word or word boundary by making the Least Significant Bit of it always zero.
Bit[0] of PC is loaded into the EPSR T-bit at reset and must be 1.

Program status registers (PSR)


The Program Status Register shown in Fig 1.18 is composed of three status registers:

 Application PSR (APSR)


 Interrupt PSR (IPSR)
 Execution PSR (EPSR)

Fig 1.18 xPSR diagram


The first row in the above PSR shows 32 bit APSR. N, Z, C, V and Q are negation, zero, carry,
overflow and DSP saturation flags respectively. These are the current state of the condition flags. The
second row indicates IPSR, whose bits [0-8] give current exception handler number. The third row is
the EPSR which contains the thumb state bit T and execution state bit for either IF-Then instruction or
an interruptible continual instruction.

Embedded Systems and Features 41

Page 42
TM4C123x Microcontrollers

Introduction Key highlights Benefits


The TM4C123x MCUs provide a broad • ARM Cortex-M4 core with floating • 12-bit ADC accuracy achievable at
portfolio of connected Cortex®-M4 point the full 1 MSPS rating without any
microcontrollers. Designers who • CPU speed up to 80 MHz hardware averaging, eliminating
migrate to the TM4C123x MCUs • Up to 256-KB Flash performance tradeoffs
benefit from a balance between the • Up to 32-KB single-cycle SRAM and • First ARM Cortex-M MCU in
floating-point performance needed to 2-KB EEPROM advanced 65-nm process technology
create highly responsive mixed-signal • Two high-speed 12-bit ADCs up to provides the right balance between
applications and the low-power archi- 1 MSPS higher performance and low power
tecture required to enable increasingly • Up to two CAN 2.0 A/B controllers consumption
aggressive power budgets. TM4C123x • Optional full-speed USB 2.0 OTG/ • ARM Cortex-M4 with floating
MCUs are supported by TivaWare™ for Host/Device point accelerates math-intensive
C Series software, designed specifical- • Up to 40 PWM outputs operations and simplifies digital
ly for those customers who want to get • Serial communication with up to: signal processing implementations
started easily, write production-ready 8 UARTs, 6 I2Cs, 4 SPI/SSI • Range of pin-compatible memory
code quickly, and minimize their overall • Intelligent low-power design power and package configurations enables
cost of software ownership. consumption as low as 1.6 µA optimal selection of devices

Applications
• Connectivity
• Sensor aggregation
• Security and access control
TM4C123x Temperatures 85°C 105°C
• Home and building automation
® • Industrial automation
ARM® Memory Power & Clocking • Human machine interface
Cortex -M4 Up to 256 KB Flash Precision Oscillator • Lighting control
Up to 80 MHz Up to 32 KB SRAM RTC Battery-Backed Hibernate • Energy
2 KB EEPROM • Data acquisition
FPU MPU ROM • System management
DMA (32 ch)
System Modules
NVIC ETM SWD/T
6× 32-bit Timer/PWM/CCP
6× 64-bit Timer/PWM/CCP
Debug Systick Timer
Real-time JTAG 2× Watchdog Timer

Control Peripherals Comms Peripherals Analog


2× Quadrature Encoder 8× UART 2× 12ch, 12-bit ADCs,
Inputs 4× SSI/SPI 1MSPS
16× PWM Outputs 6× I2C LDO Voltage Regulator
2× CAN 3× Analog Comparators
USB Full Speed Temperature Sensor
(Host/Device/OTG)

2 | TM4C Microcontrollers • 2014 Texas Instruments


Page 43
TM4C129x Microcontrollers

Introduction Key highlights Benefits


The TM4C129x product line will allow • ARM Cortex-M4 core with floating • Connect to and communicate with
designers to develop a new class of point products and services with 10/100
highly connected products using the • CPU speed up to 120 MHz Ethernet MAC+PHY with advanced
first ARM® Cortex®-M4 MCU with • Up to 1-MB Flash line diagnostics. Integrated CAN
integrated Ethernet MAC+PHY, along • 256-KB SRAM and 6-KB EEPROM and USB provide high-speed
with on-chip communication peripher- • 10/100 Ethernet with embedded connectivity, allowing the creation of
als. Engineers will have the ability to MAC and PHY seamless gateway solutions.
enhance product features and commu- • LCD controller • Control outputs and manage multiple
nicate to industrial and HMI applica- • AES, DES, SHA/MD5 and CRC events with 10 I2C ports, dual 12-bit
tions with integrated data protection, hardware acceleration ADCs, three on-chip comparators,
robust memory and LCD controller. • Four tamper inputs and the external peripheral interface
They can further control and differenti- • Two 12-bit ADCs up to 2 MSPS • Address varying application memory
ate products with TivaWare™, includ- • Two CAN 2.0 A/B controllers needs with pin-for-pin compatibility
ing 50+ software application examples, • Full-speed USB 2.0 OTG/Host/ across the TM4C129x portfolio. With
along with TI’s strong development Device and high-speed USB ULPI 256 KB of integrated SRAM and
ecosystem. interface 6-KB EEPROM along with a scalable
• Serial communication with up to: 512 KB to 1 MB Flash memory with
8 UARTs, 10 I2Cs, 4 QSPI/SSI, 100,000 program cycle endurance
1-Wire master interface for extended in-field updates and
reliable operation.
• Save board space and design
smaller products with integrated
TM4C129x Temperatures 85°C 105°C
Ethernet MAC+PHY, USB and LCD
® controller.
ARM® Memory Power & Clocking • Add data protection to applications
Cortex -M4 Up to 1 MB Flash Precision Oscillator and reduce processing overhead
Up to 120 MHz Up to 256 KB SRAM RTC Battery-Backed Hibernate with the hardware acceleration of key
6 KB EEPROM encryption/decryption
FPU MPU ROM System Modules
Applications
NVIC ETM SWD/T DMA (32 ch) 8× 32-bit Timer/PWM/CCP • Solar inverters
EPI • Industrial sensors
System Management
LCD • Industrial automation
1-Wire Systick Timer • Security access systems
Debug 2× Watchdog Timer • Industrial motor control
Real-time JTAG • Communications adapters/
Analog concentrators
Control Peripherals Comms Peripherals • Networked industrial meters/
2× 12ch, 12-bit ADCs
Quadrature Encoder Inputs 8× UART
up to 2 MSPS controllers
8× PWM Outputs 4× QSSI/SPI
LDO Voltage Regulator • Industrial HMI control panels/
2 3× Analog Comparators displays
10× I C
Data Protection Temperature Sensor • Networked residential/SoHo systems
2× CAN
4× Tamper Inputs • Vending machines
10/100 Ethernet MAC/PHY
CRC Accelerator (IEEE 1588)
AES, DES, SHA & MD5 USB Full/High Speed
Accelerators (Host/Device/OTG)

Texas Instruments TM4C Microcontrollers • 2014 | 3


Page 44
Figure 1-1. Tiva™ TM4C123GH6PM Microcontroller High-Level Block Diagram

JTAG/SWD
ARM®
Cortex™-M4F Boot Loader
ROM DriverLib
(80MHz) AES & CRC

System ETM FPU


Control and DCode bus Flash
Clocks (256KB)
(w/ Precis. Osc.) NVIC MPU
ICode bus
System Bus
TM4C123GH6PM

Bus Matrix SRAM


(32KB)

SYSTEM
PERIPHERALS

Watchdog
DMA Timer
(2)

EEPROM Hibernation
(2K) Module

General-
GPIOs
(43) Purpose
Timer (12)
Advanced High-Performance Bus (AHB)

SERIAL
Advanced Peripheral Bus (APB)

PERIPHERAL
S

USB OTG UART


(FS PHY) (8)

SSI I2C
(4) (4)

CAN
Controller
(2)

ANALOG
PERIPHERALS

Analog 12- Bit ADC


Comparator Channels
(2) (12)

MOTION CONTROL
PERIPHERALS

PWM QEI
(16) (2)

Page 45
ARM Cortex-M4F Processor Core
All members of the Tiva™ C Series, including the TM4C123GH6PM microcontroller, are designed around
an ARM Cortex-M processor core. The ARM Cortex-M processor provides the core for a high-performance,
low-cost platform that meets the needs of minimal memory implementation, reduced pin count, and low
power consumption, while delivering outstanding computational performance and exceptional system
response to interrupts.

Nested Vectored Interrupt Controller (NVIC)


The TM4C123GH6PM controller includes the ARM Nested Vectored Interrupt Controller (NVIC). The
NVIC and Cortex-M4F prioritize and handle all exceptions in Handler Mode. The processor state is
automatically stored to the stack on an exception and automatically restored from the stack at the end of the
Interrupt Service Routine (ISR). The interrupt vector is fetched in parallel to the state saving, enabling
efficient interrupt entry. The processor supports tail-chaining, meaning that back-to-back interrupts can be
performed without the overhead of state saving and restoration. Software can set eight priority levels on 7
exceptions (system handlers) and 78 interrupts.

Memory Protection Unit (MPU)


The MPU supports the standard ARM7 Protected Memory System Architecture (PMSA) model. The MPU
provides full support for protection regions, overlapping protection regions, access permissions, and
exporting memory attributes to the system.

Floating-Point Unit (FPU) (see page 130)


The FPU fully supports single-precision add, subtract, multiply, divide, multiply and accumulate, and square
root operations. It also provides conversions between fixed-point and floating-point data formats, and
floating-point constant instructions.

On-Chip Memory
The TM4C123GH6PM microcontroller is integrated with the following set of on-chip memory and features:
■ 32 KB single-cycle SRAM
■ 256 KB Flash memory
■ 2KB EEPROM
■ Internal ROM loaded with TivaWare™ for C Series software:
– TivaWare™ Peripheral Driver Library
– TivaWare Boot Loader
– Advanced Encryption Standard (AES) cryptography tables
– Cyclic Redundancy Check (CRC) error detection functionality

Flash Memory
The TM4C123GH6PM microcontroller provides 256 KB of single-cycle on-chip Flash memory. The Flash
memory is organized as a set of 1-KB blocks that can be individually erased. Erasing a block causes the
entire contents of the block to be reset to all 1s. These blocks are paired into a set of 2-KB blocks that can be
individually protected. The blocks can be marked as read-only or execute-only, providing different levels of
code protection. Read-only blocks cannot be erased or programmed, protecting the contents of those blocks
from being modified. Execute-only blocks cannot be erased or programmed, and can only be read by the
controller instruction fetch mechanism, protecting the contents of those blocks from being read by either the
controller or by a debugger.

Page 46
ROM The TM4C123GH6PM ROM is preprogrammed with the following software and programs:
■ TivaWare Peripheral Driver Library
■ TivaWare Boot Loader
■ Advanced Encryption Standard (AES) cryptography tables
■ Cyclic Redundancy Check (CRC) error-detection functionality

Serial Communications Peripherals


The TM4C123GH6PM controller supports both asynchronous and synchronous serial communications with:
■ Two CAN 2.0 A/B controllers
■ USB 2.0 OTG/Host/Device
■ Eight UARTs with IrDA, 9-bit and ISO 7816 support.
■ Four I2C modules with four transmission speeds including high-speed mode
■ Four Synchronous Serial Interface modules (SSI)
The following sections provide more detail on each of these communications functions.

Controller Area Network (CAN)


Controller Area Network (CAN) is a multicast shared serial-bus standard for connecting electronic control
units (ECUs). CAN was specifically designed to be robust in electromagnetically noisy environments and
can utilize a differential balanced line like RS-485 or twisted-pair wire. Originally created for automotive
purposes, it is now used in many embedded control applications (for example, industrial or medical). Bit
rates up to 1 Mbps are possible at network lengths below 40 meters. Decreased bit rates allow longer
network distances (for example, 125 Kbps at 500m). A transmitter sends a message to all CAN nodes
(broadcasting). Each node decides on the basis of the identifier received whether it should process the
message. The identifier also determines the priority that the message enjoys in competition for bus access.
Each CAN message can transmit from 0 to 8 bytes of user information.

Universal Serial Bus (USB)


Universal Serial Bus (USB) is a serial bus standard designed to allow peripherals to be connected and
disconnected using a standardized interface without rebooting the system. The TM4C123GH6PM
microcontroller supports three configurations in USB 2.0 full and low speed: USB Device, USB Host, and
USB On-The-Go (negotiated on-the-go as host or device when connected to other USB-enabled systems).

UART
A Universal Asynchronous Receiver/Transmitter (UART) is an integrated circuit used for RS-232C serial
communications, containing a transmitter (parallel-to-serial converter) and a receiver (serial-to-parallel
converter), each clocked separately. The TM4C123GH6PM microcontroller includes eight fully
programmable 16C550-type UARTs. Although the functionality is similar to a 16C550 UART, this UART
design is not register compatible. The UART can generate individually masked interrupts from the Rx, Tx,
modem flow control, and error conditions. The module generates a single combined interrupt when any of
the interrupts are asserted and are unmasked.

I2C
The Inter-Integrated Circuit (I2C) bus provides bi-directional data transfer through a two-wire design (a
serial data line SDA and a serial clock line SCL). The I2C bus interfaces to external I2C devices such as
serial memory (RAMs and ROMs), networking devices, LCDs, tone generators, and so on. The I2C bus may
also be used for system testing and diagnostic purposes in product development and manufacture. Each
device on the I2C bus can be designated as either a master or a slave. I2C module supports both sending and
receiving data as either a master or a slave and can operate simultaneously as both a master and a slave. Both
the I2C master and slave can generate interrupts.
Page 47
SSI (see page 952)
Synchronous Serial Interface (SSI) is a four-wire bi-directional communications interface that converts data
between parallel and serial. The SSI module performs serial-to-parallel conversion on data received from a
peripheral device, and parallel-to-serial conversion on data transmitted to a peripheral device. The SSI
module can be configured as either a master or slave device. As a slave device, the SSI module can also be
configured to disable its output, which allows a master device to be coupled with multiple slave devices. The
TX and RX paths are buffered with separate internal FIFOs. The SSI module also includes a programmable
bit rate clock divider and prescaler to generate the output serial clock derived from the SSI module's input
clock. Bit rates are generated based on the input clock and the maximum bit rate is determined by the
connected peripheral.

Direct Memory Access


The TM4C123GH6PM microcontroller includes a Direct Memory Access (DMA) controller, known as
micro-DMA (μDMA). The μDMA controller provides a way to offload data transfer tasks from the Cortex-
M4F processor, allowing for more efficient use of the processor and the available bus bandwidth. The
μDMA controller can perform transfers between memory and peripherals. It has dedicated channels for each
supported on-chip module and can be programmed to automatically perform transfers between peripherals
and memory as the peripheral is ready to transfer more data.

System Control and Clocks


System control determines the overall operation of the device. It provides information about the device,
controls power-saving features, controls the clocking of the device and individual peripherals, and handles
reset detection and reporting.

Hibernation Module (HIB)


The Hibernation module provides logic to switch power off to the main processor and peripherals and to
wake on external or time-based events.

Watchdog Timers
A watchdog timer is used to regain control when a system has failed due to a software error or to the failure
of an external device to respond in the expected way. The TM4C123GH6PM Watchdog Timer can generate
an interrupt, a non-maskable interrupt, or a reset when a time-out value is reached. In addition, the Watchdog
Timer is ARM FiRM-compliant and can be configured to generate an interrupt to the microcontroller on its
first time-out, and to generate a reset signal on its second timeout. Once the Watchdog Timer has been
configured, the lock register can be written to prevent the timer configuration from being inadvertently
altered. The TM4C123GH6PM microcontroller has two Watchdog Timer modules: Watchdog Timer 0 uses
the system clock for its timer clock; Watchdog Timer 1 uses the PIOSC as its timer clock.

Programmable GPIOs
General-purpose input/output (GPIO) pins offer flexibility for a variety of connections. The
TM4C123GH6PM GPIO module is comprised of six physical GPIO blocks, each corresponding to an
individual GPIO port. The GPIO module is FiRM-compliant (compliant to the ARM Foundation IP for Real-
Time Microcontrollers specification) and supports 0-43 programmable input/output pins. The number of
GPIOs available depends on the peripherals being used

Page 48
PWM
The TM4C123GH6PM microcontroller contains two PWM modules, each with four PWM generator blocks
and a control block, for a total of 16 PWM outputs. Pulse width modulation (PWM) is a powerful technique
for digitally encoding analog signal levels. High-resolution counters are used to generate a square wave, and
the duty cycle of the square wave is modulated to encode an analog signal. Typical applications include
switching power supplies and motor control. Each TM4C123GH6PM PWM module consists of four PWM
generator block and a control block. Each PWM generator block contains one timer (16-bit down or
up/down counter), two comparators, a PWM signal generator, a dead-band generator, and an interrupt/ADC-
trigger selector. Each PWM generator block produces two PWM signals that can either be independent
signals or a single pair of complementary signals with dead-band delays inserted.

QEI
A quadrature encoder, also known as a 2-channel incremental encoder, converts linear displacement into a
pulse signal. By monitoring both the number of pulses and the relative phase of the two signals, the position,
direction of rotation, and speed can be tracked. In addition, a third channel, or index signal, can be used to
reset the position counter. The TM4C123GH6PM quadrature encoder with index (QEI) module interprets
the code produced by a quadrature encoder wheel to integrate position over time and determine direction of
rotation. In addition, it can capture a running estimate of the velocity of the encoder wheel. The input
frequency of the QEI inputs may be as high as 1/4 of the processor frequency (for example, 20 MHz for a
80-MHz system).

Analog
The TM4C123GH6PM microcontroller provides analog functions integrated into the device, including:
■ Two 12-bit Analog-to-Digital Converters (ADC), with a total of 12 analog input channels and each with a
sample rate of one million samples/second
■ Two analog comparators
■ On-chip voltage regulator
The following provides more detail on these analog functions.

ADC
An analog-to-digital converter (ADC) is a peripheral that converts a continuous analog voltage to a discrete
digital number. The TM4C123GH6PM ADC module features 12-bit conversion resolution and supports 12
input channels plus an internal temperature sensor. Four buffered sample sequencers allow rapid sampling of
up to 12 analog input sources without controller intervention. Each sample sequencer provides flexible
programming with fully configurable input source, trigger events, interrupt generation, and sequencer
priority. Each ADC module has a digital comparator function that allows the conversion value to be diverted
to a comparison unit that provides eight digital comparators.

Analog Comparators
An analog comparator is a peripheral that compares two analog voltages and provides a logical output that
signals the comparison result. The TM4C123GH6PM microcontroller provides two independent integrated
analog comparators that can be configured to drive an output or generate an interrupt or ADC event. The
comparator can provide its output to a device pin, acting as a replacement for an analog comparator on the
board, or it can be used to signal the application via interrupts or triggers to the ADC to cause it to start
capturing a sample sequence. The interrupt generation and ADC triggering logic is separate. This means, for
example, that an interrupt can be generated on a rising edge and the ADC triggered on a falling edge.

Page 49
JTAG and ARM Serial Wire Debug
The Joint Test Action Group (JTAG) port is an IEEE standard that defines a Test Access Port and
Boundary Scan Architecture for digital integrated circuits and provides a standardized serial interface
for controlling the associated test logic. The TAP, Instruction Register (IR), and Data Registers (DR)
can be used to test the interconnections of assembled printed circuit boards and obtain manufacturing
information on the components. The JTAG Port also provides a means of accessing and controlling
design-for-test features such as I/O pin observation and control, scan testing, and debugging. Texas
Instruments replaces the ARM SW-DP and JTAG-DP with the ARM Serial Wire JTAG Debug Port
(SWJ-DP) interface. The SWJ-DP interface combines the SWD and JTAG debug ports into one
module providing all the normal JTAG debug and test functionality plus real-time access to system
memory without halting the core or requiring any target resident code.

Page 50
Address space

The ARM architecture uses a single, flat address space of 232 8-bit bytes. Byte
addresses are treated as unsigned numbers, running from 0 to 232 - 1.

This address space is regarded as consisting of 230 32-bit words, each of whose
addresses is word-aligned, which means that the address is divisible by 4. The word
whose word-aligned address is A consists of the four bytes with addresses A, A+1, A+2
and A+3. The address space can also be considered as consisting of 231 16-bit halfwords,
each of whose addresses is halfword-aligned, which means that the address is divisible by
2. The halfword whose halfword-aligned address is A consists of the two bytes with
addresses A and A+1.
While instruction fetches are always halfword-aligned, some load and store instructions
support unaligned addresses. This affects the access address A, such that A[1:0] in the
case of a word access and A[0] in the case of a halfword access can have non-zero
values.
Address calculations are normally performed using ordinary integer instructions. This
means that they normally wrap around if they overflow or underflow the address space.
Another way of describing this is that any address calculation is reduced modulo 232.
Normal sequential execution of instructions effectively calculates:

(address_of_current_instruction) +(2 or 4) / 16- and 32-bit instr mix /

after each instruction to determine which instruction to execute next. If this calculation
overflows the top of the address space, the result is UNPREDICTABLE. In ARMv7-M
this condition cannot occur because the top of memory is defined to always have the
eXecute Never (XN) memory attribute associated with it. See The system address map on
page B3-2 for more details. An access violation will be reported if this scenario occurs.
The above only applies to instructions that are executed, including those which fail their
condition code check. Most ARM implementations prefetch instructions ahead of the
currently-executing instruction.
LDC, LDM, LDRD, POP, PUSH, STC, STRD, and STM instructions access a sequence of
words at increasing memory addresses, effectively incrementing a memory address by 4 for
each register load or store. If this calculation overflows the top of the address space, the
result is UNPREDICTABLE.
Any unaligned load or store whose calculated address is such that it would access the byte at
0xFFFFFFFF and the byte at address 0x00000000 as part of the instruction is
UNPREDICTABLE.
Page 51

You might also like