Professional Documents
Culture Documents
Embedded Software Concepts
Embedded Software Concepts
Ahmed Tolba
xgameprogrammer@hotmail.com
Vienna, Austria.
This is a tutorial handbook; it means you need to do a lot of researching about the
topics that I have presented. I just gave you some information and application that
you might need to do a lot of work on it and googling such information. This is not
a textbook where it explains every detail about the subject. I have grouped many
articles from several books; they are mentioned in the reference section. I tried my
best to group every piece of information that I wanted to learn about.
DI Ahmed Tolba
Introduction to Software Architecture in
Embedded Software
Architectural Coupling
Coupling refers to how closely related different modules or classes are to each other
and the degree to which they are interdependent. The degree to which the
architecture is coupled determines how well a developer can achieve their
architectural goals. For example, if I want to develop a portable architecture, I need
to ensure that my architecture has low coupling.
There are several different types and causes for coupling to occur in a software
system. First, common coupling occurs when multiple modules have access to the
same global variable(s). In this instance, code can’t be easily ported to another
system without bringing the global variables along. In addition, the global variables
become dangerous because any module can access them in the system. Easy access
encourages “quick and dirty” access to the variables from other modules, which then
increases the coupling even further. The modules have a dependency on those
globally shared variables.
Another type of coupling that often occurs is content coupling. Content coupling
is when one module accesses another module’s functions and APIs. While at first
this seems reasonable because data might be encapsulated, developers have to be
careful how many function calls the module depends on. It’s possible to create not
just tightly coupled dependencies but also circular dependencies that can turn the
software architecture into a big ball of mud.
Coupling is most easily seen when you try to port a feature from one code base to
another. I think we’ve all gone through the process of grabbing a module, dropping
it into our new code base, compiling it, and then discovering a ton of compilation
errors. Upon closer examination, there is a module dependency that was overlooked.
So, we grab that dependency, put it in the code, and recompile. More compilation
errors! Adding the new module quadrupled the number of errors! It made things
worse, not better. Weeks later, we finally decided it’s faster to just start from scratch.
Software architects must carefully manage their coupling to ensure they can
successfully meet their architecture goals. Highly coupled code is always a
nightmare to maintain and scale. I would not want to attempt to port highly coupled
code, either. Porting tightly coupled code is time-consuming, stressful, and not fun!
Architectural Cohesion
The coupling is only the first part of the story. Low module coupling doesn’t
guarantee that the architecture will exhibit good characteristics and meet our goals.
Architects ultimately want to have low coupling and high cohesion. Cohesion
refers to the degree to which the module or class elements belong together.
In a microcontroller environment, a low cohesion example would be lumping
every microcontroller peripheral function into a single module or class. The
module would be significant and unwieldy. Instead, a base class could be created
that defines the common interface for interacting with peripherals. Each peripheral
could then inherit from that interface and implement the peripheral-specific
functionality. The result is a highly cohesive architecture, low coupling, and other
desirable characteristics like reusable, portable, scalable, and so forth. Cohesion is
really all about putting “things” together that belong together. Code that is highly
cohesive is easy to follow because everything needed is in one place. Developers
do not have to search and hunt through the code base to find related code. For
example, I often see developers using an RTOS (real-time operating system)
spread their task creation code throughout the application. The result is low
cohesion. Instead, I pull all my task creation code into a single module so that I
only have one place to go. The task creation code is highly cohesive, and it’s easily
ported and configured as well.
Now that we have a fundamental understanding of the characteristics we are
interested in as embedded architects and developers, let’s examine architectural
design patterns that are common in the industry.
The Unstructured Monolithic Architecture
The great sage Wikipedia describes a monolithic application as “a single-tiered
software application in which the user interface and data access code are combined
into a single program from a single platform. An unstructured monolithic
architecture might look something like.
The architecture is tightly coupled which makes it extremely difficult to reuse,
port, and maintain. Individual modules or classes might have high cohesion, but
the coupling is out of control
First, note how we can exchange the driver layer for working with nearly any
hardware by using a hardware abstraction layer. For example, a common HAL
today can be found in Arm’s CMSIS.
The HAL again decouples the hardware drivers from the above code, breaking the
dependencies.
Next, notice how we don’t even allow the application code to depend on an RTOS
or OS. Instead, we use an operating system abstraction layer (OSAL). If the team
needs to change RTOSes, which does happen, they can just integrate the new
RTOS without having to change a bunch of application code. I’ve encountered
many teams that directly make calls to their RTOS APIs, only later to decide they
need to change RTOSes. An example of OSAL can be found in CMSIS-RTOS2.
Next, the board support package exists outside the driver layer! At first, this may
seem counterintuitive. Shouldn’t hardware like sensors, displays, and so forth be in
the driver layer? I view the hardware and driver layer as dedicated to only the
microcontroller. Any sensors and so on that are connected to the microcontroller
should be communicated through the HAL. For example, a sensor might be on the
I2C bus. The sensor would depend on the I2C HAL, not the low-level hardware.
The abstraction dependency makes it easier for the BSP to be ported to other
applications.
Finally, we can see that even the middleware should be wrapped in an abstraction
layer. If someone is using a TLS library or an SD card library, you don’t want your
application to be dependent on these. Again, I look at this as a way to make code
more portable, but it also isolates the application so that it can be simulated and
tested off target.
Event-Driven Architectures
Event-driven architectures make a lot of sense for real-time embedded applications
and applications concerned with energy consumption. In an event-driven
architecture, the system is generally in an idle state or low-power state unless an
event triggers an action to be performed. For example, a widget may be in a low-
power idle state until a button is clicked. Clicking the button triggers an event that
sends a message to a message processor, which then wakes up the system
Embedded software has steadily become more and more complex. As businesses
focus on joining the IoT, the need for an operating system to manage low-level
hardware, memory, and time has steadily increased. Embedded systems implement
a real-time operating system in approximately 65% of systems. The remaining
systems are simple enough for bare-metal scheduling techniques to achieve the
systems requirements.
Real-time systems require the correctness of the computations, the
computation’s logical correctness, and timely responses.There are many
scheduling algorithms that developers can use to get real-time responses, such as
1. Run to completion schedulers
2. Round-robin schedulers
3. Time slicing
4. Priority-based scheduling
RTOSes are much more compact than general-purpose operating systems like
Android or Windows, which can require gigabytes of storage space to hold the
operating system. A good RTOS typically requires a few kilobytes of storage
space, depending on the specific application needs. (Many RTOSes are
configurable, and the exact settings determine how large the build gets.)
An RTOS provides developers with several key capabilities that can be time-
consuming and costly to develop and test from scratch. For example, an RTOS will
provide
A multithreading environment
At least one scheduling algorithm
Mutexes, semaphores, queues, and event flags
Middleware components (generally optional)
While an RTOS can provide developers with a great starting point and several
tools to jump-start development, designing an RTOS-based application can be
challenging the first few times they use an RTOS. There are common questions
that developers encounter, such as
How do I figure out how many tasks to have in my application?
How much should a single task do?
Can I have too many tasks?
How do I set my task priorities?
Tasks, Threads, and Processes
An RTOS application is typically broken up into tasks, threads, and processes.
These are the primary building blocks available to developers; therefore, we must
understand their differences.
A task has several definitions that are worth discussing. First, a task is a
concurrent and independent program that competes for execution time on a
CPU.This definition tells us that tasks are isolated applications without interactions
with other tasks in the system but may compete with them for CPU time. They also
need to appear like they are the only program running on the processor. This
definition is helpful, but it doesn’t represent what a task is on an embedded system.
The second definition, I think, is a bit more accurate. A task is a semi-
independent portion of the application that carries out a specific duty. This
definition of a task fits well. From it, we can gather that there are several
characteristics we can expect from a task:
It is a separate “program.”
It may interact with other tasks (programs) running on the system.
It has a dedicated function or purpose.
For most developers working with an RTOS, a thread and a task are synonyms!
Surveying several different RTOSes available in the wild, you’ll find that there are
several that provide thread APIs, such as Azure RTOS, Keil RTX, and Zephyr.
These operating systems provide similar capabilities that compete with RTOSes
that use task terminology.
A process is a collection of tasks or threads and associated memory that runs in
an independent memory location. A process will often leverage a memory
protection unit (MPU) to collect the various elements part of the process. These
elements can consist of
Flash memory locations that contain executable instructions or data
RAM locations that include executable instructions or data
Peripheral memory locations
Shared RAM, where data is stored for interprocess communication
Display
Touch screen
LED backlight
Cloud connectivity
Temperature measurement
Humidity measurement
HVAC controller
Most teams will create a list of features the system must support when they
develop their stakeholder diagrams and identify the system requirements. This
effort can also be used to determine the tasks that make up the application
software.
Feature-based task decomposition can be very useful, but sometimes it can
result in an overly complex system. For example, if we create tasks based on all the
system features, it would not be uncommon to identify upward of a hundred tasks
in the system quickly! This isn’t necessarily wrong, but it could result in an overly
complex system with more memory and RAM than required.
When using the feature-based approach, it’s critical that developers also go
through an optimization phase to see where identified tasks can be combined based
on common functionality. For example, tasks may be specified for measuring
temperature, pressure, humidity, etc. However, having a task for each individual
task will overcomplicate the design. Instead, these measurements could all be
combined into a sensor task.
An example is feature-based task decomposition for an IoT thermostat that shows
all the tasks in the software
Using features is not the only way to decompose tasks. One of my favorite
methods to use is the outside-in approach
Example:
Humidity/temperature sensor
Gesture sensor
Touch screen
Analog sensors
Connectivity devices (Wi-Fi/Bluetooth)
LCD/display
Fan/motor control
Backlight
Etc.
Explaining those algorithms is beyond the scope of the tutorials, you can refer to
any RTOS books out there. We recommend Operating Systems concepts book.
Math Concepts for Embedded Software
Engineers.
Look Up Tables:
We have been taught in the schools the use of Math to solve real world problems
especially in mechanical concepts like acceleration, velocity, etc. There are many
embedded projects that use heavily the trigonometry functions like sin(x), cos(x).
Sample projects like those that uses GPS navigation which tries to calculate the
distance between points on a sphere(earth), it needs to compute a lot of trig
functions in real time! These functions are terribly slow while executing on a
microcontroller that runs on 8MHz.
These functions are usually computed using the Taylor series, which approximates
most of complex function calculation like n! and sin (x), etc. using Taylor Series,
sin(x) can be computed using the following series
sin(x) := x - x^3/3! + x^5/5! - x^7/7! +
As you see you need to calculate of terms like factorials, power, which is a disaster
if you want to compute and calculate them on a microcontroller.
One solution that was used in the era of 80’s when computer game programmers
used to work on 8 bit computers like Atari is to use a Lookup table, if you really
must to work with trig functions, then Look-up tables are to rescue.
Look-up tables are precomputed values of some computation that you know you’ll
perform during run-time. You simply compute all possible values at startup and then
run the embedded software. For example, say you
needed the sine and cosine of the angles from 0-359 degrees. Computing them using
sin() and cos() would kill you if you used the floating-point processor, but with a
look-up table your code will be able to compute sin() or cos() in a few cycles because
it’s just a look-up. Here’s an example:
And on ARM arch things may be quite weird. There are ARM's with FPU and
without. And you may want to have universal application which will run on both.
In such case there is a tricky (and slow) scheme. Application uses FPU commands.
If your CPU does not have FPU, then such command will trigger an interrupt and
in it OS will emulate the instruction, clear error bit and return control to an
application. But that scheme occurred to be very slow an is not commonly use
Floating-point is just scientific notation in base-2. Both the mantissa and exponent
are integers, and softfloat libraries will break up floating-part operations into
operations that affect the mantissa and exponent, which can use the CPU integer
support.
Remember at school we can have a number like 10.7. but can we represent it as an
integer? We can’t we don’t have a decimal place in integer values, but we can
scale it up by 10 and the result would be 107. Which is an integer. You scale
numbers by some factor and that make sure to take this scale into consideration
when doing mathematics.
We will consider using 32bit integers for our floating point representation. There
are many formats of Fixed point math that depends on the size of the integer that
has been used. We will show an example of a 16.16 fixed point math.
You put the whole part in the upper 16 bits and the decimal part in the lower 16
bits. Hence, you’re scaling all numbers by 2^16, or 65,536. Moreover, to extract
the integer portion of a fixed-point number, you shift and mask the upper 16 bits,
and to get to the decimal portion, you shift and mask the lower 16 bits. Here’s
some working types for fixed-point math:
Within the handler, the following section can be added before invoking main:
The first assembly instruction is used to store the current value of the stack pointer
to the
variable sp, to ensure that the painting stops after painting the area, but only up
until the last unused address in the stack:
The current stack usage can be checked periodically at runtime, for instance in the
main loop, to detect the area painted with the recognizable pattern. The areas that
are still painted have never been used by the execution stack so far, and indicate
the amount of stack still available.
This mechanism may be used to verify the amount of stack space required by the
application to run comfortably. According to the design, this information can be
used later on to set a safe lower limit on the segment that can be used for the stack.
Data Alignment
Most computer systems have some alignment requirement on the starting memory
address of a variable. The memory address of a C variable often must be aligned,
as listed in Table . The smallest unit exchanged between the processor and the
memory is a byte (8 bits), and thus the memory address is always in terms of bytes.
A variable is n-byte aligned in memory if its starting memory address is some
multiple of n. Typically, n is a power of 2, such as 2 (halfword aligned), 4 (word
aligned), and 8 (double word aligned). Suppose a 32-bit variable is word aligned. If
the address of the next available byte in memory is 0x8001, the variable is then
stored in a continuous span of 4 bytes from 0x8004 to 0x8007. The compiler or the
program inserts three bytes at memory addresses 0x8001, 0x8002, and 0x8003.
These three bytes are called padding bytes.
Enforcing data alignment is to improve the memory performance. A memory
system consists of multiple storage units, and the processor typically distributes
data among these units in a round-robin fashion. Because the number of pins
available on a processor is limited, these memory units typically share some pins in
the memory address bus. To allow these memory units to transfer data
concurrently, the target data stored in all memory units needs to share a portion of
their memory addresses. The data alignment ensures that all data of a variable
stored in different memory units meet this requirement.
When the processor reads a properly aligned variable, only one access is required
to transfer the data out of these memory units. Otherwise, two separate memory
accesses might be necessary, slowing down the processor performance.
The data memory is organized into four banks, and these banks can feed the 32-bit
data bus. Four bytes in the same row of all banks can be loaded into the processor
concurrently. In this example, data 0x78563412 is not aligned with word
boundaries, and the processor takes two memory accesses to load the data
0x78563412 to a register. However, it takes only one memory access to load data
0x44332211.
Align the Data Structure
As mentioned in previous section, each CPU has its own word length (bit-width)
and it varies from one platform to another. Due to the broad scope of embedded
CPUs, the word length can be as small as 8-bit (such as Intel 8051), or as big as
32-bit (like the popular ARM processor). For some high-end products, they could
even afford 64-bit CPUs. And most systems would also allow memory access with
a width less than the native word length (For example, most 32-bit CPUs can R/W
memory in a single byte as well, with degraded efficiency.) Consequently, it gives
rise to the alignment issue for all data structures. From the performance standpoint,
memory accesses that are aligned to native word length are always preferred (i.e.,
the beginning address is at the boundary of native word length, and the R/W length
is a multiple of native word length.) Thus most compilers will do some
optimization by inserting space padding into the data structure when they see the
chance of misalignment, as demonstrated
typedef struct {
U8 a;
U32 b;
U16 c;
} STRUCT_NOT_PACKED;
sizeof(STRUCT_NOT_PACKED) == 12
For a 32-bit target CPU, the GNU C compiler will insert three bytes of padding for
field a and two bytes padding for field c in the struct defined previously so that
struct member b will be aligned to 32-bit boundary, and the total size of struct
STRUCT_NOT_PACKED becomes 12. Under GNC C, such alignment padding
can be disabled by using the packed attribute , as illustrated . After applying the
packed attribute, the size of the data structure is reduced to 7. However, struct
member b is now stored across 32-bit boundary and carries performance penalties
if accessed individually.
#define PACKED __attribute__((packed))
typedef struct {
U8 a;
U32 b;
U16 c;
} PACKED STRUCT_PACKED;
GNU C Compiler, 32 bit target CPU:
sizeof(STRUCT_PACKED) == 7
As we all know, RAM access is much slower than register access. If you know
beforehand that a certain variable will be used frequently, you would like it to be
stored in register to speed up subsequent accesses.
One way to achieve such acceleration in C is to specify the storage class of that
variable as register instead of auto , so that compiler will store the variable in a
register if there is one. However, such manual optimization has its limitations and
is usually unnecessary in practice. Instead, you could let compiler figure out the
best register-allocation scheme by turning on the optimization switch, and most
modern C compilers do a good job in this regard.
Let’s do some experiments to get an idea on how compilers will optimize the
memory access.
The results shown here are produced with a 32-bit Cygwin on a 64-bit x86
machine running Windows 10.
The object file format is pei-i386 . I chose 32-bit Cygwin since most embedded
processors are 32 bits or fewer. If you run it with 64-bit Cygwin, the object file
format will be pei-x86-64 , and the final Assembly code will use 64-bit
instructions. But the same idea of optimization applies.
Check Please after doing optimization flags in your compiler.
Replacing memory access with register access could have harmful side effects,
which in turn lays the ground for inconsistency between the debug build and the
release build.
Access Peripheral Registers
In hardware, peripheral registers are usually mapped into a memory space. The
following mixed use of const and volatile is recommend when these register are
being operated on:
• Use volatile unsigned int * const to specify register address: In practice,
unsigned int can also be replaced by U32 or U16 . Here’s an example of such a
practice, where control register is written, followed by waiting on the busy flag.
#define BUSY_FLAG (1 << 0)
...
volatile U32* const REG_CONTROL = (U32*) 0xABCD0000;
volatile U32* const REG_STATUS = (U32*) 0xABCD0004;
(*REG_CONTROL) = ... ; // correct statement
REG_CONTROL = ... ; // will produce compile error
while((*REG_STATUS) & BUSY_FLAG); // wait on flag
...
• Use const volatile type * to explore a read-only data buffer: If the peripheral
exposes a buffer of read-only data to the CPU, a pointer of const volatile type can
be used to explore the data, with an additional sanity check from the compiler to
prevent inadvertent writes on the buffer.
const volatile U32 *p;
U32 data1, data2, data3;
volatile U32* const DATA_BUFFER = (U32*) 0xABCD0008;
p = DATA_BUFFER;
data1 = *p++;
data2 = *p++;
data3 = *p++;
(*p) = ...; // will produce compile error
• Use volatile type * to read/write the data buffer: If the peripheral exposes a
bidirectional
buffer to the CPU, a pointer of volatile type * can be used to explore
the data.
#define DATA_BUFFER 0xABCD0008
volatile U32 *p;
U32 data;
Developers should consider developing their drivers and their application code in
an object-oriented manner. The C programming language is not an object-oriented
programming language. C is a procedural programming language where the
primary focus is to specify a series of well-structured steps and procedures within
its programming context to produce a program.7 An object-oriented programming
language, on the other hand, is a programming language that focuses on the
definition of and operations that are performed on data.
There are several characteristics that set an object-oriented programming language
apart from a procedural language. These include:
• Abstraction
• Encapsulation
• Objects
• Classes
• Inheritance
• Polymorphism
Despite C not being object-oriented, developers can still implement some concepts
in their application that will dramatically improve their software. While there are
ways to create classes, inheritance, and polymorphism in C, if these features are
required, developers would be better off just using C++. Applications can benefit
greatly from using abstractions and encapsulation.
OOP IN C EXAMPLE
#ifndef Sensor_H
#define Sensor_H
/*## class Sensor */
typedef struct Sensor Sensor;
struct Sensor {
int filterFrequency;
int updateFrequency;
int value;
};
int Sensor_getFilterFrequency(const Sensor* const me);
void Sensor_setFilterFrequency(Sensor* const me, int p_filterFrequency);
int Sensor_getUpdateFrequency(const Sensor* const me);
void Sensor_setUpdateFrequency(Sensor* const me, int p_updateFrequency);
int Sensor_getValue(const Sensor* const me);
Sensor * Sensor_Create(void);
void Sensor_Destroy(Sensor* const me);
#include "Sensor.h"
void Sensor_Init(Sensor* const me) {
}
void Sensor_Cleanup(Sensor* const me) {
}
int Sensor_getFilterFrequency(const Sensor* const me) {
return me->filterFrequency;
}
void Sensor_setFilterFrequency(Sensor* const me, int p_filterFrequency) {
me->filterFrequency = p_filterFrequency;
}
int Sensor_getUpdateFrequency(const Sensor* const me) {
return me->updateFrequency;
}
void Sensor_setUpdateFrequency(Sensor* const me, int p_updateFrequency) {
me->updateFrequency = p_updateFrequency;
}
int Sensor_getValue(const Sensor* const me) {
return me->value;
}
Sensor * Sensor_Create(void) {
Sensor* me = (Sensor *) malloc(sizeof(Sensor));
if(me!=NULL)
{
Sensor_Init(me);
}
return me;
}
void Sensor_Destroy(Sensor* const me) {
if(me!=NULL)
{
Sensor_Cleanup(me);
}
free(me);
}
Polymorphism and Virtual Functions
Polymorphism is a valuable feature of object-oriented languages. It allows for the
same function name to represent one function in one context and another function
in a different context. In practice, this means that when either the static or dynamic
context of an element changes, the appropriate operation can be called.
One approach using switch case, another approach using function pointers. Please
refer to a book on Object Oriented Programming in C
Inheritance
….
Extend, Vehicle, Father, and Child Parent Relationship.
#ifndef QUEUE_H_
#define QUEUE_H_
#define QUEUE_SIZE 10
/* class Queue */
typedef struct Queue Queue;
struct Queue {
int buffer[QUEUE_SIZE]; /* where the data things are */
int head;
int size;
int tail;
int (*isFull)(Queue* const me);
int (*isEmpty)(Queue* const me);
int (*getSize)(Queue* const me);
void (*insert)(Queue* const me, int k);
int (*remove)(Queue* const me);
};
/* Constructors and destructors:*/
void Queue_Init(Queue* const me,int (*isFullfunction)(Queue* const me),
int (*isEmptyfunction)(Queue* const me),
int (*getSizefunction)(Queue* const me),
void (*insertfunction)(Queue* const me, int k),
int (*removefunction)(Queue* const me) );
void Queue_Cleanup(Queue* const me);
/* Operations */
int Queue_isFull(Queue* const me);
int Queue_isEmpty(Queue* const me);
int Queue_getSize(Queue* const me);
void Queue_insert(Queue* const me, int k);
int Queue_remove(Queue* const me);
Queue * Queue_Create(void);
void Queue_Destroy(Queue* const me);
#endif /*QUEUE_H_*/
#include <stdio.h>
#include <stdlib.h>
#include "queue.h"
void Queue_Init(Queue* const me,int (*isFullfunction)(Queue* const me),
int (*isEmptyfunction)(Queue* const me),
int (*getSizefunction)(Queue* const me),
void (*insertfunction)(Queue* const me, int k),
int (*removefunction)(Queue* const me) ){
/* initialize attributes */
me->head = 0;
me->size = 0;
me->tail = 0;
/* initialize member function pointers */
me->isFull = isFullfunction;
me->isEmpty = isEmptyfunction;
me->getSize = getSizefunction;
me->insert = insertfunction;
me->remove = removefunction;
}
/* operation Cleanup() */
void Queue_Cleanup(Queue* const me) {
}
/* operation isFull() */
int Queue_isFull(Queue* const me){
return (me->head+1) % QUEUE_SIZE == me->tail;
}
/* operation isEmpty() */
int Queue_isEmpty(Queue* const me){
return (me->head == me->tail);
}
/* operation getSize() */
int Queue_getSize(Queue* const me) {
return me->size;
}
/* operation insert(int) */
void Queue_insert(Queue* const me, int k) {
if (!me->isFull(me)) {
me->buffer[me->head] = k;
me->head = (me->head+1) % QUEUE_SIZE;
++me->size;
}
}
/* operation remove */
int Queue_remove(Queue* const me) {
int value = -9999; /* sentinel value */
if (!me->isEmpty(me)) {
value = me->buffer[me->tail];
me->tail = (me->tail+1) % QUEUE_SIZE;
me->size;
}
return value;
}
Queue * Queue_Create(void) {
Queue* me = (Queue *) malloc(sizeof(Queue));
if(me!=NULL)
{
Queue_Init(me, Queue_isFull, Queue_isEmpty, Queue_getSize,
Queue_insert, Queue_remove);
}
return me;
}
void Queue_Destroy(Queue* const me) {
if(me!=NULL)
{
Queue_Cleanup(me);
}
free(me);
}
#ifndef CACHEDQUEUE_H_
#define CACHEDQUEUE_H_
#include "queue.h"
typedef struct CachedQueue CachedQueue;
struct CachedQueue {
Queue* queue; /* base class */
/* new attributes */
char filename[80];
int numberElementsOnDisk;
/* aggregation in subclass */
Queue* outputQueue;
/* inherited virtual functions */
int (*isFull)(CachedQueue* const me);
int (*isEmpty)(CachedQueue* const me);
int (*getSize)(CachedQueue* const me);
void (*insert)(CachedQueue* const me, int k);
int (*remove)(CachedQueue* const me);
/* new virtual functions */
void (*flush)(CachedQueue* const me);
void (*load)(CachedQueue* const me);
};
How does the RTOS achieve multitasking? Well, each thread in an RTOS has a
dedicated private context in RAM, consisting of a private stack area and a thread-
control-block (TCB). The context for every thread must be that big, because in a
sequential code like that, the context must remember the whole nested function call
tree and the exact place in the code, that is, the program counter. For example, in
the Blink thread, the contexts of the two calls to RTOS_delay(), will have identical
call stack, but will differ in the values of the program counter (PC). Every time a
thread makes a blocking call, such as RTOS_delay() the RTOS saves CPU
registers on that thread's stack and updates it's TCB. The RTOS then finds the next
thread to run in the process called scheduling. Finally, the RTOS restores the CPU
registers from that next thread's stack. At this point the next thread resumes the
execution and becomes the current thread. The whole context-switch process is
typically coded in CPU-specific assembly language, and takes a few microseconds
to complete.
Compared to a “superloop”, an RTOS kernel brings a number of very important
benefits: 1. It provides a “divide-and-conquer” strategy, because it allows you to
partition your application into multiple threads. → Each one of these threads is
much easier to develop and maintain than one “kitchen sink” superloop 2. Threads
that wait for events are efficiently blocked and don't consume CPU cycles. This is
in contrast to wasteful polling loops often used in the superloop. 3. Certain
schedulers, most notably preemptive, priority-based schedulers, can execute your
applications such that the timing of high-priority threads can be insensitive to
changes in low-priority threads (if the threads don't share resources). This is
because under these conditions, high-priority threads can always preempt lower-
priority threads. This enables you to apply formal timing analysis methods, such as
Rate Monotonic Analysis (RMA), which can guarantee that (under certain
conditions) all your higher-priority threads will meet their deadlines
EVENT BASED SYSTEM
Remember Windows Event Loop?
Taken from internet ( JAVA Script Event LOOP)
Polling
Interrupts
Direct memory access (DMA)
Each of these design mechanisms, in turn, has several design patterns that can
be used to ensure data loss is not encountered. Let’s explore these mechanisms
now.
Peripheral Polling
The most straightforward design mechanism to collect data from a peripheral is to
simply have the application poll the peripheral periodically to see if any data is
available to manage and process.
Unfortunately, there are more disadvantages to using polling than there are
advantages. First, polling tends to waste processing cycles. Developers must
allocate processor time to go out and check on the peripheral whether there is data
there or not. In a resource-constrained or low-power system, these cycles can add
up significantly. Second, there can be a lot of jitter and latency in processing the
peripheral depending on how the developers implement their code.
For example, if developers decide that they are going to create a while loop that
just sits and waits for data, they can get the data very consistently with low latency
and jitter, but it comes at the cost of a lot of wasted CPU cycles. Waiting in this
manner is polling using blocking. On the other hand, developers can instead not
block and use a nonblocking method where another code is executed, but if the
data arrives while another code is being executed, then there can be a delay in the
application getting to the data adding latency. Furthermore, if the data comes in at
nonperiodic rates, it’s possible for the latency to vary, causing jitter in the
processing time. The jitter may or may not affect other parts of the embedded
system or cause system instability depending on the application.
Despite the disadvantages of polling, sometimes polling is just the best solution. If
a system doesn’t have much going on and it doesn’t make sense to add the
complexity of interrupts, then why add them? Debugging a system that uses
interrupts is often much more complicated. If polling fits, then it might be the right
solution; however, if the system needs to minimize response times and latency, or
must wake up from low-power states, then interrupts might be a better solution.
Peripheral Interrupts
Interrupts are a fantastic tool available to designers and developers to overcome
many disadvantages polling presents. Interrupts do precisely what their name
implies; they interrupt the normal flow of the application to allow an interrupt
handler to run code to handle an event that has occurred in the system. For
example, an interrupt might fire for a peripheral when data is available, has been
received, or even transmitted.
The following Figure shows an example sequence diagram for what we can expect
from an interrupt design.
The advantages of using an interrupt are severalfold. First, there is no need to
waste CPU cycles checking to see if data is ready. Instead, an interrupt fires when
there is data available. Next, the latency to get the data is deterministic. It takes the
same number of clock cycles when the interrupt fires to enter and return from the
interrupt service routine (ISR). The latency for a lower priority interrupt can vary
though if a higher priority interrupt is running or interrupts it during execution.
Finally, jitter is minimized and only occurs if multiple interrupts are firing
simultaneously. In this case, the interrupt with the highest priority gets executed
first. The jitter can potentially become worse as well if the interrupt fires when an
instruction is executing that can’t be interrupted.
Despite interrupts solving many problems associated with polling, there are still
some disadvantages to using interrupts. First, interrupts can be complicated to set
up. While interrupt usage increases complexity, the benefits usually overrule this
disadvantage. Next, designers must be careful not to use interrupts that fire too
frequently. For example, trying to use interrupts to debounce a switch can cause an
interrupt to fire very frequently, potentially starving the main application and
breaking its real-time performance. Finally, when interrupts are used to receive
data, developers must carefully manage what they do in the ISR. Every clock cycle
spent in the ISR is a clock cycle away from the application. As a result, developers
often need to use the ISR to handle the immediate action required and then offload
processing and non-urgent activities to the application, causing software design
complexity to increase.
There’s also a good chance that the data just received needs to be combined
with past or future data to be useful. We can’t do all those operations in a timely
manner within an interrupt. We are much better served by saving the data and
notifying the application that data is ready to be processed. When this happens, we
need to reach for design patterns that allow us to get the data quickly, store it, and
get back to the main application as soon as possible.
Designers can leverage several such patterns used on bare-metal and RTOS-
based systems. A few of the most exciting patterns include
Linear data store
Ping-pong buffers
Circular buffers
Circular buffer with semaphores
Circular buffer with event flags
Message queues
Now, if you’ve been designing and developing embedded software for any
period, you’ll realize that linear data stores can be dangerous! Linear data stores
are where we often encounter race conditions because access to the data store
needs to be carefully managed so that the ISR and application aren’t trying to read
and write from the data store simultaneously. In addition, the variables used to
share the data stored between the application and the ISR also need to be declared
volatile to prevent the compiler from optimizing out important instructions caused
by the interruptible nature of the operations.
Data stores often require the designer to build mutual exclusion into the data store.
Mutual exclusion is needed because data stores have a critical section where if the
application is partway through reading the data when an interrupt fires and changes
it, the application can end up with corrupt data. We don’t care how the developers
implement the mutex at the design level, but we need to make them aware that the
mutex exists. I often do this by putting a circular symbol on the data store
containing either an “M” for a mutex or a key symbol, as shown in the following
Figure. Unfortunately, at this time, there are no standards that support official
nomenclature for representing a mutex
Now, at first, having two data stores might seem like an opportunity just to double
the trouble, but it’s a potential race condition saver. A ping-pong buffer is so
named because the data buffers are used back and forth in a ping-pong-like
manner. For example, at the start of an application, both buffers are marked as
write only – the ISR stores data in the first data store when data comes in. When
the ISR is done and ready for the application code to read, it marks that data store
as ready to read. While the application reads that data, the ISR stores it in the
second data store if additional data comes in. The process then repeats.
Circular Buffer Design Pattern
One of the simplest and most used patterns to get and use data from an interrupt is
to leverage a circular buffer. A circular buffer is a data structure that uses a single,
fixed-size buffer as if it were connected end to end. Circular buffers are often
represented as a ring, as shown in the following Figure. Microcontroller memory is
not circular but linear. When we build a circular buffer in code, we specify the start
and stop addresses, and once the stop address is reached, we loop back to the
starting address.
The idea with the circular buffer is that the real-time data we receive in the
interrupt can be removed from the peripheral and stored in a circular buffer. As a
result, the interrupt can run as fast as possible while allowing the application code
to process the circular buffer at its discretion. Using a circular buffer helps ensure
that data is not lost, the interrupt is fast, and we still process the data reasonably.
The most straightforward design pattern for a circular buffer can be seen in Figure
5-7. In this pattern, we are simply showing how data moves from the peripheral to
the application. The data starts in the peripheral, is handled by the ISR, and is
stored in a circular buffer. The application can come and retrieve data from the
circular buffer when it wants to. Of course, the circular buffer needs to be sized
appropriately, so the buffer does not overflow.
Using a semaphore is not the only method to signal the application that data is
ready to be processed. Another approach is to replace the semaphore with an event
flag. An event flag is an individual bit that is usually part of an event flag group
that signals when an event has occurred. Using an event flag is more efficient in
most real-time operating systems than using a semaphore. For example, a designer
can have 32 event flags in a single RAM location on an Arm Cortex-M processor.
In contrast, just a single semaphore with its semaphore control block can easily be
a few hundred bytes.
Semaphores are often overused in RTOS applications because developers jump
straight into the coding and often don’t take a high-level view of the application.
The result is a bunch of semaphores scattered throughout the system. I’ve also
found that developers are less comfortable with event flags because they aren’t
covered or discussed as often in classes or engineering literature
An example design pattern for using event flags and interrupts can be seen in
Figure. We can represent an event flag version of a circular buffer with a
notification design pattern. As you can see, the pattern itself does not change, just
the tool we use to implement it. The implementation here results in fewer clock
cycles being used and less RAM.
Struct Overlays
The following example code shows a struct overlay for a timer peripheral. If a
peripheral's registers do not align correctly, reserved members can be included in
the struct. Thus, in the following example, an extra field that you'll never refer to is
included at offset 4 so that the control field lies properly at offset 6.
typedef struct
{
uint16_t count; /* Offset 0 */
uint16_t maxCount; /* Offset 2 */
uint16_t _reserved1; /* Offset 4 */
uint16_t control; /* Offset 6 */
} volatile timer_t;
It is very important to be careful when creating a struct overlay to ensure that the
sizes and addresses of the underlying peripheral's registers map correctly.
The bitwise operators shown earlier to test, set, clear, and toggle bits can also be
used with a struct overlay. The following code shows how to access the timer
peripheral's registers using the struct overlay. Here's the code for testing bits:
pTimer->control |= 0x10;
pTimer->control ^= 0x80;
Design by Contract
Design-by-contract is a methodology developer can use to specify pre-conditions,
post-conditions, side effects, and invariants that are associated with the interface.
Every component then has a contract that must be adhered to in order for the
component to integrate into the application successfully.
As developers, we must examine a component’s inputs, outputs, and the work (the
side effects) that will be performed. The pre-conditions describe what conditions
must already exist within the system prior to performing an operation with the
component.
For example, a GPIO pin state cannot be toggled unless it first has the GPIO clock
enabled. Enabling the clock would be a pre-condition or a pre-requisite for the
GPIO component. Failing to meet this condition would result in nothing happening
when a call to perform a GPIO operation occurs.
A side effect is basically just that something in the system changes. Maybe a
memory region is written or read, an i/o state is altered, or data is simply returned.
Something useful happens by interacting with the component’s interface. The
resulting side effect then produces post-conditions that a developer can
expect. The system state has changed into a desired state.
Finally, the outputs for the component are extracted. Perhaps the interface returns
a success or a failure flag—maybe even an error code. Something is returned to let
the caller know that everything proceeded as expected and the resulting side effect
should now be observable.
Assertions
The best definition for an assertion that I have come across is
“An assertion is a Boolean expression at a specific point in a program that will
be true unless there is a bug in the program.”
There are three essential points that we need to note about the preceding definition,
which include an assertion evaluates an expression as either true or false.
The assertion is an assumption of the state of the system at a specific point in the
code. The assertion is validating a system assumption that, if not true, reveals a
bug in the code. (It is not an error handler!)
As you can see from the definition, assertions are particularly useful for verifying
the contract for an interface, function, or module.
Each programming language that supports assertions does so in a slightly different
manner. For example, in C/C++, assertions are implemented as a macro named
assert that can be found in assert.h. There is quite a bit to know about assertions,
which we will cover in this chapter, but before we move on to everything we can
do with assertions, let’s first look at how they can be used in the context of Design-
by-Contract. First, take a moment to review the contract that we specified in the
documentation in the following Listing
The Doxygen documentation for a function can specify the conditions in the
Design-by-Contract for that interface call
/*************************************************************
* Function : Dio_Init()
*//**
* \b Description:
*
* This function is used to initialize the Dio based on the
* configuration table defined in dio_cfg module.
* PRE-CONDITION: Configuration table needs to populated
* (sizeof > 0) <br>
* PRE-CONDITION: NUMBER_OF_CHANNELS_PER_PORT > 0 <br>
* PRE-CONDITION: NUMBER_OF_PORTS > 0 <br>
* PRE-CONDITION: The MCU clocks must be configured and
* enabled.
*
* POST-CONDITION: The DIO peripheral is setup with the
* configuration settings.
*
* @param[in] Config is a pointer to the configuration
* table that contains the initialization for the peripheral.
*
* @return void
*
* \b Example:
* @code
* const DioConfig_t *DioConfig = Dio_ConfigGet();
* Dio_Init(DioConfig);
* @endcode
*
* @see Dio_Init
* @see Dio_ChannelRead
* @see Dio_ChannelWrite
* @see Dio_ChannelToggle
* @see Dio_ChannelModeSet
* @see Dio_ChannelDirectionSet
* @see Dio_RegisterWrite
* @see Dio_RegisterRead
* @see Dio_CallbackRegister
*************************************************************/
Documentation can be an excellent way to create a contract between the interface
and the developer. However, it suffers from one critical defect; the contract cannot
be verified by executable code. As a result, if a developer doesn’t bother to read
the documentation or pay close attention to it, they can violate the contract, thus
injecting a bug into their source code that they may or may not be aware of. At
some point, the bug will rear its ugly head, and the developer will likely need to
spend countless hours hunting down their mistake. Various programming
languages deal with Design-by-Contract concepts differently, but for embedded
developers working in C/C++, we can take advantage of a built-in language feature
known as assertions. Assertions can be used within the Dio_Init function to verify
that the contract specified in the documentation is met. For example, if you were to
write the function stub for Dio_Init and include the assertions, it would look
something like the following Listing. As you can see, for each precondition, you
have an assertion. We could also use assertions to perform the checks on the
postcondition and invariants. This is a bit stickier for the digital I/O example
because we may have dozens of I/O states that we want to verify, including pin
multiplexing. I will leave this up to you to consider and work through some
example code on your own.
Listing An example contract definition using assertions in C
void Dio_Init(DioConfig_t const * const Config)
{
assert(sizeof(Config) > 0);
assert(CHANNELS_PER_PORT > 0);
assert(NUMBER_OF_PORTS > 0);
assert(Mcu_ClockState(GPIO) == true);
/* TODO: Define the implementation */
}
Defining Assertions
The use of assertions goes well beyond creating an interface contract. Assertions
are interesting because they are pieces of code developers add to their applications
to verify assumptions and detect bugs directly. When an assertion is found to be
true, there is no bug, and the code continues to execute normally. However, if the
assertion is found to be false, the assertion calls to a function that handles the failed
assertion.
Each compiler and toolchain tend to implement the assert macro slightly
differently. However, the result is the same. The ANSI C standard dictates how
assertions must behave, so the differences are probably not of interest if you use an
ANSI C–compliant compiler. I still find it interesting to look and see how each
vendor implements it, though. For example, The next Listing shows how the
STM32CubeIDE toolchain defines the assert macro. Also the next Listing
demonstrates how Microchip implements it in MPLAB X. Again, the result is the
same, but how the developer maps what the assert macro does if the result is false
will be different.
Next, when we are using the full definition for assert, you’ll notice that there is an
assertion failed function that is called. Again, the developer defines the exact
function but is most likely different between compilers. For STM32CubeIDE, the
function is __assert_func, while for MPLAB X the function is __assert_fail.
Several key results occur when an assertion fails, which include
1. Collecting the filename and line number where the assertion failed
2. Printing out a notification to the developer that the assertion failed and where it
occurred
3. Stopping program execution so that the developer can take a closer look
The assertion tells the developer where the bug was detected and halts the program
so that they can review the call path and memory states and determine what exactly
went wrong in the program execution. This is far better than waiting for the bug to
rear its head in how the system behaves, which may occur instantly or take
considerable time.
When and Where to Use Assertions
The assert macro is an excellent tool to catch bugs as they occur. That also
preserves the call stack and the system in the state it was in when the assertion
failed. This helps us to pinpoint the problem much faster, but where does it make
sense to use assertions? Let’s look at a couple proper and improper uses for
assertions.
First, it’s important to note that we can use assertions anywhere in our program
where we want to test an assumption, meaning assertions could be found just about
anywhere within our program. Second, I’ve found that one of the best uses for
assertions is verifying function preconditions and postconditions, as discussed
earlier in the chapter. Still, they can also be “sprinkled” throughout the function
code.
As a developer writes their drivers and application code, in every function, they
analyze what conditions must occur before this function is executed for it to run
correctly. They then develop a series of assertions to test those preconditions. The
preconditions become part of the documentation and form a contract with the caller
on what must be met for everything to go smoothly.
A great example of this is a function that changes the state of a system variable in
the application. For example, an embedded system may be broken up into several
different system states that are defined using an enum. These states would then be
passed into a function like System_StateSet to change the system's operational
state, as shown in the following Listing
Implementing assert_failed
Once we have found how the assertion is implemented, we need to create the
definition for the function. assert.h makes the declaration, but nothing useful will
come of it without defining what that function does. There are four things that we
need to do, which include
Copy the declaration and paste the declaration into a source module7
Turn the new declaration into a function definition
Output something so that the developer knows the assertion failed
Stop the program from executing
For a developer using Keil MDK, their assertion failed function would look
something like the code in Listing
void __aeabi_assert(const char *expr, const char *file,
int line)
{
Uart_printf(UART1, "Assert failed in %s at line %d\n",
file, line);
}
we copied and pasted the declaration from assert.h and turned it into a function
definition. (Usually, you can right-click the function call in the macro, which will
take you to the official declaration. You can just copy and paste this instead of
stumbling to define it yourself.) The function, when executed, will print out a
message through one of the microcontrollers' UARTs to notify the developer that
an assertion failed. A typical printout message is to notify the developer whose file
the assertion failed in and the line number. This tells the developer exactly where
the problem is.
This brings us to an interesting point. You can create very complex-looking
assertions that test for multiple conditions within a single assert, but then you’ll
have to do a lot more work to determine what went wrong. I prefer to keep my
assertions simple, checking a single condition within each assertion. There are
quite a few advantages to this, such as
First, it’s easier to figure out which condition failed.
Second, the assertions become clear, concise documentation for how the function
should behave.
Third, maintaining the code is more manageable.
Finally, once we have notified the developer that something has gone wrong, we
want to stop program execution similarly. There are several ways to do this. First,
and perhaps the method I see the most, is just to use an empty while (true) or for(;;)
statement. At this point, the system “stops” executing any new code and just sits in
a loop. This is okay to do, but from an IDE perspective, it doesn’t show the
developer that something went wrong. If my debugger can handle flash
breakpoints, I prefer to place a breakpoint in this function, or I’ll use the assembly
instruction __BKPT to halt the processor. At that point, the IDE will stop and
highlight the line of code. Using __BKPT can be seen in the following Listing
void __aeabi_assert(const char *expr, const char *file,
int line)
{
Uart_printf(UART1, "Assert failed in %s at line %d\n",
file, line);
__asm("BKPT");
}
As discussed earlier, you want to be able to detect a bug the moment that it
occurs, but you also don’t want to break the system or put it into an unsafe state.
Real-time assertions are essentially standard assertions, except that they don’t use a
“BKPT” instruction or infinite loop to stop program execution. Instead, the
assertion needs to
There are many ways developers can do this, but let’s look at four tips that
should aid you when getting started with real-time assertions.
You’ll notice in the Listing that I’ve also started adding conditional compilation
statements to define different ways the assertion function can behave. For example,
if ASSERT_UART is true, we just print the standard assertion text to the UART.
Otherwise, we call Log_Append, which will store additional information and log
the details in another manner. Once the log information is stored in RAM, there
would be some task in the main application that would periodically store the RAM
log to nonvolatile memory such as flash, an SD card, or other media.
The simplest techniques tend to not be reusable or portable, while the more
complex techniques are. There are several memory-mapping techniques that are
commonly used in driver design.
These methods include the following:
• Direct memory mapping
• Using pointers
• Using structures
• Using pointer arrays
Let’s examine the different methods that can be used to map a driver to memory.
Mapping Memory Directly
Once a developer has thought through the different driver models that can be used
to control the microcontroller peripherals, it is time to start writing code. There are
multiple techniques that a developer could use to map their driver into the
peripherals’ memory space, such as directly writing registers or using pointers,
structures, or pointer arrays.
The simplest technique to use—and the least reusable—is to write directly to a
peripheral’s register. For example, let’s say that a developer wants to configure
GPIO Port C. In order to set up and read the port, a developer can examine the
register definition file, find the correct identifier, and then write code similar to that
seen in Figure
PORTC_SET_PIN_2();
Writing code in this manner is very manual and labor intensive. The code is written
for a single and very specific setup. The code can be ported, but there are
opportunities for the wrong values to be written, which can lead to a bug and then
time spent debugging. Very simple applications that won’t be reused often use this
direct register write method for setting up and controlling peripherals. Directly
writing to registers in this manner is also fast and efficient, and it doesn’t require a
lot of flash space.
While directly writing to registers can be useful, the technique is often employed
for software that will not be reused or that is written on a very resource-constrained
embedded system, such as a simple 8-bit microcontroller. A technique that is
commonly used when reuse is necessary is to use pointers to map into memory. An
example declaration to map into the GPIO Port C register—let’s say it’s the data
register—can be seen in Figure
A PROBLEM?!
In order to resolve this issue, developers need to use the volatile keyword.
Volatile essentially tells the compiler that the data being read can change out of
sequence at any time without any code changing the value. There are three places
that volatile is typically used:
• Variables that are being mapped to hardware registers
• Data being shared between interrupt service routines and application
code
• Data being shared between multiple threads
Volatile basically tells the compiler to not optimize out the read but instead make
sure that the data stored in the memory location is read every time the variable is
encountered.
With the volatile keyword in the correct place, we now know the compiler won’t
optimize out reading the variable. However, there still is a problem with the
declaration the way it has been written. Take a moment to examine the code shown
in Figure
USUB8 from the manual it enabled us to subtract 4 bytes of val1 from val2. That
means we will work on 4 bytes in a row, which is multiple data, and that will
enhance the performance a lot as we will see.
The example is shown in the following figure:
The value 0x64 is 100 in decimal. As you notice we are processing 4 bytes each
time the loop is executed, hence the division by 4. Recall the image representation
as A8R8G8B8, which each channel consists of one byte. So each pixel is subtract
from the value 100 using that instruction.
The performance is shown in the following figure
As you see the width of the pulse is 3.3ms which is a huge gain performance! As a
conclusion you can consult the ARM SIMD manual of cortex M4, and you can
also read more about the NEON which is an extension of the ARM Cortex SIMD.
There are many times that you want to work on multiple data to gain performance
as we showed in that chapter.
VARIOUS TOPICS IN EMBEDDED
SOFTWARE:
Endianness
Endianness is the attribute of a system that indicates whether integers are
represented from left to right or right to left.
Endianness comes in two varieties: big and little. A big-endian representation has a
multibyte integer written with its most significant byte on the left; a number
represented thus is easily read by English-speaking humans. A little-endian
representation, on the other hand, places the most significant byte on the right. Of
course, computer architectures don't have an intrinsic "left" or "right." These
human terms are borrowed from our written forms of communication. The
following definitions are more precise:
Big-endian
Means that the most significant byte of any multibyte data field is stored at
the lowest memory address, which is also the address of the larger field
Little-endian
Means that the least significant byte of any multibyte data field is stored at the
lowest memory address, which is also the address of the larger field
It matters only when two computers are trying to communicate. Every processor
and every communication protocol must choose one type of endianness or the
other. Thus, two processors with different endianness will conflict if they
communicate through a memory device. Similarly, a little-endian processor trying
to communicate over a big-endian network will need to do software-byte
reordering.
A C programming example
#include <stdint.h>
#include <stdio.h>
Modern little-endian: 0d 0c 0b 0a
Modern big-endian: 0a 0b 0c 0d
If you develop software for a microcontroller , it is likely that you will measure
some physical quantity (e.g., acceleration or temperature) and sample it at regular
time intervals with an analog-to-digital converter to obtain digital data. Now,
regardless of what you are measuring, you will need to take into account the
endianness of the AD-converter.
The Web is full of data sheets for many AD-converters. I picked the following one
(almost at random):
www.analog.com/media/en/technical-documentation/data-sheets/AD7981.pdf
It is an industrial converter, designed to operate at high temperatures, that can
perform 600 kSPS (i.e.,six hundred thousand samples per second) of an input
voltage between 0V and 5.1V. But those details are irrelevant for the purposes of
this book. What matters is that it converts the input voltage to a 16-bit number
We need to confirm that any value placed on the data bus by the processor is
correctly received by the memory device at the other end. The most obvious way to
test that is to write all possible data values and verify that the memory device
stores each one successfully. However, that is not the most efficient test available.
A faster method is to test the bus one bit at a time. The data bus passes the test if
each data bit can be set to 0 and 1, independently of the other data bits.
A good way to test each bit independently is to perform the so-called walking 1's
test. The following Table shows the data patterns used in an 8-bit version of this
test. The name walking 1's comes from the fact that a single data bit is set to 1 and
"walked" through the entire data word. The number of data values to test is the
same as the width of the data bus. This reduces the number of test patterns from 2n
to n, where n is the width of the data bus.
00000010
00000100
00001000
00010000
00100000
01000000
10000000
Because we are testing only the data bus at this point, all of the data values can be
written to the same address. Any address within the memory device will do.
However, if the data bus splits as it makes its way to more than one memory chip,
you will need to perform the data bus test at multiple addresses one within each
chip.To perform the walking 1's test, simply write the first data value in the table,
verify it by reading it back, write the second value, verify, and so on. When you
reach the end of the table, the test is complete. This time, it is okay to do the read
immediately after the corresponding write because we are not yet looking for
missing chips. In fact, this test may provide meaningful results even if the memory
chips are not installed!
*ppFailAddr = NULL;
return 1;
A Queue is a linear collection of data elements in which the element inserted first
will be the element that is taken out first; that is, a queue is a FIFO (First In First
Out) data structure. A queue is a popular linear data structure in which the first
element is inserted from one end called the REAR end (also called the tail end),
and the deletion of the element takes place from the other end called the FRONT
end (also called the head).
Practical Application:
For a simple illustration of a queue, there is a line of people standing at the bus
stop and waiting for the bus. Therefore, the first person standing in the line will get
into the bus first
Buffering data like keypad, serial port, video buffering like YouTube
A Stack is a linear collection of data elements in which insertion and deletion take
place only at the top of the stack. A stack is a Last In First Out (LIFO) data
structure, because the last element pushed onto the stack will be the first element to
be deleted from the stack. Three operations can be performed on the stack, which
includes PUSH, POP, and PEEP operations. The PUSH operation inputs an
element into the top of the stack, while the POP operation removes an element
from the stack. The PEEP operation returns the value of the topmost element in the
stack without deleting it from the stack. Every stack has a variable TOP which is
associated with it. The TOP pointer stores the address of the topmost element in
the stack. The TOP is the position from where insertion and deletion take place.
Practical Application:
A real-life example of a stack is if there is a pile of plates arranged on a table. A
person will pick up the first plate from the top of the stack.
Linked List
The major drawback of the array is that the size or the number of elements must be
known in advance. Thus, this drawback gave rise to the new concept of a linked
list. A Linked list is a linear collection of data elements. These data elements are
called nodes, which point to the next node using pointers. A linked list is a
sequence of nodes in which each node contains one or more than one data field and
a pointer which points to the next node. Also, linked lists are dynamic; that is,
memory is allocated as and when required.
In the previous figure we have made a linked list in which each node is divided
into two slots:
1.The first slot contains the information/data.
2.The second slot contains the address of the next node.
Practical Application:
A simple real-life example is a train; here each coach is connected to its previous
and next coach (except the first and last coach).
Snake game, consider using a linked list for connecting each node
We have already learned that an array is a collection of data elements stored in
contiguous memory locations. Also, we studied that arrays were static in nature;
that is, the size of the array must be specified when declaring an array, which limits
the number of elements to be stored in the array. For example, if we have an array
declared as int array[15], then the array can contain a maximum of 15 elements and
not more than that. This method of allocating memory is good when the exact
number of elements is known, but if we are not sure of the number of elements
then there will be a problem, as in data structures our aim is to make programs
efficient by consuming less memory space along with minimal time. To overcome
this problem, we will use linked lists.
A linked list is a linear collection of data elements. These data elements are called
nodes, and they point to the next node by means of pointers. A linked list is a data
structure which can be used to implement other data structures such as stacks,
queues, trees, and so on. A linked list is a sequence of nodes in which each node
contains one or more than one data field and a pointer which points to the next
node. Also, linked lists are dynamic in nature; that is, memory is allocated as and
when required. There is no need to know the exact size or exact number of
elements as in the case of arrays. The following is an example of a simple linked
list which contains five nodes:
In the previous figure, we have made a linked list in which each node is divided
into two parts:
1.The first part contains the information/data.
2.The second part contains the address of the next node.
The last node will not have any next node connected to it, so it will store a special
value called NULL. Usually NULL is defined by -1. Therefore, the NULL pointer
represents the end of the linked list. Also, there is another special pointer START
that stores the address of the first node of the linked list. Therefore, the START
pointer represents the beginning of the linked list. If START = NULL then it
means that the linked list is empty. A linked list, since each node points to another
node which is of the same type, is known as a self-referential data type or a self-
referential structure
It may be useful to think in terms of data memory in C and C++ as being divided
into three separate spaces:
Static memory. This is where variables, which are defined outside of functions,
are located. The keyword static does not generally affect where such variables are
located; it specifies their scope to be local to the current module. Variables that are
defined inside of a function, which are explicitly declared static, are also stored in
static memory. Commonly, static memory is located at the beginning of the RAM
area. The actual allocation of addresses to variables is performed by the embedded
software development toolkit: a collaboration between the compiler and the linker.
Normally, program sections are used to control placement, but more advanced
techniques, like Fine Grain Allocation, give more control. Commonly, all the
remaining memory, which is not used for static storage, is used to constitute the
dynamic storage area, which accommodates the other two memory spaces.
Automatic variables. Variables defined inside a function, which are not declared
static, are automatic. There is a keyword to explicitly declare such a variable – auto
– but it is almost never used. Automatic variables (and function parameters) are
usually stored on the stack. The stack is normally located using the linker. The end
of the dynamic storage area is typically used for the stack. Compiler optimizations
may result in variables being stored in registers for part or all of their lifetimes; this
may also be suggested by using the keyword register.
The heap. The remainder of the dynamic storage area is commonly allocated to the
heap, from which application programs may dynamically allocate memory, as
required.
Dynamic Memory in C
In C, dynamic memory is allocated from the heap using some standard library
functions. The two key dynamic memory functions are malloc() and free().
The malloc() function takes a single parameter, which is the size of the requested
memory area in bytes. It returns a pointer to the allocated memory. If the allocation
fails, it returns NULL. The prototype for the standard library function is like this:
The free() function takes the pointer returned by malloc() and de-allocates the
memory. No indication of success or failure is returned. The function prototype is
like this:
To illustrate the use of these functions, here is some code to statically define an
array and set the fourth element’s value:
int my_array[10];
my_array[3] = 99;
The following code does the same job using dynamic memory allocation:
int *pointer;
pointer = malloc(10 * sizeof(int));
*(pointer+3) = 99;
pointer[3] = 99;
When the array is no longer needed, the memory may be de-allocated thus:
free(pointer);
pointer = NULL;
Assigning NULL to the pointer is not compulsory, but is good practice, as it will
cause an error to be generated if the pointer is erroneous utilized after the memory
has been de-allocated.
The amount of heap space actually allocated by malloc() is normally one word
larger than that requested. The additional word is used to hold the size of the
allocation and is for later use by free(). This “size word” precedes the data area to
which malloc() returns a pointer.
There are two other variants of the malloc() function: calloc() and realloc().
The calloc() function does basically the same job as malloc(), except that it takes
two parameters – the number of array elements and the size of each element –
instead of a single parameter (which is the product of these two values). The
allocated memory is also initialized to zeros. Here is the prototype:
In the first two cases, space for a single object is allocated; the second one includes
initialization. The third case is the mechanism for allocating space for an array of
objects.
The first is for a single object; the second deallocates the space used by an array. It
is very important to use the correct de-allocator in each case.
Here is the code to dynamically allocate an array and initialize the fourth element:
int* pointer;
pointer = new int[10];
pointer[3] = 99;
delete[] pointer;
pointer = NULL;
Again, assigning NULL to the pointer after deallocation is just good programming
practice. Another option for managing dynamic memory in C++ is the use the
Standard Template Library. This may be inadvisable for real time embedded
systems.
There are a number of problems with dynamic memory allocation in a real time
system. The standard library functions (malloc() and free()) are not normally
reentrant, which would be problematic in a multithreaded application. If the source
code is available, this should be straightforward to rectify by locking resources
using RTOS facilities (like a semaphore). A more intractable problem is associated
with the performance of malloc(). Its behavior is unpredictable, as the time it takes
to allocate memory is extremely variable. Such nondeterministic behavior is
intolerable in real time systems.
Without great care, it is easy to introduce memory leaks into application code
implemented using malloc() and free(). This is caused by memory being allocated
and never being deallocated. Such errors tend to cause a gradual performance
degradation and eventual failure. This type of bug can be very hard to locate.
Memory Fragmentation
#define K (1024)
char *p1;
p1 = malloc(3*K);
p2 = malloc(4*K);
Some time later, the first memory allocation, pointed to by p1, is de-allocated:
free(p1);
p1 = malloc(4*K);
This results in a failure – NULL is returned into p1 – because, even though 6K of
memory is available, there is not a 4K contiguous block available. This is memory
fragmentation.
A real time operating system may provide a service which is effectively a reentrant
form of malloc(). However, it is unlikely that this facility would be deterministic.
Memory management facilities that are compatible with real time requirements –
i.e. they are deterministic – are usually provided. This is most commonly a scheme
which allocates blocks – or “partitions” – of memory under the control of the OS.
STATUS
NU_Create_Partition_Pool (NU_PAR TITION_POOL *pool, CHAR *name,
VOID *start_address, UNSIGNED pool_size, UNSIGNED partition_size,
OPTION suspend_type);
This creates a partition pool with the descriptor MyPool, containing 2000 bytes of
memory, filled with partitions of size 40 bytes (i.e. there are 50 partitions). The
pool is located at address 0xB000. The pool is configured such that, if a task
attempts to allocate a block, when there are none available, and it requests to be
suspended on the allocation API call, suspended tasks will be woken up in a first-
in, first-out order. The other option would have been task priority order.
This requests the allocation of a partition from MyPool. When successful, a pointer
to the allocated block is returned in ptr. If no memory is available, the task is
suspended, because NU_SUSPEND was specified; other options, which may have
been selected, would have been to suspend with a timeout or to simply return with
an error.
status = NU_Deallocate_Partition(ptr);
Additional API calls are available which can provide the application code with
information about the status of the partition pool – for example, how many free
partitions are currently available. Care is required in allocating and de-allocating
partitions, as the possibility for the introduction of memory leaks remains.
The potential for programmer error resulting in a memory leak when using
partition pools is recognized by vendors of real time operating systems. Typically,
a profiler tool is available which assists with the location and rectification of such
bugs.
Dynamic Memory
In general, the heap manager allows the program to allocate a variable block size,
but in this section we will develop a simplified heap manager handles just fixed
size blocks. In this example, the block size is specified by SIZE. The initialization
will create a linked list of all the free blocks. A list is a collection of dissimilar
objects, typically implemented in C with struct. In this case, the list is an array
where the first element is a pointer, and the remaining elements are the memory to
be allocated. A linked list is a collection of lists that are connected together with
pointers, as shown in Figure
FreePt points to a linear linked list of free blocks. Initially these free blocks are
contiguous and in order, but as the manager is used the positions and order of the
free blocks can vary. It will be the pointers that will thread the free blocks together.
Conclusions
C and C++ use memory in various ways, both static and dynamic. Dynamic
memory includes stack and heap.
Using the facilities provided by most real time operating systems, a dynamic
memory facility may be implemented which is deterministic, immune from
fragmentation and with good error handling.
This loop is simple. For each of the 50 bytes, it first waits until the peripheral isn’t
busy, then tells the peripheral to send it. You can imagine what implementations of
the peripheral_is_busy() and peripheral_send_byte() functions might look like.
While you’re transmitting these 50 bytes, the rest of your program can’t run
because you’re busy in this loop making sure all of the bytes are sent correctly.
What a waste, especially if the data transmission rate is much slower than your
microcontroller! (Typically, that will be the case.) There are so many more
important tasks your microcontroller could be doing in the meantime than sitting in
a loop waiting for a slow transmission to complete. The solution is to buffer the
data and allow it to be sent in the background while the rest of your program does
other things.
How to buffer the data
So how do you buffer the data? You create a buffer that will store data waiting to
be transmitted. If the peripheral is busy, rather than waiting around for it to finish,
you put your data into the buffer. When the peripheral finishes transmitting a byte,
it fires an interrupt. Your interrupt handler takes the next byte from the buffer and
sends it to the peripheral, then immediately returns back to your program. Your
program can then continue to do other things while the peripheral is transmitting.
You will periodically be interrupted to send another byte, but it will be a very short
period of time — all the interrupt handler has to do is grab the next byte waiting to
be transmitted and tell the peripheral to send it off. Then your program can get
back to doing more important stuff.
This is called interrupt-driven I/O, and it’s awesome. The original code I showed
above is called polled I/O.
A really easy to way to implement a queue is by creating a ring buffer, also called
a circular buffer or a circular queue. It’s a regular old array, but when you reach the
end of the array, you wrap back around to the beginning. You keep two indexes:
head and tail. The head is updated when an item is inserted into the queue, and it is
the index of the next free location in the ring buffer. The tail is updated when an
item is removed from the queue, and it is the index of the next item available for
reading from the buffer. When the head and tail are the same, the buffer is empty.
As you add things to the buffer, the head index increases. If the head wraps all the
way back around to the point where it’s right behind the tail, the buffer is
considered full and there is no room to add any more items until something is
removed. As items are removed, the tail index increases until it reaches the head
and it’s empty again. The head and tail endlessly follow this circular pattern–the
tail is always trying to catch up with the head–and it will catch up, unless you’re
constantly transmitting new data so quickly that the tail is always busy chasing the
head.
• An array
• A head
• A tail
These will all be accessed by both the main loop and the interrupt handler, so they
should all be declared as volatile. Also, updates to the head and updates to the
tail each need to be an atomic operation, so they should be the native size of your
architecture. For example, if you’re on an 8-bit processor like an AVR, it should be
a uint8_t (which also means the maximum possible size of the queue is 256 items).
On a 16-bit processor it can be a uint16_t, and so on. Let’s assume we’re on an 8-
bit processor here, so ring_pos_t in the code below is defined to be a uint8_t.
#define RING_SIZE 64
One final thing before I give you code: it’s a really good idea to use a power of two
for your ring size (16, 32, 64, 128, etc.). The reason for this is because the
wrapping operation (where index 63 wraps back around to index 0, for example) is
much quicker if it’s a power of two. I’ll explain why. Normally a programmer
would use the modulo (%) operator to do the wrapping. For example:
If your tail began at 60 and you repeated this line above multiple times, the tail
would do the following:
That works perfectly, but the problem with this approach is that modulo is pretty
slow because it’s a divide operation. Division is a pretty slow operation on
computers. It turns out when you have a power of two, you can do the equivalent
of a modulo by doing a bitwise AND, which is a much quicker operation. It works
because if you take a power of two and subtract one, you get a number which can
be represented in binary as a string of all 1 bits. In the case of a queue of size 64,
bitwise ANDing the head or tail with 63 will always keep the index between 0 and
63.
A FIFO is also used when you ask the computer to print a file. Rather than waiting
for the actual printing to occur character by character, the print command will put
the data in a FIFO. Whenever the printeris free, it will get data from the FIFO. The
advantage of the FIFO is it allows you to continue to use your computer while the
printing occurs in the background. To implement this magic of background
printing we will need interrupts.
The classic producer/consumer problem has two threads. One thread produces data
and the other consumes data. For an input device, the background thread is the
producer because it generates new data, and the foreground thread is the consumer
because it uses the data up. For an output device, the data flows in the other
direction so the producer/consumer roles are reversed. It is appropriate to pass data
from the producer thread to the consumer thread using a FIFO queue
A graphics display uses two buffers called a front buffer and a back buffer. The
graphics hardware uses the front buffer to create the visual image on the display,
i.e., the front buffer contains the data that you see. The software uses the back
buffer to create a new image, i.e., the back buffer contains the data that you see
next. When the new image is ready, and the time is right, the two buffers are
switched (the front becomes the back and the back becomes the front.) In this way,
the user never sees a partially drawn image.
و هللا المستعان
DI Ahmed TOLBA
References:
1. Tricks of the windows game programming gurus
2. Design Patterns Gang of Four
3. Operating System Concepts
4. Embedded Software Design
5. The C Programming Language
And all the references here