You are on page 1of 58

Cortex-M4 Overview

MKEL1123 Advanced Microprocessor Systems

Dr Usman Ullah Sheikh & Muním Zabidi


School of Electrical Engineering, UTM
Table of Contents

1 ARM Architecture

2 Cortex

3 STM32

4 Nucleo64 Boards

5 Embedded Software Development

6 Software Development for STM32

7 Take Home

© Dr Usman Ullah Sheikh & Muním Zabidi 2


ARM Architecture
Why ARM?

Popular for portable devices:


Low power consumption
Fast execution per watt
Inexpensive
Choice of offerings

Source: NVIDIA
ARM History BBC Micro

1980s
Acorn Computer built the BBC Micro microcomputer
series for the BBC Computer Literacy project.
BBC Micro used the 6502 microprocessor.

Advert in Interface Age magazine, November 1983, ’The BBC


Microcomputer Is Here!’

© Dr Usman Ullah Sheikh & Muním Zabidi 5


ARM History ARM1

Late 1983
Steve Furber and Sophie Wilson – two Acorn
engineers – visited and observed chip
designers in America
In 18 months, and 6 man-years of effort, they
produced the ARM1 (Acorn RISC Machine 1)
Utilized as coprocessor in BBC Micro
Fabricated by VLSI Technology with roughly
24,800 transistors using a 3,000nm (3µm)
process. Designed to run at 1 W, the chip
averaged under 100 mW typical power.
Compare that with 2020 Apple M1 with 8 cores,
7- or 8-cores GPU, 16-core Neural Engine with
16 billion transistors using a 5nm process.
Power dissipation 10 W.
The ARM1 was essentially a prototype, and
was quickly followed by the ARM2 The prototype ARM1 as a BBC Micro coprocessor (Wikipedia)

ARM2 was used to launch Acorn’s Archimedes


family of computers.

© Dr Usman Ullah Sheikh & Muním Zabidi 6


ARM History New Company

1990
ARM Ltd founded
Jointly owned by Acorn, Apple and (NXP now)
ARM is now acronym for Advanced RISC
Machines
Objective: to supply new CPU for Apple
Personal Digital Assistant (PDA)

ARM HQ was a barn in Cambridge (1990)

© Dr Usman Ullah Sheikh & Muním Zabidi 7


ARM History Newton

1993
Apple Newton
Used ARM610 which implements Armv3 ISA
Technologically innovative, but expensive and
early problems with handwriting recognition
which limited its sales and subsequent
cancellation
Eventually led Apple to create multi-touch
devices, iPhone and iPad
Poor Newton sales led to the beginning of IP
business model was uncommon at the time
Apple Newton MessagePad

© Dr Usman Ullah Sheikh & Muním Zabidi 8


ARM Licensing

Arm is fabless. It is a physical hardware design and


intellectual property company
Arm develops the architecture and licenses it to other
companies
ARM Licensing

Core License Architecture License:


Licensees create microcontrollers, CPU and Allows companies to design their CPU cores
SoC based on the cores using ARM instruction sets
The original design manufacturer combines Cores must comply fully with the ARM
with other parts to produce a complete device architecture
Less than 10 licensees

https://www.anandtech.com/show/7112/the-arm-diaries-part-1-how-arms-business-model-works
ARM Ecosystem Oppo Mobile Phones

https://stratechery.com/2020/nvidias-integration-dreams/

© Dr Usman Ullah Sheikh & Muním Zabidi 11


ARM and Intel StrongARM

Late 1990s
Digital Equipment Corportation (DEC) developed StrongARM which
implements the ARM v4 architecture.
StrongARM targeted for the upper-end of low-power embedded market
1997
DEC agreed to sell StrongARM to Intel as part of a lawsuit settlement .
Intel used the StrongARM to replace their ailing line of RISC processors,
the i860 and i960.
2002
A new StrongARM core, the XScale, was introduced by Intel
2006 DEC StrongARM SA-110
microprocessor
Intel sells the XScale PXA business to Marvell Technology Group
Marvell holds a full architecture license for ARM
Intel still holds an ARM architectural license even after the sale of XScale

© Dr Usman Ullah Sheikh & Muním Zabidi 12


ARM Architectural Features

Heavily influenced by Berkeley RISC and 6502.


The ARM architecture has typical RISC features:
Load/store architecture
Large 16 × 32-bit register file
Fixed instruction width of 32 bits to ease decoding and pipelining, at the cost of decreased code density
Mostly single-cycle execution
Differences from original RISC:
Rejects delayed branches & register windows
Adds conditional execution
Adds variable-length instructions in latest versions for better code density

© Dr Usman Ullah Sheikh & Muním Zabidi 13


ARM Processor Evolution
ARM ISA Evolution

Instruction set Bit width Family


ARMv1 32 ARM1
ARMv2 32 ARM2, ARM3
ARMv3 32 ARM6, ARM7
ARMv4 32 ARM7TDMI, ARM9TDMI
ARMv5 32 ARM7EJ, ARM9E, ARM10E
ARMv6 32 ARM11, Cortex-M0/-M0+/-M1,
Cortex-M3/-M4/-M7, Cortex-R4/-R5/-R7/-R8,
ARMv7 32
Cortex-A5/-A7/-A8/-A9/-A12/-A15/-A17
ARMv8 64 Cortex-A35/-A53/-A57/A-72/-A73

© Dr Usman Ullah Sheikh & Muním Zabidi 15


Thumb ISA Evolution

Original ARM had only 32-bit ARM inst set


ARM instruction set
32 bits wide ARM7 and ARM9 can execute ARM and Thumb
Conditional execution ins set
Fastest but uses most code space
Cortex-M executes Thumb-2 and Thumb only
Thumb instruction set
16 bits wide
Greater code density
Performance drop due to inability to access
certain architectural features
Special instructions to/from from ARM state
Thumb-2
A blend of 16 and 32 bit instructions
No state switching overhead, saving time and
code space
No need for separate ARM and Thumb source
code files, making software development easier

© Dr Usman Ullah Sheikh & Muním Zabidi 16


Cortex
ARM Architecture: For Diverse Embedded Processing Needs
Cortex Profiles

Cortex-A Cortex-R Cortex-M


Profile Application Real-time Microcontroller
(Highest performance) (Fast response) (Smallest/lowest power)
Operating system Rich OS/RTOS Bare metal/RTOS
64-bit ARM ISA •
Instruction 32-bit ARM ISA • •
set
Thumb ISA • • •
Deterministic
Interrupts SW-managed HW-managed
SW-managed
Pipeline length Long Long to medium Short
Cache memory • •
General-purpose Real-time signal Cost-sensitive
Market
computing processing & control microcontrollers
Mobile computing, Industrial MCU, IoT, smart sensors,
Applications smart phone, servers, storage controllers, mixed-signal devices,
high-end processors modems, networking automotive
Cortex-M Power, Area, Performance

Dynamic Dhrystone Dhrystone


Area CoreMark
power (official) (max options)
µW/MHz mm2 DMIPS/Mhz DMIPS/MHz CoreMark/MHz

Cortex-M0 4 0.01 0.84 1.21 2.33

Cortex-M0+ 3 0.009 0.94 1.31 2.42

Cortex-M3 7 0.03 1.25 1.89 3.32

Cortex-M4 8 0.04 1.25 1.95 3.40

Cortex-M7 2.14 2.55 5.01

Static power <0.7 µW/MHz


M0/M0+ uses von Neumann architecture; M3/M4/M7 uses Harvard architecture

© Dr Usman Ullah Sheikh & Muním Zabidi 20


Cortex-M4 Features

Architecture ARMv7E-M
Bus Interface 3x AMBA AHB-Lite
ISA Thumb-2
Pipeline 3-stage + branch speculation
DSP Single cycle 16/32-bit MAC
Single cycle dual 16-bit MAC
8/16-bit SIMD arithmetic
Hardware divide (2-12 cycles)
HLL Support Designed to be programmed fully in
C
Floating-point Single precision IEEE 754
Interrupt Handling Nested Vectored Interrupt Controller
Non-maskable Interrupt (NMI)
+ 1 to 240 physical interrupts
8 to 256 priority levels
Cortex-M4 Extra Features

Bit Manipulation Bit Field processing instructions


Bit banding
Wake-Up Interrupt Controller Optional
Sleep Modes Integrated WFI and WFE Instructions
Sleep On Exit capability
Memory Protection Optional Memory Protection Unit (MPU)
Debugging Support Optional JTAG & Serial Wire Debug ports
Up to 8 breakpoints & 4 watchpoints
Trace Support Optional Instruction Trace (ETM)
Optional Data Trace (DWT)
Optional Instrumentation Trace (ITM)
Cortex-M Low Power

Cortex-M4 is designed for low-power and small footprint

ARM Cortex-M4 Implementation Data

180ULL 90LP 40G


Process 180 nm ultra low power 90 nm low power G process
7-track, typical 1.8V, 25°C 7-track, typical 1.2V, 25°C 9-track, typical 0.9V, 25°C
Dynamic Power 157 µW/MHz 33 µW/MHz 8 µW/MHz

Floorplan Area 0.56 mm2 0.17 mm2 0.04 mm2


STM32
STM32 Intro

The STM32 family of microcontrollers comes


from STMicroelectronics
Based on the ARM Cortex-M 32-bit processor STM32 Cortex- Max clock Performance
architecture Series Mx (MHz) (DMIPS)
F0 M0 48 38
Very popular
F1 M3 72 61
All STM32 variants come with internal Flash F3 M4 72 90
and RAM memories F2 M3 120 150
Basic variants include STM32F0 and STM32F1 F4 M4 180 225
sub-series, with clock of 24 MHz, as small as F7 M7 216 462
16 pins H7 M7 400 856
High-performance include STM32H7, goes up L0 M0 32 26
400 MHz, as big as 240 pins L1 M3 32 33
STM32L series is is designed specifically for L4 M4 80 100
low-power portable applications L4+ M4 120 150
Some models come with Floating Point Units
(FPU)

© Dr Usman Ullah Sheikh & Muním Zabidi 25


STM32 ARM Cortex Line Card

© Dr Usman Ullah Sheikh & Muním Zabidi 26


STM32F4 Intro

Based on an ARM Cortex-M4 processor core


The number after the ‘F’ in an STM32 part number matches the ‘M’
number for the Cortex core.
All STM32F0 controllers use Cortex-M0 processor cores
All STM32F1 controllers use an M1 core, and so on.
The Cortex-M4 is more advanced core than the M0.
M4 will perform about 50% better than an M0 at the same clock speed
Cortex-M4F (STM32F4) versions have a Floating Point Unit (FPU).
Faster MCUs cost more, consume more power, and significantly
complicate development, so only use them when absolutely
required.

© Dr Usman Ullah Sheikh & Muním Zabidi 27


STM32 Naming Conventions

© Dr Usman Ullah Sheikh & Muním Zabidi 28


STM32F4 Product Line

© Dr Usman Ullah Sheikh & Muním Zabidi 29


STM32F446 Intro

STM32F446 is Cortex-M4 with FPU and DSP instructions


Flash memory 256-512KB
RAM 128 KB
Digital Supply and I/O Supply: VDD = 2.4 to 3.6 V
Analog Supply: VDDA = VDD to 3.6 V
Power-on/Power-down reset: MCU is reset when VDD < 2 V
Lower power modes:
Sleep mode: CPU is stopped. Interrupts can wake the CPU. Consumes most power.
Stop mode: all clocks are disabled, but contents of SRAM and registers are retained. Interrupts can wake
the CPU.
Standby mode: all clocks stopped, internal 1.8 V regulator is swiched off, SRAM content are lost. Lowest
power consumption.
Clock speed up to 180, performance 225 Dhrystone MIPS

© Dr Usman Ullah Sheikh & Muním Zabidi 30


STM32 Architecture Quirks

On the smaller packages, many I/O ports are not available.


The STM32F446 running at 180 MHz is 38 times faster than the original IBM PC!
But, flash memory is S-L-O-W
To minimize power, unused interface circuits are turned off or run slow
Some architectural creativity needed:
Phase Locked Loop
Adaptive Real-Time (ART) memory accelerator
Multi-Advanced High-speed Bus (AHB) switching Matrix
Complex system clock

© Dr Usman Ullah Sheikh & Muním Zabidi 31


STM32F4 Architecture
Nucleo64 Boards
STM32 Boards

Nucleo-144
Discovery board Extended Arduino pinout
Most comprehensive: includes LCD, Leverages low-cost Arduino shields
microphone, audio DAC& motion sensor So models lack crystal so the µC runs off
Older design internal 8 MHz RC
License prohibits use in commercial products PLL can be be used to run faster
Crystal installed so it can run at max speed Many µC versions use the same layout
STM32 Boards

Blue Pill
Nucleo-32
Not made by ST
Arduino Nano size
Cheapest
Nucleo-64 Can be plugged into
Arduino Nano size
Arduino Uno pinout breadboards
Debugger/programmer is
Additional I/O thru Morpho Lacks crystal (except
not included
connectors STM32Lxxx)
One of many third-party
Some models lack crystal
STM32 boards
Many µC versions available
Nucleo-F446RE Overview

Nucleo Board adds 4 features to


make experimentation easier
ST-Link debugger
2 button and 1 LED
Arduino connector for stacking
Arduino shields
Morpho connector to access all
STM32 I/O pins

More details refer to STM32 Nucleo-64 Board User Manual


(https:
//www.st.com/resource/en/user_manual/dm00105823-stm32-nucleo-64-boards-mb1136-stmicroelectronics.pdf )
Nucleo Block Diagram
Nucleo Layout
Nucleo-F411RE Arduino Connectivity

© Dr Usman Ullah Sheikh & Muním Zabidi 39


Arduino Shields

https://randomnerdtutorials.com/25-arduino-shields/
Nucleo-F411RE Morpho Connectivity
Embedded Software Development
How is Embedded Programming Different?

Must know about the software tools


Must know about the hardware
How is the memory addressed?
How to interface with sensors, actuators?
Do you need to code to be portable?
How are the pins accessed?
How to get the program into the system?
How to handle memory & speed limitations?
How are errors handled?

You don’t have to know any of these when programming for PC or mobiles

© Dr Usman Ullah Sheikh & Muním Zabidi 43


Development Toolchain

The toolchain is a set of tools to generate binaries from the source code.
Compiler: Translates human readable code into assembly
language or opcodes for a particular processor.
Produces an object file. C/C++ Assembly

Cross-Compiler: Compiler that runs on platform A (x86 PC) but


generates executables for platform B (ARM target
Compiler Assembler
board).
Assembler: Translates assembly language into opcodes. Produces
an object file. Object Object

Linker: Organizes the object files, necessary libraries, and other


data and produces an executable. Figures out memory
Linker
organization of target device and touches up addresses
of input (relocates) object files so that the program can
run under the new memory organization Executable
Debugger: Allows you to see what is going on inside your program
while it executes, or conduct post-mortem to see what
your program was doing at the moment it crashed.
© Dr Usman Ullah Sheikh & Muním Zabidi 44
Typical Program Generation Flow

J. Yiu, The Definitive Guige to ARM Cortex-M3 and Cortex-M4 Processors


Integrated Development Environments (IDEs)

The free STM32CubeIDE from


Integrated Development Environments (IDE) STMicrolectronics is an alternative
combines all of the necessary tools into an Based on the open-source gcc compiler
Download here: https://www.st.com/en/
integrated environment.
development-tools/stm32cubeide.html
A common IDE is Keil MDK ARM (uVison5 IDE)
mbed
Can be downloaded for free
Arduino-like simplicity
Allows development of code up to 32 KB
Download here: https:
For developing larger programs a licensed
//os.mbed.com/platforms/ST-Nucleo-F446RE/
version needs to be purchased
Based on them armcc official ARM compiler IAR
Download here:
http://www2.keil.com/mdk5/install System Workbench
etc...

© Dr Usman Ullah Sheikh & Muním Zabidi 46


C or Assembly?

Ideally we would never need to write assembly language.


Coding in assembly requires more effort
But C is an abstraction.
What you see may not be what’ really happening behind the scene.
With assembly:
You know about what really happening ’back there’
You can make the machine do something that’s impossible with C
You can optimize parts of the code (small code size and/or higher speed)

© Dr Usman Ullah Sheikh & Muním Zabidi 47


Embedded Programming Approaches Superloop

Easy to setup
OK for simple tasks
Low CPU utilization
Low hardware requirements
’Bare metal’

© Dr Usman Ullah Sheikh & Muním Zabidi 48


Embedded Programming Approaches Interrupt Driven

Low power usage


CPU sleeps when no processing
required
Interrupts wake up the CPU
Different devices can have different
priorities

© Dr Usman Ullah Sheikh & Muním Zabidi 49


Embedded Programming Approaches RTOS

Real-Time Operating System


RTOS handles multiple concurrent
application processes
High CPU utilization

© Dr Usman Ullah Sheikh & Muním Zabidi 50


Software Development for STM32
In-Circuit Serial Programmer (ICSP)

ICSP is required to program and test the code on the actual microcontroller.
STM32 supports 2 protocols:
Joint Test Action Group (JTAG)
Serial Wire Debug (SWD)
Nucleo64 boards come with ST-link

STM32
PC USB ICSP JTAG/SWD
MCU

© Dr Usman Ullah Sheikh & Muním Zabidi 52


Developing Applications

1. Start with basic code framework


2. Add code for specific model of MCU
3. Add code for specific application

STM32CubeMX from STMicroelectronics:


An initialization code generator
Configure the peripherals on the multiplexed pins of the MCU
Needed because each MCU pin can perform multiple functions

Configure ports Develop application  Program


on STM32CubeMX on IDE the device

© Dr Usman Ullah Sheikh & Muním Zabidi 53


What to Download
Keil IDE:
Keil IDE:
http://www2.keil.com/mdk5
Keil Board Support Package for STM32F446RE:
https://www.keil.com/dd2/stmicroelectronics/stm32f446retx/
STM32CubeMX:
https://www.st.com/en/development-tools/stm32cubemx.html
STM32CubeIDE: You either use this or Keil
STM32CubeIDE with integrated STM32CubeMX:
https://www.st.com/en/development-tools/stm32cubeide.html
Drivers:
STM32Cube MCU Package for STM32F4 series:
https://www.st.com/en/embedded-software/stm32cubef4.html
ST-Link Drivers:
https://www.st.com/en/development-tools/stsw-link009.html
Java Runtime Environment:
https://www.java.com/en/download
© Dr Usman Ullah Sheikh & Muním Zabidi 54
Blinky

Programmers print out ’Hello World’ from their first program


Embedded designers create a blinky instead

https://www.youtube.com/watch?v=hyZS2p1tW-g

© Dr Usman Ullah Sheikh & Muním Zabidi 55


Take Home
Answer The Following Questions

1. Can the Nuceloboard operate with 3.3V supply?


2. How many LEDs the board have? What are the Labels?
3. How many jumpers does the board have? How are they labeled?
4. Does the board outputs 3.3V?
5. How many buttons are on the board? How are they labeled?
6. How many oscillators are on the board? How are they labeled?
7. What types of package the STM chip uses?
8. How many pins does the MCU have?
9. Which port the BUE button is connected to?
10. PC10 is connected to which PIN on the chip?
11. What is the value of the resistor directly connected to the Green LED?
12. Looking at the board, where are SB29 and SB42 and are they closed or open?
13. Which IO port LD2 is connected to?
14. How many pins (in total) are on each Morpho connector?
15. Which PIN on the Morph connector is connected to PC10? What is the reference designator for this
Morpho connector?
16. On your board is JP6 placed? What is it used for (see the schematic)?
17. How many PORTS are available on the board (e.g., Px) ? How many bits does each PORT have?
© Dr Usman Ullah Sheikh & Muním Zabidi 57
References

Official ARM Cortex-M4 Overview


http://ccrs.hanyang.ac.kr/webpage_limdj/embedded/Cortex-M.pdf
Programming Your NucleoBoard
https://web.sonoma.edu/users/f/farahman/sonoma/courses/es310/310_arm/labs/Lab5/Lab5_all/
ProgramingNucleoBoard_2.pdf
ARM ISA
https://www.youtube.com/watch?v=15z_vn4H41U

[1] G. Brown, Discovering the STM32 microcontroller.


Indiana University, 2016.
Available https://legacy.cs.indiana.edu/~geobrown/book.pdf.

[2] T.O.M.A.S – Technically Oriented Microcontroller Application Services, “STM32Cube MX HAL MOOC
presentation 16:9.” v0.03, Available
https://www.mikrocontroller.net/attachment/319720/STM32Cube-MX-HAL-MOOC-presentation-16-9.pdf , 2018.

[3] STMicroelectronics, Description of STM32F4 HAL and low-layer drivers, UM1725 ed., 2020.
Available https://www.st.com/resource/en/user_manual/
dm00105879-description-of-stm32f4-hal-and-ll-drivers-stmicroelectronics.pdf .

© Dr Usman Ullah Sheikh & Muním Zabidi 58

You might also like