You are on page 1of 55

11/6/2015

SYSTEM ENGINEERING

Engineering 180
Systems Engineering
Embedded Processing Case Study

Lecture 1
May 21, 2015
Steve Kirsch
2015 Steve Kirsch- All Rights Reserved

Outline for Lectures on


Real-Time Embedded Processing

SYSTEM ENGINEERING

Lecture 1
Overview of Embedded Subsystem Design
Case Study: Problem statement

Lecture 2
Conceptual Design

Lecture 3
Preliminary Design

Lecture 4
Detailed Design / Integration and Test
2015 Steve Kirsch- All Rights Reserved

Lecture 1: Agenda

SYSTEM ENGINEERING

Overview

What is an Embedded Processing System


Characteristics
Examples of Embedded computers
Summarize Design Challenges

Case Study: The road to SDR: Applying System


Engineering Process

Problem statement
Identify stake holder
Top level requirements
Key Performance Parameter (KPPs)

Homework
2015 Steve Kirsch- All Rights Reserved

11/6/2015

Overview: What is an embedded processing system


SYSTEM ENGINEERING

Wikipedia:
An embedded system is a computer system with a
dedicated function within a larger mechanical or
electrical system, often with real-time computing
constraints
It is embedded as part of a complete device often
including hardware and mechanical parts.
Embedded systems control many devices in
common use today

2015 Steve Kirsch- All Rights Reserved

Overview: continued
SYSTEM ENGINEERING

Embedded processing sub-system


Consists of one or more digital
processors integrated with other parts of
a complex system (sensors, actuators,
user interface, etc.)
Arranged in a tightly coupled
architecture, designed to perform a
specific set of functions

Generic System with Embedded Digital Processor


Sensor

Amp

ADC

Embedded
Digital
Processor

DAC

Amp

Load

2015 Steve Kirsch- All Rights Reserved

Overview: continued
SYSTEM ENGINEERING

System design as previously discussed is hierarchical


A system is broken down into subsystem which in turn is broken down
in to more subsystems
Embedded computer subsystem in itself is a complex system using the
same principles employed at higher level the system level

2015 Steve Kirsch- All Rights Reserved

11/6/2015

Overview: continued
SYSTEM ENGINEERING

Embedded processing subsystems are unique because its


requirements are implemented in both hardware and software
components
An embedded processing subsystem engineer early task is to flow
down high level requirements and allocate these requirements to
hardware and software components
An embedded processing subsystem engineer therefore needs broad
knowledge in both hardware and software systems

2015 Steve Kirsch- All Rights Reserved

Overview: continued
SYSTEM ENGINEERING

Embedded subsystems today are perhaps the most


critical and complex part of the system to get right,
and to get right early in the system design process
Functionality that was classically implemented as hardware
solutions, today are being implemented as both hardware
and software components (trending to more and more
software)
Human and environmental interfaces are sensed and
controlled by interaction of hardware and software
components
Real-time operation is a function of hardware and software
interaction
Deterministic behavior is critical
2015 Steve Kirsch- All Rights Reserved

Overview:
Embedded system engineers job is very challenging

SYSTEM ENGINEERING

Embedded subsystem engineer is required to have broad expertise


Hardware & Software development management process and tools
Mechanical / Structural (Enclosure designs)
Thermal dynamics (Cooling of electronic critical to system design)
Materials (Enclosures, Backplanes, modules, connectors etc)
E&M (Electromagnetic radiation and protection)
Computer hardware architecture (interface standards, networks, memory
architecture, processor architecture, communication protocols, electronic
components (GPGPU, FPGA, ASIC technology, etc).
Computer Software architecture (interface standards, communication
prototcols, Operating systems, development tools, computer languages and
programming models (parallel processing, streaming, objection oriented,
scripting, etc)
System Operational theory (e.g. Communication or Radar theory with an
understanding of processing algorithm)
System Simulation tools (e.g. matlab)
2015 Steve Kirsch- All Rights Reserved

11/6/2015

Overview: Hardware Properties


SYSTEM ENGINEERING

Key flow down system requirements of an embedded processing


system

Size
Weight
Power
Life cycle costs
Non-recurring develop cost
Recurring cost

Rugged operating environment


Durability / Reliability
Maintainability
Supportability
Development schedule
Development test, integration environment
Functional requirement (many application specific)

2015 Steve Kirsch- All Rights Reserved

10

Overview: Software Properties


SYSTEM ENGINEERING

Key flow down system requirements of an embedded processing


system
Life cycle costs
Non-recurring develop cost
Recurring cost

Durability / Reliability
Maintainability
Supportability
Development schedule
Development test and integration environment
Infrastructure requirement (many application specific)
Build-time (Drivers, Libraries, Interfaces)
Run-time (Services, Clients, Servers, etc.)

Functional requirements (application specific)

2015 Steve Kirsch- All Rights Reserved

11

Overview: Software Properties


Key Embedded Processing Application CharacteristicsSYSTEM ENGINEERING
Complex Algorithms
Environment sensing and filtering
Visualization
Tracking

User interfaces
Human to computer
computer to computer

Realtime operation
Hard realtime sec - msec response times
Testability / observability

Multirate (asynchronous events)

2015 Steve Kirsch- All Rights Reserved

12

11/6/2015

Overview: Balance design (CRISP)


SYSTEM ENGINEERING

Cost of ownership (life cycle cost)


Development cost, production cost, support cost (training, spares,
repair, system upgrade, )

Risks (technology risk, production risk, obsolescence, )


Installation (weight, size, power, style, transportation, )
Supportability (reliability, maintainability, )
Performance (functionality, ease of use, throughput, )

2015 Steve Kirsch- All Rights Reserved

13

Overview: Real Design is a Compromise Among


Conflicting Needs and Desires
SYSTEM ENGINEERING

Functional Performance

Mission environment

Add your need here

Support

Detection performance
Tracking accuracies
ID capabilities
Weapon support
Map characteristics

Offboard info
Communication
requirements
Weapon
characteristics

Design has to
balance multiple
desires and
constraints

Maintenance concept
Reliability
Maintainability measures
Built-in test capability

Physical characteristics

Weight
Size (O&M)
Prime power
utilization
Cooling required
Dissipation
EMI/C characteristics

Physical environment

Operating temperatures
Storage temperatures
Coolant characteristics
Vibration levels
Shock
Prime power characteristics
EMI/C requirements

Cost

Recurring cost
Development cost
Life-cycle cost

Programmatic Characteristics

Development plan
Production plan
Risks
Technology maturity

Source: Raytheon
2015 Steve Kirsch- All Rights Reserved

14

Overview: Embedded computing examples


SYSTEM ENGINEERING

Whirlwind I Lessons learned from the 40s!

Core Memory Controller

2015 Steve Kirsch- All Rights Reserved

15

11/6/2015

Overview: Embedded computing examples


SYSTEM ENGINEERING

ENIAC First fully electronic turing machine

2015 Steve Kirsch- All Rights Reserved

16

Overview: Embedded examples


SYSTEM ENGINEERING

Intel 4004 First processor

2015 Steve Kirsch- All Rights Reserved

17

Overview: Embedded examples


SYSTEM ENGINEERING

HP-35 Wikipedia
The HP-35 was Hewlett-Packard's first pocket calculator and the
world's first scientific pocket calculator[1] (a calculator
with trigonometric and exponential functions). Like some of HP's
desktop calculators, it used reverse Polish notation. Introduced at
US$395,[2] the HP-35 was available from 1972 to 1975.

2015 Steve Kirsch- All Rights Reserved

18

11/6/2015

Overview: Embedded examples


SYSTEM ENGINEERING

PlayStation3 based on the IBM Cell processor in 2007


was way ahead of its time

2015 Steve Kirsch- All Rights Reserved

19

Overview: Embedded Examples


SYSTEM ENGINEERING

Tianhe-2 Worlds fastest computer in 11/2014


Tianhe-2 is build from Intel Xeon Phi

Knights Ferry / Knights Landing


14nm processing

2015 Steve Kirsch- All Rights Reserved

20

Overview: Embedded examples


SYSTEM ENGINEERING

Qualcomm SoC -- Snapdragon 800 processors

2015 Steve Kirsch- All Rights Reserved

21

11/6/2015

Overview: Single Chip Compute


SYSTEM ENGINEERING

Single Chip Computer or processor is the


foundation of embedded computing today
Embedded computational systems today are
constructed with
single processor chip
Array of processor chips
SoC (system on a chip) that contains processor cores

Therefore understanding the key aspects of a


processor is fundamental for an embedded
system engineer
2015 Steve Kirsch- All Rights Reserved

22

Overview: Physics of Software


SYSTEM ENGINEERING

Computing is a physical act.


Computers abstract information but in fact do their work by
moving electrons
This is fundamentally why it take time and energy to
compute

Software performance and energy consumption is


where we connect embedded computing to the real
word
Embedded engineers make high-level decision about
the structure of their programs to greatly improve
their real-time performance and power consumption

2015 Steve Kirsch- All Rights Reserved

23

Overview: Challenges in Embedded Computing


System Design
SYSTEM ENGINEERING

How much hardware do we need?


How do we meet deadlines?
How do we minimize power consumption?
How do we design for upgradability?
Does it really work and meet the
requirements?
How can you get the job done with the budget
and schedule constraints?
2015 Steve Kirsch- All Rights Reserved

24

11/6/2015

Case Study:
The Need Phase Proposal Phase

SYSTEM ENGINEERING

The proposal phase usually consists of


Defining the problem:

Identify customers and stakeholders


Understand their needs
Understand and develop the operational concept
Identify the constraints

Defining the system (or product) to be procured or built


Building the system specification (procurement spec)
Make sure the problem is solvable. Identify risks and risk mitigation plans
Investigate potential system designs

Preliminary system modeling and performance assessment


Preliminary program plan and schedule
Development cost projection

Develop product testing and evaluation strategies


Writing a proposal
Winning the contract

It is marketing, it is management,
It is a lot of engineering, and it is about managing risk
2015 Steve Kirsch- All Rights Reserved

25

Case Study: Problem Statement


SYSTEM ENGINEERING

The government in conjunction with a prime contractor


intends to design and build a surveillance and reconnaissance
radar system that will be compatible with existing UAVs as
well as new more advanced UAVs of the future
The Radar primarily function is Air to Ground capability to
locate and disarm ground moving troops, equipment and air
defense systems
Future system
Joint US Navy/ US Air
unmanned combat air vehicle

Current System

Predator
2015 Steve Kirsch- All Rights Reserved

26

Case Study: Nested Design Process


Radar Processing Embedded Subsystem
Nested Design Process for Complex Systems

SYSTEM ENGINEERING

Conceptual Design

System Level

System Specification

Conceptual design for


subsystem B

Preliminary Design
System Architecture

Level #1 A
Subsystems

Subsystem Specifications

Preliminary Design B
Subsystem Architecture

Level # 2
Subsystems

B1

2015 Steve Kirsch- All Rights Reserved

B2

B3
27

11/6/2015

Case Study: Problem Statement


SYSTEM ENGINEERING

Radar system engineers are responsible for consuming the


system level specifications and decomposing the requirements
into level 1 requirements so that they could be allocated to the
major subsystem components
Antenna / Beam steering computer
Receiver / Exciter
Processor subsystem This is our task
Many additional requirements not explicitly specified in the
level 1 spec referred to as derived requirement are also
flowed down to the next level of the major subsystem
components

2015 Steve Kirsch- All Rights Reserved

28

Case Study: Problem Statement


SYSTEM ENGINEERING

Radar System Block Diagram


Stable
Microwave
Source

Frequency
Synthesis/
WF Gen

Power
Amplifier

Antenna

Reciever Exciter
Subsystem

Low Noise
Amplifier

DownConversion

A/D
Conversion
and Timing
Generator

Digital
Signal
Processing

Control

Image
Information

Detected Objects

Control, Interface, and Data Processing

Antenna
Subsystem

Processing
Subsystem
Commands,
Motion Data
Radar Results,
Health Info

System
Interfaces

2015 Steve Kirsch- All Rights Reserved

29

System Design Process (One view)


Need
+
Desires

Potential
Solutions

Conceptual Design

Baseline
Solution

SYSTEM ENGINEERING

Detailed,
Documented
Baseline

Preliminary Design

Includes:
Elicitation of need and requirements
Design through insight, invention, and successive
refinement
Management of complexity through partitioning and
creating well-posed lower-level design problems
2015 Steve Kirsch- All Rights Reserved

30

10

11/6/2015

Case Study: Why is system engineer so challenging?


SYSTEM ENGINEERING

The most important decision are made early in the design


cycle with the least amount of detailed information
A

Concept
Technology
Refinement Development

System Development
& Demonstration

Production &
Deployment

Operations
& Support

System Life Cycle


Combat Developer
TRADOC

Materiel Developer
PM Total Life Cycle System Manager

Army Materiel Command

Acquisition Framework
High
Ability to
Influence
LCC
(70-75%
of Cost
Decisions
Made)

Less Ability to
Influence LCC
(85% of Cost
Decisions
Made)
(10%-15%)

Little Ability to
Influence LCC (90-95%
of Cost Decisions
Made)

Minimum Ability to Influence LCC (95% of Cost


Decisions Made)

(5%-10%)

28% Life Cycle Cost

72% Life Cycle Cost

2015 Steve Kirsch- All Rights Reserved

31

Case Study: Problem Statement


SYSTEM ENGINEERING

Customer system level specification describes

Mission scenarios
Threats
Operational environment
Platform resource allocation for the Radar system
Space
Weight
Cooling capacity

Operators interfaces
Mission stability (system shall run continuously for N hours)
Plus many more illities requirements

2015 Steve Kirsch- All Rights Reserved

32

Case Study: Problem Statement


SYSTEM ENGINEERING

The system level specification and requirements


allocation is a complex task

Results of this work are documented and reviewed at the


SDR

The major subsystem responsible engineers (REA) is


part of the Radar system team that does the allocation
Involvement of stake holders required

Program manager (also part of the Radar system


engineering team)

Establishes the Work Breakdown Structure (WBS)


Allocates budget to each of the WBS line items
Creates an integrated master plan (IMP)
Creates an integrated master schedule (IMS)

2015 Steve Kirsch- All Rights Reserved

33

11

11/6/2015

Case Study:
System Conceptual Design Phase Product
Products
Baseline design

Performance

Risk

Cost

Schedule

Other high-level attributes

SYSTEM ENGINEERING

Characterized by top-level budgets and supporting analysis

Supported by enough lower-level design to give confidence in the numbers

Hardware, algorithms, signal processing sizing, software sizing


Top-level program plan
Schedule
Headcounts vs time
Critical item development plans
Top-level understanding of programmatic issues

Subsystem spec
2015 Steve Kirsch- All Rights Reserved

Case Study:
System Conceptual Design Review (SDR)

34

SYSTEM ENGINEERING

Review of the concept and supporting documents


Concept Analysis Review
System modeling and Simulation results
Compare and contrast conceptual designs and review
justifications for selected baseline
Risk mitigation plan going forward

Establish the functional baseline


Approve the system specification

2015 Steve Kirsch- All Rights Reserved

35

Case Study: Post SDR flowdown requirement


SYSTEM ENGINEERING

Processor subsystem Level 1 requirements specification

Space
Weight
Power
Cooling
Illities
Transmit waveform specifications (PRF, num coherent pulses
transmitted/collected, sample rates, number of receive channels,
phase coding, etc.)
Processing algorithms (preliminary)
Interfaces (Sensor data, Sensor command, Nav, mission
computer, instrumentation system)
2015 Steve Kirsch- All Rights Reserved

36

12

11/6/2015

Case Study: Definition of terms


SYSTEM ENGINEERING

Ground Area of Interest


Azimuth

Swath Swath Swath


1
2
3

Range

CPI: Coherent Processing Interval

Series of pulse with a phase relationship transmitted and collect that can be coherently processed

PDI: Post Detection Integration

Multiple CPI are non-coherently integrated

Dwell time

Time to radiate a single beam position on the ground

Bars

Number of beam positions to radiate a swath

PRF Pulse rate frequency

Rate pulses are transmitted

PRI Pulse rate interval = 1/PRF

Time from the start of one pulse to the next

Pulse modulation

Amplitude and phase superimposed on the pulse during the duration of a pulse

Receive channels Radar antenna are typically partitioned into subArrays that have physically offset
phase centers connected to a separate receiver and A/D
2015 Steve Kirsch- All Rights Reserved

37

Case study: GMTI


Ground Moving Target Indicator waveform
CPI 1
pulse 0 - N

SYSTEM ENGINEERING

CPI M
pulse 0 - N

Key parameters for embedded subsystem design

Number of CPIs/Dwell
Number of pulses/CPI
Pulse modulation LFM linear frequency modulation
Number of receive channel
PRF
Number swaths/scan area
Scan area rate

2015 Steve Kirsch- All Rights Reserved

38

Case study:
GMTI Processing Algorithm (CPI processing)
SYSTEM ENGINEERING

I/Q
Formation

Pulse
Compression

Clutter
Cancellation

2015 Steve Kirsch- All Rights Reserved

Motion
Compensation

Noise
Estimation

Target
Detection

Doppler
Filtering

PDI
Processing

39

13

11/6/2015

Case study:
GMTI Processing Algorithm (PDI processing)SYSTEM ENGINEERING

CPI
Processing

False Alarm
Control
M of N
Processing

Ambiguity
Resolving

Angle
Estimation

Noise
Estimation

Sidelobe
Detection
Rejection

Target
Parameter
Estimation

Hit
List

2015 Steve Kirsch- All Rights Reserved

40

Homework
SYSTEM ENGINEERING

Text: Computers as Components


Principles of embedded computing system Design
By Professor Wayne Wolf
Text link: Available in CCLE 15S-ENGR180-1 Information
Folder
http://ceng2.ktu.edu.tr/~ulutas/Courses/EmbeddedSystems/0123
743974.pdf

Read: Chapter 1 Embedded Computing


Introduction
1.1 Complex Systems and Microprocessors

Write up to a 1 page discussion answering


Why are microprocessors used in complex system designs?
2015 Steve Kirsch- All Rights Reserved

41

14

11/6/2015

SYSTEM ENGINEERING

Engineering 180
Systems Engineering
Embedded Processing Case Study

Lecture 2
May 26, 2015
Steve Kirsch
2015 Steve Kirsch- All Rights Reserved

Outline for Lectures on


Real-Time Embedded Processing

SYSTEM ENGINEERING

Lecture 2
Conceptual Design

2015 Steve Kirsch- All Rights Reserved

Lecture 2: Agenda: Conceptual Design Process


SYSTEM ENGINEERING

Review Homework
Starting point for conceptual design of the
embedded processing
Feasibility and Requirements Analysis
Embedded processing design synthesis
Subsystem concept design review process
Homework

2015 Steve Kirsch- All Rights Reserved

11/6/2015

Homework 1: review
SYSTEM ENGINEERING

Why are microprocessors used in embedded computing systems?


Many examples of processors use in embedded computing

Perhaps experience tells us that something about this approach if fundamental

Large variety of processors to choose from

High potential there is a best fit

Alternative to processors is custom utilizing hardwired logic. Advantage over this


alternative

Easier to design and debug


Allows for possibility of upgrade and adding new functionality

More efficient than custom logic

Custom design will have some logic dedicated to sub-functions that arent active all the
time. Microprocessors logic is reused for all sub-functions
Microprocessors are application agnostic, therefore we can leverage huge investments
made by others. Application specific logic can be implemented in software
Microprocessors can be faster than custom logic (Seems almost counter intuitive!)
Utilizes the latest manufacturing processes
Resources available for access to the best experts and large design teams
Can over come the overhead of interpreting instructions with clever utilization of
parallelism

2015 Steve Kirsch- All Rights Reserved

Homework 1: review
SYSTEM ENGINEERING

What differentiates embedded computing from other forms of computing?

Program must meet deadlines


Must be fast enough
Needs to have deterministic behavior to guarantee it will be fast enough

To understand real time behavior of an embedded computing system one needs to


understand the component from the lowest level to the highest level of the system.
What are the 5 components from the lowest to the highest

CPU: (processors plus memory)


Platform: (CPU scaffolding):
Components supporting the CPU (eg Buses, I/O devices)
Program:
Programs can be very large, CPU see a very small window of the program at any one time. We
must consider the structure of the program to determine the overall behavior of the system
Tasks:
We generally run several programs simultaneously on a CPU, creating a multi-tasking system.
Tasks interact with each other in way that have profound implications for performance
Multiprocessors:
A system can have many microprocessors all interacting with each other as well as other
potentially interacting with accelerators. The interaction can be very complex to analyze and
determines the overall system performance.

2015 Steve Kirsch- All Rights Reserved

Concept Design Phase: Embedded Processing


SYSTEM ENGINEERING

BAA Broad Area Announcement typically precedes RFP request for proposal
when contracting with the US government

This is a head start on preparing for the RFP

RFP let

Procurement specs review and analyzed


Enormous effort applied at this stage to develop a system design concept or concepts
Proposal written and submitted

Contract won!

Often leveraging years of IR&D

System engineers decomposed and allocated the level 1 requirement

Produced preliminary subsystem specifications

Subsystem development team leads identified

Program manager
Subsystem architect (head technical subsystem engineer)
Development team leads (Tech leads)
Hardware unit lead
Mode software lead
Infrastructure software lead

2015 Steve Kirsch- All Rights Reserved

This is our starting point for the Embedded


Processing Concept design phase
6

11/6/2015

The Conceptual Design Process


SYSTEM ENGINEERING

2015 Steve Kirsch- All Rights Reserved

Embedded Processing Concept Design:


Stakeholders Requirements C1
Who are the stakeholders?

Requirement sources

Our subsystem team


Subsystem Program Manager
Subsystem Architect
Tech Leads and their development
teams

Customers
System Team
System Program Manager
Contracting organization

Vendors and Suppliers


Test and Integration team

SYSTEM ENGINEERING

Procurement spec
Subsystem specs (Generated by tier 1 system team)
KPPs (Key Performance Parameters identified by
customer or system team)
System TPMs (Technical Performance Measures)
SRD (system requirements document)
SDD (system design description)
SRR (system requirements review material)
Vendor components specifications
Legacy systems components *
Standards *
Laws of Physics
Company development procedures, ethics, rules
Laws of the land and point of deployment
Common sense
* Potential requirement source

2015 Steve Kirsch- All Rights Reserved

Embedded Processing Concept Design:


Feasibility Analysis C2

SYSTEM ENGINEERING

Identify the possible processing solutions


Study the viability of these solutions according to the
flowdown requirements
performance, cost, schedule, risk, supportability,

Key questions:
Can we design the embedded processing to run in realtime while
meeting the SWaP-C requirements?
SWAP-C (Space Weight and Power - Cost)

What are the key risks?


How to reduce the risks?
Is Preliminary Subsystem specifications reasonable? Could it be
modified to reduce the risks and still meets the main system objectives?

2015 Steve Kirsch- All Rights Reserved

11/6/2015

Embedded Processing Concept Design:


Feasibility & Req Analysis C2 & C3 -- Step 1:

SYSTEM ENGINEERING

Area of Interest (AoI)


Swath Swath Swath
1
2
3

Understand requirements and focus first on the primary requirements that will
likely drive the top level design
For our case study, the real-time signal processing requirement is key
System requirement is to scan an area of interest in N secs process the real time
data and produce a hit report of all ground movers within the AoI with a false
alarm rate of R and a probability of detection P.
The system flowdown requirements have specified the waveforms and the signal
processing algorithms that can achieve this system performance
As one begins to drill down to the next level of detail some requirements might not be
achievable with in the scope of other requirements
Requirements can be modified to help achieve the primary system goals at this stage

It is up to you to only accept requirements that can be achieved


2015 Steve Kirsch- All Rights Reserved

10

Embedded Processing Concept Design:


Feasibility & Req Analysis C2 & C3 -- Step 2:

SYSTEM ENGINEERING

Derive the key performance parameters for the embedded processing


subsystem
Data rates
Memory requirements
Processing throughput requirements based on the required processing
algorithms
I/Q
Formation

Pulse
Compression

Clutter
Cancellation

Motion
Compensation

Noise
Estimation

Target
Detection

Doppler
Filtering

PDI
Processing

Coherent GMTI Processing Algorithm


2015 Steve Kirsch- All Rights Reserved

11

Deep dive into data rates:


SYSTEM ENGINEERING

Data rates can help the embedded processing subsystem


engineer to understand a lot about the problem
In our case study the Sensor is producing a very high input
stream of data (10s of Gsamples/secs)

A/D rates
A/D sample word size (often a function of data rate)
Number of input data channels
REX Processor network bandwidth and protocol
How is the data packaged and shipped?
How much extra bandwidth is needed for the protocol (eg. error correction coding)?
What is the receive duty? (How much of the total time is data streaming?)

Synchronous or Asynchronous data flow


Flow control
How much rate buffering is required?
How is data synchronization achieved?

Data rates will drive memory and processing requirements


2015 Steve Kirsch- All Rights Reserved

12

11/6/2015

Deep dive into data rates: Receive Duty


SYSTEM ENGINEERING

Typical Radar Processing One Beam Position (Dwell)

N Pulses
Receive
window

Receive
window

Receive
window

Receive
window

Receive
window

xxxxxx

A/D Samples (Range Bins)


Doppler filter bank
2015 Steve Kirsch- All Rights Reserved

13

Deep dive into data rates:


System Front End Duty

SYSTEM ENGINEERING

System requires multimode operation


Tight interleaving of frontend resources desired for
best system performance

Data CollectionTime

Mode 1

Mode 2

Mode 1

2015 Steve Kirsch- All Rights Reserved

14

Deep dive into data rates:


SYSTEM ENGINEERING

Pipeline processing of an AoI


Complete Dwell of data collected prior to processing
Memory required to hold a complete set of data (how big would this be?)
Dwells

Data CollectionTime

Data Processing Time

2015 Steve Kirsch- All Rights Reserved

D0

D1

Mode 1
Collection

Mode 2
Collection
Mode 1 D0
Processing

D2

Mode 1
Collection
Mode 2 D1
Processing

15

11/6/2015

Deep dive into data rates:


SYSTEM ENGINEERING

Pipeline with overlap processing


Data processed while it is being collected
Memory size required reduced
Dwells

Data Collection Time

D0

D1

Mode 1
Collection

Mode 2
Collection

Mode 1 D0
Processing

Data Processing Time

D2

Mode 1
Collection

Mode 2 D1
Processing

Mode 2 D2
Processing

2015 Steve Kirsch- All Rights Reserved

16

Deep dive into data rates:


SYSTEM ENGINEERING

Parallel Processing
Memory size required could be larger then the fully pipelined architecture
Processing performance per processor reduced
Notice the time to get the results from processing dwell D0 (latency) is longer in this case
D0

Data Collection Time


Data Processing Time Processor 0
Data Processing Time Processor 1

Mode 1
Collection

Dwells
D1

Mode 2
Collection

D2

Mode 1
Collection

Mode 1 D0
Processing
Mode 2 D1
Processing
Mode 1 D2
Processing

Data Processing Time Processor 2


2015 Steve Kirsch- All Rights Reserved

17

Deep dive into data rates: What have we learned?


SYSTEM ENGINEERING

Keeping up with the real-time input data rate presents architecture


trade-offs that can be used to balance the requirements for memory
and processor(s) performance
Parallel processing approaches that may utilize an array of
potentially slower processors could add processing latency but have
more throughput performance overall
Latency is more important for some applications than others
Examples:
Air to Air alert confirm modes need very short latency
SAR ground maps typical have very loose latency requirements

There exists many opportunities to exploit processing parallelism


once system requirements are fully understood
For highly computation intensive signal processing requirements exploiting
parallelism is typically required to achieve system requirements
Exploiting parallelism can be a cost effective approach for many less
computation intensive applications as well
2015 Steve Kirsch- All Rights Reserved

18

11/6/2015

Deep dive into processor performance:


SYSTEM ENGINEERING

Software Performance Drivers


Algorithm design - Clever way of doing a computation can sometimes
provide dramatic improvement in system processing performance
Compiler efficiency - Optimized compiler can generate efficient code
(fewer hardware instructions needed per high order language
instruction)
OS responsiveness - Real-time Operating System (RTOS) can be
designed to require minimal resources for the OS itself

Hardware Performance Drivers


Processor execution speed - Optimized processor can execute more
hardware instructions per second (or per watt)
Memory effective bandwidth - Memory and memory bus can provide
sufficient data and instruction access to keep up with the processor
I/O system effective bandwidth - External interfaces/network provide
sufficient input and output bandwidth to keep processor busy
Balanced design requires each of these factors to be considered in allocating system resources
2015 Steve Kirsch- All Rights Reserved

19

Deep dive into processor performance:


Requirement Allocation can greatly affect performance
SYSTEM ENGINEERING
Critical part of the embedded processing concept design is functional
requirements allocation to hardware vs software
Many cases it is quite simple to do this allocation
Memory storage (Hardware)
ALU functions (the lowest level of a computation engine) (Hardware)
Basic Operating system functions (mutex, semaphores, thread
scheduling) (Software functions)
Many function will have both hardware and software components
Ethernet interfaces
DMA controllers (will be discuss later in detail)

It is very important to define the boundary between hardware and software


ISAs (Instruction Set Architecture) are commonly used to define the boundary
between hardware and software for a programmable device
To the software it is an abstraction of the hardware
To the hardware its a specification of what the hardware is required to do

2015 Steve Kirsch- All Rights Reserved

20

Hardware Performance Drivers


SYSTEM ENGINEERING

Key performance drivers of an embedded processor

Memory Bandwidth
Network Bandwidth
I/O Bandwidth
CPU OPs (Operations per sec)
Signal processing performance usually expresses performance in FLOPS (Floating
point operations per sec
I/O Ports
Program
Memory

Classic Harvard Architecture


Data
Memory

CPU
Clock

How can we optimize for performance?


Exploit parallelism
Utilize a CPU that best fits the job
2015 Steve Kirsch- All Rights Reserved

21

11/6/2015

Exploiting Parallelism To Improve Performance


SYSTEM ENGINEERING

Parallelism is present in multiple forms


Thread or Task Level Parallelism (TLP)
Wikipedia:
Task parallelism (also known as function
parallelism and control parallelism) is a form
of parallelization of computer code across
multiple processors in parallel computing environments. Task
parallelism focuses on distributing execution processes
(threads) across different parallel computing nodes. It
contrasts to data parallelism as another form of parallelism.

2015 Steve Kirsch- All Rights Reserved

22

Thread Level Parallelism: cont.


SYSTEM ENGINEERING

Independent execution stream can execute in parallel all


working on a single goal
Example with the multiple processor example showed earlier
processing multiple AoIs in parallel

Simultaneous multithread operation is commonly supported


within modern processors
Multiple cores running independent threads
Multiple hardware threads within a single core
(SMT symmetric multi-threading or hyper-threading)

Most modern operating systems support simultaneous


multithreading

2015 Steve Kirsch- All Rights Reserved

23

Exploiting Parallelism To Improve Performance


SYSTEM ENGINEERING

Data Level Parallelism (DLP)


Wikipedia:
Data parallelism is a form of parallelization of
computing across multiple processors in parallel
computing environments. Data parallelism focuses
on distributing the data across different parallel
computing nodes. It contrasts to task parallelism as
another form of parallelism.

2015 Steve Kirsch- All Rights Reserved

24

11/6/2015

Data level parallelism: cont.


SYSTEM ENGINEERING

Data is organized where the same operation can be performed


on the data set at the same time.
This form of parallelism is abundant in Radar signal
processing
(will be discuss later when we return the GMTI algorithm)
Can be exploited classically in two typical ways
Multi-core processor
SIMD (single instruction multiple data) processing cores
Multi-core P

P Core

Core 1
Input
Data

Core N

Reg File

Output
Data

Instruction

ALU
0

2015 Steve Kirsch- All Rights Reserved

ALU
N

25

Other Hardware Architecture Parallelism


SYSTEM ENGINEERING

DMA Direct Memory Access is a form of this parallelism


To move data from memory to an I/O device CPU cycle are
required with no DMA capability
With DMA a few CPU cycles are utilized to setup the DMA
transfer and then can do work in parallel with the data movement

Transfers without DMA

Transfers with DMA

2015 Steve Kirsch- All Rights Reserved

Optimizing Performance by:


Utilizing the Best Fit CPU Type

26

SYSTEM ENGINEERING

Some multicore p contain SIMD engines


Freescale, Intel

DSP Chips (Digital Signal Processing)


Texas Instruments

GPGPUs (General Purpose Graphic Processing Units)


Nvidia, Intel, AMD/ATI

FPGA (Field Programmable Gate Arrays)


Altera, XILINX

ASIC (Application Specific Integrated Circuit)


VLSI, Softchip, Micronix Integrated Systems

2015 Steve Kirsch- All Rights Reserved

27

11/6/2015

Trade-offs between GPPs, DSPs, and FPGAs


SYSTEM ENGINEERING

General Purpose
Processors (GPPs)

Digital Signal
Processors (DSPs)

Field Programmable
Gate Arrays (FPGAs)

Throughput per Watt

Lowest

More than GPPs

Highest (fixed point


arithmetic)

Ease of Application
Programming

Full-featured support of
HOL programming

Limited support of
HOL programming

VHDL Required

Support for Floating


Point Arithmetic

Yes. Some high-end


GPPs have SIMD
floating point vector
units

Limited number of
products available
with floating point

Only with large


performance penalty
(Products with optimized
floating point starting to
appear.)

Recurring Cost per


Component

High Performance
GPPs more expensive
than DSPs

Lowest

Highest

When to Consider
Using

When time to market is


the dominant issue or
when high performance
floating point arithmetic
is essential

When recurring cost


is most important

When (fixed point)


throughput per watt is
most important

2015 Steve Kirsch- All Rights Reserved

28

Comparing ASICs and FPGAs

SYSTEM ENGINEERING

An Application-Specific Integrated Circuit (ASIC) is an


integrated circuit (IC) customized for a particular use, rather than
intended for general-purpose use.
A Field-Programmable Gate Array (FPGA) contains
programmable logic blocks and programmable interconnects that
allow the same FPGA to be used in many different applications.
Property

ASIC FPGA

Requires foundry run(s) for


each application

Yes

No

Typical Development Cost

High

Moderate

Lengthy

Moderate

Typical Development Schedule


Recurring Cost

Moderate

High

High

Moderate

Power Consumption per


Function

Lower

Higher

Maximum Clock Frequency

Higher

Lower

Functional Density

2015 Steve Kirsch- All Rights Reserved

29

The Conceptual Design Process


SYSTEM ENGINEERING

2015 Steve Kirsch- All Rights Reserved

30

10

11/6/2015

System Concept High Level


Embedded Processing System Architecture

SYSTEM ENGINEERING

Mission
Computer
INS
GPS
Embedded Processor
General Purpose
Processing

High Speed
Instrumentation
System

REX
Signal Processing

2015 Steve Kirsch- All Rights Reserved

31

Subsystem Level Synthesize


SYSTEM ENGINEERING

This is where the rubber meets the road


Multiple options are created and assessed
A Baseline architecture is selected and performance is estimated (usually rough at
this stage)

Relevant like system performance can be useful as a guide


Scaling validated performance from an earlier fielded system or IR&D efforts can
greatly reduce risk

Performance assessment at this develop stage

Difficult for revolutionary designs


Easier for evolutionary design

Full set of flowdown requirements will be too much to fully evaluate in

detail at this stage


Focus on the KPPs (Key Performance Parameters)
Use SMEs (subject area experts) to help to guide focus on the highest risk
largest impact requirements
Identify the key items to evaluate
For our case study we know from experience that the signal processing
will utilize the bulk of the SWAP-C and drive system performance
2015 Steve Kirsch- All Rights Reserved

32

Evaluating CPU Performance


SYSTEM ENGINEERING

Best approach would seem to be to run the actual application


software workload on each candidate processor and measure
required CPU Time for each
Possible complications with this approach
Application software may not be available early in development process
when CPU must be selected
Application software may include dependencies on Operating System
and on external interfaces
Multiple versions of the application software may need to be created
May be difficult to remove effects of CPU idle time due to waiting for external
events and I/O transfers

Candidate processor may not exist yet. Evaluation may have to be


done on a slowly-executing simulation

Often the best way to obtain a comparison is to use one or more


software benchmarks that adequately represent the application

2015 Steve Kirsch- All Rights Reserved

33

11

11/6/2015

Using Benchmarks to Evaluate


CPU Performance

SYSTEM ENGINEERING

Benchmarking strategy is to obtain (or create if necessary) relatively


simple sequences of code that together represent the most
computationally-intensive algorithms of the application
Requires some insight on the part of the subsystem engineer to be able
to identify these a priori

Use of multiple benchmarks creates an understanding of how well


each candidate does on each algorithm
The best CPU on one algorithm may not be the best on other algorithms
CPU selection will need to be based on balanced design principles,
considering best overall benchmark performance as well as many other
factors
In particular, power consumption will be important for most embedded
applications

Benchmark results may suggest compiler optimizations or even


CPU architectural enhancements that will dramatically improve
performance
2015 Steve Kirsch- All Rights Reserved

34

SPEC: Industry Standard Benchmarks


SYSTEM ENGINEERING

System Performance Evaluation Corporation (SPEC) established in 1989 by


consortium of computer vendors to create standard benchmarks for
computer systems (www.spec.org)
Originally intended to benchmark performance of servers and workstations, using
CPU-intensive benchmarks
Has since expanded to include benchmarks for graphics, Java applications,
client-server models, mail systems, file systems, and Web servers
CPU vendors normally execute benchmark suite and provide documented results
Serious effort made to produce benchmarks that avoid misleading comparisons,
with strictly specified execution rules and reporting requirements

2015 Steve Kirsch- All Rights Reserved

SPEC Benchmark Suites

35

SYSTEM ENGINEERING

SPEC CPU2006 contains two benchmark suites:


CINT2006 for measuring and comparing compute-intensive integer
performance
CFP2006 for measuring and comparing compute-intensive floating point
performance
Performance is expressed as the number of times the benchmark
algorithm can be executed per unit time by the CPU being evaluated
Note: SPEC benchmarks measure the combined performance of the
CPU and its compiler code generation capability

Besides SPEC, many other benchmarks are available, and its


usually feasible to create application specific benchmarks when
needed
SPEC provides a good model of how to construct and use benchmarks
to make fair apples-to-apples comparisons between candidate CPUs

2015 Steve Kirsch- All Rights Reserved

36

12

11/6/2015

EDN Embedded Microprocessor


Benchmark Consortium (EEMBC)

SYSTEM ENGINEERING

Less well-known than SPEC, but more relevant to most embedded


systems
Non-profit consortium supported by member dues and license fees
Real world benchmark software helps designers select the right
embedded processors for their systems
Standard benchmarks and methodology ensure fair and reasonable
comparisons
EEMBC Technology Center manages development of new
benchmark software and certifies benchmark test results
Originally started under the sponsorship of Electronic Design
Newsletter (EDN)
Formed in 1997 to develop meaningful performance benchmarks for the
hardware and software used in embedded systems

2015 Steve Kirsch- All Rights Reserved

37

EEMBC Benchmarks (Partial List)


SYSTEM ENGINEERING

Digital Entertainment

AES
DES
High-Pass Gray-Scale Filter
Huffman Decoding
MP3 Decode
MPEG-2 Decode
MPEG-2 Encode
MPEG-4 Decode
MPEG-4 Encode
RGB to CMYK Conversion
RGB to YIQ Conversion
RSA

Telecom Version 1.1

Autocorrelation
Bit Allocation
Convolutional Encoder
Fast Fourier Transform (FFT)
Viterbi Decoder

2015 Steve Kirsch- All Rights Reserved

38

Signal processing performance evaluation


SYSTEM ENGINEERING

We will need to exploit data level parallelism


SIMD engines in our CPU is good candidate for this type of parallelism
exploitation
SIMD performance cant be evaluated by target agnostic benchmarks
Vectorized libraries are required to utilize SIMD engines.
Use processor vendor characterized library timing to get an estimate of Clock
cycles to process a particular size dataset
Examples:
VSIPL standard signal processing library
Mercury Computer System SAL
Intel MKL

Memory bandwidth to feed CPU critical when using SIMD engines


Evaluate data rates from memory system and CPU memory interfaces
Determine number of processor clocks to move data

Determine the performance driver


Compute Cycles
Memory bandwidth
2015 Steve Kirsch- All Rights Reserved

39

13

11/6/2015

Signal Processing performance evaluation


SYSTEM ENGINEERING

What library function are the important ones?


I/Q
Formation

Pulse
Compression

Clutter
Cancellation

Motion
Compensation

Noise
Estimation

Target
Detection

Doppler
Filtering

PDI
Processing

We will explore this in more detail in the


detail design phase
2015 Steve Kirsch- All Rights Reserved

40

Subsystem level Syntheses (C4): Summary


SYSTEM ENGINEERING

Purpose: To select a subsystem level


functional design.
Many trade studies are required
Modeling and simulations can
be effective tools
Use of SMEs critical to help
focus on key requirements
Conceptual design is an iterated
process.
The Subsystem requirements are
often revised based on the
lessons learned during the
design synthesis process
Requirement flow down and
traceability are key to this
process.

2015 Steve Kirsch- All Rights Reserved

41

The Conceptual Design Process


SYSTEM ENGINEERING

2015 Steve Kirsch- All Rights Reserved

42

14

11/6/2015

Subsystem Design Review


SYSTEM ENGINEERING

Typically done after the system level SRR


Major program milestone

Subsystem Concept Review often is part of the System


level PDR
Objectives: Are very similar to the system level
Determine what need to be done
Establish the baseline for the next design phase
Show how the baseline will meets the requirements

2015 Steve Kirsch- All Rights Reserved

43

Subsystem Design Review: Specific objectives


SYSTEM ENGINEERING

Show an understanding of the complete set of flow down


requirements
Specify derived requirements that constrain the design
Show traceability back to the system requirement

Present the baseline architecture


Where options are still under consideration show multiple approaches that
will be selected from in the PDR stage

Document and review the analysis that lead to the baseline


architecture
Identify risks
Create a risk mitigation plan
Generate a preliminary requirements compliance matrix
Identify the subsystem TPMs (Technical Performance Measures)
2015 Steve Kirsch- All Rights Reserved

Subsystem Design Review:


TPMs Risk mitigation, Requirements Compliance

44

SYSTEM ENGINEERING

TPMs

Identified at this early stage in the design


Will be monitored and tracked at each subsequent design phase (PDR, CDR)
They are an important tool to manage technical risk.
TPMs are dropped and added depends on the uncertainty and risk factors

Risk management plans should be in place for high priority TPMs.


Examples: SWAP, Processing margin
Plans should list tasks that will be done to mitigate the risks with the highest
probability and highest impact

Requirements Compliance Matrix


Shows flow down requirements
Shows derived requirements and linkage back the higher level spec
Shows test method for each requirement

Validation by analysis
Validation at unit test level
Validation in the system integration lab
Validation in a deployed environment

2015 Steve Kirsch- All Rights Reserved

45

15

11/6/2015

Subsystem Concept Design Phase: Summary


SYSTEM ENGINEERING

This early design phase is extremely important


Miss-interpretation of requirements can result in a system
that doesnt meet customer expectations
Design alternatives overlooked can result in sub-optimal
system
Result in a non-competitive system

Risks missed that are discovered later in the design phase


can be very costly
Flawed analyses can result in a system that just doesnt
work
Chances for a success will be greatly enhanced by
following a sound system design process
2015 Steve Kirsch- All Rights Reserved

46

Homework
SYSTEM ENGINEERING

Read: Computers as Components by Wayne Wolf

Section 4.5 Designing with Microprocessors


Section 4.7 System-level Performance

Write a 1 a short discussion answering


What are a few important processing subsystem performance drivers? Discuss
how you would analyze these performance drivers for our Radar embedded
processor case study.

2015 Steve Kirsch- All Rights Reserved

47

16

11/6/2015

SYSTEM ENGINEERING

Engineering 180
Systems Engineering
Embedded Processing Case Study

Lecture 3
May 28, 2015
Steve Kirsch
2015 Steve Kirsch- All Rights Reserved

Outline for Lectures on


Real-Time Embedded Processing

SYSTEM ENGINEERING

Lecture 3
Preliminary Design

2015 Steve Kirsch- All Rights Reserved

Lecture 3: Agenda: Preliminary Design Process


SYSTEM ENGINEERING

Starting point for Preliminary design of the


embedded processing
Hardware Architecture Design
Software Architecture Design
Architecture Performance Analysis
Homework

2015 Steve Kirsch- All Rights Reserved

11/6/2015

The Goal of Preliminary Design


SYSTEM ENGINEERING

Thoroughly understand the system level requirement


Allocate the top level subsystem architecture requirements
Identify the next level down subsystems and interfaces
Flow down subsystem level requirements to these lower level
subsystems

The main output from preliminary design is the allocated


baseline (hardware and software baselines)
Design description and analysis
Requirements flow-down traceability
Draft of a test compliance matrix
PDR -- Preliminary Design Review

2015 Steve Kirsch- All Rights Reserved

Nested Design Process for Complex Systems


System design will have completed preliminary design

SYSTEM ENGINEERING

Nested
Design Process for Complex Systems
Major subsystem interfaces defined

Behavioral functionality of the subsystems defined


Allocation of SWAP to subsystems defined
Allocation of illities to subsystems defined
Master Develop Plan updated with more detail

This step completed

We are at this step

2015 Steve Kirsch- All Rights Reserved

Preliminary Design Process


SYSTEM ENGINEERING

P1. Subsystem
Requirements Analysis
Preliminary subSystem
Architecture
P2. Requirements
Allocation

P3. Interface
identification/design
P4. Subsystemlevel synthesis
P5. Preliminary
design review

To detailed design
2015 Steve Kirsch- All Rights Reserved

11/6/2015

Preliminary Design Phase: Embedded Processing


SYSTEM ENGINEERING

System design in detail design phase

Embedded subsystem concept design

Specs revised as system design evolves


System design risks in process of being
mitigated
Analysis results becoming available
Discovery of new unexpected problems
arising

Customer requirements potentially


changing
SOW changes

Rough idea of interface requirements


Rough idea of processing algorithms
Baseline architecture
Course performance analysis
Risks identified and mitigation plan
defined

Due to cost and schedule updates

System test and integration details


developing
Impacts on subsystem design
New requirements
This is our starting point for the Embedded
Processing Preliminary design phase
2015 Steve Kirsch- All Rights Reserved

Embedded Processing Preliminary Design:


Requirements Analysis

SYSTEM ENGINEERING

Coherent GMTI Processing Algorithm

Drill down into processing algorithms (focus on the stressing pieces)


Algorithm laydown on target architecture required to get a more precise
estimate on performance

Programming model selected, initial target processor selected

Interface specification detailed

All radar waveforms finalized by system CDR (not quite there yet)
Explore the full range of variability on interfaces

Functional behavioral descriptions detailed

Functional capabilities assessed and renegotiated with system team

2015 Steve Kirsch- All Rights Reserved

Processor Block Diagram:


From Concept Design Phase

SYSTEM ENGINEERING

Signal Processing Modules

PCIe x8

High Speed
point to point
mesh
network

sFPDP x8

REX
Data
I/F
REX
Cntrl
I/F

10 Gb Ethernet

2015 Steve Kirsch- All Rights Reserved

Ethernet
Controllers

System I/O

Custom
I/F

Control Processing Module

11/6/2015

Signal Processor Processor:


IBM Cell Processor

SYSTEM ENGINEERING

The Cell multi-core Processor was a combined development between Sony, Toshiba
. and IBM
First app was the Sonys PlayStation 3
First chips (90 nm version) available in 2005
65nm version in 2007 and 45nm version in 2009 (first chip used in Sony play station)
Chip performance was way ahead of its time in 2005

This attracted the attention of the Radar embedded processing team!

2015 Steve Kirsch- All Rights Reserved

10

Theoretical Peak Performance in Ops/sec


SYSTEM ENGINEERING

2015 Steve Kirsch- All Rights Reserved

11

IBM Cell Processor: Chip Spec


SYSTEM ENGINEERING

2015 Steve Kirsch- All Rights Reserved

12

11/6/2015

IBM Cell Features

2015 Steve Kirsch- All Rights Reserved

IBM Cell 8 SPE: High Performance Engine


Ideal for this Radar Application

2015 Steve Kirsch- All Rights Reserved

SYSTEM ENGINEERING

13

SYSTEM ENGINEERING

14

Why is Cell Processor So Fast


SYSTEM ENGINEERING

2015 Steve Kirsch- All Rights Reserved

15

11/6/2015

Basics of Parallel Programming Models:


SIMD Single Instruction Multiple Data Model

SYSTEM ENGINEERING

Wikipedia:
Single instruction, multiple data (SIMD), is a class of parallel computers in Flynn's
taxonomy. It describes computers with multiple processing elements that perform the
same operation on multiple data points simultaneously. Thus, such machines
exploit data level parallelism, but not concurrency: there are simultaneous (parallel)
computations, but only a single process (instruction) at a given moment.

Register File
Instruction
Register
Decoder

ALU

ALU

ALU

ALU

2015 Steve Kirsch- All Rights Reserved

Fundamentals of utilizing the multi-processor


capability of the embedded processing system

16

SYSTEM ENGINEERING

TLP (Thread Level Parallelism) provides the opportunity to


achieve high levels of parallelism in the sensor processing
domain
However data movement between threads if not done correctly could be
the kiss of death
System performance can be brought to a halt waiting for data to be moved
or reorganized across a parallel processing architecture

When utilizing TLP concurrency, data synchronization, and data


reorganization is key to performance!

So what is concurrency, data synchronization and data reorganization?

2015 Steve Kirsch- All Rights Reserved

Concurrency
Does it Imply Parallelism?

17

SYSTEM ENGINEERING

Sequential program
A single thread of control that executes one instruction at a time
Next instruction isnt executed until the prior one has completed

Concurrent program
A collection of autonomous sequential threads executing logically in
parallel

Concurrency is not necessarily parallelism


Interleaved Concurrency
Logically simultaneous processing
Interleaved execution on a single processor

Parallelism
Physically simultaneous processing
Requires a multi-processor not just a multi-threaded single processor

2015 Steve Kirsch- All Rights Reserved

18

11/6/2015

Data Synchronization
SYSTEM ENGINEERING

All possible interleaving of threads wont necessarily


lead to a correct program!
Concurrent programs require synchronization so that
data produced by one processing step wont be
consumed until a complete coherent set of data is
stored.
Synchronization serves two purposes
Thread safety for access to shared resources
Avoids race conditions

Coordinates actions of threads


Parallel computation
Event notification
2015 Steve Kirsch- All Rights Reserved

19

Data Organization
SYSTEM ENGINEERING

To efficiently process very large datasets when


utilizing thread level parallelism the data must
be organized in distributed memories so that it
can be accessed at the highest possible rate

Coherent GMTI Processing Algorithm

2015 Steve Kirsch- All Rights Reserved

20

Data Organization
Inter-process Communication FundamentalsSYSTEM ENGINEERING
Parallel programs need to share data and results processed by
different processors. There are two typical ways to pass data
Shared memory Architecture
Message passing architecture

Share memory
Architecture

PROCESSOR

Message Passing
Architecture

PROCESSOR

PROCESSOR

Processor
+ memory

Processor
+ memory

PROCESSOR

Processor
+ memory

Interconnection
Network

GLOBAL
MEMORY
PROCESSOR

Processor
+ memory

Processor
+ memory

PROCESSOR

2015 Steve Kirsch- All Rights Reserved

21

11/6/2015

Processing Domains
SYSTEM ENGINEERING

Processing Domains refers to the organization of data in memory


Example shows data organized for processing in Fast Time Dimension

Channel
T7
T5
T3
T1

Thread 1

Slow Time

T6
T4
T2

Thread 0

T0

Fast Time
Data in Sequential Memory Locations
2015 Steve Kirsch- All Rights Reserved

22

Data Corner Turn


SYSTEM ENGINEERING

Data Cube was rotated in


around Slow time Fast time
plane
T7
T5
T3
T1

T6

Thread 1

Fast Time

Each thread now requires


different data within its
virtual address space

T4

Data must be moved


between these address
spaces

T2
Channel

Data in Sequential Memory Locations


2015 Steve Kirsch- All Rights Reserved

T0

Thread 0

Slow Time

6/11/2015

23

Identify the Processing Domains


SYSTEM ENGINEERING

Best functional performance achieved by processing all signal


processing steps within a single domain prior to redistributing data

2015 Steve Kirsch- All Rights Reserved

24

11/6/2015

Embedded Processing: Software Architecture


Signal Processing Programming model

SYSTEM ENGINEERING

Baseline target hardware attributes


Based on performance analysis so far in the design process

SIMD engines (Data Level Parallelization)


DMA engines (Data movement) (Recall discussion from last lecture)
Multi processors (Thread level Parallelization)
High bandwidth main memory
High bandwidth network interfaces (Point to Point simultaneous data flow)

Desired Programming Model Attributes


Can explicitly express an algorithms available parallelism
Can exploit the hardware attributes
Can hide or isolate low level programming details so that the application
programmer doesnt need to be concerned with things that can be automated
Can express when to utilize shared memory (fast memory) communication
Can express when to utilize message passing (slower memory) communication

2015 Steve Kirsch- All Rights Reserved

25

Parallel Programming Approaches


SYSTEM ENGINEERING

Auto-vectorization for data level parallelism (DLP) extraction has been difficult to
automate

Many attempts (Intel C++ compiler, GCC, Green Hills Multi tools)
Experience shows these tools arent particular good

Source to Source compilation for Thread level parallelism (TLP) extraction

Still a big research area (too risky for our case study)

A Plethora of Programming Languages and parallelism abstracting compilers have


been developed

Most focus on a particular form of parallelism or architecture

Shared memory
Message passing
GPU specific architecture

-- Data Level Parallelism


-- Thread Level Parallelism

Graph Programming Model for parallel programming has proven to be particularly


good for the sensor signal processing domain
2015 Steve Kirsch- All Rights Reserved

26

Programming Model:
Directed Acyclic Graphs (DAG)

SYSTEM ENGINEERING

Wikipedia Definition

In mathematics and computer science, a directed acyclic graph (commonly abbreviated to


DAG), is a directed graph with no directed cycles. That is, it is formed by a collection of
vertices and directed edges, each edge connecting one vertex to another, such that there is
no way to start at some vertex v and follow a sequence of edges that eventually loops back
to v again.
Example:

5
10
11
3

2
9
7

2015 Steve Kirsch- All Rights Reserved

27

11/6/2015

Programming Model:
Directed Acyclic Graphs (DAG)

SYSTEM ENGINEERING

Put vertex with no inputs on left and no output on right and those with both input and
output in the middle provides a more intuitive data flow diagram

Acyclic nature becomes obvious

5
10

11

2
9
7

10

8
2

7
11

2015 Steve Kirsch- All Rights Reserved

28

Signal Processing Programming Model Is Derived From DAGs


SYSTEM ENGINEERING

Directed Acyclic Graph DAG methodology is a perfect match for signal processing abstraction

DAG are a good method for expressing parallelism and data flow relationships
Signal Processing Programming model is focus on exposing parallelism and processing precedence
relationships

A vertex represents a signal processing function(s) and directed edges are the data flow path
from one processing step to the next
The Acyclic nature of DAG is key to achieving an efficient processing structure

The invocation of the processing at a vertex is only dependent on the input data availability
Once processing at a vertex has been invoked it will run to completion uninterrupted
Data flows through the processing steps at the rate solely determined by the latency of the processing at
each vertex

An efficient DAG will perform as much processing as possible in a single vertex

Data should only be pass to a downstream vertex if new dependencies exist

Poor Design

1+2
Good Design

2015 Steve Kirsch- All Rights Reserved

Graph Programming Model


Derived From Directed Acyclic Graphs (DAG)

29

SYSTEM ENGINEERING

DAGs forms the basis for the Graph


Programming Model
Used to express
DLP (data level parallelism)
TLP (threal level parallelism)
Precedence relationships
Data Reorganization

2015 Steve Kirsch- All Rights Reserved

30

10

11/6/2015

Design the signal processing graph:


Analyze Preformance

SYSTEM ENGINEERING

Utilize the graph programming model to express DLP


Groups all the processing within a single processing
domain into a jobClass
jobClass will have multiple instances called jobs that can
consume DLP
Jobs will utilizing the multi-cores capability of the single
Processor

Utilize the graph programming model to express TLP


Group multi data independent jobClasses into a subgraph
Subgraphs will be allocated to groups of processors and
will run in parallel utilizing multiple Processors

2015 Steve Kirsch- All Rights Reserved

31

GMT Functional Requirements Allocation


JobClasses and Subgraphs

SYSTEM ENGINEERING

Graph design for GMT mode


Allocation of processing functions to jobClasses based on corner turn
boundaries and subgraphs based on TLP opportunities

2015 Steve Kirsch- All Rights Reserved

32

Mode laydown in Graph Programming Model


SYSTEM ENGINEERING

Process of a swath is requires1 dwell of radiation and collection


Multi-dwell are collected back to back with no gaps in collection time

Processing of 1 dwell of data requires 2 Graphs


CPI Graph
Coherent processing
1 subgraph per receive channel

PDI Graph
Post Detection Integration processing
1 subgraph per graph

CPI Graph
Ch0
Subgraph 0
Ch 1
Subgraph 1

CPI Graph
Subgraph 0

Ch 2
Subgraph 2
Ch 3
Subgraph 3

2015 Steve Kirsch- All Rights Reserved

33

11

11/6/2015

Subgraph Laydown on Target Hardware


SYSTEM ENGINEERING

Real-time constraints identified


subgraph 0-3 processing time < Dwell time
subgraph 4 < 2*Dwell time
Dwells

P2,1

PDI subgraph 0

P2,0

PDI subgraph 0

PDI subgraph 0

P1,1

CPI subgraph 3

CPI subgraph 3

CPI subgraph 3

P1.0

CPI subgraph 2

CPI subgraph 2

CPI subgraph 2

P0,1

CPI subgraph 1

CPI subgraph 1

CPI subgraph 1

P0,0

CPI subgraph 0

CPI subgraph 0

CPI subgraph 0

Processing
for Dwell 0

Processing
for Dwell 1

Processing
for Dwell 2

PX,Y

2015 Steve Kirsch- All Rights Reserved

Preliminary design review

X= module number
Y= Processor number
34

SYSTEM ENGINEERING

Objectives:
Make sure the functional baseline
requirements have been adequately
addressed by the preliminary design

Physical architecture
Interfaces
Subsystem functional requirements
Real-time constraints
SWAP
illities

Key documents:

Subsystem description
Interface control documents (ICDs)
Preliminary Timing Analysis
Requirements traceability
Draft Requirement Compliance Matrix
Design review package

2015 Steve Kirsch- All Rights Reserved

Oh No! Huston we have a problem!


We arent making real-time requirement

35

SYSTEM ENGINEERING

Updated analysis just prior to PDR found

subgraph 0-3 processing time > Dwell time


Each dwell processing is following further and further behind!!

Next lecture will address this problem

2015 Steve Kirsch- All Rights Reserved

36

12

11/6/2015

Homework
SYSTEM ENGINEERING

Read Paper:
Hybrid processor architectures meet demands for SWaP
By John Keller
Available in CCLE 15S-ENGR180-1 Information Folder

Write a 1 page discussion answering

What are the pros and cons of using a hybrid processor architecture for our case study of
a Radar embedded processor?
Is a hybrid architecture a good potential solution to resolve our processing timeline
issue?

Read Paper: HPEC2012 Kirsch.pdf


Graph Programming Model: An Efficient Approach for Sensor Signal Processing
By Steve Kirsch

Available in CCLE 15S-ENGR180-1 Information Folder

2015 Steve Kirsch- All Rights Reserved

37

SYSTEM ENGINEERING

Backup Slides

2015 Steve Kirsch- All Rights Reserved

Embedded Processing Use Case:


Preliminary design

38

SYSTEM ENGINEERING

Hardware Architecture
Signal Processing Software Architecture
Performance of the Architecture
To get good performance requires a system
approach
Lets drill down into the architecture

2015 Steve Kirsch- All Rights Reserved

39

13

11/6/2015

IBM Cell Processor Component 1 PPE


SYSTEM ENGINEERING

2015 Steve Kirsch- All Rights Reserved

Signal Processing Stressing Algorithm:


Understand Behavioral Requirements

40

SYSTEM ENGINEERING

Expanded view of the GMT algorithm defined in the


conceptual design phase

2015 Steve Kirsch- All Rights Reserved

Functional Behavioral Requirements:


Parameterizing the variability

41

SYSTEM ENGINEERING

Drilling down in the functional behavior of the processing steps


Parameterize the functionality based on the system waveform definition

2015 Steve Kirsch- All Rights Reserved

42

14

11/6/2015

Signal Processing Libraries and Performance


SYSTEM ENGINEERING

SIMD architecture can utilized via two methods


1.
2.

Standard programming languages (eg C++) if compiler technology supports automated vectorization
of code
Predesigned Signal processing libraries

Todays compiler technology is very poor at automated vectorization of code


Best choice today is the use of Signal Processing libraries

Signal processing libraries are target dependent code written utilizing SIMD instruction sets

SIMD instruction sets

Are basically assembly level code that can access the ISA (instruction set architecture) of the target
processor
Examples of SIMD instruction set are:
AltiVec PowerPc architecture
SSE x86 architecture
SPE intrinsics IBM Cell SPE

Signal processing libraries are implemented with a SIMD instruction set

Examples of Signal processing libraries


Mercury SAL
VISPL (http://www.omgwiki.org/hpec/vsipl)
LAPack
BLAS
FFTW

2015 Steve Kirsch- All Rights Reserved

43

15

11/6/2015

SYSTEM ENGINEERING

Engineering 180
Systems Engineering
Embedded Processing Case Study

Lecture 4
June 2, 2015
Steve Kirsch
2015 Steve Kirsch- All Rights Reserved

Outline for Lectures on


Real-Time Embedded Processing

SYSTEM ENGINEERING

Lecture 4
Detailed Design / Integrations and Test
2015 Steve Kirsch- All Rights Reserved

Lecture 4: Agenda: Detailed Design Process


SYSTEM ENGINEERING

Starting point for detailed design of the embedded


processing
Hardware Architecture Design Improvement
Software Architecture Design Improvement
Detailed Performance Analysis
Detail Design and CDR
Test and Integration

2015 Steve Kirsch- All Rights Reserved

11/6/2015

The Goal of Detail Design


SYSTEM ENGINEERING

Synthesize the detail design

Fully define all interfaces

Fully define the functional behavior of all subcomponents

Detailed analysis of performance

Update of TPMs

Detail analysis of SWAP and illities

Define test and integration approach

Refine recurring cost estimate


Refine non-recurring cost estimate and development schedule
The main output from detailed design is the baseline design (hardware and software
designs)

Design description and analysis

Requirements flow-down traceability updated

Test compliance matrix and test procedure documents

CDR -- Critical Design Review

2015 Steve Kirsch- All Rights Reserved

GMT Functional Requirements Allocation


JobClasses and Subgraphs From Prelimary Design Phase

SYSTEM ENGINEERING

Graph design for GMT mode


Allocation of processing functions to jobClass and subgraphs

2015 Steve Kirsch- All Rights Reserved

PDR:
Performance was identified as a big risk!

SYSTEM ENGINEERING

Reported at PDR

Subgraph 0-3 processing time > Dwell time

Processing time will be longer then the collection time thus not keeping up with
real-time
Data
Processing
Data
Collection

1
1

2015 Steve Kirsch- All Rights Reserved

2
2

11/6/2015

Processor Block Diagram:


Preliminary Design Had Margin For Growth

SYSTEM ENGINEERING

Processor Enclosure
5 module slots available
4 used in baseline + 1 spare

Sufficient spare prime power


Sufficient total power
dissipation margin
Weight limit can accommodate
a module in spare lot

Signal Processing Module

Sufficient board real-estate for


additional components
Power regulation could
accommodate additional
components

System Design

Has insufficient SWAP for an


additional processor enclosure
Processing subsystem firm
requirement

2015 Steve Kirsch- All Rights Reserved

Performance Growth:
Source of real-time performance issue
SYSTEM ENGINEERING
Doppler Tune preliminary assessment accounted only
for the application of the tuning parameters
Generation of tuning parameter computation initial ignore
resulted in a big unaccounted processing load

Pulse compression estimate greatly increased


Performance was dominated by data movement not
computation cycles
Analysis focused on computation cycles

Main memory bandwidth became a bottleneck for


many processing steps
Initial analysis didnt account for simultaneous data flow of
REX data to main memory and data produced between
processing steps
2015 Steve Kirsch- All Rights Reserved

Deeper Dive into Radar Processing


SYSTEM ENGINEERING

Pulse compression basics


Pulse energy is transmitted as a long pulse due
to limitation on transmitter total instantaneous
output power
Signal processing compresses return signal for
better range resolution
Total energy in long pulse = compressed pulse

Signal processing consists of passing the


signal through a matched filter
Pulses are phase coded for better
compression
LFM
Barker codes (Discrete Phase Codes)
Arbitrary phase and amplitude codes
2015 Steve Kirsch- All Rights Reserved

Linear Freq Modulation


Pulse Compression of
a phase coded pulse

11/6/2015

Intuitive Approach to Pulse Compression


Match Filter Utilizing AutoCorrelation
Transmitted pulse Tx

T0 T1 T2 T3

SYSTEM ENGINEERING

So

T0 T1 T2 T3

S1

T0 T1 T2 T3

S2

T0 T1 T2 T3

Time-shifted
Replicas of Tx(Sn)

S3

T0 T1 T2 T3

S4

T0 T1 T2 T3

S5

Received energy Rx

R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11R12 R13 R14 R15

Convolution function
of Tx(sn) with Rx

S0 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10S11 S12S13S14S15

Convolution / Correlation =
Time-shift replicas of Tx(sn)

Rx

= Dot Product

So =To*Ro + T1*R1 + T2*R2 + T3*R3


S1 =To*R1 + T1*R2 + T2*R3 + T3*R4
S2 =To*R1 + T1*R2 + T2*R3 + T3*R4 and so on
2015 Steve Kirsch- All Rights Reserved

10

Pulse compression math


SYSTEM ENGINEERING

Pulse compression is achieve by performing continuous time domain


convolution

Discrete form of the convolution

Definition of the convolution theorem


where

denotes the Fourier transform of

Therefore one can do a Fast Convolution


=

2015 Steve Kirsch- All Rights Reserved

11

Discrete Fourier Transform (DFT)


SYSTEM ENGINEERING

2015 Steve Kirsch- All Rights Reserved

12

11/6/2015

DFT is Computational Intensive


SYSTEM ENGINEERING

Given a set of N complex input samples, xn where n = 0, N-1, the DFT filters are:
N 1

Fm W 'mn xn , where m 0, N 1
n0

and W 'mn exp(

2j mn
2j
) W mn , where W exp(
)
N
N

Assuming the W factors can be pre-computed, an N-point DFT requires:


N2 complex multiplies + N2 complex adds
complex mult = 6 real ops (4 multiples + 2 adds)
complex add = 2 real ops
For example, a single 1024-point DFT takes: (1024^2)* 8 = 8e6 ops
Straightforward computation of N-point DFT requires
~N2 complex multiplications and ~N2 complex additions
for a total of ~8N2 real arithmetic operations

2015 Steve Kirsch- All Rights Reserved

An Algorithm for Rapidly Computing DFTs


The Fast Fourier Transform (FFT)

13

SYSTEM ENGINEERING

FFT is a clever algorithm for computing DFTs, published by Cooley


and Tukey in 1965

It takes advantage of a lot of symmetry in the computation thus reducing the


number of operations by a lot.

N point FFT ops = N/2(Log2N)* 10 ops

1024 point FFT = 51200 ops

FFT results numerically identical to those of the corresponding DFT


(not an approximation)
Advantage of FFT grows with increasing DFT size

2015 Steve Kirsch- All Rights Reserved

14

Pulse Compression by Fast Convolution


SYSTEM ENGINEERING

Time domain convolution


Assume 1024 complex floating point collected
samples
Assume pulse width of 256 sample
Time domain convolution
FLOPs = 256 (complex multiplies) * 1024 collected samples
= 256 * 8 FLOPs * 1024
= 2,097,152 FLOPs
Fast Convolution N= 1024
FLOPs = ( (N/2*Log2N)*10 FLOPs ) *2 (forward and inverse)
= 10,240 FLOPs

2015 Steve Kirsch- All Rights Reserved

15

11/6/2015

Class Group Discussion


SYSTEM ENGINEERING

Using your newly acquired embedded system engineering skills, how


would you attack the processing performance risk we have identified in our
case study?
Divide into 4 teams
15 minutes to discuss
Nominate a spokesman for your group

Create an approach for resolving the performance issue

What are the trades to consider

Address the root causes of the performance issues


Utilize your knowledge from the last homework assignment
Root cause of real time performance issues

Doppler Tune preliminary assessment accounted only for the application of the tuning parameters

Pulse compression estimate greatly increased

Generation of tuning parameter computation initial ignore resulted in a big unaccounted processing load
Performance was dominated by data movement not computation cycles

Main memory bandwidth became a bottleneck for many processing steps

Initial analysis didnt account for simultaneous data flow of REX data to main memory and data produced between
processing steps

2015 Steve Kirsch- All Rights Reserved

16

Attaching Real-time Performance issue


SYSTEM ENGINEERING

Was performance analysis correct? Is a more


thorough analysis needed?
Understand real-time system requirements

How tightly spaced spots are required?


Can the processing fall behind and then catch up?

Could system requirement be modified in some way


without major system performance impact to resolve
the embedded processing limitation
Could system requirements allocation be modified

Could Pulse compression processing be done in the REX prior to sending data to processor?

Once system solutions appear to be a deadend, then focus on subsystem solutions

Can margin that was planned to reduce risk later in the program be used now to solve this performance
problem?
If spare slot is used for an additional signal processing card, will it solve the performance issues?
What are the options for increasing throughput and memory bandwidth on signal procession card?

Increase development cost and NRE might be a big driver for solution

Performance Trade-studies and risk analysis affects cost assessments

Next lets look at the trades and results


2015 Steve Kirsch- All Rights Reserved

17

Performance Trades:
Second Look at Requirements and Assumptions

SYSTEM ENGINEERING

Modifying System Requirements and allocations werent acceptable

Tailoring system requirements to only the address specific known Radar mode was
deemed a poor choice
Design requirement to accommodate new undefined applications is very important

Adding additional processing units to the system

Though this approach could meet the SWAP requirements of the first application of the
system it was deemed too expense and would exceed the SWAP for other potential
applications.
Partitioning a mode across multi units given limited box to box bandwidth potentially
wouldnt solve all the performance issues

Utilizing spare slot for the additional performance would violate the
processing margin requirement

Intent of spare is for future programs and risk reduction during test and integration phase

Best option was to increase signal processing module performance within


the module SWAP allocation

Program resources could be reallocated (ie. $$ and schedule and engineering talent)
Module SWAP margin was a lower risk and margin could be used earlier in program

Next step is trade studies for best way to improve module performance
2015 Steve Kirsch- All Rights Reserved

18

11/6/2015

Review of Performance Analysis


SYSTEM ENGINEERING

Error in performance analysis discovered!

Programming model not well understood by the engineer doing performance modeling

Key aspect of programming model utilizes DMA and double buffering to


parallelize data movement with computation cycles
Use of DMA requires target specific software design
Dataset
Data dependent
processing domain

t1

t5
t6

Pong
Buf

t7
t8

Time
DMA to Ping

Ping
Buf

t2
t3
t4

Data independent
processing domain

t1

DMA to Pong
Processing

t3

t1

t5
t4

t2
t2

t3

t7
t6

t4

Ping Pong Buffer

t5

t8
t6

t7

t8

Processing is fully parallelized with


data movement if compute cycles
take same amount of time as data
movement

This technique of overlapping data movement with processing is called tiling


2015 Steve Kirsch- All Rights Reserved

Data Movement vs Throughput


In Determining Performance

19

SYSTEM ENGINEERING

FFT example
N=1024 Complex Floating Point Samples
Total Flops to perform pulse compression via fast convolution =
10,240 FLOPs
Assume CPU executes 1 FLOP/ns
Fast convolution time = 10,240 FLOPs / (1FLOP/ns)
= 10.24 sec
Assume memory bandwidth = 100MB/sec
Complex floating point sample = 8 bytes

Data movement time = 1024 * 8 Bytes * 2 (in and out) / 100MB/sec


= 16 sec
Data movement time is longer than computation time
Overall processing time driven by data movement time
2015 Steve Kirsch- All Rights Reserved

20

Trade-study results
SYSTEM ENGINEERING

Analysis error accounted for only a small fraction of the performance issue
Re-allocation of processing requirement from Cell and a new Front-end processor looks
promising

Front-end high data rate processing characteristics


Very few processing functions require > 50% of processing performance
FIR (Finite Impulse Response) filter for IQ formation or IQ calibration
Phase ramp generator and complex multiple
Large FFTs

Large data rate reduction after front-end processing (reducing processing load on following stages)
Application specific design tends to have the highest performance per SWAP

Trades Conclusion

Additional investment to develop application specific solution for front-end processing functions
FPGA (Field Programmable Gate Array) solution best choice (other contender, GPGPUs and DSP specific
COTS chips)
Biggest bang for the buck!
Front-end processing fairly consistent between different mode applications
Greatly reduces load on IBM Cell

Add more on module memory bandwidth


Decouple REX data ingest with rest of IBM Cell processing

2015 Steve Kirsch- All Rights Reserved

21

11/6/2015

Processor Block Diagram:


Update from Preliminary Design Phase

SYSTEM ENGINEERING

New Features ( Distributed GBM, Front-end Processor)


Signal Processing Module

Signal Processing Module

Main
Memory
CPU
IBM Cell

Distributed
Global
Bulk
Memory

Front-end
Processor

Network
Interface
Controller

Main
Memory

Signal Processing Module

Main
Memory

CPU
IBM Cell

Distributed
Global
Bulk
Memory

CPU
IBM Cell

Distributed
Global
Bulk
Memory

Front-end
Processor

Network
Interface
Controller

Front-end
Processor

Network
Interface
Controller

sFPDP x8

High Speed
point to point
mesh
network

REX
Data
I/F
REX
Cntrl
I/F

Ethernet
Controllers

10 Gb Ethernet

Custom
I/F

System I/O

Control Processing Module

2015 Steve Kirsch- All Rights Reserved

22

Solution is a Hybrid Architecture


SYSTEM ENGINEERING

Front-end Processor functions


FIR filter (I Q formation / Calibration)
Phase ramp generator and complex multiplier
Large efficient FFT

Very large
Computational
intensive functions

Front-end Processor implementation


Application Specific FPGA (Field Programmable Gate Array) based design
High memory bandwidth memory interface
Designed as an offload engine

GBM functions
REX data store in GBM instead of Main memory
Decouples high bandwidth REX interface from impacting Cell computations

Front-end processor access data directly from GBM


Reduces competition for main memory bandwidth between processor types
Hybrid design address all three of the key performance issue in available SWAP
1) Doppler tuning parameter generation
2) Large FFT computational speed
3) Memory bandwidth limitations
2015 Steve Kirsch- All Rights Reserved

23

Lessons Learned
SYSTEM ENGINEERING

Fully understand all requirements as thoroughly as possible and as early in the


design process as possible

Hardware requirements
Software requirements
Interaction between hardware and software

Perform as thorough of a performance analysis as earlier as practical

Problems discovered later in the design process are much more costly (e.g. If
performance issues were found in integration the fix would have been very expensive)

Explore higher level requirement as well as lower level allocation when


resolving issues

Though in this case we werent able to change the system requirements it was worth
exploring

Use risk analysis when performing performance trades

A lower cost solution might have been to give up design margin, but the consequences
were too high and the probability of an occurrence wasnt low enough

Often application specific designs are general enough to have wide


applicability if scope is limited

Application specific designs can be more SWAP efficient then general solutions, but are
in general more costly

2015 Steve Kirsch- All Rights Reserved

24

11/6/2015

Integration and Test


SYSTEM ENGINEERING

Requirements flowdown and allocation to subsystems


includes requirement validation documentation
Requirements Compliance Matrix -- specifies the test
method

Deployed system field test


System Integration Lab (SIL) test
Unit level test
Analysis
Inspection

Increasing complexity
and cost of validation

Test Description Document


Detail description of tests and support equipment required to do the
test

Test Procedure Document


Specifies how to do the test and expected results
2015 Steve Kirsch- All Rights Reserved

25

Integration and Test


SYSTEM ENGINEERING

Key concepts to keep in mind when planning for integration


and test
Sufficient visibility for unit testing and system integration lab testing

Is there support for inspecting memory


Is there support for monitoring system state while in operation
Is there support for monitoring bus activity
Is there support for monitoring operation of application specific
implementation (eg. Inside of an FPGA)

Real-time debug tools for unit test and system integration lab
Does the IDE (Integrated Development Environment) support nonintrusive monitoring of OS and application software (example next slide)

System Level Instrumentation (Support for both SIL and Field testing)
At the full system level are there sufficient interfaces and capability
provided for non-intrusive real time access
Are there sufficient support for data reduction tools
Sorting and understanding of the data of interest

2015 Steve Kirsch- All Rights Reserved

Example of an IDE
Real-time Non-Intrusive Debug Tool

26

SYSTEM ENGINEERING

Intel Vtune Performance Profiler


Hotspot (statistical call tree),
call counts (statistical)
Thread profiling with lock
and waits analysis
Cache miss, bandwidth
analysis
OpenCL kernel tracing &
GPU offload on Windows*

2015 Steve Kirsch- All Rights Reserved

27

11/6/2015

Example of an IDE
Real-time Non-Intrusive Debug Tool

SYSTEM ENGINEERING

Green Hills IDE Event Analyzer

EventAnalyzer displays the length and frequency of RTOS and user events, making it quickly apparent what
operations take the most time and where optimization efforts should be focused

2015 Steve Kirsch- All Rights Reserved

28

Critical Design Review CDR


SYSTEM ENGINEERING

CDR purpose
Final design review prior to the official acceptance of the design
Opportunity for all stake holders to assess designs compliance to requirements
Opportunity to review risk assessment and mitigation results
All risks should be well understood and accepted at this time

To force detail design documentation effort


To refine Non-recurring and recurring costs

Goal of a CDR
- Demonstrate the design meets the functional and performance requirements
- Assures the test and evaluation strategies, procedures and support are in place
for the next development phase
- Establishment of the Product Baseline
Successful completion of CDR is the green light for the next development phases
- Building Hardware
- Writing of Application Software
- Unit test
- System Test
2015 Steve Kirsch- All Rights Reserved

Embedded Processing Case Study


Summary

29

SYSTEM ENGINEERING

Last 4 lectures stepped through the design development process for embedded
processing design

Concept development
Preliminary design
Detailed design
Integration and Test (briefly)

Case study utilized real application for real-time high performance embedded
processing in a highly SWAP constrained environment
Goal was to provide insight to the system engineering process and the myriad of
complexities that the embedded system engineer needs to be aware of and the skill
set required

Final take away on the role of an Embedded Processing Engineer


1) Requires board technical knowledge of both hardware and software technologies
2) Requires excellent team skills
3) There is no system design process that can replace experience!
4) High demand for engineers with this skill set!
Embedded Subsystem Engineers Job is Very Challenging and Very Rewarding
2015 Steve Kirsch- All Rights Reserved

30

10