You are on page 1of 22

Soft Real Time and High Availability, the AXE approach

AXE applications Control system structure Hard real time vs. soft real time Event driven execution, soft real time and parallel processes Fault tolerance and recovery Upgrade Scalability Operation under overload (separate presentation)
2004-10-18 1

AXE Applications in Telecom Networks


ADSL AN IN SCP

T
R
CCN

I
I

L
ILR
SCP HLR

L
HLR

FNR
SCP

GSM MS BTS

BSC
PCU

MSC
GGSN

GMSC
MSC

GMSC MSC TDMA

SGSN MGW

GPRS
CSCF HSS

Internet IPMM/SIP

MS T G

AN
TeS

3G UMTS

RNC

AS

ATM Back bone

OSS
2004-10-18 2

AXE TSP CPP WPP AXD EAR TMOS/CIF

AXE Control System Structure


Central Central Processor Processor HDLC/Ethernet
Regional 1024 Processor

<

Adjunct Adjunct Pro,( I/O) Pro,( I/O) Application Hardware DP

2004-10-18

AXE Hard Real Time vs. Soft Real Time


Soft Real Time
Central Central Processor Processor

Hard Real Time

~1ms
Adjunct Adjunct Pro,( I/O) Pro,( I/O)

Regional Processor

Application Hardware

DP

2004-10-18

Soft Real Time: Event (Signal) Driven Execution


External Events
Internal Events

Subject to Load control

Event Buffer (Typically 2 us/event)

SW Unit Response ms level

2004-10-18

Soft Real Time: Event (Signal) Driven Execution


Each event executes until next event is generated, that is not processes interrupted by a time sharing system The execution time is limited by design rules (and checks) The number of internal events is known at system design The occupancy level of the event buffer is subject to load control All share the same level

2004-10-18

Soft Real Time: Event (Signal) Driven Execution


=> The response time is well controlled, even at high load or overload => All events are intermixed and share the processing time on equal terms => no starvation => Requires very fast context switching!

2004-10-18

AXE HW Redundancy

Central Central Processor Processor HDLC/Ethernet


Regional Processor

Adjunct Adjunct Pro,( I/O) Pro,( I/O) Application Hardware DP

2004-10-18

AXW HW redundancy
RP: Duplicated with simple fail over, or pooled. Data loss (only temporary data) AP(I/O): Duplicated, secure data on RAID disks CP (classic systems): Duplicated, synchronous mode with transparent fail-over CP (modern systems): Duplicated, non synchronous, warm stand-by with possibility for Soft Side Switch for maintenance purposes (repair and upgrade)

2004-10-18

AXE HW Redundancy, Soft Side Switch


A-side memory Write Transfer all pages LOOP: Transfer all modified pages UNTIL Hot area stable; HALT execution; Transfer hot area; RESUME on B-side;
2004-10-18 10

B-side memory

Frequent write

AXE SW Recovery.
SW recovery actions are : - Selective, depending on severance, possibility to recover and system state (history) - Coordinated/consistent all over the system

2004-10-18

11

AXE SW Recovery. Levels and Escalation


No action / register irregularity Perform low level recovery = single transaction (a call) fails Suppressed/delayed system restart, raise alarm Small system restart = transactions in dynamic states are lost (not established calls are lost,established are checked Large system restart = all transactions are lost (all calls) Large system restart with reload from back-up copy Large system restart with reload from old back-up copy Escalation to next restart level if a problem recurs
2004-10-18 12

AXE SW Recovery. Low Level Recovery


An identity (ID) is tied to each resource included in a transaction, typically a call or a command. The processing platform provides support for creation of ID and linking to application SW. In case of an execution error, the platform identifies all SW units concerned and orders release over a standard interface.

2004-10-18

13

AXE SW Recovery. Low Level recovery


Transaction Ix Links
Ix Low level recovery handler Ix Ix

Ix

Ix

Ix

Ix

Ix

Execution Error! ID=Ix

Release
Ix Low level recovery handler
2004-10-18 14

AXE SW Recovery. The Reality


SW Error

Low Level Recovery

99,8%

No Action Filter < 0,1% System Restart


2004-10-18 15

0,1%

AXE SW Upgrade
Two different methods are used for SW upgrade: Corrections/patches and New SW packages Corrections/patches are local changes of code inserted at assembler level when the CP is idle => no disturbance New SW packages are introduced when major changes including new data structures are required. The new version of a SW units inherit data from the old version and are switched in with a system restart => at least yearly disturbance of new calls (~1 min. down time)
2004-10-18 16

AXE SW Upgrade, Data Inheritance


Data Change Data Change Data Change Information Information Information
Old unit Different upgrade cases New unit

2004-10-18

17

AXE Scalability
The traditional AXE in scalable only in the RP region. For the CP only a low-end/high-end option exists. In modern applications the need for HW related RP scalability is decreasing but the need for CP scalability is increasing. To achieve better scalability AXE uses two approaches: 1) Parallel multi-threaded execution with common memory 2) Clusters of CPs with network interfaces

2004-10-18

18

AXE Scalability, Multi-threaded Execution


Must be 101% compatible with application SW (includes fault compatibility!) The problem is not to make it work The real problem is to make an efficient implementation with limited over-head including the cost for cache coherency => minimize true concurrent execution => combine concurrency with functional distribution! = (CMX-FD)

2004-10-18

19

CMX-FD. Simplified View

AXE Scalability, Concurrent Multi Execution


Memory Cluster of APZ-OS and Platform SW Processor Core FD CMX Processor Core FD CMX Cluster of Application Modules Memory

Processor Core

Processor Core FD CMX

Memory
Cluster of Application Modules

Memory Cluster of Application Modules

FD CMX

FD-mode: CMX-mode:

Functional Distribution, that is each function is allocated to execute on one Processor Core only. Concurrent (Multi) eXecution, that is each SW unit is allowed to execute on all Processor Cores, but only one at a time. Certain sequencing rules must be obeyed in order to make each call execute like it would in a single CP system.

2004-10-18

20

AXE Scalability. Cluster Systems


Clusters address - scalability - down time at upgrade - down time at node failure

N+1

Network protocols Call Control

Protocol Term. + Dispatcher


T

2004-10-18

21

AXE 10 Minutes, CP Classic vs. Modern


Application SW
Same Same

Application SW
Same

APZ-CP OS

APZ-CP OS
ASAcompiler

MIP

APZ-VM

OS (Tru64) HW

HW (-processor)
2004-10-18 22

You might also like