You are on page 1of 22
AAS 08-031 GN & C Fault Protection Fundamen tal s Robert D. Rasmussen Jet Propulsion Laboratory,

AAS 08-031

GN&C Fault Protection Fundamen tal s

Robert D. Rasmussen Jet Propulsion Laboratory, California Institute of Technology


February 1-6, 2008 Bre ckenridge , Color ado

Sponsored by Rocky Mount ain Se ct ion

AAS 08-031 GN & C Fault Protection Fundamen tal s Robert D. Rasmussen Jet Propulsion Laboratory,

AAS Publ ic at ions Off ic e, P.O. Box 28130 - San Di ego, Ca l ifornia 92198

AAS 08-031


Robert D. Rasmussen *

Addressing f ault tol er an ce fo r spa ce c ra ft Guidan ce , Navig ation, a nd Control has never been ea sy. Even unde r norm al condition s, th es e system s confront a r em ark able blend of

complex issu es a cross m any disc iplin es, with pri ma ry impli c at ions for most e ss enti al syst em func tions. Mor eover, GN&C must de al w ith the pe cul ia rit ie s of sp ac e cr aft

configur at ions,

distu rbanc e s, environmen t, and othe r physi ca l m ission-unique constr aints

that a re s eldom unde r its fu ll contro l, al l whi l e promising consist ent ly high perform anc e.

Adding faults in a ll thei r insidiou s va ri ety to this a lr e ady intric a t e mix cr e at es a tru ly daunting ch all enge . Appropri at e t ac ti ca l re covery must be en sured without compromis e to mission or spa ce cr aft int egri ty, even during ene rge ti c a ctiv iti es or under imm inent criti c al d eadl ine s. If that we re not enough, the consequen ce s of a s e emingly prudent move can have pro foundly negativ e long-te rm cons equen ce s, i f cho sen unwis ely, so ther e

is o ft en a m ajor str at egi c component to GN&C f ault tol er anc e, a s w ell . Th er efor e, i t is not

surprising tha t faul t p rote ct ion fo r GN&C ha s an enduring r eput ati on

as one o f th e mo re

complex and troubl esom e a spe cts of spa c ec ra ft de sign — one that w ill only be

compounded by the es c al at ing ambi tions of impending spa ce missio ns.

Despi t e th es e diffi cult ie s, experien ce ha s suggested me thods of a tt ack tha t prom is e good re sults when fo llowed consist ently and implement ed rigorou sly. Up on close sc rut iny, it is strikingly c le a r tha t the s e me thods have roots in the s ame fundame ntal conc epts and prin-

cipl es that hav e suc c essfu lly guided normal GN&C dev elopmen t. Yet i t is

disappoint ing

to note tha t th e ac tua l m anif e st at ion of th es e ide as in d eployed sy st ems is ra re ly

tr anspa ren t. The co st of this obfus ca tion has b een unwarr anted growth in complexi ty, poorly understood behavior, incompl et e cover age , brit tl e de sign, an d loss of confidenc e.

The obje ct ive o f th is p ape r is to sh ed som e light on the fundam e ntals of f ault tol er ant design fo r GN&C. The common herit age of ide as beh ind both fault ed and norma l opera tion is explored , a s is the inc re a singly indist inct l ine be t ween the se r ea lm s in complex m issions. Te chnique s in common pra ct ic e ar e th en ev al uated in this l ight to suggest a be tt er dir e ction for futur e efforts.

INTR ODUC TION Volumes have b een wr it ten about how to do GN& C, wh e n everyth ing is more or less right . It is an

incred ibly r ich topic st eep ed in phys ics , m ath , eng ineer ing disc ipl ine, and de cad es of pra ct ic e. Rath er l ess though has be en wri t ten about how to m ake GN& C work, even when th ings go ser iously wrong. Th at’s understandab le . Succ ess is hard enough without add ing the c ompli ca tion of faul ts. Y et ev en wi thout faul ts,

it tak es making

arbitr ari ly more effort to guar ante e al l is ju st r ight th an to tol era te imperf ec tion. Fau lts or not then, things work, even wh en th ings aren’t right , is r ea lly t he es senc e of engin eering .

Convent ional engin eer ing a ccounts for normal and other m eand ers w ithin the engin eer ing design

production varia tions , env ironment uncer ta inty , we ar, sp ac e. I n GN&C, th ese are things l ike m is al ignmen ts ,

mea surem ent and disturbanc e nois e, and sma ll model ing simplif ic at ions. On e typi ca lly ch ara ct eri zes su ch varia tions through sensi t ivity an alys es of various sorts aro und selec ted des ign points, s triv ing to make a syste m tha t serv es i ts in tended funct ion and performs cor rect ly wi thin th ese expec ted to ler anc es. Th is aspe ct of conven tiona l d esign can be thought of as v ari at ion t oleran ce.


Jet Propulsion Laboratory, California Institute of Technology


Faul t tol eranc e is s im il ar, but wi th a larger s cope. Var ia tions handl ed through fault tol eranc e are genera lly more ex trem e, pre cip itous, or dangerous than normal vari at ions; and be caus e they involve departures from in tended funct ional i ty and correct performa nce, to ler at ing them t ends to involve higher or


des ign a cco mmoda tions th an

do norma l v ari at ions. Even so , the

b asi c issu es are

th e s am e: finding

ways to a sses s and cont ain th e effec ts of vari at ion, such tha t functiona li ty and perform ance ar e pres erved.

Given this kinship, one might exp ec t to see a c lose fa mi l y resembl anc e when comparing the des ign

pract ic es of conven tion al v ari at ion tol eran ce

w ith those of f ault to leran ce . Regre tt ably , i t oft en

isn’ t th ere .

Convent ional GN& C brings a ma ture underst anding of dynami cs and st at ist i ca l model ing, me asure men t and est im at ion, con trol, pl anning, opt im iz at ion, and oth er design e le ments — in ea ch ca se grounded in solid theor et ic al foundat ions. But fau lt to ler ant GN& C ha s a different s tory to t el l. Many le ssons hav e be en le arned over m any years and many proje c ts (usua lly som et hing lik e, “ Don’t do tha t ag ain ”), but progress

has been

s low. Theor et ic al grounding for fault to leran ce , as general ly prac ti ced , has sign ifi can t ground to

make up in comp arison to its convent ional cousin. Th is is the c entra l th em e explored her e. Fundam ent al

issue s of f aul t toler ant GN& C d esign are cons idered , w ith th e goa l of r ei ter at ing bas ic guide l ines

com monly

understood by those working in this fi eld. The l arger obj ec tive , howev er, is to underscor e th e concep tual bonds between convent ional G N& C funct ions and GN& C fa ult toler anc e wi th the hope tha t th is sugges ts a

direc tion for fur ther growth in the l at t er.

This is of part icu lar int ere st now , be caus e in m any ways w e have re ached the end of an er a, where i t might be sa id th at cus tomary me thods hav e be en carri e d to their log ic al extr em e. In fac t, by som e

asse ssm ents , st andard faul t to ler ant des ign is in crisis , as the sam e l it any of problems re curs on projec t aft er

projec t (l at e de livery , fragi le b ehavior, poor operab il ity , inco mple te te st ing, and so on) — and this is be fore

consider ing the impl ic at ions of new mis sion type s tha t wi ll push even harder. Solut ions are not lik ely to come m ere ly by pol ishing ex ist ing m e thods, s tar ting the m ear l ier, in tegra t ing th em bet t er, or wrapping them wi thin t ighter proces ses . The roo ts of the problem ar e d eeper than th is, so a d ifferen t pa th is r equired .

There are many s ides to th is puz zl e. Id ent ifying ha zards and evalu at ing risks are cl ear ly importan t, as

are redundan cy and cross- strapp ing trad es, sensor s e lec t i on and pla cem ent , ver ifi ca tion proc ess es and

methods , prev enta t ive ma int enanc e, and o ther r el at ed topic s. How ever, for th e s ake of brevi ty and


the emphas is h ere is on the fau lt prote ct ion * ele men ts thems e l ves — the in tegra ted

in a l ive syst em , tha t m ake th e diff erenc e b etw een f ai lure an d

succ ess , when th ings

control funct ions, ac tiv e go wrong.


The co lloqu ia l vi ew of fau lt pro te ct ion is simpl e. D et ec t i on of fault X tr iggers correc tiv e a ct ion Y (e.g., a r edundancy swap). Th en the sys te m jus t t idi es up a li tt le and reports home. Why isn’t fau lt protec tion alw ays this e asy? We l l, from a cursory vi ew of archi te ctur es dep loyed aga inst f aul ts in sp ac e syste ms, one might g et the m is tak en impre ssion that this is indeed a ll ther e is to i t. Ex cep t where ac tiv e redundancy is used (common in launch veh ic les , for exampl e), typi ca l faul t protec tion arch it ec tures cons is t of a moni toring sys te m for de te ct ing when so me thing is not right, link ed to a re sponse sys te m to iso la te the fault and ei ther re tre a t to a s afe mode or restor e the los t function (som et im es both). Topi cs abound when discuss ion turns to faul t tol eranc e in gener al , but monitors and responses ar e st il l the dom inant them e in ac tual impl em ent at ion, l eaving th e genera l impre ssion th at fault prot ec tion is m ain ly about base ref lexe s, like pul ling your hand aw ay from a hot stove . Ev en th e f ew supporting func tions typi ca l ly follow th is patt ern. For ex ampl e, re sponses ar e oft en aid ed by a redundancy man agem ent sys t em to f ac il it at e se l ec ting backups. In addit ion, a logging syst em records key det ai ls , such as which moni tors tripped a t wha t time , how severely, wha t response wa s trigger ed, et c. Of co urse, ther e are a lso the oblig atory flags to individua l ly turn monitors and responses on or off. The m onitor-response them e though is ev er presen t,

describ ing, more or l ess, mos t sy st ems th at h ave ever

flown i n spac e.

Another common aspe ct of faul t protec tion impl emen ta t ion is that th is co ll ec tion of monitor-response functions u sual ly appears a s a dist inc t app endage to norm al funct ional ity : moni toring func tions g enera l ly eave sdrop on exis ting d at a s tre ams ; r esponse syst ems co mmonly usurp exis ting comm and-sequen cing

* The term “fault protection”, as used here, is essentially equivalent to “fault management”, “safety and health management”, and similar terms in common use. It can be taken broadly or not, to encompass a variety of topics. A middle-of-the-road view is adopted here to focus on online capability, while broadening consideration to implications on general system operation.


functions ; redundancy is oft en mask ed from normal funct ions through virtual iz at ion of some sort — l argely the sep ara te deve lopm ents of a faul t prote ct ion te am . As fa ult prote ct ion experi enc e has a ccumu la ted , the import ance of e arly in tegra tion in to the larger sys te m has beco me appar ent . Nonethe l ess, w ith few excep tions , the f aul t prot ec tion ta sk and i ts assoc i at ed ar chi te cture have t ended to re ma in large ly de t ached .

In this convent ional p ic ture, mos t faul t prote ct ion arch it ec tu re (as dis tinc t from faul t prote ct ion design or imple men ta tion) res ide s prim ari ly in th e moni tor-response mach inery. This m ay se em lik e a shorts ight ed asser tion , but major ar chi te ctura l fea tures pe cul iar to fa ult prote ct ion or embodying fault prote ct ion

princip les ar e gener al ly scar ce , outsid e the mon itor-respon se struc ture . * For example, the resump tion of disrupted cri ti ca l a ct ivi ti es, a lthough usual ly consid ered a su bstan ti al par t of the f aul t prote ct ion engin eers’ task , is noneth el ess impl em ent ed separ at ely in most sys te ms from the moni tor-response ar chit ec ture ; and i t

genera lly requ ires custom d esign

(e .g. , com mand sequ ence s or sta te m ach ines) w ith at most triv ia l spe cif ic

archi te ctur al support iden tif iab ly ded ic at ed to ma tt ers of fau l t re covery. S im il arly , when the t ime com es for fault prote c tion v erifi ca t ion (regul arly a t ime of r evel a tio n, i t se ems), focus t ends, a t le ast in it i al ly, to

desc end from a globa l v iew ba ck to moni tors and respons es in the “ touch a l l pa ths” spir it of tes ting , w ith the hope tha t th is is somehow adequa te. For su ch reasons , a nything outside the mon itor-respons e syst em is routine ly tre at ed as some thing other than faul t prote ct ion archi te ctur e, a t le as t from an imp lem ent at ion point of view. The handful of except ions usual ly consis ts of simpl e retry me chanisms , dat a error masking,

fil ter ing

to to lera t e tempor ary da ta outag es, and o ther highl y loc al iz ed me asures of th at sort —

a l l most ly

ad hoc, thus wi th li tt le supporting arch it ec ture. Given this unadorned pi ctur e of fau lt prote ct ion a s som eth in g with boundaries , d ist inc t from

the res t of

the sys te m, i t has b ecom e a com mon m anage ment exer cise t hen to count the number of fau lt moni tors and responses to get som e ide a of the scope of “the fau lt pr otec t ion problem” . Invariably th is number is asse ssed as “ too high” — d esign ers trying too hard, too muc h redundancy, too many ver ifi ca tion t es ts, and so on — and that , i t is oft en s aid , is why faul t prot ec ti on is co mplex . N egle ct ed in such s imp list ic

accoun tings though are the abundan t ways in whi ch real fault prote ct ion departs from th e simp list ic orthodoxy suggested by moni tor-response arch it ec tures . Co mmon issues ar e lis t ed in Figure 1 (nex t pag e).

Sorting one’s w ay through this ma ze is no t e asy, even wi th few fa ilur e mode s or l im i ted r edundancy. However, wi th l i tt le beyond the bas ic appar atus of mon itor-response ar chi te cture s for support, faul t protec tion de signs tend to requir e a lot of c lever improv isa tion . In the resu lt ing confusion of ad hoc

solut ions, f aul t prot ec tion sys te ms

can beco me ov erly co mpli ca ted , diff icu lt to unders tand or ana lyz e,

capab le of unforeseen e mergen t behaviors (usua lly for the worse), impos sibl e to t es t thoroughly, and so britt le th at th e sugge st ion of ev en trivi al ch ange is enough to rais e al arms .

Loss of Ar chit ectu ral Integ rity

These problem s ar e a ll signs of lost arch it ec tura l in tegri ty: the absenc e of con ceptu al grounding an d

regular p at terns of des ign th at embody and circu mst anc es, wher e thes e issue s ar e not

enforce fundame ntal princ ipl es of the dis cip lin e. In the worst of under control, fa ult prot ec t ion m ay ev en be detr im ent al to the

rel iabi l ity of the sys te m in de fianc e o f its v er y purpose .

This shouldn’t happ en. Anyone fam il iar w ith thoughtfu l, we l l-run fault prote c tion deve lopmen t apprec ia tes the profound impli ca tions of this effort on the qual ity and robustnes s of the overa ll des ign, ev en

when every thing is norm al . I t is r are to f ind anyone more i ntim at e ly fa mi li ar w ith the sys te m as a sy st em than faul t prote ct ion engin eers . The ir job by nece ssi ty requires the m to underst and the func tion al ity of the whole sys t em, th e obj ec tiv es it mus t fu lfi ll , the sc enarios i n which i t opera t es, i ts qu irks of beh avior, the intri ca ci es of in tera ct ion a mong i ts p arts and w ith i ts environment, the cons tra ints wi thin wh ich i t w il l perform correc tly, the m anag emen t of its resourc es, the ways in which i t wi l l be opera ted, and the des ign of the soft ware tha t con trols it . Th e she er effort of a sse mbl ing this p ic ture into a coheren t whol e, when done

properly, is undoubted ly a gre at contr ibution


syst em s en gineer ing. Fau lt pro te ct ion design ers th en t ake

this a st ep furth er by ext ending th e boundari es of sys tem understanding into reg ions outs ide th e de sign


Even “rule-based” architectures generally offer little more than generic programming support to the monitor-response paradigm, with nothing intrinsic to say about fault protection, the issues surro unding it, or the principles supporting it.


Ma ny scatt ere d sympt oms may a pp ear conc urre ntly fro m o n e ro ot c ause , o ft en masking th e r ea l cu lpr it. So me fa ults may b e h ar d t o d isti ng uish fro m n ormal op era tio n, y et st ill caus e l on g-t er m tr ou bl e. Wh at ap pe ars t o b e wr on g dur in g on e missio n ph ase may be exp ecte d or accept ab le in a no th er. The re may b e sever al p ossibl e expl an at io ns for observ ed di fficu lt ies, eac h wit h dif fer en t r amific ati ons or req uir in g co mp et in g r espo nses. An ap par en t f au lt in on e are a may in fact b e d u e t o a sensor fa ult in a no th er are a. A fa ult may not man if est its elf wh en it occurs, emer gi ng only much lat er, p ossibly dur in g a su bseq ue nt u nre lat ed emerg e ncy. An ap par en t pr o bl em may n ot hav e a pr evi ously ide nt ifi ed caus e or may not expr ess itse lf in a n ant icip at ed man ner . False a lar ms may prov oke r esp onses t ha t are as disru ptiv e as a ge nu in e f au lt, so t her e is pr essure to compro mis e o n sa fety. Misb eh avi or may be a c onse qu enc e o f d esi gn or mo de li ng err ors, i ncl ud in g f ail ur e t o ad dress cert ai n system-l evel in ter acti ons. Enviro n me nt al or op era ti on al extr emes may be pus hi ng a system b eyon d its d esi gn r an g e. Oper ato r er rors may be th e pri mary caus e o f a fa ul t, creati n g a con flict b etw ee n d oi ng wh at is d irect ed a nd doi n g wh at is ri gh t. Overt act io n may be n ee de d t o di ag nose a pr ob le m bef or e corr ective acti on can b e deci de d. Compl ex i ncre me nt al tactics may be n ee de d t o id ent ify an d iso la te fa ults, or recov er saf e op era ti on. Fau lts may cr eat e an ur ge nt haz ard to s afe ty or critic al op era tio ns t hat must also b e ha nd le d a t t he s ame ti me . Concur re nt r espo nse acti ons may b e n ecessary, w it h pot en ti ally s hift in g pri ori ties an d co nf licti ng d e ma nds on the system.

Ma ny mor e w ays may b e n ee de d to r esto re op er ati on aft er a f aul t t ha n t he typ ical ly small n umb er o f w ays ne ed ed to p erfo rm th e sa me op erat io n nor mal ly. Funct io na l backups, wh en fu ll re du nd ancy is no t avai la bl e, may inv olve de gr ad ed mod es or charact erist ics of op er ati on wit h bro ad i mp lica tio ns o n subseq u ent activ iti es. Fau lt c ont ai n me nt may n ot be eff ectiv e, r esul tin g in second ary f au lts t ha t must also b e ha nd le d. Isola ti on mecha nis ms may be insu ffici e nt t o eli min at e all a dverse ef fects, even wh en re du nd ancy is pr esen t. Other fu ncti ons ( oft en by desi gn ) may be masking sympto ms or i nt erf eri ng wit h r ecovery. Quirks or o missio ns i n un der lyin g f unct io na lity may conspir e ag ai nst t he acco mp lish ment of ot h erwise reaso na bl e acti ons. Lat e bre akin g dev elo p me nts i n ot her syste m an d softwar e are as ca n i nval id at e f au lt pro tecti o n mo de ls or tests, d isru pti ng or i nval id ati ng V&V ef forts. Actio ns must be c hose n t o pres erve r esou rces a nd ma in tai n missi on o pp ort uni ti es, ev en un d er sever e de ad lin es o r o th er co nstra in ts. Contr ol a ut hor ity n ee de d to ens ure s afe ty a nd rest ore op era tio n may b e co mpr o mise d. Ti mely resp ons es may re qu ire subst an ti al an tici pat io n in ord er to r ea dy p ot en ti ally r eq uire d asse ts a nd alt ern at ive t actics. The l on g-t erm cons eq ue nces o f sh ort-t er m expe di enci es may be dir e. Essenti al activi ti es may n ee d t o be resu med , ev en tho ug h t i me or c ap ab ili ty h as b ee n l ost. Even w he n a missio n is lost , i t is ge ner al ly i mp ort an t t o att empt re port in g t h e pr o ble m to o per at ors — oft en a da unt in g t ask o n i ts ow n. Exten de d aut o no mo us o pe rat io n un der a dverse or de gra de d co nd iti o ns may b e ne ed ed, yet st il l r eq uir in g a lar ge po rti on of th e syste m’s f unct io na lity at s ome leve l.

Figu re 1

Issue s Routi ne ly Co mpl icat ing Fault Prot ection De sign

spac e, mak ing

changes th at improve design robustnes s ev en under normal condit ions. Fau lt prote ct ion,

done right , gre a tly improves th e s afe ty and re li abi li ty of a sy ste m, wheth er or not a fau lt ever occurs .

The diff icul ty we f ac e, then, is not one of find ing ways to substan ti al ly reduc e the scope of th e fau lt

protec tion task , for the mer it of this ac tiv ity is

g enera lly ap preci at ed,

even when its comp lex it ies are not.

Ra ther, wha t we ne ed is a way to mak e sure the right things happen on every project. Unfortuna te ly though, having ac cep ted tha t faul t prote c tion is hard, e mphas is in the s earch for solu tions h as t ended to sh ift generi ca lly to sof tware engin eer ing, primar ily v iew ing th e need a s one of improved sp ec ifi ca tions , mor e

rel iabl e des ign me thods and tools, t ight er configura tion manage men t, be t ter ver ific a tion, and other

mea sures aimed at proces s ra ther than archi t ec ture. Ther e is sure ly no ques t ioning th e m eri t of su ch an enterpr ise , sinc e conc erns over unmanaged softw are co mpl e xity and consequen t reduc tions in re li abi li ty are valid (fau lt prot ec tion being only one of many contribu tors). The m ess age here , however, is tha t bet t er

software eng ineer ing is not enough — and

in f ac t, doesn’t ev en

touch th e topic of fau lt prote ct ion its elf .

To preserve arch it ec tura l int egrity , which in the fin al ana lys is is a precondi tion for all oth er mea sures, there mus t firs t be Archi t ec ture. Not just any archi te ctur e, but Archit ec ture th at is up to the ch al lenge s of fault toler anc e in al l i ts dim ensions . Otherw is e, re curring appeal to idea s tha t have predi ct ably fa ll en short wil l guar ant ee tha t ea ch ne w sys te m s truggle s w ith the dange rous irony of unsafe fau lt prote ct ion.

Fault Prote ction as a Pr in cip le d Cont rol Fun ction

So, wha t do f aul t prote c tion wo es hav e to do w ith G N&C? From on e poin t of vi ew, GN& C mer el y compounds the problem. Due to i ts unusual ly many and diverse int era ct ions with oth er syste m el em ents ,


GN&C tends to be involv ed in ha lf or more of a ll the fa ult prot ec tion impl em ented on spa ce v ehi cl es. Issues of problema ti c syst em- leve l in ter ac tions, wh ich plag ue many designs , incre as ingly find GN&C in their m idst . Meanwh il e, supplying robus t GN& C c apabi l ity hasn’t be en ge tt ing any e as ier, as ev er more compl ex m iss ions aris e, ca l ling for be t ter p erformanc e and more au tonomy in new un cer ta in env ironments. It is c le ar tha t good faul t prot ec tion w il l endure as a vi ta l ingredien t in me et ing th ese n eeds , so any so lut ion to th e f ault prot ec tion probl em as a whole wi l l na tura lly str engthen GN& C as a key const itu ent.

The e ssen ti al quest ion her e though is whe ther th e rev erse is also true. Tha t is, are ther e princ ipl es to be le arned from GN&C tha t might sh ed l ight on the faul t prote ct ion problem? To be cl ear , the not ion is not to address GN& C fau lt prot ec tion a lone , wi th the condesc ending notion th at al l other fau lt prot ec t ion problems f al l by ex tens ion. Rath er , the id ea is to vi ew f ault prot ection as funda me ntally a contro l probl em , the p ri nc ipl es of whi ch have b een hap pi ly r efin ed through G N&C and s im ila r syste m control d is cip l ines . In this approach li es the benef i t of struc turing faul t prote ct ion arch it ec ture through the sam e wel l-es tab lished pat terns th at have a ided GN& C, but which go beyond present com mon pract ic e in

fault prote ct ion. Wi th

thes e prin cip led pat terns in pl ace i t also beco mes quit e cl ear

th at faul t prot ec t ion

belongs as an int egra l and harmonious part of a col le ct ive ap proach to syste m control and opera tion, not a s

an appendag e. Th is insigh t is ess ent ia l for

any at te mpt to e stab lish an ar chi te ctura l foundat ion for further

advance s in autonomy — f aul t prot ec tion or o therwise . Ye t t his is by no m eans a n ew id ea . Thus , th ere is a cert ain irony tha t this poten ti al has ye t to be broadly exploi te d, esp ec ia lly in GN& C fau lt prote ct ion.

Fundament a l, st abl e pa t terns and sound prin cip les ar e at th e heart of good ar chi te cture . The foundations of thes e pa tt erns and prin cip les mus t be d eeply understood s o that depar tures from the m can be recogni zed , if we ar e to dep loy sys tem s w ith arch it ec tura l int egrity . Le t’s tak e a qui ck look a t those gu iding GN& C.


The conc eptu al

roots of modern GN& C l ie d eep in syst e ms th eory*, which concerns i tse lf wi th th e

nature of beh avior and int erac t ions in compl ex sys tem s. I t is very much focus ed on the proper ti es and behavior of the whol e, r ecogni zing tha t any de scr ipt ion of the sys te m at low er leve ls through a reduc tion ist

approach is ne ces sari ly incomp le te . Th is syste m w ithout e mergen t b ehavior is only

a cknowl edgmen t gives r ise to the idea of em ergenc e. In fac t, a triv ia lly a proper s yste m by usua l d efin it ions.

Natura lly , th is ra is es ques tions r egarding the ex ten t to wh ich such behaviors are to be engine ered, r ath er than mer ely endured. Unfortuna te ly, m any syst em des igns leav e key issu es of emerg ence to chanc e. Engineer s who have found the mse lves in th e m idst of a sys te ms in tegra tion task tha t is as much an exerc is e of discovery as of v erif ic at ion wi l l re cogniz e the sinking fee ling brought on by this re al iz at ion. Tha t

sent im ent c ert ain ly echo es th e the me her e. H ence , to the ex tent syst ems theory has the top ic, i t is worth a brief re i tera t ion of the bas ic idea s a nd termino logy, despi te

any thing to say about the ir fam i li ari ty. Th is

indulgenc e wi ll be repa id shortly, a s the re la tionsh ip of these idea s to the topi c of GN&C faul t prote ct ion becom es cl ear.

Basi c Ideas and Ter mino logy State and Beha vior . Sy st ems theory b egins wi th th e essen ti al

notion of a sy st em as a dyna mi ca l ent ity,

some thing tha t chang es over t ime . Al l chang es ar e refl ec ted in chara ct eris t ics ref erred to a s sta tes (th ings like a t ti tude, t emper atur e, op erat ing mode, and so on). S t at es d escr ibe wh at c an ch ange, but no t wha t changes are poss ibl e: the prin cipa l int eres t of sys tem s theory. The l at ter ar e captur ed in behav iors , the ru les ,

constra ints , or other “l aws” that de ter min e which his tor ies of syst em st at e over time ar e possibl e.

Descr ipt ions of beh avior are often

referr ed to

as models. St at e and behav ior ar e co mpl et e de scrip tions of

the dyna mi c aspe cts of a sys te m — no exc ept ions.

Hierar chy and Scope . From sys t ems theory we

also get our basic no tions of sys te m de composi tion and

subsyst ems , int erfa ces , and hierar chie s. Al l refer ence s to “s yste m” here in clude subsys tem s anywhere in a hierar chy. A key concep t in th is vi ew is tha t most int eres ti ng systems , and ess ent ia l ly al l subsyst ems , are

* Historically, systems theory appeared as a generalization and consolidation of earlier ideas from dynamics, control, filtering, communications, and so on. This term is taken loosely here to correspond with notions from cybernetics, control theory, signal processing, and others. Precise alignment with any of these domains of thought is not essential to the central points of this paper.


open, being subje ct to ext ernal influen ce (i .e ., not c losed) . Absenc e of any obvious closure rais es th e quest ion of syst em s cope : de cid ing wha t is properly insid e a syste m and wh at is not. G iven the openness of syste ms, issue s of emergen t syst em beh avior do not resolve them se lves as the cir cumfer ence of consider at ion is w idened , so ther e r ea lly is no corre ct answ er to que st ion of s cope. One must choos e.

Objec t iv es. Wh en a sys te m is d ire ct ed toward so me purpose, it is s aid to h ave an ob je ct iv e. Oth er recogniz abl e words des crib ing this v it al no tion from our syst ems engine ering vocabu lary ar e funct ion, task, targe t, ro le , respons ibi li ty, aim, goa l, in ten tion, constr ain t, plan, et c. L ike beh aviors, obje ct ive s cons tra in the history of st at e, but r ath er than constr aining wh at change s of sta te are poss ibl e over t im e, they cons tra in

what chang es are a cc ept abl e.

Say ing a syst em has a cc ept ab le behav ior is equiv al ent to

say ing it

m ee ts i ts

objec tiv e. A sys te m tha t fai ls,

th erefore, is ne ce ssar ily one

that v iol at es i ts ob jec t ive, and

converse ly.

Contro l. The de l ibera te exerc is e of influenc e on an open syste m to a chi eve an obje c tive is c al led con trol . In composit e syst ems , i t may be poss ibl e to ex ercise con trol on a subsyste m only indire c tly vi a cha ins of influen ce. Ev en if direc t ly control labl e, a subsys tem m ay be disturbed by other such chains of influen ce .

Engineer ed sys te ms g enera l ly have som e ident ifi abl e s truc tu re in the ir composi t ion tha t t ips the b al ance in favor of meet ing objec t ives . The hi erar chy of functiona l dec omposit ion is immed ia te ly recogn iz able in this pictur e. Control ling a sys te m genera lly requ ires m anag ing influenc e through many, if not mos t, of its

chains of influen ce . Manipul at ing thes e

chains of influen c e dir ec tly is c al l ed

open loop con trol. Th is is

compl ic at ed consid erably by random influ enc es, and by a web of inter ac tions th at is far more tangl ed in typic al sys t ems than any top-down function al de composi t i on alone would ever indic a te. Addre ssing this

genera lly ca l ls for addit iona l s truc ture. Contro l Loops. The fundamenta l response to such comp li c at ions is

the in troduct ion of negotiat ed, c losed

loop control. Clo sing loops is done by augm enting a control l ed syste m wi th a con trol s yst em, wh ich app li es knowledge of the s ta te and behavior of the sys te m under control to de term ine appropri at e influ ence s on it, so that i t wi l l me et i ts obj ec tiv e. Th is pair ing is c al led a clos ed loop syste m. By regu lar ly adjus ting to

observat ions

of the con trol led sys te m as it is, the control s yste m provides var ia tion to ler ance , a s defin ed

earl ier , m aking the clos ed loop sys te m more predi ct abl e th a n the beh avior of th e underly ing syst em b eing control led. Moreov er, s ince a c losed loop syst em is influ en ced by a dir ec t st at em ent of i ts obj ec t ive, the

apparent comp lex ity of the origin al syst em from an ext erna l point of v iew is subst ant ia lly hidden. (Tha t’s import ant, be caus e c losed loop syst ems ar e st il l open syst e ms.) Negot iat ion aris es from the norma li za tion of direc tion to c losed loop syst ems , wher eby a ll ext erna l en ti ti es w ant ing to influen ce a con troll ed sys te m el emen t c an do so through state men ts to its control sys te m e xpressed in th e com mon language of obj ec tiv es (i.e ., constr aints on st at e). S inc e obje ct ive s expres s in tent , they may be compar ed, in order to de te ct

confli ct ing in ten ts, and combin ed, in order to me et shar ed in tent . The

norm al i zed b ehaviors of c losed loop

syste ms mak e predi ct ion, plann ing, and coordinat ion eas ier t o do with confidence and effi c iency , and hence

eas ier to autom a te, if des ired. A syst em

opera ting on thes e pr incip le s is s aid to be goal-bas ed. 1

Cognizanc e. Through its knowledge of st at e, behav ior, and objec tive s, a con trol syst em is in some mean ingful way cogni zan t of the syst em i t contro ls. Such k nowledge m ay not appear overt ly in a des ign, but it is pres ent in the sen se th at , wer e st at e or objec tiv e to c hange, th e control syst em would respond to i t, and were behav ior to change , control sy st em pl ans or desig n would have to be revis it ed. In its cogni zanc e role a clos ed loop control syst em solve s the m ajor problem s of how to achieve an obje ct ive , whil e hiding thes e det a ils from ext erna l ent i ti es. How ever, it l eave s beh ind the v it al probl ems of unders tanding what objec tiv es me an ex ac tly , what obje ct ives ar e ac tua lly p laus ib le, and what to do when ob je ct ives fa il . —

The key to addressing the se conc erns is transpar ency . Wh e n we put a system together, it shou ld be cl ear whi ch basi c control conce pts are be ing appl ied an d whether bas ic pr in cip l es in the ir us e are being follow ed . Th e pre mis e h ere is th at the se conc epts and princ iple s are in broad, product ive us e w i thin normal GN& C, so m aking the m tr anspar ent in GN &C fau lt protec tion should a ccru e s im il ar b enefi ts .

Making Th ese Id eas Tran spar ent

A control syst em needn’ t be too sophist ica t ed before basi c idea s of stat e, beh avior, and obje ct ives becom e more tr anspar ent. Even in s imp le e arly sp ac e s yste ms, wher e contro l was exerc is ed though manipu la tion of sensor biases , control sys te ms were opera t i ng at a commensur at e lev el of cogniz ance . For


examp le , an objec tiv e present ed to an at ti tude control sys te m might have be en to achi eve a part icu lar sun sensor output voltag e (corresponding to some angl e). Nonet hele ss, a cco mplishing this required knowledge of the volt age, through measure men t, and use of an a ssoc ia tion be twe en volt age, at ti tude , and thrust er derived torque, whi ch a mounted to knowl edge of behav ior (if only in th e des igners’ m inds).

In more sophist ic at ed syst ems , i t is common to s ee mode ls a nd the sta t e we c are about appear expli c it ly, as for instanc e gravi ta tiona l dynami cs in orbit de term ina ti on. Likew ise , objec t ives m ay be more cle arly art icul at ed, a s in t arge t-rel at ive poin ting. The m eri ts of such approaches are we l l es tab lish ed, having ar isen during the ear ly years of syste ms th eory developm ent . There fore, the e le men tary notions out lin ed here have deep and broadly unders tood impl ic at ions tha t hard ly be ar rec it ing, espe c ia lly in the cont ext of GN& C. Somehow though, thes e ide as s ti ll s ee m to los e focus in ma ny designs, espe c ia lly in fau lt prote c tion (and

syste m manag em ent in g enera l), so it is importan t to re it era te the m. Here

ar e so me of th e ba si c idea s.

Transparency o f Obje ct iv es . Given the d efini t ion of behavior, an obje ct ive on a clo sed loop syst em is nothing more or less than a mode l of desired beh avior for the syste m under control . Tha t is , as des cribed above, there are c erta in histori es of the st at e of th e sys te m u nder control tha t are ac cept abl e (i .e ., sa tisfy the

inten t of the issuer) , and the obj ec t ive of the con trol sys te m is to ach iev e one of them . An obje ct ive to poin t

a c am era , for examp le , can b e ach ieved by any me ander a mong the orien ta t ions th at a lign

the bores ight

with in some to leran ce of the t arge t direc t ion. It is far bet t er t o give a pointing control sys te m such pointing objec tiv es than to h ave to issue co mm ands to thrus ters , gimb als , and so on to m ake th e s am e thing happ en.

One se es then tha t clos ed loop control do es not e limina t e t he need to know the b ehavior of a sys te m. Ra ther, it simpl ifi es the invoc a tion of desired behavior by permit t ing direc t comm anding of it in the form of

objec tiv es — a t le ast when obj ec tiv es are transpar ent . In re a li ty,

the w ay obj ec tive s ar e co mmonly d efined

can leav e c ert ain aspe cts of behav ior undefined, impl ic it , or condit iona l.

Li tt le o mis sions ( as above where

rotat ion around th e bores ight is ignored) may mak e som eon e unhappy, but f ar more ser ious infr ac tions are possibl e. Fa ilure to appre ci at e this c an le ad to dangerous vul nerabi li ti es .

There are r ec iproca t ing aspe cts to this. Firs t, an obj ec tiv e is not transp aren t if a control syst em produce s behavior o ther than tha t express ed in its obj ec tive , wh il e cl a iming o therwise. Th is much is c le ar. In addi t ion though, when spe cifying an obje ct ive , anyth ing short of ex pressing ful l in ten t is a lso not transpar ent . For

examp le , if one int ends a l at ch va lve to be open for the dura t ion of a propuls ion a ct ivi ty, th en a d ire ct ive to

open the v alv e is not the re al ob je ct ive . A

v alve contro l sys t em could cert a inly ac t on such a d irec t ive, but

not knowing the int ent , it wou ld have no subsequent bas is f or reje ct ing co mpet ing dir ec tiv es, d et erm ining whether the a ctu al in ten t of the dire ct ive had b een me t, or t ak ing any correc tiv e ac tion if it had no t.

Re al transpar ency ther efore amounts to a sort of contrac t betwe en the issu er of object ive s and the control sys te m ach iev ing them , where obj ec t ives ar e expr essed in a way th at m akes su cc ess or fai lure mutua lly obvious to both par ti es. Unfortunat ely , th is is freq uently over looked. D el iver ed behav ior can b e different from wh at is expe ct ed, and the d ifferen ce won’t n ece ssar ily be apparen t. Closed loop beh aviors that are quirky, prone to surpris es , hard to mode l, unre li abl e, or otherw ise opaque about th eir c apab il it ie s are of questionab le benef it . In fact , this is a primary conc ern behind reservat ions over incre asing au tonomy in spac e sys te ms (inc luding fau lt prot ec tion), wh ich e ssen ti al ly amounts to perm i tt ing grea ter c los ed loop control on the v ehic l e. Oper ators are more comfort abl e dea li ng with the arc ane de ta ils and pecu li ari t ies of a transpar ent op en loop syst em , th an w ith the broken promises of an opaque c losed loop sys te m.

Reso lving this di le mm a does no t requ ire abandoning dir ec t ion by obje ct ives . I t s imply requir es mor e carefu l at ten tion to mak ing obje c tive s s ay wh at they m ean, and mean wha t they say. Indeed , subs tan ti al progress has b een mad e in th is reg ard, h aving be en a princ ipa l mot ivat ion beh ind th e gradua l modulari za t ion and abstrac tion of capab il it ie s with in GN&C syst ems over the year s, enab led by greater computa t ional power . For examp le , many pres ent day a tt i tude control sys te ms respond dire ct ly to profil ed at ti tude obje ct ives tha t bound motion with min im al tr ansi ents. They require no compens at ion for biases or other errors — unbias ed knowledge of at ti tude hav ing been derived by applying knowledge of the sensor mis al ignmen ts and bia ses . Simi lar ly, inf luenc e on att i tude ca n be exercised by set ting se condary obje ct ives on torque, wh ich are m et in thrust er control syst ems by co mmand ing thrust ers on and off appropriat ely , adjus ting for sp ec ifi c impulse and trans ien ts, mapp ing aroun d fai led thrust ers, and so on.


Even wher e int ent is c l ear though, sys te ms st il l g enera l ly fa ll short in making overt conne ct ions b etw ee n set ting ob je ct ives and che cking for their suc ces s. For exa mpl e, po inting-profi le ob je ct ives rare ly inc lude an overt cr it erion for how we ll this must be done, whil e fau lt monitors looking for exc ess ive control errors rarely hav e any id ea of th eir re l evanc e to curr ent obj ec tiv e s. Simi lar probl ems o ccur for task s where the only indica t ion that th ere migh t be an objec tiv e is the fau lt m onitor (e.g., “ too much” thrus ter firing). In st il l

other c as es, d ire ct ion is giv en to syst ems wi th no exp li ci t obj ec tive a t al l reg arding th e s ta te s

plac ing obj ec tiv es ins te ad on th e s ta t e of the

control sys te m i tse lf ( e.g. , “cruise mode”) .

under con trol,

In all su ch c ases , respons ibi li ty for me et ing obje ct ives h a s been div ided, and connec tions be tw een objec tiv es and the ir su cc ess cri ter ia , wher e they ex is t, are g e neral ly impl ic i t — hidden assump tions tha t are vulnerabl e to abuse . Inten t is opaque , so predic tab le coordin a tion is sues ar ise . Cons ider wha t migh t happen,


instan ce , if tr ansi ents a t obj ec tive tr ansi tions b ec am e problem at ic . If gadg ets , such as p ersist ence fi lt ers,

are added in the moni toring sys tem to ride out su ch events, t he syste m beco mes vuln erab le to re al probl ems that oc cur during transi tions and is overly des ensi ti zed to er rors general ly. On the other hand, if moni tors

are g iven ac ces s to con trol sys t em int erna ls (su ch a s de adbands, modes, or ga ins) and other inform at ion they m ay ne ed to be more d is crim ina ting, th e resu lt is a prol ifera tion of ad hoc int erfac es , an A chi ll es h ee l for any design. Thus, separ at ing monitors from contro l cann ot be defended on the grounds that it is simpl er.


assu mptions , opaque int ent , poten ti al inconsist enc ie s, faul t coverag e gaps, coordin at ion problems ,

tangl ed int erfac es, and so on ar e not s impl er. They are m ere l y symptom s of the f aul t prot ec tion problem .

There ar e two par ts th en to mak ing obje ct ives truly tran spa rent: 1) be exp li ci t about the ex ist enc e and full me aning (suc ces s versus fai lure) of every obje ct ive , and 2) give full responsib il ity for manag ing

objec tiv es, in cluding the ir f ai lure,

to th e con trol syst em respo nsibl e for a chi eving the m in th e f irst pl ac e.

Transparency o f Mode ls . Mod els p lay an e ssen ti al ro le in acquir ing st at e knowledge by providing expec ta tions aga ins t whi ch eviden ce of st at e can b e co mpa red. Assu ming a mod el is right , any depar ture from expec ta tions is an indic at ion th at a knowledge correc t ion of some sort is in order . Adjus tmen ts are

typic al ly sm al l to ac commod at e me asure ment and disturban ce nois e, but may ne ed to be large if the only way to a lign observa tions w ith a mode l is to hypothes iz e a discre t e change of s ta te , such as a fau lt . When models are vi ewed in th is w ay, i t be come s c le ar that th ere is no transpar ent way to sep ara te mode ls of

normal and abnorm al behav ior, and hen ce to

d ivide

norma l s ta te det erm ina tion from fau lt diagnos is.

The division of responsibi l it ies d escr ibed above frequen tly a rises from the c are les s confl at ion of the two dist inc t sorts of error one se es in a con trol sys te m. Large control errors indi ca te a proble m in me et ing objec tiv es, wh ere as l arge exp ec ta tion errors sugges t a proble m in s ta te knowl edge. The fir st ca lls for a f aul t response, wh il e the la tt er c al ls for a corre ct ion in knowl edge, such as the d iagnos is of a f aul t. The form er is not a f aul t d iagnosis, and th e la tt er is not an ob je ct ive f ai lure . Howev er, wh en both types of exc ess ive error are tr ea ted the s am e, they s tar t looking diff erent from everyt hing els e and w il l th en t end to be sp li t off from the re st of the imp lem ent at ion. Thus, the div is ion of res ponsibil ity for obj ec tiv es in contro l funct ions describ ed above has an analog in th e divis ion of responsibili ty for knowledge in estima tion funct ions. In both ca ses , models of beh avior h ave b een broken and s ca tt er ed, r esul ting in a loss of mode l transpar ency.

Sim il ar proble ms of tr ansparen cy ar ise in the conne c tion s among con troll ed sys te ms. In order for control syst em s to do their jobs wel l, it he lps to clos e loops within subordinat e funct ions as we ll , making

them transp arent in a l l the r espe cts des cribed

h ere. Howeve r, sinc e such d ependen t behav iors affe ct what

control sys t ems can do , mode ls of thes e dep endenc ie s

shoul d be refl ec ted in the b ehaviors contro l sys tem s

pledge in turn to others . Unle ss this is a cco mplished in an open and discip lin ed manner th at a l lows int egri ty

to be preserv ed and v erif ied, subt le errors can cre ep into imp lem ent at ions tha t are h ard to spot . Consid er, for ex ampl e, the pressuri za t ion or vent ing of a propulsion sys te m. Do ing th is properly

depends on ad equa te

valve a ct ions tak en to

se tt ling of pressure tr ansi ents, bu t the re la tionsh ip be twe en this obj ec tive and the a ccomp lish i t is frequen tly imp li ed on ly through co mm and sequ ence t im ing. Even in

syste ms w ith condit iona l t im ing, this d ependency is oft en s ti ll imp li ci t, the func t ional connec t ion having

been m ade els ewher e (e .g., in an uplink sequenc ing sys te m), but b eyond the awar eness of the contro l syste m respons ibl e for its a ccomp lishmen t. No thing in the normal w ay the propulsion a ct ivi ty is done carri es any inform at ion about th e effe cts of a lt ered t im in g or actions in the s equenc e Therefor e, any interrupt ion th at puts f aul t prote ct ion in contro l, perh aps for tota l ly unrel at ed re asons, suddenly exposes the


miss ing conne ct ions. Wha t must a f aul t re sponse do then t o safe ly s ecure propulsion capab il i ty, onc e the interrupt ion is handl ed? Resum ing the s equence is problem at ic, s inc e the a ssumed timing no longer appli es;

and rest art ing may not work, bec ause ini t ia l condi tions are different . Mak ing the ob je ct ive of th e a ct ivi ty

transpar ent would at l eas t provide a b asis for re cogni zing an uns afe condi tion,

but g ain ing in sight and

concoc ting a r eason abl e response h as to be done w ith no he l p from normal proc edures. The r esul t ye t ag ain is an ad hoc growth of s ca tt ered models, func t ions, and in terf ace s, which m ay not a lways ge t things r ight.

Adding to such probl ems is the typic a l abs enc e of good archi te ctura l m ech anis ms to negot ia te potent ia lly comp et ing objec tiv es from mult ipl e sources . Negle ct ing to art icu la te and negoti a te poten ti al confli cts , su ch as w ant ing to m inimi ze thrust er fir ing whil e a chi eving good antenn a point ing, both involving obje ct ives on a t ti tude, w il l resu lt in dis appoint i ng someone — usua l ly the one w ithout the expli ci t ly s ta ted ob je ct ive . Look ing around m any impl em ent at ions, i t is common to se e liber al m anipul at ion of modes, enabl es and dis able s, priori ty or threshold tunings , locks or seri al iz at ion of responses, and other mech anis ms for ac compl ish ing such coordina tion. Howev e r, th ese ar e a l l indire ct , low-lev el , dev ic es for manag ing softw are programs , not expl ic it , high- leve l archit ec tura l fe atur es for coordin at ing syst em objec tiv es.

Another common m is take aris ing from negle c t of this

princ iple is th e disper sa l of essen ti al b ehaviora l

informat ion among co mplex para me ter se ts . No doubt bur ie d there , for both des igners and opera tors to sor t out, are importan t rel a tionships among objec t ives and behav iors. However , not only can the ir impl ic at ions to ac tua l behav ior be opaque, but th ere is a lso typi ca lly n o structure provided to ensure cons is tency of

objec tiv es w ith r epresen ted b ehaviors , cons ist ency of th e pa rame ters among thems e lves, or consis t ency



impl ied b ehaviors w ith ac tua l behav ior.

V arious m it iga ti ng processes can b e toss ed over th is proble m,

but the haphaz ard tre atment tha t cre at ed the problem in the first pl ac e le aves qui te a mess under the carpe t. The b ehaviors tha t em erge can be qu it e unpl eas ant .

Fina lly, there is the proble m of shifting sys tem capab il i ti es. Such issues be com e part icu larly s evere for

fault to ler anc e. Whi le mos t vari at ion tol eranc e can be ac com plish ed with in a set range of objec t ives, and is thereby hidden from the res t of the syste m, faul t to leran ce (as wi th syst em m anage ment in genera l) frequently requ ires the res t of the sys tem to adjus t to ch a nges in the avai lab le obj ec t ives. This may be temporary , unti l oper at ion wi thin norma l opera t ional boun ds is res tored, or it may be pro longed, if the

syste m

ha s irrevers ibly los t cap abi li ty. Ei ther way though, t he sys tem af ter

a faul t is not the one for wh ich


oper at ion was prep ared,

are of normal c ase s. Th is m eans

and ther e ar e m any more c ombina tions of such abnormal ca ses than there th at for every poten ti al al ter at ion of avail abl e obje ct ive s in one pla ce , ther e

must b e

corresponding a cco mmoda tions a mong a ll is suers o f thes e ob je ct ives . Thes e ac commoda tions m ay

further chang es of b ehavior, and so on, r ippl ing thro ugh the

sys t em un ti l the

effe ct is con ta ined.

resul t in

There are differ ent ways to manag e su ch propaga ting effec ts. A com mon one is to try to conta in the m with marg in. If l ess is expe ct ed than a sys te m is ac tua lly c a pable of de liv ering under b et ter c ircums t ance s, then ther e is no ne ed to announce reduc ed c apabi li ty. In th is approach, sys te ms suffer reduc ed cap abi li ty a ll the time in order to be abl e to to ler at e it some of th e t im e. T his is co mmon, espe ci a lly for cri ti ca l a c tiv it ies . Another approach is to pass a long informat ion about c ap abil i ty changes , not to the el em ents dir ec tly

impa ct ed by i t, but to

a cen tra l au thority tha t reduc es or w i thdraws obj ec tiv es a cross the ent ire sys te m in

order to incre ase

to ler ance to the reduc ed c apab il it ie s. This is

part of th e ubiqu itous “s afing” approach .

Bui lding ba ck to som ething us eful — a t ask typ ic al ly tur ned over to operators — m ay require d esign changes . A more forgiving and flexible approa ch is to es tab lish a ne twork of information exchang e with in

the syst em in advanc e

th at le ts affe ct ed el em ents ad jus t as nec ess ary, w ith in a flex ible rang e they have be en designed to ac commod at e. Th is be com es e ssen ti al in sys te ms opera ting in more dynam ic cont exts , but is

also broad ly usefu l in r educing oper at ions co mplex ity . In re al ity, a ll of thes e approaches appe ar in modern

syste ms to so me degre e, ea ch hav ing m eri t d epending on the nature of th e risks involved.

Properly chos en and imple men ted, th ese m ethods work we ll . The syst em s in re al danger are thos e wher e propagation eff ec ts hav e not be en adequa te ly cons idered or chara ct eri zed, or wher e an a sse ssm ent of th is coverage is obfusc at ed by a de sign that fa ils to m ake them transparent . Unfortuna te ly, as w ith nego ti at ion mech anis ms, mos t deployed sys te ms show li tt le overt ar chi t ec tural support for this princ iple . Inste ad, one typic al ly finds m inima l me chan ist i c support (e.g. , com man d reje ct ion, ki ll ing or rolling ba ck sequen ces ,



the lik e), a ccomp ani ed by a plethora of ad hoc m it i gation . Wi thout the supporting s tructur e of a

princip led arch it ec ture looking after th is b asi c no tion of clo s ed

loop control , the job ge ts a lot harder .

For transpar ency of models th en, one mus t 1) expos e and c onsolida te mode ls, inc luding the w ay they rela te to the a chi evem ent of ob je ct ives , and 2) be expli ci t about how th e re st of the syst em depends on, becom es awar e of, and ac commoda te s new inform at ion about wha t ob je ct ives ar e pl aus ible .

Transparency o f Kno wledge . The dep endenc e of control d ec isions on s ta te knowl edge sugges ts a logi ca l divis ion of c losed loop contro l func tions into two d is tinc t par ts, one be ing to m ain ta in requir ed s ta te

knowledge, and the o ther

to mak e dec is ions about the appropriat e contro l a ct ion.

Th e a cquis it ion of st at e

knowledge is common ly referr ed to as sta te d et erm inat ion or est imat ion, bu t oth er a ct ivi t ies , such as

ca libra tion and fault d iagnosis, knowledge appears overt ly at th e

also fa l l under this in terf ac e be twe en

d efini t ion. A benef it of th is div ision is th at sys te m the tw o parts , m aking it e as ier to sh are acros s oth er

control func t ions, eas ier to avoid d ivergenc e of op inion a c ross the sys tem , and e asi er to underst and th e reason beh ind con trol d ec is ions. Tha t is, st at e know ledge and i ts us age are m ade transp arent .

It is ess ent ia l to note that the st at e referr ed to here is not the sta te of the con trol syst em . Rather , it is the control sys tem’s knowl edge of the st at e of the syst em it con trols. In fact , i t can be argued persu asiv ely th at asid e from this knowledge and its knowledge of its own obje ct ive, a con trol syst em should have no other

sta te of i ts

own . Pra ct i ca l issues work ag ains t that id ea l, but i t is some thing

to ke ep in m ind as a contro l

syste m comes tog ether . Every bi t of s t at e in a contro l sys t em b eyond the b as ic n eed for knowl edge is a source of inc ident a l co mpl exi ty th at mus t be m anaged c arefu lly, and el im ina ted , if poss ible .

The provision of st at e knowledg e is not ne arly as e le men tary as typi ca l impl emen ta t ions would le ad one to be li eve . For ins tanc e, sin ce s ta te spans t im e, knowled ge of it should also span t im e in a m anner consis ten t w ith behav ior mode ls . How ever, in most impl em e ntat ions this is ac compl ished ins te ad mere ly by interpo la tion or extrapo la tion, and then often impl ic i tly an d care less ly. Ex trapol a tion, for ins tanc e, oft en amounts to s imp ly using old dat a, as suming i t wi ll b e used or replac ed soon enough. Indeed, for most sta t e informat ion, and the evid ence used to d eriv e it , li tt le if any th ought is typi ca lly given to this.

In this l ight it is c le ar th at even ts ( me asurem ents, d et ec ti ons…) ar e not st at e knowl edge. They are mere ly evid enc e, whi ch should contr ibute to st at e knowle dge only through the appli ca t ion of behavior

models . Ev en a s imp le mea surem ent , invar iab ly used only after som e de lay (s ma ll as tha t m ay be), requ ires a model- med ia ted ex trapol at ion from even t to knowledg e in order to presume i ts us abi li ty over time . Tha t


is a simpl e mod el of behav ior, r egarding both t he ch angeab il ity of th e

s ta tes be ing m easur ed

and la tency in the con trol syst em’s us e of the m easur emen t . Such models are usua l ly imp li ci t in s tandard

pract ic e, but mode ls they are noneth el ess , s inc e they d esc ribe how know ledge of s ta te is to be c arri ed

forward in time . Th erefore , they are in pot ent ia l d isagr ee ment wi th

o ther models , and cou ld be f la t ou t

wrong, as r ecurren t probl ems wi th s ta le m easur em ent d at a in dica te . Avo iding this requir es transp arency .

Bes ide s corruption from models tha t disagre e, s t at e knowl edge m ay also appe ar in mu lt ipl e vers ions with in a syst em , and in fac t often does (e .g., from differ e nt mea surem ent sourc es). Th is is probl ema t ic becaus e of the po ten ti al for control de cis ions working at cr oss-purposes due to differ ent v iew s of sys te m

sta te . There is mor e to this problem

th an just confl ic t ing st a te knowledge . S epara t e opin ions c an n ever

b e a s

good as a shared opinion ga ined by consider ing al l inform at i on sources toge ther. Thus , ther e being no point in making m at ters worse than nec ess ary, i t is useful in any syste m to str ive for a single sourc e of truth for each i tem of st at e knowl edge. Tha t in clude s exer cising the d isc ipl ine not to copy and store st at e informat ion around the syst em , but ra ther, to a lways re turn t o its sourc e on ea ch use , if f eas ibl e.

Another key aspec t of sta te knowledge is tha t it is al ways im perfec t or incomple t e, ther e being no direct , continuous way to ac ce ss the ac tual s ta te its elf . Such knowle dge can be det ermin ed only indire ct ly through limi ted mode ls and the co ll ec tion and int erpre ta tion of ambig uous or noisy evidenc e from the syst em. Thus , whil e a syst em is truly in a unique st at e a t any inst ant, a control syst em’s knowl edge of this st at e mus t nece ssar ily a l low for a range of possibi li ti es, wh ich ar e mor e or less credib le d epending on the ir abi li ty to expla in ava i labl e ev idenc e. Mor eover, a mong thes e possi bil it ies must g enera lly b e an a llow anc e for behavior outside of unders tood possib il it ie s. Tha t is, a contr ol sys te m should b e abl e to re cogniz e when th e syste m i t controls is in a s ta te for which confid ent expe ct at ions of behavior are appar ent ly la cking. By represen ting such mod el ing issues in st at e know ledge , appropriat ely c aut ious con trol d ec isions ar e poss ibl e.


When confronted wi th mul tip le poss ibi li ti es, control de cisio ns become more comp li ca ted, but to edi t th e possibi li ti es before presen ting a se le c ted one for control ac tion amounts to making prec ise ly this sort of decision anyway. This is one of m any ins idious ways tha t abound in sys tem s to confuse the boundary betwe en s ta te de term ina tion and contro l, and henc e the loc at ion of st at e knowl edge (mor e on th is in th e next subse ct ion). In doing so, tr ansparen cy is compromised and the sys te m is a s tep further on th e sl ippery

slope to lost arch it ec tura l integr ity . It is far b et ter to ack nowledge knowl edge un cert a inty,

repres ent i t

honestly , and

mak e control dec is ions ac cordingly w ith proper consid erat ion of the risks involv ed.

Part of repr esen ting knowl edge unc erta inty hone st ly is also acknowl edging th at this uncer ta int y

degrades furth er wi th time unl ess r efresh ed. Th is too shoul d be overt ly dea l t wi th.

For exa mpl e, beyond

some po int in time , knowl edge may

n eed to b e de cl ared ent irely inval id un less updat ed w ith new evid ence .

To keep the d at a,

th e l ine be twe en knowledg e and contro l c lean , th is must be the re sponsibi li ty of the producer of not its consu mer. Fol lowing this prin cip le e limina t e s th e proble m of un ini ti al iz ed or s ta le da ta .

Yet ano ther conc ern is the cho ic e of sta te r epresen ta t ions, transp arent choic es be ing ea si est to us e

correc tly. Wi th th is in mind, cer ta in obvious viol at ions co me to mind, su ch as the absenc e of expl ic it ly

defined un its or fram es of r efer ence for

physic al va lues . However, g iven the topi c h ere, i t is us eful to

ment ion some of the r ecurring offenders in fau lt prot ec tion . We c an begin wi th pers ist enc e counter s, th e

most b as ic of whi ch t al ly conse cut ive ins tanc es of an error. I f such a coun ter is nec ess ary, then pr esum ably lone errors are ins ignifi can t — a st at is ti ca l poss ibi li ty unde r normal condit ions perh aps. If tha t’s the ide a behind pers ist enc e, then it is appar ent ly a me asure of lik el i hood that the syst em is in an abnorma l s ta te . That’s a ctu al ly a good s tart , sinc e un cert a inty in s ta te knowledge has ent ered the pi cture , as i t ought.

Moreover , before the persist enc e thr eshold is re ach ed, one o ften finds syst ems rightfu lly d isc arding

suspe c t

data , ind ic at ing tha t th e di agnosis of a po tent ia l fau lt h as alre ady been made , if no t overt ly. Ye t, upon asking how a persist enc e threshold is s el ec ted , it quick ly becom es cl ear in m any system s tha t any of a varie ty of crit eri a might apply: a l lowanc e for an occas ional opera tion al fluke ; the leve l of syste m error

toler anc e; a de lay to control pr ec edenc e or t im ing of fault r esponses ; an empir ic al

v alu e tha t avoids fa lse

al arms — oft en mu lt ipl e r easons — d epending on the mo tiv e for i ts las t adjus tm ent . Thus , poor overload ed persis ten ce a ctu al ly repres ents almost anything but lik el ihood, as com monly used, and its thre shold has a s much to do with con trol de cis ions as w ith m aking a fau lt d ia gnosis. As a re sul t, pers is tenc e thr esholds jo in the r anks of a gr ea t m any para me ters in typi ca l fau lt prote ct ion syst em s a s tun ing knobs, not d ire ct ly rela tab le to st at es , mod els , or ob je ct ives , and consequ ent ly b eyond help from any of the se fundamen ta ls.

Error moni tors in gener al t end to have an alogous probl em s when errors ar e not in terpre t ed through models or corre la ted and reconc i led w ith o ther eviden ce . When such a di agnost ic lay er is abs ent , and responses are trigger ed dire ct ly by moni tor even ts, it beco mes hard to put one’s f inger on wha t exa c tly a syste m be l ieve s it is responding to. This is an invit at ion for a ll kinds of int eres t ing e mergen t beh avior.

Sim il ar issu es occur when poor models of behavior ar e used. As described above, for examp le, timers in genera l have li tt l e if any thing to s ay about the na ture or condition al ity of behav ior. One p art icul ar troublem aker is th e com mon comm and-loss timer. Do su ch tim ers repr esen t obje ct ives , s ta te knowl edge, mea surem ents , mode ls , or wha t? From the gyra tions ope rators go through to man age them , and the inevi tab le mist akes mad e due to the ir typic al ly opaque , raw chara ct er, it is c le ar tha t no one qu it e knows how such app li anc es f it in to th e control fram ework, or how t hey re la te to o ther el em ents of the sys t em.

There ar e many other proble ma ti c ways to repre sent s ta te k nowledge in a sys te m. Oth ers are des crib ed in the nex t subsec t ion on control. A fina l one worth men tio ning here though, is th e disab le f lag for a faul t monitor. If one dis able s a moni tor, is this b ec ause th e faul t is no longer cred ibl e, or tha t the mon itor is beli eved to be dangerously incorre ct , or tha t i ts fa ls e tr ips have beco me a nuisan ce , or tha t th e faul t i t dete cts is no longer cons idered a thre at , or tha t tr igger ing the faul t under pr esen t cir cums tanc es wou ld confli ct wi th another a ct ivi ty, or what? Th e first few possib il it ie s cl ear ly refer to model ing issues ; so it is prudent to have a w ay (arguably not th is one) to t emporar il y preempt an incorrec t mode l. The l at ter one s though amount to no thing l ess than d irec t ion for the c ontrol syst em to promulg at e in correc t s ta te knowledge. Tha t’s no t transpar ency, i t’s cover-up — ye t not unheard of in some fau lt prote ct ion sys te ms.

For transpar ency of knowl edge then, i t is es sen ti al to 1) m ak e knowledge expl i ci t, 2) r epresen t it c lear ly with hone st repres ent at ion of its time line ss and unc erta inty , a nd 3) striv e for a s ingl e sourc e of truth.


Transparency o f Contro l. In princ ipl e, g iven knowledg e of controll ed syst em b ehavior , knowledge of objec tiv es (both impos ed and imposab le), and knowledge of the st at es of the contro ll ed syste ms and thos e influen cing it , no o ther informa tion is required to choos e con trol ac tions from among thos e tha t are plaus ible . There m ay be addi tion al cri t eria for choos ing opti ons with th e rang e of poss ibi li ti es adm it ted by this princ ipl e (e.g ., optimi za t ion), but in most c ase s tha t is a refine men t somewh at bes ide th e point here . Other excep tions involve ca ses where a sys t em f inds i tse lf in a quandary among m any poor choic es, making i t nec ess ary to appea l to some m et a-l eve l poli cy that doesn’t quite fi t into th e presen t sche ma . This

gets a br ief men tion b elow , but otherw is e the ass ert ion above

appl ies for mos t purposes .

Bas ing control de cisions on informat ion other than behav ior, obje ct ives , and sta te s can a ctua l ly be

detrimen ta l to a “rea l” ob je ct ive

sys tem . Pr esum ably the only mo tiv e for us i ng other da ta would b e e i ther b esid es the imposed one is known, or th at sta te and behav ior knowledge

tha t so me o ther h as not h ad the

benefi t of other ava i labl e ev idenc e or expert is e. Howev e r, thes e condi tions vio la te the prin cip les of transpar ent obj ec tiv es, mod els , and knowledge . Us ing al ter nativ e obje ct ive s subverts the very notion of

control. S im il arly, using ex tra d at a effe ct ive ly puts

th e r eco ncil i at ion of mu lt ipl e informa tion

source s into

the control funct ion, cre at ing a s econd, d ifferen t v ersion of syst em knowledge . As div ersi ty

of authori ty

and opinion prolifera te s through such pract ic e, the in tegri ty of control dec is ions becom es in cre asing ly

suspec t. Transp arency of con trol consequen tly requir es ad herence to th e prin cip le

th at contro l de c isions

depend only on

the canoni ca l thre e it ems , for wh ich so me

co nsensus ha s be en est abl ished .

Unfortunate ly, vio la ting th is princ iple turns out to be an ea sy mist ake , even wi th the mos t innoc ent of inten tions . For exa mpl e, hav ing noted only th at a previous error rema ins, a contro l funct ion tha t tr ies an al terna tiv e a ct ion has vio la ted the prin cip le, effe ct ive ly hav ing made an indep endent d et ermin at ion of the sta te of the sys te m. The proper approach would be for est ima t ion processe s to monitor contro l ac t ions, making appropria te modif ic at ions to s ta te knowledg e to a c count for the obs erved r esul ts. Control a ct ions

would then b e chosen diff erent ly on

the n ext at te mpt , giv en the n ew

s ta te knowledg e. For

simi lar r easons

raw mea surem ents should be off-lim i ts to control funct ions, cert ain ar can e issue s of stabi li ty asid e. Con trol modes are also proble ma ti c, a s descr ibed above under transp arent obj ec tiv es, and likew is e for disabl e flags on fault response s, wh en m isus ed as con trol me chanis ms (like dis abl ing fau lt pro tec t ion during cr it ic al ac tiv it ies) , and not jus t s toppers on model ing or imp lem ent at ion gl it ches . There are m any oth er ex ampl es .

The fl ip s ide of us ing ex traneous d at a is to ignore the s ta te data you have. Al l kinds of common ly used

mech anis ms sh are this proble m, such as t ime-b ased

seque nces tha t make no appe al to st at e knowl edge,

relying ins t ead on faul t moni tors to c at ch dangerous misst ep s. Also proble ma ti c ar e responses w i th no basis

at al l, o ther th an poor behavior. For inst anc e, if errors ar e l ar ge but nothing e ls e se em s broken, res et t ing the offending algor ithm m akes th ings differ ent for sure, and w i th lu ck m ight ac tua lly a ccomp lish som eth ing.


th ere is no basis for such a respons e from the p oint of view of control prin cip les . (Rese t and

sim il ar ac t ions at a me ta- lev el of contro l are diff eren t, be cau se they involve th e hea lth of th e control sys te m

its elf .) A mor e l ike ly c andida te for puz zl ing problem s of th is

sort is a mod el ing error, a t wh ich

poin t th e

syste m should e ith er shif t to a more def ensive pos ture or try to correc t th e mode l. Ad mi tt edly, the la tt er is a

ta ll order, out of scope for many systems , but the firs t lin e of defense at that poin t, having va lida ted mode ls , has alr eady be en breach ed. To avo id get ting in to this si tu at ion one must honor the princ ipl e of control transpar ency. Non-spec ifi c r esponses sign al an opaqu e de sig n tha t de mands scru tiny.


useful byproduct of con trol transp arency is tha t much of the comp lex ity one oft en se es in control ,

when viewed properly, turns out to be a st at e or behavior knowledge proble m. Th at’s no consola t ion to the providers of this knowl edge, though it does m ean tha t knowledge is l ike ly to be come ea si er to underst and as tr ansparen cy is improved . Oth er benef its appe ar on the c ontrol sid e, too, such as a more s tra ightforward decomposi t ion of a ct ivi t ies , and consequ ent simpl ifi ca tions i n planning.

Control tran sparen cy, is consequent ly fa ir ly e asy to

ach i eve, simply by 1) avo iding


bas is for

decision b esid es knowl edge of obje ct ive , b ehavior, and st at e.


Faul t prote ct ion is cl ear ly a control func tion, over se eing th e st at e of a sys te m, and trying to ach ieve objec tiv es of saf ety and su cc ess. How ever, as control sys tem s, m any faul t prote c tion impl emen ta t ions impose a dec ided ly non-conforming model of control over t he syste ms in their charge . Having submerg ed


basi c princip le s of control, i t is of l it tl e surprise tha t faul t protec t ion rema ins problem at ic in many syst ems . In exploring the ba si c notions of sys te m control , i t is a ll ege d here tha t a numb er of thes e faul t prote ct ion issue s resu lt from f ai lure to apply control pr inc ipl es trans parent ly, espe ci al ly in compar ison to norma l GN&C d es ign. In ke eping w ith tha t the me, th e fol lowing a ddition al gu ide line s ar e sugges t ed as ways to bring this disc ipl ine, broadly apprec i at ed wi thin th e GN& C c ommunity , into G N&C fau lt prote ct ion.

Do not separat e faul t prot ec tion f rom norma l operat ion of t he sam e func tion s. Be caus e fau lty beh avior is just a subse t of overal l beh avior, norma l and fa i led opera tio n are in tert wined in many w ays: me asurem ents of normal st at es are also affec ted by hea lth s ta te s; robustne ss of control functions beyond normal tol eranc es

aids fau lt to ler anc e; in crem ent al ta ct ic s of normal

a lgorithm s play a c entra l role in bootstrapp ing capab il ity

after a fau lt , et c. Thus , diagnosing the fa i lure of a device is just part of est im at ing the overa ll s ta te of the

devic e; r etry ing a co mmand is just p art of contro ll ing th e

devic e, g iven its

st at e and th e des ired obj ec t ive,

and so on. The ma in differ enc e se ems only to be tha t de cis i ons along paths of abnorma l behav ior are more sweep ing and, if we are fortuna te , invoked l ess often th an th ose a long pa ths of norm al behav ior.

One often hears exp lana t ions of fault prote ct ion compl exi ty as being rel at ed to its int era ct ions with s o many other sp ac ecr aft func tions . This way of looking at t he issu e is probl em at ic , s ince i t vi ews norma l operat ion and faul t prote ct ion as sep ara te func tions . In fact , faul t prote ct ion is oft en perc eived as above normal opera tion — a sort of supervisory leve l funct ion that watch es over and asserts control when normal functions misbehav e. In most c ase s, this is inappropria te , art ifi ci al ly and unnec ess ari ly cr ea ting broad functiona l re la tionsh ips over and above wh at are a lre ady ther e by virtue of th e physic al r el at ionships w ith in the sys te m under control . By exc luding fau lt prot ec tion fro m the control func tion for e ach sys te m e le men t, thes e contro l funct ions are unab le to ful ly assum e their cognizan ce ro le. This unne ce ssar ily div ides functiona li ty, adding in terfa ces and comp li ca ting or confounding int era ct ions al l around the sys tem .

Another way to apprec ia te the va lue of uniting m anage me nt of normal and abnormal beh aviors is to consider th e ever-dimin ishing dis tin ct ion betw een the m in m ore complex sys te ms. Th is is w el l il lus tra ted in

reac tiv e, in s i tu syst em s, su ch as rovers, wh ere nove l s itu at i ons are rout ine ly encoun tered and rev ision of

plans is frequent ly nec essary , as in it ia l a tt emp ts ar e thwar ted . In the

pursuit of viab le a l terna tiv es, dynam i c

consider at ions of risk, pr iority , resourc e use , and pot ent ia l for lost opt ions are shared under both norma l and abnorma l condit ions. S ince put ting two p aral l el man age ment sys tem s in pl ac e for such sys te ms would make l it t le sens e, one s ee s inst ead th e inevi tab le jo ining of faul t prot ec t ion wi th norm al opera tion. O ther syste ms are a lso gradua l ly t aking this rout e 2 , as the m eri ts of merger b ecom e apparen t.

Str iv e for fun ct ion pr ese rva tion , not ju st faul t prot ec tion . In keep ing wi th th e re com menda tion

to ke ep

fault prote ct ion and norm al contro l togeth er in pursuit of th e ir shar ed in ten t of me et ing

obje ct ive s, the a im

should not be mere ly to re ac t m ech anic al ly to som e a priori lis t of fai lure mode s wi th rote re sponses, but inst ead to acknowl edge any threa t to obj ec tiv es, s triving to p reserve funct iona li ty no m a tt er wha t the c ause .

This insight cre at es a d ifferen t m ind-set about faul t prot ec t ion. It he lps one se e, for ins tan ce , tha t the


of fault tre es , fai lure mode s and effe cts ana lyses , ha zards an alys es, r isk ass ess men ts, and the l ike is

not for the ass emb ly of a list of moni tors ne eding respons es. That approach unve i ls only a limi ted model of cert ain a spec ts of cert ain ch ains of influenc e in cer ta in cir c umst ance s. Rath er thes e must be thought of as just par t of an encompass ing effort to fully mode l a ll beh avio r, both normal and faulty , tha t wi ll he lp cl arify what is happen ing and what is a t s tak e. Th at is , i t is equa lly important to underst and a ll the things th at must

work right for the system to suc ce ed, recogni zing and resp onding to their loss , even if the m anner of this

loss is unfores een . Th is is wha t a true

control syst em does , w hen fully cogni zan t of the syst em i t controls .

This mode of thought should a lso disabuse engin eers of not i ons tha t faul t prot ec tion h as no rol e af ter it responds to the “f irst ” * fault, or th at fau lt prot ec tion V&V is finished when ev eryth ing on the monitor and


lis ts has b een che cked off, or th at op era tor errors or environm ent al ly induced errors count as

“firs t” fau lts tha t abso lve faul t prote ct ion of further ac tion, o r that la tent f aul ts don’t count a t a ll . When th e perce ived role of fault prote c tion is to preserv e functiona l it y, such inciden ta ls ar e irrel evan t. Any projec t choosing to depart from th is princ ipl e had bet ter spe l l it out plain ly in i ts faul t to leran ce pol icy , so everyone understands the risk . Th is is a com mon point of m is commun i ca tion.


This could be “second”, if there’s a two fault tolerance policy in effect, and so on.


Tes t s ys te ms, not faul t prot ec t ion; tes t b ehav ior, not ref le xes. If fau lt prote ct ion is an int egral par t of a syste m, it should b e t est ed th at w ay. Unfortuna te ly, this oft e n doesn’t happen th e way i t should — a na tura l consequenc e of viewing faul t prote ct ion as a dis tinc t super visory function. Syst ems tend to be int egra ted and tes ted from the bot tom up, so in th e convent iona l vi ew of things, fau lt prot ec t ion gets int egra ted and tes ted l ast , and sp ec ifi ca lly a ft er norma l capab il i ty has been int egrat ed and t es ted. Ev en when faul t protec tion funct ions ar e presen t e arl ier, it is co mmon to do much te st ing wi th faul t prote ct ion disab led , or if enabl ed, to ignore indic ators of i ts performan ce (e .g., de te ct ion margins) , bec ause t es ts ar e focused on normal func tions. The ne t resu lt is that no thing is r ea lly tes t ed right. The m agni tude of this blow to sys te m integr ity is brought into focus when one vi ews th e role of f ault prot ec t ion as funct ion preserva tion . In this role, obj ec tiv es are not just co mmands to be execu ted ; they a re condit ions to be moni tored for achiev em ent. Sim il arly, di agnosis mode ls don’t just sign al f aul ts ; th ey pre dict norma l beh avior too. Consequen tly, in this integr al rol e, faul t prote ct ion fulfil ls th e ess ent ia l role of guarding expectat ions for the syste m’s performance . On e of th e grea t est benef its of faul t prot ec tion during tes t ing, th erefore , is to h elp w a tch for aberra tions acros s the syst em, ev en if a t es t is focus ed on nor mal func tiona li ty in a par ti cul ar ar ea .

This con trol sys te m view of fau lt prot ec tion ha s the addi tion al adv ant age of el eva ting a t tent ion, beyond

rote r eflex ive response s, to the underlying des ign. Th is

b ehaviora l char ac ter ist ic s tha t pr ovide a bas is for confiden ce in the in tegr ity of is s im il ar to th e way normal GN &C contro l funct ions are an aly zed and verif ied.

The ide a, for inst ance , is not to t est tha t the right thrus ter fir es when a control error thresho ld is exc eeded , but rather to t es t tha t point ing objec tiv es are cons is tent ly m et in a re asonab le m anner. Applying th e sam e idea to faul t prote ct ion shifts the emphas is from fut il e a tt emp ts a t exhaus tiv e t est ing toward a more re levan t explorat ion of emergen t behavior and robustness. Th at in turn motiva tes proper mech anis ms for well- behaved transp arent contro l from the s tar t, clo sing the c ircl e back to sound princ ipl ed sys te m arch it ec ture.

Re vi e w all the data . Of course, the pres ence of fau lt prot ec t ion does not rel iev e te st ers of responsibi l ity to

review dat a, but her e ag ain an a lt ered po int of v iew about

the rol e of fau lt prote ct ion is he lpful. If f aul t

monitoring is focus ed sole ly on triggering fault respons es, a limi ted kind of d iagnost i c capab il ity resu lts. However, if mon itoring is focus ed on gener al , mode l-bas ed expe ct at ions, mak ing note of even s ma ll departures , whi le l eaving control func tions to dec ide wh eth er departur es m eri t a ct ion, then a much more

transpar ent syst em r esul ts, and the tes t ers’ t ask ge ts ea si er.

If tha t s ee ms a subst ant ia l incre as e in scope ,

keep in m ind that not al l such c apab il it ie s must re sid e on the fligh t syst em . Faul t prote ct ion should be consider ed jus t as int egral a part of ground functiona li ty as a ny other. The s am e m isperc ept ions and opaque impl em ent at ions d ivide ground syste ms too though, wi th the sam e consequent los s of insigh tfulnes s.

Cl eanl y estab lish a de l inea tion of main l ine con trol func tio ns from trans cend ent issue s. Desp it e moving most faul t prote ct ion out of a separate supervisory role, th ere rem ains a ne ed for supervisory level faul t protec tion in sys tem s. Thes e func tions should in g enera l be limi ted though to m anag ing the pl atforms on

which oth er contro l funct ions run, and to exer cising c ert ain me ta- leve l pol ic ie s, as a l luded to ear li er, that

are b eyond resolu tion by usu al m eans . Ex amp les of th e

f ormer in clude overs ee ing ins ta l la tion

of n ew

software , moni toring softwar e exe cut ion, swi tch ing from a fail ed comput er to a backup, and so on. An examp le of the l at ter is the regu lar ly occurring ca tch-22 during crit ic al a ct ivi ti es (e .g., orbi t inser tion), where a choic e mus t be made b etw een pre serving ev iden ce o f a mishap, or dar ing a fina l a t te mpt a t suc ce ss

that is a lmos t cer ta in to destroy the sys te m along wi th its ev idenc e. No obje ct ive se eks the la tt er outcom e, yet this is the pa th mos t proj ec ts t ake. As ide from the m et a-lev el issues involv ed w ith such func t ions, another good reason for their del ine at ion is to ensur e they ge t appropriat e vis ibi li ty, giv en the ir far-rea ching

impl ic a tions. ( Limi ta t ions of spac e her e

requir e neg le ct ing cert ain thorny imp li ca tions to fau lt prot ec t ion

from me ta- lev el

contro l, involv ing preserv at ion of credib le st at e da ta , tol eranc e to control ou tag es, e tc .)

Solv e problem s local l y, if possib le; expl ic i tl y manage broader impac ts, if not.

Loc al iz at ion is a ctua l ly a

natura l byproduct of keeping norma l and fau lty beh avi or manage ment toge ther. The adv ant age of loca l iz at ion is c le arly to con ta in the comp lex ity of the beh avi or associ at ed wi th faul ts. In the bes t c ase ( e.g. ,

when funct ional ity can

b e quick ly res tored, or an equiv al e nt backup is av ai lab le) this can b e m inima lly

disruptiv e to the res t of the syst em . The one it em that cannot be loca li zed, however , is th e resul ting ch ange of behavior of the el em ent . Any redu ct ion in the range of obje ct ives the e le men t is able to a ccep t, even if i t

is temporary , mus t be ac com modat ed by other el em ents de pendent on th is cap abi li ty. Thes e eff ec ts m ay ripple broadly across the sys tem and over t im e, d isrupt ing pl ans. Thus, the new cap abi li ty l andsc ape aft er a


fault re sul ts in new syst em l eve l behavior . This is one of the em ergent impl ic at ions of failure , which mus t be handl ed through transpar ent man agem ent of mode l d epen denci es, as des crib ed above .

Respond to the s i tuat ion as i t is , not as i t is hoped to be. There ar e a number of aspe c ts to th is, but the bottom l ine is tha t ac tion c al ls for continua l reas ses smen t an d revision. St at e and obje ct ives ar e both subjec t to change at any time, so in keep ing with the prin cipl e of responding only to thes e i te ms, any ac tion that

ignores th eir blocking out

change is wrong. A co mmon vio la tion of this is to embark on a course of a ct ion th at r equire s o ther r esponses or r eas sess men ts of s ta te unti l t he a ct ion is ov er. This pr esum es that the in it ia l

decision w il l r em ain the corre ct one, w ith a pred ic tab le outco me, no m at ter wha t dev elops in the m eant im e.

Dist ingu ish faul t diagnosis from faul t respons e ini t iat ion. Det ec tion and diagnosis are st at e det erm inat ion functions, not s eparab le from de term ining norm al func tion al ity, so th ey should hav e nothing to do wi th decid ing, or even trigger ing, control a c tions. D iagnosis of a fai lure is m ere ly the impart ia l ass ert ion of new knowledge regard ing some fa i lure to s at isfy des ign funct ional ity expec ta t ions. It is unconc erned wi th impl ic a tions to the sys tem . D ec iding whe ther or no t a f aul t diagnosis m eri ts a r esponse is a control ac t ion, and control a c tions ar e driven by obje ct ives . Ther efore, th e only role of faul t di agnosis is to supply the informat ion us ed in a control funct ion’s de term ina t ion tha t an obje ct ive m ay be thre at ened or hav e f ai led. In the absenc e of an object ive, or in the presen ce of a faul t t hat does not threa ten obj ec tiv es, no response is warrant ed. Lik ewise, if an obje ct ive is thr ea tened , r esponse is warran ted , fau lt di agnosis or not.

The revers e of this is a lso import ant. In ev ery dec ision is the risk of being wrong, so a dec ision to ac t on one of multip le possib il i ti es is not th e sa me as a det er mina tion th at that poss ibi li ty is in fa ct true. Knowledge rem ain s as uncer ta in right aft er the de cis ion a s it was before . La ter , having emb arked on a

parti cul ar pa th, su cc ess or f ai lure

in m aking h eadw ay aga in st a proble m c an be t aken as ev iden ce tha t the

conje cture b ehind the cho ic e was corr ec t or incorrec t, re spec tive ly. Th e way to ga in this knowl edge, however, is not to simply asser t tha t con clus ion in the f aul t r esponse ( as m any fau lt pro te ct ion syst em s do), thereby m ixing es tima tion w ith contro l and divid ing opinion, but rather for es tima tion funct ions to observe

al l ac t ions tak en, as they would any other ev idenc e, and re vise s ta te knowl edge a ccordingly . Subsequent

control ac t ions c an then b e ad just ed ac cording to th e upda ted knowledge , as in Use con tro l ac tions to narro w u nce rta int y, if poss ibl e. A fami l iar s itu at ion

prin cip le they should. is to encounter an error th at

cl early indic at es som eth ing is wrong, but la ck the ab il ity to discrimin at e among various poss ibi li ti es.

Sim il arly, ther e ar e oft en si tua t ions where the d ifferen ce in behavior be twe en norma l and f aul ty opera tion

is h ard to se e. Ther e are two g enera l s tra teg ie s that c an be

fol lowed in su ch c as es to re solve th e unc ert ain ty.

The f irst is to remov e sourc es of amb iguity , g iven tha t th e problem may be f ar mor e apparent under other, typi ca lly simpler condi tions . In GN& C syst ems , this tends to appear in the form of regres sion to

simp ler mod es of contro l involv ing fewer par ts , and in crem enta l ly

rebui lding to ful l c apab il ity . The f ewer

the p arts involv ed, the few er the poten ti al cu lpri ts, should the symp toms rem ain ; and should sympto ms disappe ar in a s impl er mode, options narrow through a proc ess of e limina t ion. By mov ing to the s imp les t modes firs t and rebui lding, ambigu ity and vulner abi li ty to symptoms are both m inimi zed; and of course, there is alw ays th e opt ion of stopping par t way and wai t ing for ground help, once m inim a l c apabi li ty for safing is e st abl ished. F aul t prote ct ion syste ms usua lly a cco mplish such things by rote, through safing and normal re acquisi tion s teps , but in keep ing wi th the prin cip le of transparency , i t is be tt er to conne ct the a cts of simp lify ing and r ebuild ing w ith the pres ence of unc ert ain t y and wi th thre ats to obje ct ives .

Somet im es the rev erse s tra tegy is preferab le , where it is a dvisabl e to bring in addit iona l ev idenc e t o corroborate one a lt ernat ive versus another , ra ther than m aki ng a ser ies of arbi trary choic es , expe ct ing th at

the s i tuat ion w il l r esolve i ts elf even tua lly. In gener al , be tt er knowledge le ads to

be t ter d ec isions , whi ch is

ult ima t ely the s afes t rout e to fo llow , when the opt ion is ava il abl e. The co mmon thr ead be tw een thes e two

approaches is s imp lif ic at ion.

Make diagnos is ea si er by re mo ving comp li ca tions and adding informat ion.

Make obj ec ti ves exp li ci t for ev ery th ing. Nothing a control syste m is respons ible for should be in response to an unst at ed obje ct ive . Moreover, wheth er or not such objec tiv es ar e be ing me t should be expl ic it ly apparent , as w ith any oth er obj ec tiv e. In this way , enfor cing oper at ional cons tra ints and fl ight rule s, avoiding ha zards , m anag ing resourc es, and so on all be com e a norma li zed p art of the contro l func tiona li ty of the syst em , a s opposed to addi tiona l overr iding lay ers of c ontrol or awkwardly in tegra ted sid e-funct ions.


Make sur e obj ec ti ve s express your ful l in ten t. F aul t pro te ct ion is less l ike ly to respond to proble ms correc tly, if the obje ct ives provided are not wha t are ac tua lly required. A good example of this is th e typic al spec ifi ca tion of an orbit inser tion obje ct ive in term s of a delt a-velo ci ty vec tor. If the inser t ion burn could be

guarante ed to oc cur a t the r ight t im e, this would b e okay , but cr it ic al obje ct ive s a lso

n eed to b e

s at isfi ed

when th ings go wrong. A be t ter ob je ct ive , therefore , wou l d be de l ta–en ergy, wh ich

is s ignifi can tly l ess

sensi tiv e to t im ing, and is near ly as easy to ac compl ish if v el ocity informa tion is alr eady av ai lab le . This not only permi ts gr ea ter fr eedom in choosing how to r espond, but the resu lt ing response c an have a s ma l ler

penal ty to the syst em . One can apply this notion in many pla ces wi th sim i lar benef it , and the only neces sary enabl er is to m ake obje ct ive s mor e transpar ent, a s ba si c prin c iple s would sugges t anyway.

Transpar ency of obj ec tiv es can b e a cco mpl ished

in GN &C


m any other

ways ,

a s

w el l,

w i th

consequent b enefi t to fau lt prot ec t ion. For examp le , in a wel l-des igned, fe ed-forward, profil ed mot ion

syste m, the mo tion expec ted is jus t that spec ifi ed in the objec tiv e. F eedba ck ne ed only ac commod at e

disturban ces , so any s ignif ic ant contro l error beco mes a cl ear sign

of troubl e, indi ca ting d epar ture of

behavior from both the obj ec tiv e and the under lying m odel. O ther syst em s tha t re ly inst ead on a control ler’s trans ien t chara ct eris t ics to profi le mo tion are su bstan ti al ly harder to ass ess , due to this loss of transpar ency. S im i lar observa t ions apply reg arding st a t e det erm ina tion sys te ms, wh ere subs tan ti al transpar ency g ains c an be mad e through more d irec t appl ic at ion of mode ls to d efine exp ec ta tions .

Redu ce s ens i ti vi ty to mode li ng error s. As d iscus sed, cl osed loop control exists in l arge m ea sure to

dimin ish th e eff ec ts of var iab il ity

in a sys t em. Ther efore, o ne of the s ide effe cts of poor transparen cy in a

control sys te m is th at the effe cts of vari abi li ty ar e poorly conta ined. This c an happen if a de sign is too dependent on th e de ta il ed corre ctn ess of th e mode ls i t us es. Given a choic e, robus tness alw ays favors les s sensi tiv e des igns. To de c ide whe ther th is cr it erion is me t, i t pays to ass ess s ensi t ivi ty in a ll its dim ensions and pay clos e at ten tion to those a spec ts tha t domina t e. A l ter nativ es migh t be found th at reduc e s ensi tiv ity , if you’re looking for th em ; and if no t, i t w il l b ecom e obvious where extr a-he al thy m argins are a good ide a.

Alt ernat ive ly, a more proa ct ive approach m ay be wor thwhi le. Th e profil ed mot ion syst em men tion ed

above, for exampl e, w il l begin to show signif ic ant trans ients, even with th e feed-forward ac t ions in pla ce, if the fe ed-forward model is in correc t. Therefore , a control sys tem can subs tan ti al ly improve its tr ansparen cy by att empt ing to es tima te the par ame ter s of this mode l (essent i al ly as addi t ional sys te m st at e) and

incorporat e the r esul ts into i ts f eed-forward ac tions . This not only improves i ts abi l ity to r ecogni ze faul ts , but a lso prom ises to ma int ain opera tion w i thin a mor e line ar rea lm , wher e the syst em is b et ter behav ed and

eas ier to ch ara ct eri ze. Moreov er, the addit iona l ins ight

gain ed by the syste m via the addi tion al e st im at ed

sta te s c an be used as a fore warning of fu ture probl ems or a n indic a tor of reduc ed m argins tha t alone m ay require ac t ion. It is alw ays be t ter to r espond to a probl em ear ly th an to wa i t id ly for a cris is.

Fol low the path of leas t regr et . Transpar ent mode ls requir e honesty about like l ihoods. Unlik e most of the random things in norma l GN&C, which ar e expe ct ed to occu r and usually have we ll chara ct eri zed s ta tis t ics , few faul ts are vi ewed in advan ce as l ike ly, and the ir st a tis ti cs ar e rare ly known, even crudely . Moreover, most fau lts th at h appen are e ither unan ti c ipat ed or tran spire i n surprising ways . Ther efore, any presump tion

about whi ch faul ts are more l ike ly than o thers mus t be c onsidered fl aw ed. In gener al , faul t prot ec tion

decisions ought to b e bas ed on cr it eri a o ther than fau lt l ike l ihood. Wha t should b e

of mor e con cern is the

lis t of things one thinks the sys te m m ight be abl e to re cover f rom, if giv en a fight ing chanc e to do so.

Somet im es , ev en when the odds s ee m dram a ti ca lly in favor of one possibi l ity, one ne eds to cre at e one’s own luck. For exampl e, wh at is extr eme ly l ike ly to be just a sensor false a lar m might in fac t be a genuin e

fault in the s ensed sys te m tha t cou ld turn out to be c at astro phic. G et ting a s econd chan ce in this cas e m ay require a suboptima l first cho ic e with a ll i ts nuis ance va lue . Given such possibi li t ies , one might suppose i t’s alw ays best to a ssume the worst and sort out the det a ils l at er , but this course c an also l ead to troubl e, when ac tions t aken by f aul t prot ec tion are not benign or revers ibl e (such as w e tt ing a ba ckup propulsion bran ch). Fals e a lar ms com monly presen t such di le mma s (not to m ention des ens it iz ing operators). Wi th enough foresight (and budget) a sys tem c an oft en b e instrum ent ed w el l enough to el im ina te such amb iguit ie s, but in the end ther e wi ll s t il l be a few tha t jus t need to be thought through with great car e. Th e art in faul t

protec tion is not alw ays to be

righ t, bu t ra ther to b e wrong a s pain les sly as possib le .


Take th e anal ys is of all cont ing enc ie s to t he ir logica l conc lus ion. Th e proble ms of view ing faul t

protec tion as d ist inc t from o ther control problems wi th vi ewing f aul t prot ec tion as

funct ions hav e alre ady be en c it ed. However , th ere are a lso

confin ed to fl ight sys te m he al th. As w i th mos t o ther

sys te ms ,

fault prot ec tion is not c losed , so it shou ld never be tr ea ted th is way, ev en though much of what i t does mus t

be autonomous. The true scop e of fault prot ec t ion ext ends t o the ent ire oper at ions sys te m and supporting ground elements, and possib ly even to other missions . More over, what m ay be a short-term vi c tory could in the long run threa ten the m iss ion or deprive op era tors of a ll hope of ever f inding out wh at wen t wrong. Therefore , i t is importan t to follow al l branches in the con tin gency spac e and to carry th em to th eir abso lut e logic al conc lus ion in order to se e wh eth er th e ful ly int egra te d design ( and pl ans for it) supports the m.

Succe ssful comp le tion of a faul t r esponse is cons equent ly no t a log ic al conc lus ion. Spa ce craf t h ave b een lost by negl ec ting a larg er issu e, ev en though the fl ight sys te m manag ed to en ter a s afe mode tha t cou ld be main ta ined ind efini t ely. To be truly fin ished, one must c a rry responses al l the w ay to fina l reso lut ion, includ ing a ll th e s ide effe c ts, tr ap s ta tes , and so on. One must cons ider mission op era tions’ ro le and preparedness in moni toring th e sys te m and r esponding t o problems , res tora tion of rout ine oper at ion (includ ing a ll ne cess ary rep lann ing), th e long-term vi abi li t y of the sys tem and mission in the recov ered

sta te ( if res tora tion tak es mu ch t im e),

threa ts to consum abl e resources , to upcoming cri ti ca l a c tivi t ies , and

to plane tary prote ct ion, and adequacy and ava i labi l ity of dat a to support both a time ly operat ions response

and a compl et e post fa c to asse ssm ent. Su ch concerns ar e al l rela ted to control, so jus t bec ause they involve

processe s somewh ere bes ides th e onboard system doesn’t

mean th ere are diff eren t rules for

them . The

larger syst em should s triv e for the sa me l evel of transpar enc y as any oth er con trol syst em . Nev er und ere st ima te the va lue of op erat ional f lex ibi li ty . Another par ti cular asp ec t of driving

cont ingen-

ci es to their logi ca l conc lusion a lso de serves men tion. This is the vi ta l importan ce of opera t ional f lex ibi li ty. Many syste ms have be en recovered from th e brink through astute a c tions and c lever workarounds. Therefore , any design fea ture th at forec los es options a lso removes one of the most va luabl e ass e ts a spac ecraf t c an poss ess. If the logi ca l con clus ion of a contingen cy path is the inab il ity to a c t bec ause al terna te spa ce craf t cap abi li ty cannot be appl ied , it is time to revis i t the des ign. Unfortunat ely, such explorat ion of conting enc ies oft en occurs la te in the d evelo pment cy cl e, r ather than e arly when the des ign is ma ll eab le. (This is par tly due to th e m is concep tion tha t r eprogramm able softw are is a ll tha t’s n eed ed.) Furthermore , la te des ign changes for other rea sons tend to be made wi thout fully revis i ting cont ingency capab il it ie s. Th e fac t tha t so many syst ems hav e been recovered noneth el ess ref le cts a deep-s ea ted apprec ia tion among engine ers for the impor tanc e of this prin cipl e.

There are a var ie ty of w ays to do this . Th es e includ e m aki ng it possib le to saf ely op era te pr im e and backup equipm ent a t th e sa me time ; cre at ing finer gra in fa ult con ta inmen t regions ; avoid ing rigid moda l designs and canned procedures in favor of flexibly exer cisabl e modular cap abi li ti es ; mak ing parame ter changes easy and safe , ev en under spotty te le communi ca t ion capab il ity ; be ing abl e to confident ly dir ec t the syste m at every l eve l in th e de sign; hav ing very tol erant saf in g syst ems ; and so on.

Al lo w for

a ll reasonab le pos sib il it ie s — e ve n the implau sibl e on es. Th e logic al conc lus ion of som e

condit ions may very wel l appear hopel ess , but it is good pract ic e to keep trying, whi le c ert ain bas ic sys te m objec tiv es su ch as saf ety and commun ic at ion rem ain uns at isfi ed. H aving exh aust ed a ll l ike ly suspe cts

without re solving a probl em, it is importan t to try the unl ike l y ones, as we ll , even if they have be en dee med “not credib le ”. As s tres sed above, faul t r esponses are not e xhausted jus t be caus e al l the iden tif ied f ai lure modes h ave b een address ed. The obje ct ive is to pres erve fun ct ional i ty, even wh en th e unexpe c ted h appens.

This conc ern appl ie s not jus t to wha t fau lts m ight o ccur, but also to what condi tions m ight ensue, from which re covery wil l be ne ce ssary. For dynam ic st at es, this could inc lude high or revers ed rat es, f la t spin ,

incomp le te or unl at ched deploy ments, exce ssiv ely energe ti c

deploymen ts, toppled p ass ive a t ti tudes (e .g. , in

gravity grad ien ts), incorr ec t or unexpe ct ed products of iner t ia, ex cess ive (even d es tabi l iz ing) flex ibi li ty or slosh, inad equat e d amping , therm al flu tt er, unexp ec ted ly l a rge vari at ions ( e.g. , in m is al ignmen t, ex terna l disturban ces , torque or force imb al anc e, or mas s c ent er offse t), and so on. O ther k inds of s ta tes and models are a lso vulner abl e, whe ther th ey be devi ce modes, s ensor obscurat ion or interf erenc e, or wha tever . Even common-mode problem s should be consid ered.


Pursuing this id ea furth er, one mus t a lso conclude tha t w hat is usua l ly cons idered ess ent ia l c ert ain knowledge of the syst em m ay ne ed to be abandoned. For ex ampl e, s et t ing a sun sensor de te ct ion thre shold is a s tra ightforward a c tivi ty, performed auto ma ti ca lly in s ome interp lan et ary syst em s a s a funct ion of dist ance from the Sun, deriv ed from tra je ctory da ta ; bu t sup pose th e Sun c annot b e de te ct ed, d espi te every

appea l to

redundancy , s earch , and so on. A t so me po int , one migh t very w e ll suspec t tha t informa tion

regarding the dis tan ce to the Sun is wrong, even though there would be lit tl e reason to doubt it. As a l as t resort th en, it wou ld be adv isab le to dis card this s ta t e knowledge anyway, and try se t ting th e thr eshold by

tri al and error. Th is is anoth er app li ca tion of the ad monit ion t o simp lify as far as pos sib le.

Note tha t the sorts of things list ed above would l argely fa l l under the c at egory of des ign or oper ator errors, and many would argue, giv en the daunt ing scope t hat is sugge st ed, tha t faul t prote ct ion needn’t protec t aga inst su ch things, s inc e ther e are proce sse s in pla ce to avoid them . Unfortunat ely , history c aut ions agains t such a point of vi ew. If condi tions such as thes e are physica lly plaus ibl e, som e gre ml in is l ike ly to solve th e puzz le for enter ing the m, ev en when syste m des igners can’ t. The r ather sober ing rea li ty, in fa ct , is that more spac e syst em s are los t to such errors, than to random faults per s e. 3 Therefore, to presume th at the

design prec ludes

th em is not part icul arly in sightfu l. It t akes extraordin ary effort to be absolu te ly sure (and

right) abou t such th ings, wh ich ra ises i ts own issues of s cope and sugges ts mer i t for a s econd thought.

So why at te mpt to cover such poss ibi li ti es ? It’s h ard enou gh gett ing a G N& C des ign to work under normal cond it ions; so throwing in a l l the se abnormal pos si bil it ies in a ll the ir unchar ac ter iz ed vari abi li ty may se em lik e an impra ct ic al and unaffordable str et ch. A li tt le effort wi ll some t ime s go a long way though,

and may mak e the d ifferen ce b etw een any sor t of surviv al and no surviva l a t al l. For ex amp le, s tabi l iz ing rotat ion ra te s is s imp ly a m at ter of damp ing rot at ional en ergy, which c an gener al ly be done qui te s imply with the r ight sys te m configura tion . Most syst ems alr eady have th is c apab il ity in order to h andle ini t ia l launch veh ic le t ip-off, so apply ing i t to fau lt recov ery m ay be an easy ext ension, if an ti cip at ed e arly enough. Lik ewis e, even crude control may b e enough to ensure a pow er-posit ive , th erma l ly saf e a t ti tude, and with slow enough responses could be made s tab le under most condi tions . Safing cap abi li ti es are alw ays planned anyw ay and can be tuned for min ima l requ irem en ts, if perm it ted by the d esign , so i t’s worth a trade s tudy to dec ide how far into th e spac e of abnormal ity i t may be worthwh il e to venture . Saf ing should

provide fal lba cks th at are as abso lut ely rock bot tom foolp roof as possibl e.

Then opera tion al f lex ibi li ty

might com e to

the r escu e, once opera tors can g et involv ed. S o don’t surrender; s impl ify, and try to ge t by.

Design spac ec raft to be safe in safi ng. Don’t wai t to fig ure this out l at er. Many of the diffi cul ti es in supplying a tru ly robust saf ing cap abi li ty, as des crib ed above, go aw ay if cons idera t ions for safing ar e part

of the origina l des ign conc ept. Consid er, for ins tanc e, the passiv ely s tab le ori ent at ion and rota tion of a spac ecraf t. There’s a fa ir chanc e th e spa cecr aft w il l find i ts way into this s ta te at so me poin t and st ay ther e

for a

wh il e, whe ther one wan ts i t to or no t. If saf ing does no t us e su ch a s trap st at e, th en s afing w il l a lways

have th is thr ea t ag ains t i t. So much the be tt er th en, if ther e is only one su ch trap and i t happens to be one that can be survived ind efini t ely w ith re asonabl e support fo r tel eco mmuni ca tions . It’s alw ays be tt er not to have to f ight ups trea m, so d es ign the syst em so it helps you. The s am e argum ent can b e appl ied to oth er cri ti ca l s ta tes . Making thes e th ings happen is a d irec t conse quence of put t ing faul t prot ec tion and norma l operat ion toge ther from the outs e t. In th e s ame way that no GN&C d esigner wou ld ever al low a spa ce craft

configurat ion th at was no t a cooper at ive p artner in control, faul t prote c tion

de signers mus t t ake the sa me

ownership of the syst em behav ior th ey ar e expec ted to man a ge, both norma l and abnorma l.

Incl ude fai lsafe hard war e feat ure s. In th e sa me v ein, v ario us fai lsaf e hardwar e fe ature s c an a lso help . For examp le , i t should nev er be poss ible to reque st a torque , force, or r at e indef ini te ly; such com mands should alw ays time out qui ckly unless re inforced . There should nev er be il lega l comb ina tions of bits th at c an caus e ac tion. On e should nev er ne ed to know a priori wh at the st at e of a dev ic e is in order to be abl e to comm and it to a saf e s ta te . Ther e should be a w at chdog for command source l ivene ss. Re se ts and wa tchdog tr iggers should alw ays est abl ish a s afe hardw are st at e. And so on . S uch fea tures dr ama t ic al ly reduc e opportuni ti es for problems to prol ifer at e for la ck of at ten tion, whi le other s erious ma tt ers are disrupt ing opera t ion.

Che ck out safing s yst ems earl y in fl ight . As a fina l che ck, it is alw ays prudent to never l eav e a spac ecr aft unatt ended aft er l aunch unti l safe mode has been de monstra ted. And of course, in c as e some thing does go wrong, the cont ingen cy cap abi li ty to de al wi th i t mus t be in pl ac e. This inc ludes oper ators on duty, adequa te t el em etry and abi li ty to uplink (even under advers e condit ions), good margins, re adi ly adjus tabl e


behavior (e.g. , para me ters , mode and devi ce s el ec tions , and so on), good testbeds wher e problem s can be

readi ly dupl ic at ed, and ac ces s to the r ight exper ts wh en yo u need the m. As d escr ibed above, supporting

capab il it ie s for su ch con tingen ci es should

be added e arly in developm ent .

Carefu l ly va lida te al l mod els . Model ing is a double-edg ed sword. Good models ar e c le arly ne ces sary for proper diagnosis and response , bu t hav ing the abi li ty to wor k around a t le ast som e imp erfec tion in mode ls

has a lso be en e mphas iz ed, s inc e

knowledg e of behav ior is never perfe c t. So me may argue,

in fa ct , th at

model-ba sed des igns (espec ia l ly model-ba sed faul t prote ct ion) are vulnerabl e prec is ely for this rea son, though this begs th e ques tion of how any design a t a ll is possibl e wi thout expe ct a tions of behavior . The quest ion then, whe ther mode ls appe ar expl ic it ly in an implem ent at ion or only in the most indire ct w ay through designs, is how one knows they are v al id ( i.e ., good enough to use) .

Mode l val ida tion must cert a inly appea l to exper tis e. Thos e most fam i li ar with a mode led i te m should develop or at l eas t revi ew i ts mode l, wi th full aw arene ss of how the model wil l be used and the ne ed to keep i t current . Consequen tly, com muni ca ting models eff ec tive ly and consist ent ly is e ssen ti al . Mode l valid at ions aren’t c losed any more than syst ems ar e.

Ult im at ely though, mode l val ida tion is bes t when supporte d by data, and this is where i t is cr it ic al to understand the fundam ent al irony of model val ida tion : data can only inval idat e models . Mod el va lid at ion is consequent ly an exer cis e in looking for contr adic tory da ta. One does this by consider ing a broad s et of

condit ions to t est th e mod el’s predi ct ive pow ers, and



cons idering

d at a

s ets

c apab le


discr im inat ing among comp et ing models. Mode ls tha t ar e mere da ta f its , or tha t only loose ly appe al to

physica l phenom ena , have very produces such dat a to v al ida te

li tt le expl ic a tory power and are prone to m is le ad or m ask issues . How on e f ai lure modes (in cluding o perat ion outs ide of opera ting range) n atura l ly

rais es many que st ions. Where fe as ible , it is alw ays be st to get the da ta anyway , ev en if diff icu lt , and where not fea sib le , one should double up on experts. Spac e sys te m s t end to b e m erc il ess , when you get i t wrong. One thing is cert a in though. Tes ting a spa ce craf t wi th al l fau lt prote c tion a lways en abl ed is th e bes t way to

discover whe ther or not your mode ls a t leas t g et norma l beh a vior right .

How good a model must be is more sub jec t ive. On e te chniq ue for deciding is to exagg erat e th e effe cts of everything that co mes to m ind in order to see if anything c ould ever be a problem, el im ina ting only those effec ts wi th provably impl ausib le s ignifi can ce . What one c a n always count on though is that a ssumpt ions regarding a “ cred ibl e” range of opera tion are a lways suspe ct , espe c ia lly wh ere faul ts are involv ed.

Three more c aut ions are in order, th e first be ing to avoid the blunder of valida ting a mode l aga inst i ts elf . No one would foolishly do this in any bla tan t way, of course , but it h appens from time to t ime , noneth el ess, when a tes tbed in clude s in its simula t ion a model from the s ame sourc e or assumpt ions as thos e behind the flight des ign. When errors or inappropriate s imp lif ic at ions are repe at ed betwe en the two , the syst em wi l l obligingly work fine — unti l i t fli es. The se cond cau tion is rel a ted. Susp ec t val ida tion by sim il ari ty or extrapol a tion of prior experi enc e. Even modes t dep arture s c a n be fa ta l if handl ed incorrec t ly. Be sid es, your source migh t have be en wrong. Fina lly, in a simi lar ve in, pi ece me al tes t ing is l ess s at isfa ctory th an end-to- end t est ing, if you h ave to make a choi ce , and both are requir ed in order to find compens at ing errors.

Giv en a cho ic e, des ign s yst ems tha t are mor e eas il y mode l ed. This advi ce is e as ily appr ec ia ted, bu t not alw ays the mos t obvious to apply. Every engin eer has se en a see mingly s impl e syst em conc ept devo lve into endles s ana lysis and te st ing, as the enorm ity of the opera t ing space tha t i t al lows , and therefor e wi thin which i t m ay need to oper at e, is reve al ed. Th ere is a tende ncy in this age of ul tra-sophis ti ca t ed model ing tools and supercomput er simula t ions to soldi er on in confidenc e of the brute streng th offered by these

te chniques . Nothing , however, w i ll ever subs ti tut e for an e le gant form th at is s imp ler

to ana lyz e. Th erefore ,

one must be prepared to fa ce the rea l ity th at oft en the bes t co ncept is the one th at c an be mode led (and have

the models va l idat ed) mor e eas ily , even if it requir es a fe w m ore parts and a b it more con trol.

El eganc e of form is usefu l in control syst em arch it ec tures a s we ll . A ll of the pr inc iple s of transp arency discuss ed here have the ul tima te goa l of making contro l system s more e asi ly mode led, wh ich m akes syste ms e asi er to underst and, and henc e l ess co mplex . This can resul t in gre at s impl ifi ca t ions elsewher e, espe ci al ly in fau lt prote ct ion, a wor thy note on whi ch to end this dis cuss ion. —


There is much th at could be added to this exposit ion, wh ich has be en a ne cess ari ly simpl is ti c rac e

through issues

th at are de ep and comp lex. Essen ti a l topi cs h ave been neg le ct ed. Th e obje ct ive , however, is

not to provide a comprehens ive guide or rigorous theoret ic a l trea tm ent, but ra ther to demons tra te th at there

is v alue in adher ing to id eas from the roots of our d isc ipl ine , even in

tod ay’s co mpl ex world.


Managing the compl exi ty of a larg e sys te m requir es confi dent

con trol, and one of the most d iffi cul t

aspe cts of tha t contro l is fau lt tol eranc e. Th e prem ise her e ha s been tha t faul t prote ct ion, as fundam ent al ly a


issu e, w i ll b e w el l s erved through a more c areful grounding in the sa me p at terns of thought and

princip les of design th at have guid ed succe ssful GN& C system s through

decades of developm ent .

However, this mode st explora t ion of connec t ions is an a tt e mpt, not just to sugge st some m eri t to tha t ide a,

but also to argue th at this is not uniqu ely a f aul t prote ct ion c oncern. Tha t is, it appears tha t faul t prote c tion

compl exi ty must

b e address ed, not mere ly as an in tegra ted control

part of a uni f ied approach to sy st em control .

e le men t of a sys tem , but a s an in tegra l

If this is corre ct , then i t follows tha t solving the fau lt pr otec t ion problem requires noth ing short of

solving th e comp lex ity proble m overa l l. Th at insigh t is impo rtant , be caus e we are mov ing into a new era


unpreceden ted co mplex ity , where pre sen t is sues of faul t pr otec t ion

provide only a sa mpl ing of what is to

come for syst em s at l arge. One se es this , for inst anc e, i n spac e sys t ems moving into more uncer ta in environmen ts, wher e the d ifferen ces d im inish gre at ly betw e en responding to random faul ts and responding to random environm ent al influen ces tha t obstruc t progress . What w as onc e required on ly of fault prot ec t ion

is s lowly but sur ely b ecom ing the norm for routine opera t i on in such syst ems , espe ci al ly as de mands for

responsivene ss and eff ic ien cy esc al at e. Th is is jus t one of many su ch chal leng es w e

can an ti cipa t e.

However, if pa st exp eri ence w ith fau lt prot ec tion is any guide, diff icu lty in s at isfying th is evolu tion of

needs w i ll ho ld the fu ture out

of re ach , for w e hav e

The so lut ion to any comp lexi ty probl em l ies in

ye t to truly co me to gr ips w ith the co mplex ity is sue. underst andin g, but unders t anding requir es a transp aren t

framework for ide as. Returning to the roots of syst ems theor y and princip les of con trol is offered her e as a signifi can t p art of th is fram ework. By looking at GN& C f ault prot ec tion funda men ta ls in this l ight , it is hoped that th is revi ew of princip les w il l be usefu l both on its own meri t, and as a guide to future developm ents in both faul t prote ct ion and syste m m anag emen t. T echn iques of most r el evanc e to th is approach, wh ere the fundamen ta l not ions of s ta te , models , and obje ct ives appe ar in the ir most transp aren t

form, ar e the st at e- and mode l-bas ed, goa l-driven approa ches to sys t ems engin eer ing and control .


The key


apply ing th ese id eas , howev er, is no t just to embra ce another se t of rules . No conc ept

main ta ins i ts int egri ty during dev elopm ent wi thout the s truct ure of forma l arch it ec ture and rigorous syst ems

engine ering m ethodology. Providing this is not the eas i est t hing we can do, but s imp list ic no tions do no t confront compl exi ty; they m ere ly shif t i t els ewher e, as much of our experi ence wi th faul t prot ec tion demonstr at es. The approa ch reco mmend ed here is to w eav e ide as of tr ansparen t contro l deep ly into our archi te ctur es, so w e can tak e sp ac e sys te ms to the next l eve l.


This work w as perform ed a t the J e t Propuls ion Labora tory, Ca l ifornia Ins ti tut e of T echno logy, under contra ct w ith th e Nat iona l Aeronaut ic s and Spac e Admin istr at ion. I thank Mat thew Benne t t, Mark Brown,

Dan Dvorak, Lorra ine

Fesq , K enny Meyer , and Dave N ichols for th eir h elpful co mmen ts.


  • 1. D. Dvorak, M. Ingham, J.R. Morris, J. Gersh, “Goal-Based Operations: An Overview,” Proceedings of AIAA Infotech@Aerospace Conference, Rohnert Park, California, May 2007.

  • 2. E. Seale, “S PIDER : A Simple Emergent System Architecture For Autonomous Spacecraft Fault Protection,” Proceedings of AIAA Space 2001 Conference and Exposition, Albuquerque, NM, Aug 2001.

  • 3. J. Newman, “Failure-Space, A Systems Engineering Look at 50 Space System Failures,” Acta Astronautica, 48, 517-527, 2001.

  • 4. L. Fesq, M . Ingham, M. Pekala, J . Eepoel, D. Watson, B . Will iams, “Model- based Autonomy for the Next Generation of Robotic Spacecraft,” Proceedings of 53 rd International Astronautical Congress of the International Astronautical Federation, Oct 2002.

  • 5. M. Bennett, R. Knight, R. Rasmussen, M. Ingham, “State-Based Models for Planning and Execution,” Proceedings of 15 th International Conference on Planning and Scheduling, Monterey, CA, Jun 2005.

  • 6. R. Rasmussen, M . Ingham, D. Dvorak, “Achieving Control and Interoperability Through Unified Model-Based Engineering and Software Engineering,” Proceedings of AIAA Infotech@Aerospace Conference, Arlington, VA, Sep 2005.