You are on page 1of 6

Lessons Learned

Abstract Failures in Avionics systems are the result of certain errors and mistakes. Studying about the root causes of many avionics system failures give us good lessons. The lessons learnt from failures, help us build better systems. Inadequate capturing of the requirements, insufficient margins, differences in conditions in test and in flight, polarity/sign reversals, incorrect application of components, discarding early warning signals etc. are some of the important reasons for failures. Also systemic deficiencies such as over confidence, complacency, unsustainable project schedules, flaws in review process, inadequate documentation, etc. also adds to the causes of failures. Introduction Success of launch vehicles & satellites are dependent on the performance of avionics system also. In the worldwide launch scenario, during the last decade since 2000, about 30 % of launch failures are caused due to the malfunctioning of avionics systems including software ( Table 1). Also during the final phase of the launch campaign of many missions, anomalies in the electronics system has caused anxieties and delays in launch. Many good lessons are learned from design, development, qualification & acceptance testing, final integrated tests and flight experience. A few cases are discussed here. Lesson 1: “Space is unforgiving; Thousands of good decisions can be undone by a single engineering flaw or workmanship error, and these errors and flaws can
1

result in catastrophe… It is always the simple stuff that kills you. ” The failures of launch vehicles are quite different from failure of other systems like T & M equipment, computers or communication systems. In fact launch failures are considered as accidents rather than reliability failures caused by random failures of the components. These are mainly due to design errors or workmanship errors in fabrication. If we look at the history of ISRO launch failures , it can be seen that very simple & silly mistakes/errors were the reasons behind these accidents. Table 1 gives the summary of ISRO launch failures. As stated above, design or workmanship related errors in fabrication are the reasons for these failures. Attending to minute details during design and avoiding deviations & mistakes during fabrication are the key factors which ensure mission success.

Failure Reasons Propulsion Guidance and Navigation Software and computing systems Electrical systems Structures Ordnance Pneumatics & Hydraulics

Percentage 54% 4% 21% 8% 0% 0% 0%

Table 1 Worldwide scenario of launch failures

Normally. when the stowed legs were deployed. guarding low level signals against noise. the designers failed to implement the requirement that the processing of the leg sensor data shall not begin until 12 metre above the ground. Provide sufficient de-rating for the parts Design for testability Good PCB layout with good grounding. 2 - - - - packaging shock & Lesson 3: Systems Requirements – To be adequately captured . Making a very clear specification document of interfaces between the systems Selection of right electronic parts and applying them correctly as recommended by the part Manufacturer. with One major concern during the system design phase is that. This is all the more important in the case of software intensive systems as well as systems with FPGA devices. had to deployed at about 1500 metres above ground. very detailed requirements to the minute levels. System designers were aware that. which were kept in stowed position. good timing design with adequate margins and taking care of signal integrity issues. Giving special attention to the details of the requirements and having a thorough discussion in this regard will go a long way in having a good system. however. when the legs were deployed at about 1500 meters and hence a 2 year long mission was lost. As a result. the sensors on the legs will produce similar momentary signal as on touchdown. Good mechanical with respect to vibration. However.Lesson 2: Robust Design – Essential to mission success We all know that the first and foremost factor determining the reliability of a system is good design. the requirements of the system and sub-systems are not adequately defined and detailed. One classical example is the much publicised MARS Polar Lander failure. during the proto model evaluation many new requirements are discovered. are not made initially and often. Major factors of a good avionics system design are the following: All the requirements are clearly and unambiguously specified initially itself. the shutdown of the engine happened when the leg sensors generated the false momentary signal. during the landing phase. Lesson 4: Wrong application of avionics parts – a major concern. during the design. Also the engine have to be shut down within 50 mS of legs touching the Mars surface. The three legs of the Polar lander. Scope and major specifications are properly defined. Good thermal design adequate margins. The shock sensors mounted on the legs were used to sense the touchdown and then shutdown the engine.

Recently in one of the telemetry packages. in the + 15 V supply line. The important point is that the design engineers should read and understand all the datasheets. are actually much more than system failures. Special care in layout design is required with regard to the timing capacitors used with certain devices. problems were seen with regard to data corruption in EEPROM devices due to wrong or incorrect data protection schemes employed in the circuits. very common now a days.Incorrect usage. in a relay driver 3 circuit. observed due to component quality & reliability related issues. unless all these are not properly understood and applied. during the assembly of devices on to PCB. may cause problems in a new design. In certain cases. Many of the new devices are very fast and extreme care is needed during layout. good workmanship practices. Tinning and Hand soldering of surface mount CDR type ceramic capacitors was a regular practice in many work centres and this has resulted in failures of packages even at launch pad. it was observed that. The line lengths should be kept very short and proper terminations are to be provided for each interconnection to avoid signal integrity related issues like overshoot and ringing problems. wrong application. not following the guidelines of component manufacturer. while in the package about 80 UF capacitance is put across the supply line. especially during thermal tests is a phenomenon. of every device used in the design. up to 22V is seen. In fact Chip capacitor manufacturers have recommended to avoid hand soldering practices. as these inputs may be more susceptible to noise. On analysis it is seen that the data sheet of Interpoint make DC-DC converter MHF+ 2815D specifies the maximum capacitance across its output shall be less than 10 micro farad. which was actually detected after about 15 years of usage. Even copying an already proven circuit from an old design. mainly because the thermal gradient during the soldering causes cracks in the layers of the multilayer ceramic capacitors which may develop into capacitor shorts. A few case studies are given below:Recently in an avionics package. Disregarding the inverse current gain of transistors. This overshoot may become catastrophic as absolute max voltage spec. Qualification models passing the tests and later flight models developing problems. when the package is switched ON. unused pin terminations results in intermittent malfunctioning of circuits. . inadequate de-rating. The problems caused by these reasons. for many devices is 16 V or 18 V.. application guidelines. an over shoot. which are attributable to such signal integrity related reasons. are major causes of malfunctions of avionics systems. Many a times. excessive spikes in the output data were observed as the RS 485 opto-coupled transceiver device was not provided with the necessary bypass capacitors as suggested by the respective manufacturer. precautions. Inadequate or incorrect power on reset circuits. for a duration of about 2 to 3 mS. have caused problems in many packages. Not adhering to the workmanship practices. has resulted in sneak paths in the system. etc. etc. layout guidelines. is a major problem causing many latent failures.

but may not be necessary as it may increase the usage of routing resources. make the FPGA design more robust and can tolerate start up delays of oscillators and other internal delays. Today the FPGA design process is a casual approach. Unused inputs such as test & Mode pins have to be properly terminated. Internal clock buffers are to be used to rout the reset signal as well as the clock inside the FPGA. VHDL codes. FPGA devices were in use in our launch vehicle projects for more than a decade. third party independent verification and validation. However. Synthesis tool configuration . The design methodology followed for FPGA design is neither the good software engineering practices followed in software design. There is an overall increase in the usage of FPGA designs for space applications. SAFE option shall be enabled to ensure recovery from illegal states The design documents. Initially low capacity devices. design document. Necessary precautions have to be taken to avoid this. During synthesis. It is important to follow a good design methodology. Metastability related issues may develop when signals are transferred over different clock domains. fuse maps - - - - - - - - - . Currently 100 to 200 K gate designs are implemented using FPGA devices. as envisaged in the DO 254 standard for Complex electronics with documents like requirements specification. as it is felt that the design errors can be easily corrected compared to cost of correction in an ASIC. while designing with FPGA devices. test benches.Lesson 5: Use FPGA devices in critical applications with extreme care. nor the ASIC design methodologies. In both these cases well matured engineering practices exists to avoid mistakes during design and to find & correct errors & bugs before the product is released. and very detailed review process. As the available clock buffers are finite. limit the number of clocks used inside the FPGA. were used. Avoid derived clocks and gated clocks as this will force the designers to use normal lines other than clock buffers for driving the clock inputs of Flip flops. This high gate count and associated design complexity is one of the major problems. This will increase the clock skews as well as reduce the testability of the circuits. ( 1 to 8 K gates ). all the flip-flops driving the Outputs shall be initialised during power ON Using an external Schmitt trigger inverter or buffer is recommended to rout the reset signal to the input of FPGA Asynchronous assertion and synchronous de assertion of the reset. Some of the very important lessons learned from the usage of FPGA designs for onboard applications are given below:The power ON behaviour of FPGA devices has to be studied very carefully. However the design methodology has not improved significantly and at the same time the risk involved is also not fully appreciated. 4 - Initialising all the flip flops in a FPGA device is good.

is in defining and implementing the proper electrical and mechanical interfaces between systems. with detailed knowledge of all the intricacies of the devices. especially in less critical telemetry applications has paid dividends. This document has to be reviewed and approved by all the teams working on the project. Lesson 8: Demonstrate design margins. There was a myth that only Mil grade / space grade devices are suitable for launch vehicle applications. the actual systems 5 - Currently about 60 to 70 % of the total number of semiconductor devices used in onboard applications are industrial grade semiconductor parts. with regard to voltage.etc. interfaces. Lesson 6: Experience with Industrial Grade Plastic Encapsulated Microcircuit (PEM) devices is really good. tolerances and uncertainties. In fact there was a major failure. the bold decision to use industrial grade PEM devices in launch vehicles. . lower cost. current. The important lessons learned from the usage of PEMs are as below Select components from reputed manufacturers only Employ the components only after a proper evaluation and qualification tests to establish margins. design methodologies. like launch vehicles and satellites. etc. The failure rate and reliability levels are comparable to that of mil devices. have to be version & configuration controlled. It is true that more and important lessons are learned from failures. as one of the teams provided the data in FPS units and the other team interpreted the numbers in MKS units. increased functionality of the components. the designers of FPGA designs should be experts in the area. However. Provide protection against moisture during storage Lesson 7: interfaces Take care of One major problem in large systems. that sufficient margins exist. It is important to have a well defined interface document where all the interface specifications are provided without ambiguity. After ensuring through design analysis. The important lesson is that. It is important to analytically determine the margins prior to testing the system. Vibration analysis of the chassis or packaging and mounting details of the unit on to the launch vehicle has to be done to ascertain the margins available. However. The major benefits of using industrial grade devices are availability. A proper derating analysis of every component in the system. This is to be done in the early phase of the project itself. Also maintaining the proper interfaces between the different working teams is very important and demanding. Demonstrating the design margins is equally important as Robust design for ensuring mission success. power and thermal characteristics will give good assurance with regard to electrical stresses. we can learn lessons from successes also. All the designs have to be thoroughly reviewed by a team of experts both in FPGA design and the system design. Design margins are to be compliant with environment. verification and validation practices and above all experts in the design and verification tools.

it is also extremely 6 . Testing is the only process to validate that all the above are satisfactorily met. The tests are to be representative of flight conditions. which is reviewed and approved by experts from design and test agencies. thermal chambers not having humidity control. interface definitions. Some common problems are: excess neutral to earth voltage. Some lessons learned are:All tests are to be done based on a test plan. important to ensure that the testing is done very carefully. Test induced failure is a reality and hence extreme care is to be exercised while testing the flight systems. In such cases systematic analysis of differences between the test and flight conditions has to carried out to understand the limitations of ground test and thus assess the risks involved and find ways to mitigate the risks. meeting all the quality norms. wrong test conditions. There are many good lessons learnt with regard to testing. poor training/inexperience/ fatigue of the operators - - - - - - It is to be noted that realising a flight hardware. the system should do and should not do. The system has to be tested for all that. A/C not working. etc. no A product or subsystem is designed and developed to perform defined functions meeting the requirements. First it is to be performed on a ground model before doing on the flight unit. “ Test as you FLY and Fly as you test ” A frequent cause of maiden flight failures is that the ground tests are not truly representing the flight conditions. isolation degradation between onboard and ground systems. is not an easy task. as simulations may sometimes miss some important points. The test equipment or checkout systems should have necessary safety interlocks to prevent accidental damage to the test article. Many avionics systems had failed during testing due to operational errors. While it is very essential to test every subsystem as described above. It may not be always possible to meet the above guidelines. improper over voltage/ over current settings. specifications. faulty test equipment. Use real flight systems instead of simulations wherever feasible. Never perform a new test on the flight system for the first time. environmental conditions etc. Lesson 10: Test Induced failures are also a concern.have to be put to the required tests. Inadequate ESD control. improper power up sequences. Lesson 9: There is alternative to testing. Assumptions used in test and simulations are to be fully understood.