You are on page 1of 12
Schneider Electric University Transcript Slide 1 Welcome to the course: Fundamentals of Availabilty Slide 2: Welcome For best viewing results, we recommend that you maximize your browser window now. The screen controls allow you to navigate through the eLeaming experience. Using your browser controls may disrupt the normal play ofthe course. Click the attachments link to dowmload supplemental information for this course, Click the Notes tab to read a transcrit ofthe narration. Slide 3: Learning Objective Atthe end of this course, you will be able to ‘© Understand the key terms associated with availability ‘Understand the difference between availability and reliability, Recognize threats to availabilty Calculate cost of downtime Slide 4: Introduction In our rapidly changing business world, highly available systems and processes ate of critical importance {and are the foundation upon which successful businesses rely. So much so, that according to the National Archives and Records Administration in Washington, D.C., 3% of businesses that have lost availabilty in their data center for 10 days or more have filed for bankruptcy within one year. The cost of one episode of downtime can cripple an organization. Take for example an e-business. In a case of downtime, not only \would they potentially lose thousands or even millions of dollars in lost revenue, but theit top competitor is only a mouse-clck away. Therefore loss is translated not only to lost revenue but also to a loss in customer loyalty. The challenge of maintaining a highly available network is no longer just the responsibilty ofthe IT departments, rather it extends out to management and department heads, 25 well as the boards which govern company policy. For this reason, having a sound understanding of the factors that lead to high Fundamentals of Availabilty Page |t © 201 Sense lect lati reserved Al ademars provided ae the poet ofr rsrecie ner Schneider Electric University availabilty, threats to availabilty, and ways to measure availabilty is imperative regardless of your business sector Slide 6: Measuring Business Value ‘Measuring Business Value begins frst with an understanding of the Physical Infrastructure. Physical Infrastructure is the foundation upon which Information Technology (iT) and telecommunication Networks resid. Physical Infrastructure consists of the Racks, Power, Cooling, Fire Prevention/Security, Management, and Services Slide 6: Measuring Business Value Business value for an organization, in general terms, is based on three core objectives: 4. Increasing revenue 2. Reducing costs 3. Better utlizing assets Regardless of the line of business, these three objectives uitimately lead to improved earnings and cash flow. investments in Physical infrastructure are made because they both directly and indirectly impact these three business objectives, Managers purchase items such as generators, air conditioners, physical security systems, and Uninterruptible Power Supplies to serve as “insurance policies.” For any network or data center, there are risks of downtime from power, security and thermal problems, and investing in Physical Infrastructure mitigates these and other risks. So how does this impact the three core business objectives. above (revenue, cost, and assets)? Revenue streams are slowed or stopped, business costs / expenses are incurred, and assets are underutilized or underproductive when systems are down, Therefore, the more efficient the strategy is in reducing downtime from any cause, the more value it has to the business in meeting al three objectives. Slide 7: Measuring Business Value Historically, assessment of Physical Infiastructure business value was based on two core criteria: availability and upfront costs. Increasing the availabilty (uptime) of the Physical Infrastructure system and ultimately of Fundamentals of Availabilty Page |2 © 201 Sense lect lati reserved Al ademars provided ae the poet ofr rsrecie ner Schneider Electric University the business processes allows a business to continue to bring in revenues and better optimize the use (or productivity) of assets. Imagine a crecit card processing company whose systems are unaveilable — credit card purchases cannot be processed, halting the revenue stream for the duration of the downtime. In addition, employees are not able to be productive without their systems online, And minimizing the upfront cost ofthe Physica infrastructure results in a greater retum on that investment. if the Physica Infrastructure costis low and the risk ! cost of downtime is high, the business case becomes easier to justiy, While these arguments stil hold true, today’s rapidly changing IT environments are dictating two additional criteria for assessing Physical Infrastructure business value. One is Agilty. Business plans must be agile to deal with changing market conditions, opportunites, and environmental factors. Investments that lock resources limit the ability to respond in a flexible manner. And when this flexibility or agit is not present, lost opportunity is the predictable result The other is Sustainability. Itis imperative that data center owners have a Sold action plan to achieve sustainabiity goals and commitments. 1. Develop a plan that includes a bold and actionable strategy with clear objectives and prioritized action. 2, Implement efficient designs, which invest in technologies that improve energy efficiency and lower carbon footprint ike SF6 Free switchgear and liquid cooling, which could reduce overall IT and infrastructure energy consumption by 15 percent. 3. Drive operational efficiency with connected systems to collect data that provides visibility, tracks energy usage, and benchmarks performance. 4. Buy renewable energy which can be accomplished in three main ways ~ credit, on-site build, and off-site build 5, Decarbonize your supply chain — choose vendors that embrace circular economy with circularity designed into products Slide 8; Five 9's of Availability AA term that is commonly used when discussing availabilty isthe termS Nine’s, Although often used, this term is often very misleading, and often misunderstood. 5 9's refers to a network that is accessible 99.980% of the time. However, itis @ rather misleading term. We'll explain why a litle later on in the course. Slide 8: Key Terms There are many additional terms associated with availabilty, business continuity and disaster recovery. Before we go any further, let's define some of these terms. Fundamentals of Availabilty Page |3 © 201 Sense lect lati reserved Al ademars provided ae the poet ofr rsrecie ner Schneider Electric University Reliabilty is the abiity of a system or component to perform its required functions under stated conditions fora specified period of time, Availabit on the other hand, is the degree to which a system or component is operational and accessible \when required for use. It can be viewed as the lklnood that the system or component is in 2 state to perform its required function under given condlitons at a given instant in time. Availability is determined by a system's reliablity, as well as its recovery time when @ fallure does occur. When systems have long Continuous operating times, fallures are inevitable. Availabilty is ofen looked at because, when a failure does occur, the ertical variable now becomes how quickly the system can be recovered. In the data center, having a reliable system design is the most critical variable, but when a failure occurs, the most important Consideration must be getting the IT equipment and business processes up and running as fast as possible to keep downtime to a minimum Slide 10: Key Terms Upon considering any availabilty or reliability value, one should always ask fora definition of flue. Moving forward without a clear definition of failure, is ike advertising the fuel efficiency of an automobile as “miles per tank’ without defining the capacity of the tank in Iters or gallons. To address this ambiguity, one should start with one ofthe following two basic definitions of a feilure According tothe IEC (Intemational Electro-technical Commission) there are two basic definitions ofa failure: 1. The termination of the ability ofthe product as a whole to perform its required function. 2. The termination ofthe ability of any individual component to perform its required function but not the termination ofthe abilty of the product as a whole to perform. Slide 11: Key Terms MTBF Mean Time Between Failure, is a basic measure of a system's reliability. Itis typically represented in Units of hours. The higher the MTBF numbers, the higher the reliability of the product TTR Mean Time to Recover (or Repair), is the expected time to recover a system from a failure, This may include the time it takes to diagnose the problem, the time it takes to get a repair technician onsite, and the time it takes to physically repair the system, Similar to MTBF, MTTR is represented in units of hours. MTTR Fundamentals of Availabilty Page |4 © 201 Sense lect lati reserved Al ademars provided ae the poet ofr rsrecie ner Schneider Electric University impacts availabilty and not reliability. The longer the MTTR, the worse offa systems. Simply put, if it takes longer to recover a system from a feilure, the system is going to have a lower availabilty. As the MTBF goes Lup, availabilty goes up. As the MTTR goes up, availability goes down, Slide 12: The Limitations of 99.999% {As before mentioned 5 9's is @ misleading term because the use of the term has become diluted. 5 9's has been used to refer to the amount of ime that the Data Center systems are available. In other words, @ data Center that has achieved 5 9's is functioning €9,920% of the ime, The frequency of failure is only 1 part of the equation. The other part ofthe availabilty equation is how long it takes to recover from flue. Let's take for example two data centers that are both considered 99,908% available. In one year, Data Center A lost power once, butt lasted for @ full 5 minutes. Data Center B lost power 10 times, but for only 30 seconds each time. Both Data Centers were without power fora total of 5 minutes each. The missing detailis the recovery time. Anytime systems fai, there is a recovery time to get back to operational state, Which includes the time for servers to be rebooted, data to be recovered, and corrupted systems to be repaired, The Mean Time to Recover process could take minutes, hours, days, or even weeks. Now, ifyou consider again the two data centers that have experienced downtime, you will see that Data Center B that has hed 10 instances of outages will actually have a much longer duration of downtime, than the data center that only had once occurrence of downtime. Data Center 8 must recover from failure 10 times. it is because of this dynamic that reliability is equelly important to this discussion of availabilty. Reliability of a data center talks fo the frequency of downtime in a given time frame, There is an inversely proportional relationship in that as time increases, reliability decreases, Availabilly, however is only a percentage of downtime in a given duration Slide 13: Factors that Affect Availability and Reliability It should be obvious that there are numerous factors that affect data center availabilty and relieblty. Some of these include AC Power consitions, lack of adequate cooling in the data center, equipment failure, natural and artificial disasters, and human errors. Slide 14: AC Power Conditions Fundamentals of Availabilty Page |§ © 201 Sense lect lati reserved Al ademars provided ae the poet ofr rsrecie ner Schneider Electric University Let's look fist at the AC power conditions. Power quality anomalies are organized into seven categories based on wave shape: 4. Transients 2. Interruptions 3. Sag /Undervoltage 4. Swell Overvoltage 5. Waveform distortion 6. Voltage fluctuations 7. Frequency variations Slide 18: Inadequate Cooling Another factor that poses a significant threat to availabilty is a lack of cooling in the IT environment, IT equipment ike servers and storage generate heat. In the Data Center Environment, where @ mass quantity of heat is being generated, the potential exists for significant downtime unless this heat is removed from the space. Slide 16: Inadequate Cooling Cooling systems are needed in the data center to remove this heat, however, ifthe cooling isnot distributed properly hotspots can occur. Slide 17: Inadequate Cooling Hot spots within the data center further threaten availabilty. In addition, inadequate cooling significantly detracts from the lifespan and availabilty of IT equipment. It is recommended that when designing the data Center layout, 2 hot aislecold aisle configuration is used. Hot spots can also be alleviated by the use of properly sized cooling systems, and supplemental spot coolers and air distrioution units, Slide 18: Equipment Failures ‘The health of IT equipment is an important factor in ensuring a highly available system, as equipment failures pose a significant threat to availabilty. Failures can occur for a variety of reasons, including damage caused by prolonged improper utility power. Other such causes are from prolonged exposure to elevated or decreased temperatures, humidity, component failure, and equipment age, Fundamentals of Availabilty Page |6 © 201 Sense lect lati reserved Al ademars provided ae the poet ofr rsrecie ner Schneider Electric University Slide 19: Natural and Artificial Disasters Disasters also pose a significant threat to availability. Hurricanes, tornadoes, floods, and the often subsequent blackouts that occur ater these disasters all create tremendous opportunity for downtime, In many of these cases, downtime is prolonged due to damage sustained by the power grid or the physical site of the data center itself. Slide 20; Human Error According to Gartner Group, the largest single cause of downtime is human error or personnel issues. One of the most common causes of intermittent downtime in the data center is poor training, Data center staf or contractors should be trained on procedures for application feluresihangs, system updatelupgrades, and other tasks that can create problems if not done correctly. Slide 21: Human Error ‘Another problem is poor documentation. AS staf sizes have shrunk, and with all the changes in the data Center due to rapid product cycles, i's harder and harder to keep the documentation current. Patches can go awny as incorrect software versions are updated, Hardware fixes can falf the wrong parts are used, Slide 22: Human Error ‘Another area of potential downtime is management of systems. System Management has fragmented from a single point of control to vendors, partners, ASPs, outsource suppliers, and even a number of intemal groups. With a variety of vendors, contractors and technicians freely accessing the IT equipment, erors are inevitable. Technologies lke Al and data analytics are enabling a reduction in human etror, as maintenance programs shit from calendar-based to condition-based Slide 23: Cost of Downtime Itis important to understand the cost of downtime to a business, and specifically, how that cost changes as a function of outage duration. Lost revenue is often the most visible end easily identified cost of downtime, but it is only the tip ofthe iceberg when discussing the real costs to the organization. in many cases, the cost of downtime per hour remains constant. In other words, a business that loses at a rate of 100 dollars eet hour in the first minute of downtime wil also lose at the same rate of 100 dollars per hour after an hour of downtime, An example of a company that might experience this type of profile is a retail store, where a constant revenue stream is present. When the systems are down, there is a relatively constant rate of loss Fundamentals of Availabilty Page|? © 201 Sense lect lati reserved Al ademars provided ae the poet ofr rsrecie ner Schneider Electric University costof Bowrtime [> perhour Duration of Outage Slide 24: Cost of Downtime ‘Some businesses, however, may lose the most money after the frst $00 milliseconds of downtime and then lose very ile thereafter. For example, a semiconductor fabrication plent loses the most money in the first moments of an outage because when the process is interrupted, the Silicon wafers that were can no longer be used, and must be scrapped production Cost of perhour Duration of Outage Slide 28: Cost of Downtime And others yet, may lose at a lower rate for a short outage (since revenue is not lost but simply delayed), and as the duration lengthens, there is an increased likelihood that the revenue will not be recovered Regarding customer satisfaction, a short duration may often be acceptable, but as the duration increases, more customers wil become increasingly upset. An example of this might be a car dealership, where customers are wiling to delay a transaction for a day. With significant outages however, public knowedge Fundamentals of Availabilty Page |8 © 201 Sense lect lati reserved Al ademars provided ae the poet ofr rsrecie ner Schneider Electric University often resuits in damaged brand perception, and inquiries into company operations. All of these activities result in a downtime cost that begins to accelerate quickly as the duration becomes longer. ( (mage on next page) Cost of Downtime perhour Duration of Outage Slide 28: Cost of Downtime Costs associated with downtime can be classified as direct and incirect, Direct costs are easily identified ‘and measured in terms of hard dollars. Examples include: 1. Wages and costs of employees that are idled due to the unavailability of the network, Although some employees willbe idle, their salaries and wages continue to be paid. Other employees may siil do some work, but their output will ikely be diminished 2. Lost Revenues are the mast obvious cost of downtime because if you cannot process customers, you cannot conduct business. Electronic commerce magnifies the problem, as eCommerce sales are entrely dependent on system availability 3. Wages and cost increases due to induced overtime or time spent checking and fixing systems. The same employees that were idled by the system fallure are probably the same employees that will go back to work and recover the system via data enby. They not only have to do their ‘day job’ of processing current data, but they must also re-enter any data that was lost due tothe system crash, Fundamentals of Availabilty Page |® © 201 Sense lect lati reserved Al ademars provided ae the poet ofr rsrecie ner Schneider Electric University or enter new data that was handwritten during the system outage. This means additional hours of work, most often on an overtime basis. 4. Depending on the nature of the affected systems, the legal costs associated with downtime can be significant. For example, if downtime problems result in a significant drop in share price, shareholders may initiate a class-action suit if they believe that management and the board were negligent in protecting vital assets. In another example, if two companies form a business partnership in which one company’s ability to conduct business is dependent on the availability of the other company’s systems, then, depending on the legal structure of the partnership, the first ‘company may be liable to the second for profits lost during any significant downtime event. Indirect costs are not easily measured, but impact the business just the same. In 2000, Gartner Group estimated that 80% ofall companies calculating downtime were including indirect costs in their calculations for the frst time. Examples include: reduced customer satisfaction: ost opportunity of customers that may have gone to direct compettors during the downtime event: damaged brand percepton; and negative public relations Slide 27: Cost of Downtime by Industry Sector A business's downtime costs are directly related to the industry sectors. Industry Sector | Revenue/Hour Energy $ 2,817,846 Telecommunications $ 2,066,245 Manufacturing | $_1,610.654 Financial Institutions | $ 1,495,134 Information $ 1,344,461 Technology Insurance $ 1,202,444 Retail $1,107,274 Pharmaceuticals [$1,082,252 Fundamentals of Availability Page |10 © 201 Sense lect lati reserved Al ademars provided ae the poet ofr rsrecie ner Schneider Electric University For example, Energy and Telecommunications organizations may experience lost revenues on the order of 2 to 3 milion dollars an hour. Manufacturing, Financial Institutions, Information Technology, Insurance, Retail and Pharmaceuticals all stand to lose over 1 millon dollars an hour. Slide 28: Calculating Cost of Downtime ‘There are many ways to calculate cost of downtime for an organization. For example, one way to estmate the revenue lost due to a downtime event isto look at normal hourly sales and then multply that figure by the number of hours of downtime. Normal hourly sales X Hours of downtime Revenue lost Remember, however, that this is only one component of a larger equation end, by itself, seriously Underestimates the true loss. Another example is loss of productivity ‘The most common way to calculate the cost of lost productivity is to fist take an average of the hourly salary, benefits and overhead costs forthe affected group. Then, muttiply that figure by the number of hours of downtime, Average hourly salary Benefits Overhead costs X_Hours of downtime Lost productivity Because companies are in business to eam profits, the value employees contribute is usually greater than the cost of employing them. Therefore, this method provides only a very conservative estimate of the labor cost of downtime Fundamentals of Availabilty Page| © 201 Sense lect lati reserved Al ademars provided ae the poet ofr rsrecie ner Schneider Electric University Slide 29: Summary ‘© To stay competitive in today’s global marketplace, businesses must strive to achieve high levels of availability and reliability. 99.999% availabilty is a commonly stated target for most businesses. + Power outages, inadequate cooling, natural and artifcial disasters, and human etrors pose a significant barrier to high availability © The direct and indirect costs of downtime in many business sectors can be exorbitant, and often is enough to bankrupt many organizations, ‘* Therefore it is critical for businesses today to calculate their level of availabilty in order to reduce ity and availability, risks, and increase overal Slide 30: Thank You! ‘Thank you for participating inthis course. Fundamentals of Availabilty Page [12 © 201 Sense lect lati reserved Al ademars provided ae the poet ofr rsrecie ner

You might also like