Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more ➡
Download
Standard view
Full view
of .
Add note
Save to My Library
Sync to mobile
Look up keyword
Like this
0Activity
×
0 of .
Results for:
No results containing your search query
P. 1
What is the #1 cause of Downtime and Security Breaches?

What is the #1 cause of Downtime and Security Breaches?

Ratings: (0)|Views: 70|Likes:
Published by Amy Marion
Companies are investing in high-availability systems and performance monitoring solutions for data centers, but are failing to follow best practice procedures to avoid human errors. As complexity grows in IT infrastructure, administrators are searching for solutions that will help them effectively monitor and maintain these environments. But oddly enough, the simple question “Who last accessed the server and what did he do?” remains one of the toughest questions to answer. This is despite the variety of system management tools in use today. It is not enough to just monitor servers and applications when the #1 cause for server downtime is human error.
Companies are investing in high-availability systems and performance monitoring solutions for data centers, but are failing to follow best practice procedures to avoid human errors. As complexity grows in IT infrastructure, administrators are searching for solutions that will help them effectively monitor and maintain these environments. But oddly enough, the simple question “Who last accessed the server and what did he do?” remains one of the toughest questions to answer. This is despite the variety of system management tools in use today. It is not enough to just monitor servers and applications when the #1 cause for server downtime is human error.

More info:

Published by: Amy Marion on Dec 15, 2011
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See More
See less

12/15/2011

pdf

text

original

 
 
`
People Auditing
Why Are We Ignoring the #1 Cause of Downtime and Security Breaches?
An ObserveIT Whitepaper
March 2008© Copyright 2008 ObserveIT Ltd.
 
 People Auditing: Why Are We Ignoring the #1 Cause of Server Downtime?www.observeit-sys.com
2
Table of Contents
Executive Summary .............................................................................................................................. 2The Challenge of Human Errors ............................................................................................................ 3The Limitations of Log-Based Troubleshooting ...................................................................................... 4People Auditing .................................................................................................................................... 5Conclusion ............................................................................................................................................ 6About ObserveIT................................................................................................................................... 6
Executive Summary
Companies are investing in high-availabilitysystems and performance monitoring solutions fordata centers, but are failing to follow best practiceprocedures to avoid human errors.As complexity grows in IT infrastructure,administrators are searching for solutions that willhelp them effectively monitor and maintain theseenvironments. But oddly enough, the simplequestion “Who last accessed the server and whatdid he do?” remains one of the toughest questionsto answer. This is despite the variety of systemmanagement tools in use today. It is not enough to just monitor servers and applications when the #1cause for server downtime is human error. Ask anexpert about high availability, and the conversationquickly turns to the subject of human error.The problem of maintaining uptime is exacerbatedby the increased dependency on outsourcing,offshore consultants, temporary employees anddevelopers that administer servers and softwarefrom multiple vendors. This situation brings adecrease in direct accountability. Furthermore, thegeographic dispersion of admin-level access makesverbal investigation more complicated by orders of magnitude.To achieve efficient operations, administratorsneed a holistic view of the entire IT infrastructure,including monitoring of human factor.By adding user activity auditing as a core propertyof system monitoring platforms, IT departmentscan easily achieve higher availability, through fasterproblem identification and time to resolution. Inaddition, thorough People Auditing demonstratesthat employees and partners are meetingestablished guidelines for information access,transaction integrity and intellectual propertyprotection.
 
The Challenge of Human Errors
Incorporating Human Error Monitoring Into the System Management Infrastructure
In spite of the adoption of monitoring platforms, the processes available for troubleshooting service outages and securing IT servers has remained out-of-sync with the natural human analysis of IT administrators.
IT Concerns
Service Outage: Smart Users Make “Smart” Mistakes
Recent studies show that human errors account forapproximately 34% of total outages when countingthe number of outages. Factoring in the total effectof each incident (as measured by Mean Time toRepair and amount of lost data), 56% of totalserver outage impact comes as a result of humanerror.
OutageType% of IncidentsTotal Impact(Downtime /Data Loss)System Error
66% 44%
Human Error
34% 56%
For system errors, the tedious process of loganalysis has been alleviated partially by theadoption of system monitoring platforms andsoftware profiling utilities.But for human errors, administrators remain at aloss for assistance when searching for potentialcauses. There is no indicator that “This is probablya human error”, or even the simple statement that“Someone touched this server around the sametime as the start of the problem.”A corollary of this issue is that smart users makethe “smartest” mistakes. They know the nooks andcrannies of arcane configuration files that mighttweak an extra 5% of performance out of thesystem. These nooks and crannies are subsequentlythe most difficult to find when they go wrong.
Compliance Auditing and Security 
In today’s world of compliance regulations, anadditional trend of outsourcing, temporarycontract employees and remote IT support staff creates a significant loophole to compliancy efforts:This rise of outsourcing brings with it a rise inmission-critical data being handled via RemoteDesktop Protocol (RDP).Of course, the bottom-line complianceresponsibilities remain solely in the hands of thecore IT team. You can outsource Support, but youcan’t outsource Compliance. IT managers todaymay know how their platforms are secured fromexternal access, but the question “How can we besure that third-party vendors with RDP access arenot accessing data that they should not betouching?” is more difficult to answer.
How Human Error Differs from System Error 
There are a few fundamental differences betweensystem errors and human errors, and the way thatthey each need to be managed.The first basic difference is repeatability. Forsystem errors, finding the problem is equivalent tofixing the problem. If your troubleshooting processleads to a conclusion that a NIC card is not working,then swapping in a new card closes the issue, andyou can sleep well that evening.Human errors are not this direct. If you find animproper configuration file and swap it with thecorrect configuration file, the problem may besolved temporarily. But when you go to bed thatnight, you’re probably still scratching your head,wondering who or what caused this error, andwhether it will happen again tomorrow.

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->