You are on page 1of 4

 

Example1 Root Cause Analysis Report


Focal Point: Customer Complaints

Report Number: RCA 2012.67


Report Date: 4/2/2012
RCA Owner: Problem Manager

Problem Statement
Focal Point: Customer Complaints

When
Date: 03/05/2012
Time: 8:44am - 2:31pm
Unique Timing: While database admin was on vacation

Where
System: Company website, Company IT infrastructure
Location: Philadelphia, PA

Impact

Actual Impact Potential Impact Cost


Revenue: High Much higher $1,500,000
Customer Negative impact More customers impacted N/A
Service:
Publicity: Negative effect More negative exposure Unknown

Total: $1,500,000
Frequency: Two times overall

Cause and Effect Summary

On March 5, 2012 we received numerous complaints from customers about our website being down
while they were attempting to use it. The website was down from approximately 8:44am to 2:31pm
EST. Customers were unable to use our site because they were receiving "500"-type errors from our
web server. "500" errors prevent users from accessing the website. The server was returning "500"
errors because the application server which processes requests was timing out, and we have only one
application server.

The application server was timing out because it was receiving requests, and the associated database
was not working. The database was not working because the SQL server was not processing queries.
The SQL server could not process queries due to the fact that the transaction log stopped growing. The
                                                                                                                         
1  Note:    This  is  an  example  only!    The  main  source  of  information  for  this  report  is  from  a  Sologic  RCA  client,  but  specific  information  has  been  
omitted  to  ensure  anonymity.  

 
Page 2 of 3
 

log couldn't grow because the T:Drive was full and we were using only one database cluster. There
was only one database cluster in use because we only have two, and the other cluster was being used
for UAT testing. The drive was full because there is fixed capacity, the log file storage grew, the logs
were not truncated, and the logs are required to be truncated to reduce memory needs. The logs
weren't truncated because the database administrator (DBA) is tasked with manually truncating them,
and he was on vacation. The backup DBA was not aware the logs needed truncating because there
was no process in place to inform the backup DBA of critical tasks.

Solutions

ID Label Detail
1 Cause: No process in place to inform backup DBA
Solution: Implement process to notify backup DBA of critical tasks when taking
over duties.
Assigned: Jennifer Elderberry
Due: No due date assigned – example only!
Term: Medium
Notes: This would be an automated process to notify the backup DBA.
Est. Cost: No estimated cost available – example only!

2 Cause: Backup DBA not informed about truncating needs


Solution: Create document highlighting DBA duties in case of turnover or
emergency backup DBA appointed
Assigned: Jennifer Elderberry
Due: No due date assigned – example only!
Term: Medium
Notes: This would work in conjunction with Solution #1, but would be
delivered to the backup DBA well in advance.
Est. Cost: No estimated cost available – example only!

3 Cause: Logs are manually truncated by DBA


Solution: Explore automating log truncation options and make a
recommendation to management.
Assigned: Dave Flynn
Due: No due date assigned – example only!
Term: Short
Notes: Not really a “solution”, but if we can make this happen it will help
reduce the risk of recurrence.
Est. Cost: No estimated cost available – example only!

4 Cause: Only one database cluster in use


Solution: Use multiple databases for application servers
Assigned: Jennifer Elderberry
Due: No due date assigned – example only!
Term: Medium
Est. Cost: No estimated cost available – example only!

 
Page 3 of 3
 

ID Label Detail
5 Cause: T:Drive at zero bytes free
Solution: Increase space on T:Drives
Assigned: Ted Dezember
Due: No due date assigned – example only!
Term: Medium
Est. Cost: No estimated cost available – example only!

 
People visiting Site was live
website

Requests made of
application server Transaction log
located on T:Drive

Application server
processes requests We only have two SQL
clusters

The application Application server SQL trans. log needs Only one database
server was timing relies on working to grow to process cluster in use
out database queries

SQL server was not Other SQL cluster


processing queries being used for UAT
testing

Web server returned Time outs result in Database not working Transaction log was Storage required for
error ("500"-type) "500" errors unable to grow log to grow

Functional database
relies on working
SQL server

Customers not able "500" errors prevent Only one application T:Drive damaged Database Admin (DBA)
to access our web access to website server exists was on vacation
site

Storage file size


fixed
OR

Customer Complaints Customers need/want Customers attempted T:Drive at zero Logs are manually
to access site to access site bytes free truncated by DBA

Logs were not


truncated

Chose to contact web Company only has one


support with DBA
complaint

Backup DBA not aware No process in place


logs needed to inform backup DBA
truncating

You might also like