Professional Documents
Culture Documents
SoCC15 Jian PDF
SoCC15 Jian PDF
Jian Huang
Xuechen Zhang† Karsten Schwan
†
Why Issue Study Matters?
Scalable distributed systems are complex [Yuan et al., OSDI’14]
Complicated System
2
Why Issue Study Matters?
Scalable distributed systems are complex [Yuan et al., OSDI’14]
+
Complicated System Error-prone
2
Why Issue Study Matters?
Scalable distributed systems are complex [Yuan et al., OSDI’14]
+ +
Complicated System Error-prone Hard to Debug
2
Why Issue Study Matters?
Scalable distributed systems are complex [Yuan et al., OSDI’14]
+ +
Complicated System Error-prone Hard to Debug
Issue Study
Issue Pattern
2
Why Issue Study Matters?
Scalable distributed systems are complex [Yuan et al., OSDI’14]
+ +
Complicated System Error-prone Hard to Debug
Issue Study
+
Issue Pattern Better Software & Debugging Tools
2
Hadoop: A Representative Distributed System
3
Hadoop: A Representative Distributed System
HDFS (Storage) MapRedue (Computation)
10
Number of Reported Issues
8
6
(x1000)
4
2
0
2008 2009 2010 2011 2012 2013 2014 2015
The Evolution of Apache Hadoop
3
Hadoop: A Representative Distributed System
HDFS (Storage) MapRedue (Computation)
10
Number of Reported Issues
8
6
……
(x1000)
4
2
0
2008 2009 2010 2011 2012 2013 2014 2015
The Evolution of Apache Hadoop
3
Hadoop: A Representative Distributed System
HDFS (Storage) MapRedue (Computation)
10
Number of Reported Issues
8
6
……
(x1000)
4
2
0
2008 2009 2010 2011 2012 2013 2014 2015
The Evolution of Apache Hadoop
4
What Can We Learn From Issues?
Related Work
[Gunawi et al., SoCC’14]
What Bugs Live in the Cloud?
Programming
Systems Tools
4
Our Findings
5
Our Findings
MapReduce HBase
HDFS
Hadoop Ecosystem
6
Methodology Used in Our Study
Hive … Pig Flume
HCatalog … Mahout Cascading
Computation MapReduce HBase
Storage HDFS
Hadoop Ecosystem
6
Methodology Used in Our Study
Hive … Pig Flume
HCatalog … Mahout Cascading
Computation MapReduce HBase
Storage HDFS
Hadoop Ecosystem
6
Methodology Used in Our Study
Hive … Pig Flume
HCatalog … Mahout Cascading
Computation MapReduce HBase
Storage HDFS
Hadoop Ecosystem
6
Methodology Used in Our Study
Hive … Pig Flume
HCatalog … Mahout Cascading
Computation MapReduce HBase
Storage HDFS
Hadoop Ecosystem
6
Methodology Used in Our Study
Issues
Labeling
External Correlation
correlated issues appear in other systems
A
Internal Correlation
correlated issues appear in the same system
B C
B C
A
A significant number of issues are independent.
B C
A
Half of them are from YARN.
B C
B C
B C
8
How the Issues Are Correlated?
Similar Causes
Issues have similar causes
Fix on Fix
Issues are caused by fixing other issues
8
How the Issues Are Correlated?
30
20
10
0
HDFS MapReduce 8
How the Issues Are Correlated?
30
20
10
0
HDFS MapReduce 8
How the Issues Are Correlated?
30
20
10
0
HDFS MapReduce 8
On the Issue Correlations with System Characteristics
Programming
Systems Tools
9
On the Issue Correlations with System Characteristics
27%
47% 26%
Programming
Systems Tools
9
How Issues Relate to Systems?
10
How Issues Relate to Systems?
100
90
80
70
Percentage (%)
60
50
40
30
20
10
0
HDFS MapReduce
security networking storage file system memory cache 10
How Issues Relate to Systems?
100
90
80
GC is still the No.1 concern,
70
Percentage (%)
60
50 File system semantic:
40 namespace management, file permission,
30 consistency (e.g., fsck), etc.
20
10
0
HDFS MapReduce
security networking storage file system memory cache 10
How Issues Relate to Systems?
100 The statement of the 99.99% of data
90 reliability in cloud storage is challenged.
80
70
Percentage (%)
leak !
60 Peer peer = newTcpPeer(dnAddr);
50 - return newBlockReader(…)
+ try{
40 + reader = newBlockReader(…)
30 + return reader
20 + } catch (IOException ex) {
+ throw ex;
10 + } finally {
0 + if(reader == null) closeQuietly(peer);
HDFS MapReduce + }
60
50
40
30
20
10
0
HDFS MapReduce
typo lock interface maintenance
11
How Issues Relate to Programming?
100
90
80
70
Percentage (%)
60
50 Half of them relate to code maintenance.
40
30
20
10
0
HDFS MapReduce
typo lock interface maintenance
11
How Issues Relate to Programming?
100
90 Mainly caused by interface changes.
80
70
Percentage (%)
60
50
40
30
20
10
0
HDFS MapReduce
typo lock interface maintenance
11
How Issues Relate to Programming?
100 5.6% of programming issues
90 are caused by typos !
80
70
Percentage (%)
60
50
40
30
20
10
0
HDFS MapReduce
configuration debugging documents testing 12
How Issues Relate to Tools?
100
Logs are misleading:
90
incorrect, incomplete, indistinct output.
80
70
Percentage (%)
60
50
40
30
20
10
0
HDFS MapReduce
configuration debugging documents testing 12
How Issues Relate to Tools?
100
Logs are misleading:
90
incorrect, incomplete, indistinct output.
80
70
Percentage (%)
60
50 Accessing a non-exist file via WebHDFS,
40 FileNotFoundException is expected, but we get this
30
20
10
0
HDFS MapReduce Logs
Programming
Systems Tools
13
Thanks!
Jian Huang
jian.huang@gatech.edu
Q&A