Professional Documents
Culture Documents
Software Analytics
Data Analytics for Software Engineering
gebeyehu2009@gmail.com
Software
Services
Software big data represents data in the form of coding, design, flow, usability, etc.
attributes
BDU: Bahir Dar Institute of Technology: Computing Faculty
2
New era … Software itself is changing
Source of software big data includes
Software lifecycle itself, Intelligence devices,
Usability, Practitioner,
Developer, Users,
Individual Isolated
Individual Isolated
Social
Data pervasive
Code Centric
In-lab Testing Debugging in the large
A huge wealth of various data exists in software lifecycle, including source code,
feature specifications, bug reports, test cases, execution traces/logs, and real-world user
feedback, etc.
Data plays an essential role in modern software development, because hidden in the
data is information about the quality of software and services as well as the
dynamics of software development.
System quality such as reliability, performance and security, is the key to success
of modern software systems.
As the system scale and complexity greatly increase, larger amount of data, e.g.,
run-time traces and logs, is generated; and data has become a critical media to
monitor, analyze, understand and improve system quality.
Usage data collected from the real world reveals how users interact with software
and services.
Software Software
Covering different areas of
System Users software domain
Software data analyzing records with real-time updating, as the data software change from
day to day, which demands update on a daily basis.
For each research topics we have locate data features to make data palatable for
computational analysis as we proposed,
We approach organizing the data, so an analyst wishing to study trends in our research
goals and interests could narrow the data down and do the necessary analytics to gain
value?
Keep in mind that the data are in varied formats (numbers, addresses (x-y), text, data-base, video,
audio).
BDU: Bahir Dar Institute of Technology: Computing Faculty
164
Output – insightful information
• Conveys meaningful and useful understanding or knowledge
towards completing the target task
• Not easily attainable via directly investigating raw data without aid of
analytics technologies
• Examples
– It is easy to count the number of re-opened bugs, but how to find out the
primary reasons for these re-opened bugs?
– When the availability of an online service drops below a threshold, how to
localize the problem?
BDU: Bahir Dar Institute of Technology: Computing Faculty
176
Output – actionable information
• Enables software practitioners to come up with concrete solutions
towards completing the target task,
• Examples
– Why bugs were re-opened?
• A list of bug groups each with the same reason of re-opening,
• Getting real
• As modern software systems tend to get more and more complex, given limited
time and resource before software release, development- site testing and
debugging become more and more insufficient to ensure satisfactory software
performance.
BDU: Bahir Dar Institute of Technology: Computing Faculty
209
Performance debugging in the large
Network
Trace Storage
Trace collection
Network
Trace Storage
Trace collection
Trace analysis
Bug Database
Network
Bug filing
Trace Storage
Trace collection
Trace analysis
Problematic Pattern
Repository Bug Database
Network
Bug filing
Trace Storage
Trace collection
Trace analysis
Bug update
Problematic Pattern
Repository Bug Database
Network
Bug filing
Trace Storage
Trace collection
Trace analysis
Bug update
Problematic Pattern
Repository Bug Database
Network
Bug filing
Key to issue
discovery
Trace Storage
Trace collection
Trace analysis
Bug update
Problematic Pattern
Repository Bug Database
Network
Bug filing
Key to issue
discovery
Trace Storage Bottleneck of
Trace collection scalability
Trace analysis
Bug update
Problematic Pattern
Repository Bug Database
Network
How many issues are Bug filing
still unknown? Key to issue
discovery
Trace Storage Bottleneck of
Trace collection scalability
Trace analysis
Bug update
Problematic Pattern
Repository Bug Database
Network
How many issues are Bug filing
still unknown? Key to issue
discovery
Trace Storage Bottleneck of
Trace collection scalability
Which trace file should I
Trace analysis
investigate first?
Combination of expertise
• Generic machine learning tools without domain
knowledge guidance do not work well
Detect a
service Fix root cause
issue via
Investigate postmortem
the problem analysis
Debugger No Debugger
Controlled
Live Data
Environment
Information Visualization
Software Software Vertical
System Users
Software big data refers to large software data-sets that overflow ordinary data
management systems,
Software big data is data that is software and its service referenced, which is
common analytics techniques, mapping and software analytics can be applied,
Software big data methods allow multidimensional screening and “data mining” to
locate parts of the mass that are showing interesting relationships, trends, or
comparisons.
Those interesting parts of a Software big data set can be sorted into small data-sets
that can have the more powerful traditional analysis methods applied to them,
BDU: Bahir Dar Institute of Technology: Computing Faculty
37
Question: