You are on page 1of 31

Metrics for Decision Support

Introduction
• Focus on causal models
• So far we measured how much of effort is needed, what amount of
time is needed, and what software tools are needed to build up a
quality product, to fix up bugs and so on.
• One more ultimate goal for software metrics is to help software
professionals (be they developers, testers, managers, or maintainers)
make decisions under uncertainty.
• A software is said to have zero defect only if it is fully completed and
well used by customers.
1. FROM CORRELATION AND REGRESSION TO CAUSAL MODELS
• Correlation is a term that is a measure of the strength of a linear
relationship between two quantitative variables (e.g., height, weight).
• Positive correlation is a relationship between two variables in which both variables
move in the same direction. This is when one variable increases while the other
increases and visa versa. For example, positive correlation may be that the more
you exercise, the more calories you will burn.
• Whilst negative correlation is a relationship where one variable increases as the
other decreases, and vice versa.
• with a reasonably high level of accuracy, the values of one variable is
based on the values of the other then, the relationship between the two
variables is described as a strong correlation. (drying of clothes)
• A weak correlation is one where on average the values of one variable
are related to the other, but there are many exceptions. (purchase
intention on seeing an advertisement)
• R and P value in correlation:
• R-value (correlation co-efficient) defines the correlation between
two variables (positive, negative or zero). The value ranges from -1 to
+1.
• p-value tells us if the result of an experiment is statistically

significant. P-value varies between 0 and 1. It is a probability value

computed to decide whether to accept or reject null hypothesis. If

p<0.05 then null hypothesis is rejected.

• A statistical measurement used to validate a hypothesis against

observed data.
• The correlation coefficient are plotted using scatter diagram
Need for Correlation:

• To measure reliability, validity, data analysis and so on.

• To test whether certain data is consistent with hypothesis.

• To predict one variable on the basis of the knowledge of the other(s).

• To build psychological and educational models and theories.


Regression:
• 'Correlation' as the name says it determines the
interconnection or a co-relationship between the
variables. (relation between x &y)
• 'Regression' explains how an independent variable is
numerically associated with the dependent variable.
• In Correlation, there is no independent and
dependent variables.
• Simple linear regression relates X to Y through an
equation of the form Y = a + bX.
• Key differences
• Regression attempts to establish how X causes Y to change and the results of

the analysis will change if X and Y are swapped. With correlation, the X and Y

variables are interchangeable.

• Regression assumes X is fixed with no error, such as a dose amount or

temperature setting. With correlation, X and Y are typically both random

variables, such as height and weight or blood pressure and heart rate.

• Correlation is a single statistic, whereas regression produces an entire

equation.
• Key advantage of correlation
• Correlation is a more concise (single value) summary of the relationship
between two variables than regression.

• Key advantage of regression


• Regression provides a more detailed analysis which includes an equation
which can be used for prediction and/or optimization.
Topic Correlation Regression

For a quick and simple summary of the direction and


To predict, optimize, or explain a numeric response Y
When to use strength of pairwise relationships between two or more
from X, a numeric variable thought to influence Y.
numeric variables.

Quantifies direction of relationship Yes Yes

Quantifies strength of relationship Yes Yes

X and Y interchangeable Yes No

Prediction and Optimization No Yes

Equation No Yes

Extension to curvilinear fits No (works for straight line) Yes

Cause and effect No Attempts to establish


Example:
• Which tool, correlation or regression, would you use in each of these
scenarios:
• You have two measuring systems and you want to see how well they agree
with each other. So you measure the same 20 parts with each measuring
system - Correlation

• You want to predict blood pressure for different doses of a drug - Regression
Causal Relationship:
• Causality means that there is a clear cause-effect relationship between two
variables.

• A causal relationship is defined when one variable causes a change in


another variable. These types of relationships are investigated by
experimental research in order to determine if changes in one variable
actually result in changes in another variable.

• Ex: seasonal change decide the sales of ice cream.

• An independent variable (Drive) is the cause, and a dependent variable


(Drink) is the effect (accident).
• In causality analysis, the interaction between variables can be
determined. While x determines y, y can determine x.

• In regression analysis, there is a one-sided interaction. There are


dependent variable and independent variable/s.
2. BAYES THEOREM AND BAYESIAN NETWORKS

• In statistics and probability theory, the Bayes’ theorem (also known as the

Bayes’ rule) is a mathematical formula used to determine the conditional

probability of events. Essentially, the Bayes’ theorem describes

the probability of an event based on prior knowledge of the conditions

that might be relevant to the event.


• A-Cause, B-Evidence
Ex:
• Three acres of land have the labels A, B, and C. One acre has reserves
of oil below its surface, while the other two do not.
• The prior probability of oil being found on acre C is one third, or
0.333.
• But if a drilling test is conducted on acre B, and the results indicate
that no oil is present at the location, then the posterior probability of
oil being found on acres A and C become 0.5, as each acre has one
out of two chances.
• Posterior probability distributions should be a better reflection of the

underlying truth of a data generating process than the prior probability since

the posterior included more information.


• Bayes' theorem can be used in many applications, such as medicine,

finance, and economics.

• In finance, Bayes' theorem can be used to update a previous belief

once new information is obtained. Prior probability represents what is

originally believed before new evidence is introduced, and posterior

probability takes this new information into account.


• Bayes' theorem can be used to determine the accuracy of medical

test results by taking into consideration how likely any given person is

to have a disease and the general accuracy of the test.

• Bayes Rule is a prominent principle used in artificial intelligence to

calculate the probability of a robot's next steps.


• Hypotheses: The events E1, E2,… En is called the hypotheses

• Priori Probability: The probability P(Ei) is considered as the priori


probability of hypothesis Ei

• Posteriori Probability: The probability P(Ei|A) is considered as the


posteriori probability of hypothesis Ei
Bayesian Network
• A Bayesian network (also known as a Bayes network, belief network, or
decision network) is a probabilistic graphical model that represents a set of
variables and their conditional dependencies via a directed acyclic graph (DAG).

• Bayesian networks (BNs) are a progressively popular technology for software


testing, cognitive (thinking) Engineering (reasoning, judgement, memory,
creativity, problem solving) and support systems, because probability plays a
major role.
• A Bayesian network is a directed, acyclic graph whose nodes
represent random variables and arcs represent direct dependencies.
The arcs often, but not always, also represent direct causal
connections between the variables.

• The goal is to calculate the posterior conditional probability


distribution of each of the possible unobserved causes given the
observed evidence.

• It is a causal graph – represents cause and effect relationship


Bayesian Network to calculate DRE (Defect Removal Efficiency)

Cause-A
Effect/Evidence-B

Directed acyclic graph representing two independent possible causes of a Defect removable efficiency
Advantages of Bayesian network
• Bayesian Networks offer a graphical representation that is reasonably
interpretable and easily explainable.

• Relationships captured between variables in a Bayesian Network are more


complex yet hopefully more informative than a conventional model.

• Models can reflect both statistically significant information (learned from


the data) and domain expertise simultaneously.
• Multiple metrics can be used to measure the significance of
relationships and help identify the effect of specific actions.

• Offer a mechanism of suggesting counterfactual actions and combine


actions without aggressive independence assumptions.
• A counterfactual explanation describes a causal situation in the form: "If X
had not occurred, Y would not have occurred". For example: "If I hadn't taken
a sip of this hot coffee, I wouldn't have burned my tongue"
3. APPLYING BAYESIAN NETWORKS TO THE PROBLEM OF
SOFTWARE DEFECTS PREDICTION
• BNs offer the following benefits:
• Explicitly model causal factors.
• Reason from effect to cause and vice versa.
• Overturn previous beliefs in the light of new evidence (also called “explaining
away”).
• Make predictions with incomplete data.
• Combine diverse types of evidence including both subjective beliefs and
objective data.
• Arrive at decisions based on visible auditable reasoning
3.1 A Very Simple BN for Understanding Defect Prediction

Defects-Low, Medium ,High


Testing quality-Good, Poor
3.2. A Full Model for Software Defects and Reliability Prediction
Using this model we can measure
1.Defects
2.Project complexity
3.Testing quality
4.Operational usage

Which in turn predicts the


reliability level of software

Operational usage – usage level of product users


3.3 Commercial Scale Versions of the Defect Prediction Models

• These models are based around a sequence of testing phases, such as


system testing, integration testing, and acceptance testing.
• Corresponding to each phase is a “subnet”, where a subnet is a
component of the BN with interface nodes to connect the component
subnet to other parent and child subnets.
• For the final “operational” phase, there is no need to include the
nodes associated with defect fixing and insertion.

You might also like