You are on page 1of 32

Probability and Statistics

INTRODUCTION

1
STATISTICS

Statistics is the science of


conducting studies to collect,
organize, summarize, analyze,
and draw conclusions from data.
The mathematics of the
collection, organization, and
interpretation of numerical data,
especially the analysis of
population characteristics from
sample datasets.

2
STATISTICAL SOFTWARE ENGINEERING
APPLICATIONS OF STATISTICS IN S.E.
Software Measurements
 Establishment of Measurements, Analysis
and Forecasting of future events. For
Example;
• Measurement of Bugs Density
• Measurement of Invalid Processing of Bugs
• Measurement of Bugs reported by client
• Measurement of Project Issues
• Measurement of rejected Baseline requests

3
STATISTICAL SOFTWARE ENGINEERING
APPLICATIONS OF STATISTICS IN S.E.
Quality Management
 Statistical methods for quality control
 Analysis of Bugs, NCs and Issues

4
STATISTICAL SOFTWARE ENGINEERING
APPLICATIONS OF STATISTICS IN GENERAL
Analysis of results of surveys
Quantitative Research Methodology
 Descriptive Statistics
 Correlation
 Regression
 Hypothesis Testing
Quality Control
 P Charts, Control Charts

5
STATISTICAL SOFTWARE ENGINEERING
APPLICATIONS OF STATISTICS IN GENERAL
Analysis of computation of algorithms
 Complexity and Performance
Analysis of Network traffic
Analysis of CPU and Memory utilization
Progress reporting to top Management
 Graphs and Charts
Prediction and Forecasting of future
events

6
STATISTICAL SOFTWARE ENGINEERING
APPLICATIONS OF STATISTICS IN GENERAL
Development of research Instruments
 Reliability Analysis
 Validity Analysis
Analysis of quality of software

7
Types of Statistics

Descriptive Statistics Inferential Statistics

• Descriptive statistics • Inferential statistics


consists of the collection, consists of generalizing
organization, from samples to
summarization, and populations, performing
presentation of data hypothesis tests,
• Charts, Graphs, Tables, determining relationships
Mean, Median, Mode etc among variables, and
making predictions.

8
VARIABLE

A variable is a characteristic or
attribute that can assume
different values.
 For Example, if the duration of 30
activities were measured, then
duration would be a variable.

9
CLASSIFICATION OF VARIABLES

QUALITATIVE VARIABLES QUANTITATIVE VARIABLES

Qualitative variables are Quantitative variables are


variables that can be numerical and can be
placed into distinct ordered or ranked.
categories, according to e.g. Professional
some characteristic or Experience of Employee
attribute. (in years), Budget of
e.g. Gender, Geographical Project, No. of bugs in a
Location of team, Nature release.
of Project, Designation of
employee

10
CLASSIFICATION OF VARIABLES
QUANTITATIVE VARIABLES

Quantitative variables can be


further classified into two
groups:
 Discrete and Continuous

11
CLASSIFICATION OF VARIABLES
1. DISCRETE VARIABLES

Discrete variables can be


assigned values such as 0, 1, 2,
3 and are said to be countable.
e.g.
 No. of software projects completed
by a company
 No. of software engineers in a
team (in matrix based
organization)
12
CLASSIFICATION OF VARIABLES
2. CONTINUOUS VARIABLES

Continuous variables can


assume an infinite number of
values between any two specific
values. They often include
fractions and decimals. e.g.
 Budget of a software project
 Computed Bugs Density against a
release/build

13
CLASSIFICATION OF VARIABLES w.r.t. MEASUREMENTS

In addition to being classified as


qualitative or quantitative,
variables can be classified by how
they are categorized, counted, or
measured.
Measurement Scale has four types:
 Nominal
 Ordinal
 Interval
 Ratio

14
CLASSIFICATION OF VARIABLES w.r.t. MEASUREMENTS

1. NOMINAL LEVEL OF MEASUREMENT


The nominal level of measurement
classifies data into mutually
exclusive (non-overlapping)
categories in which no order or
ranking can be imposed on the
data.
 e.g. Gender
 Projects completed by company
 Skills of Employee
 Cities
15
CLASSIFICATION OF VARIABLES w.r.t. MEASUREMENTS

1.1 DICHOTOMOUS
Dichotomous is a special type of
Nominal variable that comprises
only two possible values.
 E.g. Gender (Male, Female)
 Unit Test Result ( Pass, Fail)
 Sanity Testing Result ( Pass, Fail)

16
CLASSIFICATION OF VARIABLES w.r.t. MEASUREMENTS

2. ORDINAL LEVEL OF MEASUREMENT


The ordinal level of measurement
classifies data into categories that
can be ranked.
Mutually exclusive groups + order
 E.g. Severity of Bugs ( Level-1, Level-
2, Level-3, Level-4)
 Priority of Change Request ( High,
Medium, Low)

17
CLASSIFICATION OF VARIABLES w.r.t. MEASUREMENTS

3. INTERVAL LEVEL OF MEASUREMENT


The interval level of measurement
ranks data.
Precise differences between
Interval and Ratio measure do
exist; however, there is no
meaningful zero.
Interval variables have ordered
categories that are equally spaced.
 E.g. Temperature (73 oF)
 Calculated Bugs Density
18
CLASSIFICATION OF VARIABLES w.r.t. MEASUREMENTS

4. RATIO LEVEL OF MEASUREMENT


The ratio level of measurement
possesses all the characteristics of
interval measurement, and there
exists a true zero.
 E.g. No. of Bugs
 Estimated Effort for new project
 Duration of Project
 Delay of schedule

19
TYPES OF VARIABLES IN SPSS

Nominal Variable  Nominal


Ordinal Variable  Ordinal
Interval & Ratio Variables  Scale

20
SOME MORE DEFINITIONS
Data
 Data are the values (measurements or
observations) that the variables can
assume.
Data Set
 A collection of data values forms a data
set. Each value in the data set is called a
data value or a datum.

21
SOME MORE DEFINITIONS
Population
 A population consists of all subjects
(human or otherwise) that are being
studied.
Sample
 A sample is a group of subjects selected
from a population.

22
CASE STUDY NO. 1 (HR POLICY)
Read the following HR policy of a
software house regarding annual
increments of employees, and answer
the questions.
 Employees who meet their deadlines 95-
100% of the time usually receive Rs. 20k
as an increment in their salary. Employees
who meet their deadline 80-90% of the
time usually receive Rs. 10k, and
employees who meet their deadlines less
than 80% of the time usually receive Rs.
5k as an increment in their salary.
23
CASE STUDY NO. 1 (HR POLICY)
Based on this information, ‘Meeting
deadlines’ and ‘Annual increments’ are
related. The more you meet deadlines,
the more likely it is you will receive a
higher increment. If you improve your
performance and meet deadlines of
maximum tasks, your annual
increment will probably improve.

24
CASE STUDY NO. 1 (HR POLICY)
QUESTIONS
1. What are the variables under study?
2. What are the data in the study?
3. Are descriptive, inferential, or both
types of statistics used?
4. What is the population under study?
5. Was a sample collected? If so, from
where?
6. From the information given, comment
on the relationship between the variables.

25
CASE STUDY NO. 1 (HR POLICY)
ANSWERS
1. The variables are ‘Meeting deadlines’
and ‘Annual Increments’
2. The data consists of ‘Percentage of
Meeting Deadlines’ and ‘Amount of
increments’
3. These are descriptive statistics;
however, inference statement is also
present (i.e. Based on this information, ‘Meeting
deadlines’ and ‘Annual increments’ are related). So
these are also inferential statistics.
26
CASE STUDY NO. 1 (HR POLICY)
ANSWERS
4. The population under study is the
employees of software house.
5. Not specified
6. Based on the data, it appears that, in
general, the better you meet deadlines,
the higher will be your annual increment.

27
CASE STUDY NO. 2 (PROJECTS QUALITY)
 Quality Management department of a
software house has published the
number of open bugs of five ‘In-
Progress’ software projects, during
Annual Quality review meeting.
Project Name No. of Open Bugs
Project 1 500
Project 2 600
Project 3 350
Project 4 265
Project 5 1325

28
CASE STUDY NO. 2 (PROJECTS QUALITY)
QUESTIONS
 1. What are the variables under
study?
2. Categorize each variable as
quantitative or qualitative.
3. Categorize each quantitative
variable as discrete or continuous.
4. Identify the level of measurement
for each variable.

29
CASE STUDY NO. 2 (PROJECTS QUALITY)
QUESTIONS
5. ‘Project 4’ shows minimum number
of ‘Open Bugs’. Does that mean
‘Project 4’ is most successful project
among all 5 projects?

30
CASE STUDY NO. 2 (PROJECTS QUALITY)
ANSWERS
 1. The variables are
‘Project Name’ and ‘No. of Open Bugs’.
2. ‘Project Name’ is a Qualitative
variable, while ‘No. of Open Bugs’ is
quantitative variable.
3. The ‘No. of Open Bugs’ is Discrete
variable.
4. ‘Project Name’ is Nominal, while
‘No. of Open Bugs’ is ratio.
31
CASE STUDY NO. 2 (PROJECTS QUALITY)
ANSWERS
5. ‘Project 4’ shows minimum number
of ‘Open Bugs’: However, there may
be other things to consider, Size of
Project, Schedule of Project,
Compliance with client requirements.
Therefore, it is not necessary that a
project with minimum Open bugs is
most successful project of company.

32

You might also like