You are on page 1of 44

 Statistics

The science of collecting, organizing,


presenting, analyzing, and interpreting
data to assist in making more effective
decisions.
 Descriptive statistics – Methods of
collecting, organizing, summarizing, and
presenting data in an informative way

 Inferential statistics – This concerned with


Analysis, interpretation and making
conclusion about the population on the
basis of a sample data.
 A population is the collection of all objects,
outcomes, responses, measurements, or
counts under consideration.

 A sample is a subset of a population.


Example:
 In a recent survey, 250 college students at IFM were asked if
they smoke cigarettes. 35 of the students said yes. Identify
the population and the sample.

All students at IFM


(population)

Responses of 250
students in survey
(sample)
Data sets can consist of two types of data: Qualitative
data and Quantitative data.
Data

Qualitative Quantitative
Data Data
Consists of Consists of
attributes, numerical
labels, or measurements or
nonnumerical counts.
entries.
Example:
 The grade point averages of five different programes at IFM
students are listed in the table below. Which data are
qualitative and which are quantitative?
Programes Average
GPA
BTCBF A 3.22
BTCBF B 3.98
BTCBA 2.75
BTCISP 2.24
BTCS 3.84
Qualitative data Quantitative data
The level of measurement determines which statistical
calculations are meaningful. The four levels of
measurement are:

Nominal
Lowest
Levels Ordinal to
of Highest
Interval
Measurement
Ratio
Data at the nominal level of measurement are
qualitative only.
Nominal
Levels It is concerned with names,
labels, or qualities. No
of
Mathematical computations
Measurement can be made at this level.
Examples of these classifications
include gender, nationality,
ethnicity, language, etc
Data at the ordinal level of measurement are
qualitative or quantitative.

Levels Ordinal
of
Arranged in order, but
Measurement differences between data
entries are not meaningful.

Class standings: Numbers on the Top 50 songs


freshman, Junior, back of each played on the
Senior player’s shirt radio
Data at the interval level of measurement are
quantitative. A zero entry simply represents a position on
a scale; the entry is not an inherent zero.
Levels
of Interval
Measurement Arranged in order, the differences
between data entries can be
calculated.

Years on a
Temperatures
timeline
Data at the ratio level of measurement are similar to the
interval level, but a zero entry is meaningful.

Levels A ratio of two data values can be


of formed so one data value can be
Measurement expressed as a ratio.

Ratio

Grade point
Ages Weights
averages
Arrange Determine if one
Level of Put data in Subtract data
data in data value is a
measurement categories values
order multiple of another
Nominal Yes No No No
Ordinal Yes Yes No No
Interval Yes Yes Yes No
Ratio Yes Yes Yes Yes
 Primary data: Are the data collected directly from
the original source specifically for the analysis
desired.

 Primary data can be obtained by:


a. Questionnaire
b. Interviewing
c. Observation
d. Measurement
e. Focused group
 Secondary data: These are the data that have
already been compiled and are available for
statistical analysis
 Secondary data is obtained from
a. Various Government Reports
b. Statistical Authorities
c. Trade Associations
d. Various Publications
e. University Research Bureaus
f. Commercial and Financial Reports, etc
 When data has been collected from a sample
or population, it needs to be summarized and
presented in a readable form which can be
easily understood.
▪ Data can be presented in the following
forms
a. Tabular form
b. Graphically
 The following table shows the Distribution
of 50 patients at the surgical department of
Alexandria hospital in May 2018 according to
their ABO blood groups.
Blood group Frequency %

A 12 24
B 18 36
AB 5 10
O 15 30
Total 50 100
 Construction of a grouped frequency distribution
table involves the following steps:
a. Determine both the highest value H, and the lowest value L
b. Determine the Range; R = H – L
c. Determine the number of classes, K

R Range
a. Calculate the class size/Class width; C = =
K No.ofclasses

a. Chose the starting point, make sure that the smallest value
is included.
A class was given a test in Set Theory and the following
are the grades obtained by the students.

75 32 65 73 82 70 45 76 70 54
64 72 67 75 65 60 50 87 83 40
93 89 75 58 89 70 75 55 61 78
85 65 51 43 59 38 73 71 75 85
63 45 97 49 55 60 65 75 69 35
Group these grades into a Frequency distribution using 7 classes.
 We calculate the Range, R = H – L: 97-32=65
 No. of classes k = 7

 The number of classes is 7, clss width is


C =R/K =65/7 =9.28 ≈ 10
▪ Lets choose the lowest point to start with be
30
Classes Frequency

30 – 39 3
40 – 49 5
50 – 59 7
60 – 69 11
70 – 79 15
80 – 89 7
90 – 99 2
Total 50
 HISTOGRAM
This is one of the methods in which data can be
presented.

It is constructed by putting frequencies on


Vertical axis and Class boundaries on the
Horizontal axis.
Example:
Distribution of a group of cholera patients by age

Age (years) Frequency %


25-30 3 14.3
30-40 5 23.8
40-45 7 33.3
45-60 4 19.0
60-65 2 9.5
Total 21 100
% 35
30
25
20
15
10
5
0
0

25

30

40

45

60

65
Age (years)
Distribution of 100 cholera patients
%
45
40
35
30
25
20
15
10
5
0
Single Married Divorced Widowed
Marital status
Pie chart

Deletion
Inversion
3%
18%

Translocation
79%
Graphical Presentation

▪ FREQUENCY POLYGON

It is constructed by putting frequencies on


Vertical axis and Class Marks (CM) on the
Horizontal axis.

See the following example.


Frequency polygon Sex
Age M F CM
(%) (%)
20-30 12 10 25
30-40 36 30 35
Males Females
% 40-50 8 25 45
40 50-60 16 15 55
35
60-70 8 20 65
30
25
20
15
10
5
0
Age
25 35 45 55 65

Distribution of 45 patients by age and sex


 It is useful to define numerical measures that
describe important features of the data set.
 The most commonly used measures of
central tendency/location are the:
I. Mean
II. Median, and
III. Mode
Common notations to used frequently

▪ X - Data Value
▪ f - Frequency of the data value
▪ N - Total number of observation (Total
frequency)

▪ Σ - Sum of the values


 Arithmetic Mean for Ungrouped data
We use the symbol X to denote the Sample
mean (Arithmetic Mean)
1: Direct Method
x1 + x2 + ... + xn
X =
N

X =
 x
=
 fx

i.e f f
 Find an Arithmetic mean of the following
data: 12 34 56 34 21 23 1
19 17 12 34 53
X=  x 316
= = 26.3
 f 12
 The median is the middle value when the data are
arranged in ascending or descending order.
Then, the median for Ungrouped data (Discrete)

 N +1
th

Median =  
 2 
For instance, given the following data 4,6,12,8,6,7,3.
What is its Median?
Solution
Arrange the values in either ascending or
descending order

3, 4, 6, 6, 7, 8, 12
Then,  N +1
th
 7 +1
th

Median =   →  = (4)
th

 2   2 
Median = 6
 The Median for grouped data is given by
N 
 − fb 
Median = L +   C
2
fm

Where;
L – Lower boundary of the median class
N – Total frequency
fb - Cumulative frequency below the median class
fm – frequency of the median class
C - Median class width
 Mode is the value with the highest frequency,
in other words, mode is the value which
occurs most often.

 It is useful when it is not possible to calculate


Mean and Median.
 The following formula is used to calculate the
mode for grouped data.
Ct1
Mode = L +
(t1 + t2 )
Where,
L – Lower boundary of the modal class
t1 - The difference btn the freq. of the modal class and the freq.
of the preceding class
t2 – The difference btn the freq. of the modal class the freq. of
the post-modal class
C - Modal class width
Income 300-309 310-319 320-329 330-339 340-349 350-359 360-369 370-379
($)

No. of 9 20 24 38 48 27 17 6
workers
Income($) Frequency Cumulative Frequency
299.5 – 309.5 9 9
309.5 – 319.5 20 29
319.5 – 329.5 24 53
329.5 – 339.5 38 91
339.5 – 349.5 48 139
349.5 – 359.5 27 166
359.5 – 369.5 17 183
369.5 – 379.5 6 189
 Median is given by
N 
 − fb 
Median = L +   C
2
fm

The Median class is 339.5 – 349.5


Thus,
 L = 339.5
 N = 189
 fb = 91
 fm =48
 C = 10
 Thus,  189 
 − 91
Median = 339.5 +   10 = 340.23
2
48

 The mode is given by Ct1


Mode = L +
(t1 + t 2 )

And the modal class is 339.5 – 349.5


 L = 339.5
 C = 10
 T1 = 48-38=10
 T2 = 48-27=21
10 10
Mode = 339.5 +
(10 − 27)
Mode = 242.72

You might also like