Professional Documents
Culture Documents
The Sample
(a
The Universe representative
(we can never part of the
really understand Participant Selection universe, it is
what is going on nice and small,
here, it is just too and we can
big) understand
this)
Statistics
The mathematical description
inference
of the sample Analysis
Types of data
Constant Variables
Variable : A variable is that factor whose
values changes from time to time, place to place
or individual to individual.
Variables are
things that
vary and change
Variable
• Age of study participants
• Height of students
• No of Customers in a company
• No of daily transactions
• No of invoices
• Hair color
• Exam result
For example, suppose a company is being audited for invoice errors.
Instead of examining all 15,472 invoices produced by the company
during a given year, an auditor may select and examine just 100
invoices. If he is interested in the "invoice error status," he would
record (measure) the status (error or no error) of each sampled
invoice.
A store located in a densely populated metropolitan area may have higher sales
than a store in a sparsely populated rural area.
Similarly, customers may go shopping when the weather is pleasant, but few would
venture outside in stormy or snowy weather.
Some variables have a circular relationship with sales. For example, sales
depend on advertising, but the level of advertising expenses also depends
on sales.
Nominal
Qualitative
Ordinal
Variable
Discrete
Quantitative
Continuous
Quantitative: Discrete variables
Examples:
Number of children in a family,
Number of rooms in a school etc.
No of customers to a company
No of years of education of the people
No of Daily invoices
Quantitative: continuous variables
Example
Height of a person up to a fraction of an inch,
weight up to grams etc.
Age up to months
Temperature
Qualitative: Nominal variables
Example
Eye color
Male-Female
Sick-well
Married-Single-Divorced
Blood group
Qualitative: Ordinal variables
Example
Students grades (A+, A B+ ,B….)
Socio economic status (Low, Middle, High)
Education level (Low ,High)
Olympic medals (Gold, Silver, Bronze)
Shoe quality (Ordinary, good, best)
Identify: quantitative (discrete or continuous) qualitative
(nominal or ordinal)
• Blood group
• Number of student
• Students grades
• Height of students
• No of flights
• Computerized National Identity Card No.
• Smoking status
• Gender
• Salary
• Race
• No of Transactions
(Greek letter Sigma) short hand for addition
Mean X
Stand. Dev. S
Variance S2 2
Size n N
Numerical Data
Properties
Central
Dispersion
Tendency
Mean Range
Variance
Median
Mode Standard Deviation
In most of the cases the data have a tendency to gather
around middle observed value
In other words, some central value gives a special
characteristics of the data.
You go around and talk to your classmates and find out that the average
score on the quiz was 43%.
In this instance, your score was significantly higher than those of your
classmates. Since your teacher grades on a curve, your 60% becomes an
A.
• 1. Measure of Central Tendency
• 2. Most Common Measure
• 3. Acts as ‘Balance Point’
• 4. Affected by Extreme Values (‘Outliers’)
• 5. Formula (Sample Mean)
Following weights (in kg) represent the data of 10 students of a
randomly selected class of IoBM
X = 45 48 52 54 51 59 60 58 57 48
Mean
Mean = 60
Mean
Ordered values
X = 45 48 48 51 52 54 57 58 59 60
No of observation(10) is even
(n/2)th term =5th term = 52
and (n/2+1) th term = 6th term= 54
Median is = (52+ 54) / 2 = 53kg
Following weights (in kg) represent the data of 10 students of a
randomly selected class of IoBM
X = 82 48 52 54 51 59 60 58 88 48
n= number of data observation
Here n = 10
Median
Ordered values
X = 48 48 51 52 54 58 59 60 82 88
No of observation( 10) is even
(n/2)th term = 5th term = 54
and (n/2+1) th term= 6th term = 58
Mode
Most repetitive value is 48
( 48 repeat twice)
Hence mode is 48kg
Following weights (in kg) represent the data of 10 students of a
randomly selected class of IoBM
X = 82 48 52 54 51 59 60 58 88 48
Mode
Most repetitive value is 48
(48 repeat twice)
Hence mode is 48kg
• The dispersion of data reveals how the observations
are spread out or scattered on each side of the
center.
• To measure the dispersion, scatter, or variation of a
data is as important as to locate the central
tendency.
• If the dispersion is small, it indicates high uniformity
of the observations in the data set.
• Absence of dispersion in the data indicates perfect
uniformity. This situation arises when all
observations in the data are identical.
7 7 7 8
3 2
7 77 7 77
7 8 13
7 6
9
Mean = 71 Mean = 71
Median = 71 Median = 71
Mode = 71 Mode = 71
The three measure of central tendency have the same
value 71. Nonetheless is clear that two data sets are
quite different.
Range = 60 – 45 = 15
The range is used when
you have ordinal data or
you are presenting your results to people with little or
no knowledge of statistics
The range is rarely used in scientific work as it is
fairly insensitive
It depends on only two values in the set of data, XL and XS.
Two very different sets of data can have the same range:
1 1 1 1 9 vs 1 3 5 7 9
If x1, x2, x3,…, xn are n values of variable x then
the variance can be defined as
First subtract the mean from each of the observation
This difference is called a deviation observation
The deviate tells us how far a given score is from the
Mean.
Square the deviates
Variance is the mean of the squared deviations.
Observations Deviates Squared deviates
45 (45-53.2) = -8.2 (-8.2)2 = 67.24
48 (48-53.2) = -5.2 (-5.2)2 = 27.04
52 (52-53.2) = -1.2 (-1.2)2= 1.44
54 (54-53.2) = 0.8 (0.8)2= .64
51 (51-53.2) = -2.2 (-2.2)2= 4.84
59 (59-53.2) = 5.8 (5.8)2= 33.64
60 (60-53.2) = 6.8 (6.8)2= 46.24
58 (58-53.2) = 4.8 (4.8)2= 23.04
57 (57-53.2) = 3.8 (3.8)2= 14.44
48 (48-53.2) = -5.2 (-5.2)2= 27.04
Mean= 53.2 Sum = 0 Mean(squared deviation) = variance =
24.56 kg2
Large variance means that observations are
far away from mean value and small variance
means that observations are near to mean
value.
When the deviations are squared in variance, their unit of
measure is squared as well
E.g. If people’s weights are measured in pounds, then the
variance of the weights would be expressed in pounds2 (or
squared pounds)
Since squared units of measure are often awkward to deal
with, the square root of variance is often used instead
The standard deviation is the square root of variance
Following weights (in kg) represent the data of 10 students of a
randomly selected class of IoBM
X = 45 48 52 54 51 59 60 58 57 48
Coefficient of Variation
The coefficient of variation (CV) represents the ratio of the standard
deviation to the mean, and it is a useful statistic for comparing the
degree of variation from one data series to another.
When two data sets have same unit but same standard deviations as
well,
Mean = 71 Mean = 71
Median = 71 Median = 71
Mode = 71 Mode = 71
Now calculate the coefficient of variances of both data sets A and B
Box and Wishker Plot