Professional Documents
Culture Documents
BY
PROF. A.BHATTACHERJEE
• Statistics deals with the collection, presentation, analysis, and interpretation of numerical
data. The modern day statistics deals primarily with statistical inference.
• The objective of statistical inference is to make an inference about a population based on the
information obtained from sample/samples.
• One of the most common and useful presentation of data sets is the frequency table and its
corresponding graph, i.e., the histogram .
• A frequency table records how often observed values fall within certain intervals or class.
The frequency table is also being presented graphically in the form of a histogram .
• The important feature of most histogram can be captured by a few summary statistics such as
measures of location, measures of spread and measures of shape.
Measures of location
• This statistic in the first instance gives us information about where various parts of the
distribution lie. The mean, median and mode are the most useful statistics for measuring the
location of the centre of the distribution .
Mean
The sample mean, x , is the arithmetic average of the data values
1 n
x xi
n i 1
where n is the number of data and x1,x2,……..xn are the individual data values.
1 n
Hence Û = x xi
n i 1
For grouped data
Let xi = mid point of the i th class interval
fi = frequency associated with the i th class interval
n= total number of observations
x = sample mean
x
fixi
fi
x =14.8
Point And Interval Estimates Of Population Mean
1 n
Point estimate of µ= x
n
x
i 1
i
Interval estimation of µ:
x t / 2 s / n x t1 / 2 s / n
Case 2:When the sample size is more ( 30)
Interval estimation of µ:
x z / 2 s / n x z1 / 2 s / n
Where x is the sample mean,
s is the sample standard deviation,
is the population mean
n is the number of samples in each group
t1 / 2 and z1 / 2 are the standard variate at a (1-α) confidence level
Median: The median, M, is the midpoint of the observed values if they are arranged in
increasing order. Once the data are ordered so that x1 x2 .............. x n, the median can be
calculated from one of the following equations
M= x(n+1)/2 if n is odd
= (xn/2+xn/2+1)/2 if n is even
w
Median= L+ fm
(0.5n cfb)
w= interval width
Mode: The mode is the most frequent value. A data values can have more than one mode.
For histograms, a mode is a relative maximum
Measures of Spread
The measures of spread can be calculated by variance, standard deviation and the range
Variance: The variance is a measure of how spread out a distribution is. It is computed as
the average squared difference of the observed values from their mean. Since it involves
squared differences, the variance is sensitive to erratic high values. The sample variance, ,
is given by
2
s2
The sample variance s is an estimate of the population variance
1 n
2
s ( xi x ) 2
n 1 i 1
Standard deviation: The Standard deviation, σ, is simply the squared root of variance. It is
often used instead of the variance since its units are the same as the units of the variable
being described. The sample standard deviation provides an estimation of population
standard deviation
2
s ( xi x)
n 1
Range: Range is the simplest possible measure of dispersion and is defined as the difference
between the values of the extreme items of the series. Thus
Range= xn-xi
Where xn is the highest value of the data values and xi is the lowest value of the data values. The
utility of the range is that it gives an idea of the variability very quickly
Measures of shape
Skewness – Skewness is the measure of asymmetry and shows the manner in which the items
are clustered around the average. A distribution is skewed if one of its tails is longer than the
other .it is measured by:
3
E[( X ) ]
3
For a symmetric distribution skewness is zero.
Kurtosis – Kurtosis is the measure of flat-toppedness of a curve. The following is the
formula to calculate kurtosis.
n
(x
i 1
i x) 4
kurtosis 3
n 4
OBSVERATION NO x y
1 3.0 2.1
2 2.6 2.5
3 3.0 3.4
4 1.2 1.0
5 2.0 1.6
3 *
*
2 *
*
*
1
1 2 3 4
Covariance (X,Y)=бxy=E[{(X- x )}{(Y- Y )}]
n
n 1
Y =2.12
X =2.36
2
sx = 0.588
2
sy =0.827
sxy = 0.596
cov( x, y )
Correlation co-efficient= xy xy
sxy
• Sample correlation co-efficient= rxy
sxsy
n
( xi x)(Yi Y )
i 1
• = 2
2
( xi x) (Yi Y )
rxy = 0.855
Correlation Analysis
Step 3: Decision
If p-value < significance level : Reject H0 in favour of H1
If p-value > significance level : Reject H1 in favour of H0