Professional Documents
Culture Documents
STATISTICAL ANALYSIS
TRP: An example
Television rating point (TRP) is a tool provided to judge which programs are viewed the most. This
gives us an index of the choice of the people and also the popularity of a particular channel.
For calculation purpose, a device is attached to the TV sets in few thousand viewers’ houses in
different geographic and demographic sectors.
◦ The device is called as People's Meter. It reads the time and the programme that a viewer
watches on a particular day for a certain period.
An average is taken, for example, for a 30-days period.
The above further can be augmented with a Personal Interview Survey (PIS), which becomes the
basis for many studies/decision making.
Note:
A data set is composed of information from a set of units.
An observation consists of one or more pieces of information about a unit; these are called
variables.
STATISTICS PREPARED BY CHITRAPRIYA N., CSE DEPT 4
Defining Population
Note:
1. All people in the country/world is not a population.
Example: All students studying in Class XII is a population, whereas those students belong to a
given school is sample.
Note:
◦ Normally In inferential statistics, a sample is obtained in such a way as to be representative
of the population.
An unbiased sampling procedure is one in which each observation in the population has an
equal chance of being chosen for the sample.
◦ Measures of dispersion
When each sample value is associated with a weight , for i = 1,2,…,n, then it
is defined as
Note
When all weights are equal, the weighted mean reduces to simple mean.
(Distributive Measure)
If a new observation is added to a sample of size n with mean , the new mean is given by
If m observations with mean , are added (removed) from a sample of size n with mean ,
then the new mean is given by
median
Mean<median
mean
You and your friends have just measured the heights of your dogs (in millimeters):
The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and 300mm.
Find out the Mean, the Variance, and the Standard Deviation.
2. To calculate the Variance, take each difference, square it, and then average the result:
Variance
σ2 = (2062 + 762 + (−224)2 + 362 + (−94)2 ) /5
= (42436 + 5776 + 50176 + 1296 + 8836)/5
= (108520)/5
= 21704
And the Standard Deviation is just the square root of Variance, so:
Standard Deviation
σ = √21704
= 147.32...
So, using the Standard Deviation we have a "standard" way of knowing what is normal, and what is
extra large or extra small.
The larger the variance or standard deviation, the greater the variation in the data
around its mean.
Frequency 6 16 24 25 17
(f)
◦ Percentile
◦ The percentile of a set of ordered data can be defined as follows:
o Given an ordinal or continuous attribute x and a number p between 0 and 100, the pth percentile is
a value of x such that p% of the observed values of x are less than
o Example: The 50th percentile is that value such that 50% of all values of x are less than .
◦ Note: The median is the 50th percentile.
If i is an integer, the pth percentile is the average of the values in positions i and i+1
We need to find the 6th and 7th numbers in the sorted data set.
11
In this formula, X represents the independent variable, Y represents the dependent variable, N
represents the number of data points in the sample, x-bar represents the mean of the X, and y-
bar represents the mean of the dependent variable Y.
The result is positive, meaning that the variables are positively related.
STATISTICS PREPARED BY CHITRAPRIYA N., CSE DEPT 49
EXAMPLE
To calculate the correlation
A positive correlation coefficient less than one indicates a less than perfect positive
correlation.
STATISTICS PREPARED BY CHITRAPRIYA N., CSE DEPT 50
EXAMPLE:
For example, suppose you take a sample of stock returns from the Excelsior Corporation and the
Adirondack Corporation from the years 2008 to 2012, as shown here:
What are the covariance and correlation between the stock returns?
Year Excelsior Corp. Annual Adirondack Corp.
Return (percent) (X) Annual Return
(percent) (Y)
2008 1 3
2009 –2 2
2010 3 4
2011 0 6
2012 3 0
The negative result shows that there’s a weak negative correlation between the stock
returns of Excelsior and Adirondack. If two variables are perfectly negatively correlated (they
always move in opposite directions), their correlation will be –1. If two variables are
independent (unrelated to each other), their correlation will be 0. The correlation between
the returns to Excelsior and Adirondack stock is a –0.2108, which indicates that the two
variables show a slight tendency to move in opposite directions.