Professional Documents
Culture Documents
The collected data is usually contained in schedules and questionnaires. But that is
not in an easily assailable form. The answers will require some analysis if their salient
points are to be brought out. As a rule, the first step in the analysis is to classify and
tabulate the information collected, or, if published statistics have been employed,
rearrange these into new groups and tabulate the new rearrangement. In case of
some investigations, the classification and tabulation may give such a clear picture of
the significance of the material that no further analysis is required. In other cases
these processes, though may materially assist the analysis, are not sufficient
presentation of the facts. They are however, very important whether they have been
very carefully drawn up and the answers may be both complete and accurate, but
until these answers are all brought together into the class to which they belong and
the whole information displayed in a tabular form, no one will be a great deal wiser
as to the contents of the replies.
• It reduces the bulk of information i.e. raw data in a simplified and meaningful
form so that it could be easily by a common man in less time.
• Tables serve as the best source of organized data for further statistical
analysis.
• The task of computing average, dispersion, correlation, etc. becomes much
easier if data is presented in the form of a table.
Many times it is not easy or feasible to find the frequency of data from a very large
dataset. So to make sense of the data we make a frequency table and graphs. Let us
take the example of the heights of ten students in centimeters.
139, 145, 150, 145, 136, 150, 152, 144, 138, 138
This frequency table will help us make better sense of the data given. Also when the
data set is too big (say if we were dealing with 100 students) we use tally marks for
counting. It makes the task more organized and easy. Below is an example of how we
use tally marks.
Mean
Mean is what most people commonly refer to as an average. The mean refers to the
number you obtain when you sum up a given set of numbers and then divide this
sum by the total number in the set. Mean is also referred to more correctly as
arithmetic mean.
Solution
The first step is to count how many numbers there are in the set, which we shall
call n
The last step is to find the actual mean by dividing the sum by n
Median
The median is defined as the number in the middle of a given set of numbers
arranged in order of increasing magnitude. When given a set of numbers, the median
is the number positioned in the exact middle of the list when you arrange the
numbers from the lowest to the highest. The median is also a measure of average. In
higher level statistics, median is used as a measure of dispersion. The median is
important because it describes the behavior of the entire set of numbers.
Example 3
Find the median in the set of numbers given below
Solution
From the definition of median, we should be able to tell that the first step is to
rearrange the given set of numbers in order of increasing magnitude, i.e. from the
lowest to the highest
Then we inspect the set to find that number which lies in the exact middle.
Mode
The mode is defined as the element that appears most frequently in a given set of
elements. Using the definition of frequency given above, mode can also be defined
as the element with the largest frequency in a given data set.
For a given data set, there can be more than one mode. As long as those elements all
have the same frequency and that frequency is the highest, they are all the modal
elements of the data set.
Example 5
Solution
Mode = 3 and 15
Range
The range is defined as the difference between the highest and lowest number in a
given data set.
Example 7
Solution
Range 20-3=17
1. Range: It is simply the difference between the maximum value and the
minimum value given in a data set. Example: 1, 3,5, 6, 7 => Range = 7 -1= 6
2. Variance: Deduct the mean from each data in the set then squaring each of
them and adding each square and finally dividing them by the total no of
values in the data set is the variance. Variance (σ 2)=∑(X−μ)2/N
3. Standard Deviation: The square root of the variance is known as the standard
deviation i.e. S.D. = √σ.
4. Quartiles and Quartile Deviation: The quartiles are values that divide a list of
numbers into quarters. The quartile deviation is half of the distance between
the third and the first quartile.
5. Mean and Mean Deviation: The average of numbers is known as the mean
and the arithmetic mean of the absolute deviations of the observations from a
measure of central tendency is known as the mean deviation.
Also, read:
Variance
Quartiles
Mean
Relative Measure of Dispersion:
The relative measures of depression are used to compare the distribution of two or
more data sets. This measure compares values without units. Common relative
dispersion methods include:
1. Coefficient of Range
2. Coefficient of Variation
3. Coefficient of Standard Deviation
4. Coefficient of Quartile Deviation
5. Coefficient of Mean Deviation
Coefficient of Dispersion
The coefficients of dispersion are calculated along with the measure of dispersion
when two series are compared which differ widely in their averages. The dispersion
coefficient is also used when two series with different measurement unit are
compared. It is denoted as C.D.
The common coefficients of dispersion are:
Sum of all of the numbers of a group, when divided by the number of items in that
list is known as the Arithmetic Mean or Mean of the group. For example, the mean of
the numbers 5, 7, 9 is 4 since 5+7+9 = 21 and 21 divided by 3 [there are three
numbers] is 7.
X¯¯¯¯=∑ni=1XiN
Quartile Formula
A quartile divides the set of observation into 4 equal parts. The middle term,
between the median and first term is known as the first or Lower Quartile and is
written as Q1. Similarly, the value of midterm that lies between the last term and the
median is known as the third or upper quartile and is denoted as Q3. Second Quartile
is the median and is written as Q2.
Q1=(n+14)thTerm
Standard deviation formula is used to find the values of a particular data that is
dispersed. In simple words, the standard deviation is defined as the deviation of the
values or data from an average mean. Lower standard deviation concludes that the
values are very close to their average. Whereas higher values mean the values are far
from the mean value. It should be noted that the standard deviation value can never
be negative.
Standard Deviation is of two types:
Variance of a population
Variance of a sample
The variance of a population is denoted by σ2 and the variance of a sample by s2.
σ2 = ∑ (x − x̅)2 / n
s2 = ∑ (x − x̅)2 / n – 1
The interquartile range (IQR) is a measure of variability, based on dividing a data set
into quartiles. The values that divide each part are called the first, second, and third
quartiles; and they are denoted by Q1, Q2, and Q3, respectively.
Q1 is the “middle” value in the first half of the rank-ordered data set.
Q2 is the median value in the set.
Q3 is the “middle” value in the second half of the rank-ordered data set.
The formula for inter-quartile range is given below
IQR=Q3−Q1