Professional Documents
Culture Documents
Statistics is a branch of mathematics that deals with the study of collecting, analyzing,
interpreting, presenting, and organizing data in a particular manner. Statistics is defined
as the process of collection of data, classifying data, representing the data for easy
interpretation, and further analysis of data. Statistics also is referred to as arriving at
conclusions from the sample data that is collected using surveys or experiments.
Different sectors such as psychology, sociology, geology, probability, and so on also
use statistics to function.
Mathematical Statistics
Statistics is used mainly to gain an understanding of the data and focus on various
applications. Statistics is the process of collecting data, evaluating data, and
summarizing it into a mathematical form. Initially, statistics were related to the science
of the state where it was used in the collection and analysis of facts and data about a
country such as its economy, population, etc. Mathematical statistics applies
mathematical techniques like linear algebra, differential equations, mathematical
analysis, and theories of probability.
There are two methods of analyzing data in mathematical statistics that are used on a
large scale:
• Descriptive Statistics
• Inferential Statistics
Descriptive Statistics
The descriptive method of statistics is used to describe the data collected and
summarize the data and its properties using the measures of central tendencies and the
measures of dispersion.
Inferential Statistics
This method of statistics is used to draw conclusions from the data. Inferential statistics
requires statistical tests performed on samples, and it draws conclusions by identifying
the differences between the 2 groups. Tests calculate the p-value that is compared with
the probability of chance(α) = 0.05. If the p-value is less than α, then it is concluded that
the p-value is statistically significant.
The collection of observations and facts is known a data. These observations and facts
can be in the form of numbers, measurements, or statements. There are two different
kinds of data i.e. Qualitative data and quantitative data. Qualitative data is when the
data is descriptive or categorical and quantitative data is when the data is numerical
information. Once we know the data collection methods, we aim at representing the
collected data in different forms of graphs such as a bar graph, line graph, pie
chart, stem and leaf plots, scatter plot, and so on. Before the analysis of data,
the outliers are removed that are due to the invariability in the measurements of data.
Let us look at different kinds of data representation in statistics.
Descriptio
Data Representation
n
Bar Graph
A group of
data
represented
with
rectangular
bars with
lengths
proportional
to the values
is a bar
graph.
The pie
chart is a
type of graph
in which a
circle is
divided into
Sectors
where each
sector
represents a
proportion
of the whole.
Line graph
The line
graph repres
ents the data
in a form of
series that is
connected
with a
straight line.
These series
are called
markers.
Pictograph
Data shown
in the form
of pictures is
a pictograph.
Pictorial
symbols for
words,
objects, or
phrases can
be
represented
with
different
numbers.
Histogram
The histogra
m is a type of
graph where
the diagram
consists of
rectangles,
the area is
proportional
to the
frequency of
a variable
and the
width is
equal to the
class
interval.
Here is an
example of a
histogram.
Frequency
Distribution
The frequen
cy
distribution t
able in
statistics
showcases
the data in
ascending
order along
with their
correspondi
ng
frequencies.
The
frequency of
the data is
often
represented
by f.
Statistics being a broad term used in various forms, different models of statistics are
used in different forms. Listed below are a few models:
ANOVA Statistics - The word ANOVA means Analysis of Variance. The measure used
in calculating the mean difference for the given set of data is called the ANOVA
statistics. This model of statistics is used to compare the performance of stocks over a
period of time.
Degrees of freedom - This model of statistics is used when the values are changed.
Data that can be moved while estimating a parameter is the degree of freedom.
Regression Analysis - In this model, the statistical process determines the relationship
between the variables. The process signifies how a dependent variable changes when
an independent variable changed.
Measures of Central Tendency in Statistics
The measure of central tendency and the measure of dispersion are considered as the
basis of descriptive statistics. The representative value for the given data is the
measure of central tendency that gives us an idea of where data points are centered.
This is done to find how the data are scattered around this centered measure. We
use mean, median, and mode to find the central measures of tendency. In our day-to-
day life, we find the average height of the students, the average income, the average
score in exams, or of the player. The different measures of central tendency for the data
are:
• Arithmetic Mean
• Median
• Mode
• Geometric Mean
• Harmonic Mean
Mean is considered the arithmetic average of a data set that is found by adding the
numbers in a set and dividing by the number of observations in the data set. The middle
number in the data set while listed in either ascending or descending order is the
median. Lastly, the number that occurs the most in a data set and ranges between the
highest and lowest value is the mode. For n number of observations, we have
• Mean = ¯x=∑xn�¯=∑��
• Median = n+12�+12th term if n is odd.
• Median = n2th term +(n2+1)thterm2�2�ℎ term +(�2+1)�ℎterm2
• Mode = The value which occurs most frequently
Measures of Dispersion in Statistics
The measures of central tendency do not suffice to describe the complete information
about a given data. Thus we need to describe the variability by a value called the
measure of dispersion. The different measures of dispersion are:
In statistics, the frequency distributions of data can be discrete data or continuous. For n
number of individual observations x1,x2,x3,xr,.....xn�1,�2,�3,��,.....��, the mean
deviation about mean and median are calculated as follows:
Mean Deviation for ungrouped data = sum of deviation/number of observations
= ∑Ni=1(xi−¯x)n∑�=1�(��−�¯)�
The measurements of the data units are clearly shown in such a frequency distribution.
Let there be n distinct data points x1,x2,x3,xr,.....xn�1,�2,�3,��,.....��, occurring
with frequencies f1,f2,f3....fn�1,�2,�3....��.
• We find the
mean ¯x�¯ using ∑Ni=1(Xi−fi)∑Ni=1fi∑�=1�(��−��)∑�=1���.
This is the ratio of the sum of the products of xi�� observations and their
respective frequencies fi�� to the sum of the frequencies.
• Mean Deviation=1N∑Ni=1(xifi)1�∑�=1�(����)
• Afterwhich find the deviations of observations xi�� from the mean ¯x�¯ and
get their absolute values. i.e. |xi−¯x��−�¯| for all i = 1, 2, 3, .....n
• Mean Deviation
= ¯x=∑Ni=1fi∣xi−¯x∣∑Ni=1fi�¯=∑�=1���∣��−�¯∣∑�=1���
Here the data points take any value within a range and they are continuous. They can
be measured and represented by using intervals on the real number line. The frequency
in which data are arranged in classes is not countable.
We have the other prominent methods in statistics to find the proper measure of
dispersion, known as the variance and the standard deviation. While finding the mean
deviation about the mean and the median, there arises a difficulty in taking squares of
all the deviations.
• If ∑Ni=1(xi−¯x)2∑�=1�(��−�¯)2 becomes zero, while calculating the sum
for the mean, then it means there is no dispersion at all.
• If the sum is small, the observations are closer to the mean indicating a lower
degree of dispersion.
• If the sum is large, there is a higher degree of dispersion of the observations
from the mean ¯x�¯.
• Thus this sum is a reasonable indicator of the degree of dispersion. This
becomes the proper measure of dispersion, denoted as σ2, and it is termed as the
variance. Thus variance is given as
σ2 = ∑Ni=1(xi−¯x)2N∑�=1�(��−�¯)2�.
• The positive square root of the variance is called the standard deviation. σ
= √∑Ni=1(xi−¯x)2N∑�=1�(��−�¯)2�.
Coefficient of Variation
We compare the coefficient of variations of two or more frequency distributions. This
coefficient of variation in statistics is the ratio of the standard deviation to the mean,
expressed in percentage.
CV = σ/ ¯x�¯ × 100.
The distribution that has a greater coefficient of variation has more variability around the
central value than the distribution having a smaller value of the coefficient of variation.
Important Notes