You are on page 1of 21

Chapter :- 10

Data Representation and Visualization


STATISTICS Is broadly
categorised into
following two types:

INTRODUCTION Descriptive statistics

Inferential statistics
• Descriptive statistics involves summarising and
organizing the data so that they can be easily
understood and used, whereas inferential statistics
Inferential statistics attempts to draw inferences from the sample to the
whole population. Descriptive statistics are ‘just
descriptive’ and don’t involve generalisation beyond
Descriptive And

the data under consideration.


• Major types of Descriptive Statistics are:
• Measures of central tendency such as mean, mode,
median.
• Measures of variability such as range, variance,
standard deviation, skewness and kurtosis.
Primary Data:

The information, facts or figures collected


by investigators personally with a definite
purpose or objective is called Primary
Data. Thus, these data are original and
reliable in the sense that they are
Types of Data:
First-hand information.

For example, if you collect the information


regarding number of boys and girls in 20
families in your locality for a social science
project then it is a primary data.
Secondary Data:

The information, facts or figures which have been


Types of Data: collected by some other agency or individual in
some context and are used by investigator in
another context, is called secondary data.

For example: - if to show male and female ratio in


India you refer to the population census or data
collected by the government, then it is secondary
data.
Raw Data: -

The numerical data recorded or


obtained in its original form as
Types of Data: collected by investigators or received
from some sources is called Raw Data.

The Raw data is represented exactly as


it is collected from the source without
aggregating, transforming and
calculating it.
• The data which consists of only one variable is called
univariate data. The univariate data consists of observations of
only a single characteristic or attribute. The analysis of
univariate data is the simplest form of analysis as the
information deals with change in only one quantity. It does not
deal with the causes or relationships various characteristics of
Types of populations such as age height weight etcetera are some
examples of univariate data.
Data:Univariate • Consider the weight off 10 students in a class: -
• Weight (in Kg): 52 51.5 49 53 55 57.5 47 63 59 60
• Help there is only one variable that is weight and it does not
deal with any calls or relationship does it is a univariate data
• Bivariate data:
• The data which consists of two variables is
called bivariate data. In bivariate data, each
value of one variable is linked with the value
Types of Data: of other variable. The analysis of bivariate
Bivariate data: data deals with causes and relationships and
through this analysis, the relationship
between two variables is determined.
• For example, age and weight income and
expenditure price and demand etc.
Types of Data:
• Consider the temperature and, units of ice cream sold as given below

 We observe that the temperature and the units of ice cream sold are directly proportional to each other
as the temperature increases, the number of units of ice cream sold also increases.
Here we have two variables, and it involves comparison, relationship and cause. Thus, it is bivariate data.
Bivariate data can be represented through tables or by plotting the two variables on the coordinate axes
where one of the variables is independent while other is dependent.
• Multivariate data:
• The data which consists of three or more variables is
called multivariate data. This type of data contains
one independent variable and multiple dependent
variables. In the analysis of multivariate data, the
Types of Data: variables are correlated with each other and their
statistical dependence is taken into account.
Multivariate
• Consider that a doctor collects the data of blood
data: pressure, cholesterol, blood sugar level and the
eating habits of the patients and analyses this data
to establish the relationship between these
measures of health and eating habits .Then this data
is multivariate data which involves three dependent
variables (measures of health) and one independent
variable (eating habits).
• A scale is a device, or an object used to measure or quantify any
object or event. In statistics the variable or numbers are categorized
using different scales of measurement. Each scale of measurement
has specific purpose and the properties which determines its use in
statistical analysis. There are four different scales of measurements
Data on which are as follows:
• 1.Nominal Scale
various • 2. Ordinal Scale
scales: • 3. Interval Scale
• 4. Ratio Scale
• The first two scales that is nominal and ordinal skills are termed as
categorical scales and the other two scales that is interval and ratio
scales are termed as numerical scales.
• Measures of dispersion/variation in a
distribution
• The dispersion in a data is measured on the
basis of the observations and the types of
the measure of central tendency, used there.
Several measures of this dispersion are
available, the most common are
Dispersion • (i) Range(R)
• (ii) Quartile Deviation (QD)
• (iii) Mean Deviation(MD)
• (iv) Standard Deviation(SD)
RANGE
 
Quartile deviation is half of the difference between the upper
Quartile quartile and lower quartile.

Deviation(QD) It is also called semi-Inter Quartile Range.


For measuring two or more series for variability, the measure of
Quartile deviation is not useful. For this we calculate the coefficient
of Q.D
coefficient of Q.D
• Mean
  deviation of a statistical data is arithmetic
mean of the numerical values of the deviations
of the values of various items from some
average(mean or median)
• By numerical values of the deviations, we mean
absolute deviations.
MEAN • For ungrouped data:
DEVIATION • ( here a is average)
• For grouped data:

• If mean deviation is calculated about mean, then


it is written as M.D.( and if it calculated about
median, then it is written as M.D.(Median)
•• If
  mean deviation is calculated about mean, then
it is written as M.D(
• for individual series M.D.(
• For frequency distribution M.D.(
• If mean deviation is calculated about mean, then
M.D it is written as M.D(median)
• for individual series M.D.(
• For frequency distribution M.D.(
• 
• The standard deviation of a statistical data is defined
as the positive square root of the arithmetic mean

STANDARD of the square of deviations of items from the


arithmetic mean of the series under consideration.
It is generally denoted as Sigma) it is also known as
root mean square deviation.
DEVIATION
• For individual series:
(S.D)
• For grouped frequency distribution:
•• VARIANCE:
  - The square of the standard deviation is
called Variance and is denoted by
• Direct Method:

Methods to • For individual series the above formula takes form

calculate
• 

Change of origin
(Short ut method)
Change of origin and
change of scale •
(Step deviation
method)
• 
• Whenever the comparison of the variability of two
series is to be made then we don’t merely calculate
the measures of dispersion, but we require such
measures which are independent of the units.
• This measure of variability which is independent of
Analysis of units is called
Frequency Coefficient of Variation(denoted by C.V)
The coefficient of variation is defined as
Distribution
on comparing the two series, the series having greater
C.V is said to be more variable than the other and the
series having lesser C.V is said to be more consistent
than the other.
•  Let and be the mean and S.D. of the first distribution
and and be the mean and S.D of the second
distribution. We have
• Let
Comparison of • C.V(1st distribution) =
two frequency
• C.V(2nd distribution) =
distributions
with same • It is obvious from (1) and (2) that in the case the
coefficient of variations can be compared on the
mean basis of their respective standard deviation only.
Thus, for two series with equal means, the series
with greater S.D.(or variance) is more variable than
the other and the series with lesser value of S.D.
(or variance ) more consistent than the other.

You might also like