You are on page 1of 6

CATEGORICAL DATA

For categorical data, useful pictures are pie-charts, bargraphs, Histogram


and scatter plot
We may use these to create a pictorial representation of categorical data.
Let us illustrate using an example.(Pie & Bar)
Degre
e
Diplo
ma
BA
BSc
BE
Total

Male

Total

Femal
e
10

10
12
8
35

17
13
13
53

27
25
21
88

15

Pie Chart
a type of graph in which a circle is divided into sectors that each
represent a proportion of the whole
A pie chart is a circular chart divided into sectors, each sector
shows the relative size of each value.
Country
UK
Australia
India
Japan

Production of
Sugar (in quintals)
6200
4700
3500
1600
UK
Australia
India
Japan

Bar Diagram
A graph has drawn using rectangular bars to show how large each
value is. The bars can be horizontal or vertical.
A bar graph is a chart that uses bars to show comparisons between
categories of data. The bars can be either horizontal or vertical.
A diagram is a visual form for presentation of statistical data, highlighting their basic facts and
relationship. Represent the following data by a bar diagram.
Year
Marks/students/weight/height
1
45

2
3
4
5

40
42
55
50

80
60
40
20
0
1

Histogram:
A graphical display where the data is grouped into ranges (such as
"40 to 49", "50 to 59", etc.), and then plotted as bars.
A histogram is a display of statistical information that uses
rectangles to show the frequency of data items in successive
numerical intervals of equal size.
A histogram is a bar chart or graph showing the frequency of occurrence of each value of the
variable being analysed. In histogram, data are plotted as a series of rectangles. Class intervals
are shown on the X-axis and the frequencies on the Y-axis.
The height of each rectangle represents the frequency of the class interval. Each rectangle is
formed with the other so as to give a continuous picture. Such a graph is also called staircase or
block diagram.
However, we cannot construct a histogram for distribution with open-end classes. It is also quite
misleading if the distribution has unequal intervals and suitable adjustments in frequencies are
not made.
Example
Draw a histogram for the following data.
Marks

Number of students

0-20

21-40

41-60

61-80

81-100

Scatter plot
a graph in which the values of two variables are plotted along two axes, the
pattern of the resulting points revealing any correlation present.
A scatterplot is a graphic tool used to display the relationship between
two quantitative variables. A scatterplot consists of an X axis (the horizontal
axis), a Y axis (the vertical axis), and a series of dots. Each dot on the scatterplot
represents one observation from a data set. The position of the dot on the
scatterplot represents its X and Y values.

The most effective way to display the relation between two quantitative variables is a scatter
plot. A scatterplot shows the relationship between two quantitative variables measured on the
same individuals. The values of one variable appear on the horizontal axis, and the values of the
other variable appear on the vertical axis. Each individual in the data appears as the point in the
plot fixed by the values of both variables for that individual.
Always plot the explanatory variable, if there is one, on the horizontal axis (the x axis) of a
scatter plot. As a reminder, we usually call the explanatory variable x and the response variable y.
If there is no explanatory-response distinction, either variable can go on the horizontal axis.
Example
Time (Y) Vs Experience (X)
Age (Y) Vs Experience (X)
Height(Y) vs
158
162
163
170
154

Weight(X)
48
57
57
60
45

80
60
40
20
0
150 155 160 165 170 175 180

a) Symmetrical distribution:

Mean = Median = Mode


It is clear from the above diagram that in a symmetrical distribution the values of mean, median
and mode coincide. The spread of the frequencies is the same on both sides of the center point of
the curve.
Measures of location
Sample median
The sample median is the middle observation when the data are ranked in increasing order. We
will denote the ranked observations x(1);x(2); : : : ;x(n). If there are an even number of
observations, there is no middle number, and so the median is defined to be the sample mean of
the middle two observations.
Summary of location measures
When the distribution of the data is roughly symmetric, the three measures will be very close to
each other anyway. However, if the distribution is very skewed, there may be a considerable
difference, the sample median is a much more robust location estimator, much less sensitive than
the sample mean to asymmetries and unusual values in the data.
Measures of spread

Knowing the typical value of the data alone is not enough. We also need to know how
concentrated or spread out it is. That is, we need to know something about the variability
of the data. Measures of spread are a way of quantifying this idea numerically.
Quartiles and the interquartile range
Whereas the median has half of the data less than it, the lower quartile has a quarter of the data
less than it, and the upper quartile has a quarter of the data above it. So the lower quartile is
calculated as the (n+1) =4th smallest observation, and the upper quartile is calculated as the
3(n+1)=4th smallest observation.
The inter-quartile range is the difference between the upper and lower quartiles, that is IQR =
UQ-LQ.

Correlation
i.

As per data given the correlation will be the ideal to determine the relationship
between these variables.
Correlation refers to the relationship of two variables or more. (e-g) relation between
speed of driving and number of road accidents.
The word relationship is important. It indicates that there is some connection between
the variables. It measures the closeness of the relationship. Correlation does not
indicate cause and effect relationship. Price and supply, income and expenditure are
correlated.

ii.

The quantitative data are to be used to measure the relationship among two variables.
Speed of Driving(in kms) No.of accidents
Up to 40
1
41 - 60
2
61 - 80
4
81 -100
6
101 120
9
Above 121
14

iii.
It is perfect positive correlation between these two variables. There is positive strong relationship
between speed and road accidents. We denote this as r = +1

You might also like