You are on page 1of 16

Variable : http://www.emathzone.

com/tutorials/basic-statistics/some-basic-definitions-in-
statistics.html

Classification of data : http://www.emathzone.com/tutorials/basic-statistics/classification-of-


data.html

Topics of Statistics : http://www.statisticslectures.com/topics/statistics/


Data Condensation and Graphical
Methods

Variables
There are different ways variables can be described according to the ways they can be
studied, measured, and presented.
Numeric variables have values that describe a measurable quantity as a number, like 'how
many' or 'how much'. Therefore numeric variables are quantitative variables.
Numeric variables may be further described as either continuous or discrete:
A continuous variable is a numeric variable. Observations can take any value
between a certain set of real numbers. The value given to an observation for a continuous
variable can include values as small as the instrument of measurement allows. Examples of
continuous variables include height, time, age, and temperature.
A discrete variable is a numeric variable. Observations can take a value based on a
count from a set of distinct whole values. A discrete variable cannot take the value of a
fraction between one value and the next closest value. Examples of discrete variables
include the number of registered cars, number of business locations, and number of
children in a family, all of of which measured as whole units (i.e. 1, 2, 3 cars).
The data collected for a numeric variable are quantitative data.
Categorical variables have values that describe a 'quality' or 'characteristic' of a data unit,
like 'what type' or 'which category'. Categorical variables fall into mutually exclusive (in one
category or in another) and exhaustive (include all possible options) categories. Therefore,
categorical variables are qualitative variables and tend to be represented by a non-numeric
value.
Categorical variables may be further described as ordinal or nominal:
An ordinal variable is a categorical variable. Observations can take a value that can
be logically ordered or ranked. The categories associated with ordinal variables can be
ranked higher or lower than another, but do not necessarily establish a numeric difference
between each category. Examples of ordinal categorical variables include academic grades
(i.e. A, B, C), clothing size (i.e. small, medium, large, extra large) and attitudes (i.e. strongly
agree, agree, disagree, strongly disagree).
A nominal variable is a categorical variable. Observations can take a value that is
not able to be organised in a logical sequence. Examples of nominal categorical variables
include sex, business type, eye colour, religion and brand.
The data collected for a categorical variable are qualitative data.
Ref:[Click for More]

Frequency Distribution

A statistical data may consists of a list of numbers related to a research. Among those
numbers, few may be repeated twice and even more than twice. The repetition of number in
a data set is termed as frequency of that particular number or the variable in which that
number is assigned. The frequencies of variables in a data are to be listed in a table. This
table is known as frequency distribution table and the list is referred as frequency
distribution.
There are many types of frequency distributions
 Grouped frequency distribution
 Ungrouped frequency distribution
 Cumulative frequency distribution
 Relative frequency distribution
 Relative cumulative frequency distribution
Uses of Frequency Distribution
 Frequency distribution helps us to analyze the data.
 Frequency distribution helps us to estimate the frequencies of the population on the
basis of the sample.
 Frequency distribution helps us to facilitate the computation of various statistical
measures.
The frequency distribution table lists all the marks and also show how many times
(frequency) they occurred.

The number which tells us how many times a particular data appears is called the
frequency. For example, 2 marks have been scored by five students which means marks 2
occurs five times. Therefore, the frequency of score 2 is five. Similarly, the frequency of
marks 5 is three because three students scored five marks.

Ref: [Click for More]

Cumulative Frequency Distribution

Generally, the word Cumulative means "how much so far". In statistics, it is the running total
of all frequencies. Cumulative frequency corresponding to a particular value is the sum of
all the frequencies up to and including that value.
For example, the below cumulative frequency table displays the volcanic eruption form
1991 to 2000.
Years Frequency Cumulative Frequency
1991 - 92 10 10
1992 - 94 15 10 + 15 = 25
1994 - 96 9 25 + 9 = 34
1996 - 98 13 34 + 13 = 47
1998 - 2000 7 47 + 7 = 54
From the table, the cumulative frequency for the total number of valconic eruption that took
place between the years 1994 to 1998 is 34 + 13 = 47. The cumulative frequency is mostly
used while analyzing the data, where the value of the cumulative frequency represents the
number of samples in the data, which lie below the current value. It is also useful while
displaying the data using the histograms.
The Cumulative Frequency can be clearly understood when displayed in a table. A table
displaying the cumulative frequencies is called as cumulative frequency distribution and
this is one of important type of frequency distribution.
There are main two types of cumulative frequency distribution as follows:
Less than cumulative frequency distribution:
Here, the Cumulative total of the frequencies are obtained by adding frequencies of lowest
size to the highest size.
For example:
Marks of students Less than Cumulative frequency
Less than 20 7
Less than 30 8
Less than 40 12
Less than 50 16
Less than 60 24
Less than 70 37
From the table, we get to know that student scoring less than 50 is 16.
More than cumulative frequency distribution:
Here, the Cumulative total of the frequencies are obtained by adding frequencies of the
highest size to lowest size.

Marks of students More than Cumulative frequency


More than 10 88
More than 20 74
More than 30 65
More than 40 60
More than 50 58
More than 60 50
From the table, we can say that the student scoring marks between 40 and 50 is 2
i.e. 60 - 58 = 2

Relative Cumulative Frequency


The quotient between the cumulative frequency of a particular value and the total number of
data is called as relative cumulative frequency. It is calculated by dividing the cumulative
frequency in a frequency distribution by the total number of data points. It can be expressed
as percentage.

Here, f = Cumulative frequency


n = Total number of frequency.
Example:
Upper Limit Frequency Less than Cumulative frequency Relative cumulative frequency
20 11 11 11/56 = 0.196 = 19.6 %
30 8 11 + 8 = 19 19/56 = 0.339 = 33.9 %
40 10 19 + 10 = 29 29/56 = 0.517 = 51.7 %
50 3 29 + 3 = 32 32/56 = 0.571 = 57.1 %
60 7 32 + 7 = 39 39/56 = 0.696 = 69.6 %
70 12 39 + 12 = 51 51/56 = 0.910 = 91 %
80 5 51 + 5 = 56 56/56 = 1 = 100 %
Total: 56
Ref: [Click Here]

Graphical Representation of Frequency Distribution :


Histogram

A histogram is a plot that lets you discover, and show, the underlying frequency distribution
(shape) of a set of continuous data. This allows the inspection of the data for its underlying
distribution (e.g., normal distribution), outliers, skewness, etc. An example of a histogram,
and the raw data it was constructed from, is shown below:
36 25 38 46 55 68 72 55 36 38
67 45 22 48 91 46 52 61 58 55

To construct a histogram from a continuous variable you first need to split the data into
intervals, called Bins/Class . In the example above, age has been split into classes, with
each class representing a 10-year period starting at 20 years. Each class contains the
number of occurrences of scores in the data set that are contained within that class . For
the above data set, the frequencies in each class have been tabulated along with the
scores that contributed to the frequency in each class (see below):

Bin Frequency Scores Included in Bin


20-30 2 25,22
30-40 4 36,38,36,38
40-50 4 46,45,48,46
50-60 5 55,55,52,58,55
60-70 3 68,67,61
70-80 1 72
80-90 0 -
90-100 1 91

Notice that, unlike a bar chart, there are no "gaps" between the bars (although some bars
might be "absent" , reflecting no frequencies). This is because a histogram represents a
continuous data set, and as such, there are no gaps in the data .
Ref: [Click here]
Frequency Polygon
Midpoints of the interval of corresponding rectangle in a histogram are joined together by
straight lines. It gives a polygon i.e. a figure with many angles.
It is used when two or more sets of data are to be illustrated on the same diagram such as
death rates in smokers and non smokers, birth and death rates of a population etc.
One way to form a frequency polygon is to connect the midpoints at the top of the bars of a
histogram with line segments (or a smooth curve). Sometimes it is beneficial to show the
histogram and frequency polygon together. But sometimes, the frequency polygon is much
more accurate than the histogram because you can evaluate which is the low point and the
high point.
Unlike histograms, frequency polygons can be superimposed so as to compare several
frequency distributions.

Frequency Distribution of the marks obtained by 50 students in the pre-test examination

Cumulative
Class frequency
Frequency
Boundaries (Less than
type)

30.5-40.5 1 1

40.5-50.5 14 20
50.5-60.5 20 40

60.5-70.5 7 47

70.5-80.5 3 50

Total 50

The labels of the X-axis are the midpoints of the class intervals. So the first label on the X-
axis will be 35.5, next 45.5, followed by 55.5, 65.5 and lastly 75.5. The corresponding
frequencies are then considered to create the frequency polygon.

Frequency Curve
Frequency curve is obtained by joining the points of frequency polygon by a freehand
smoothed curve. Unlike frequency polygon, where the points we joined by straight lines, we
make use of free hand joining of those points in order to get a smoothed frequency curve. It
is used to remove the ruggedness of polygon and to present it in a good form or shape. We
smoothen the angularities of the polygon only, without making any basic change in the
shape of the curve. In this case also the curve begins and ends at base line, as is in case of
polygon. Area under the curve must remain almost the same as in the case of polygon.

Ogive or Cumulative Frequency Curve

The branch of mathematics that deals with large amount of numerical data is known as
statistics. It is a science of gathering, tabulating, organizing, calculating and interpreting
statistical data. Statistical methods are commonly used in researches and surveys where
vast data are used. Generally, statistics do require different types of graphs in order to
understand data more clearly. The most commonly used graphs are bar chart, histogram,
frequency polygon, line graph, cumulative frequency curve etc.

Let us go ahead and understand about the cumulative frequency curve in this page. The
cumulative frequency curve is also known as Ogive. Just to recall that the cumulative
frequency of a variable is the summation of all the frequencies of variables previous to it.
The frequency distribution of a given data set could be converted into a cumulative
frequency distribution by adding each frequency to the total of the predecessors. An Ogive
is a curve that represents cumulative frequencies of the given variables. The graph of the
cumulative frequency distribution is better known as the cumulative frequency curve or
Ogive.
The word Ogive is basically a term used in the architecture to describe curves or curved
shapes.
Steps for constructing a less than Ogive chart (less than Cumulative frequency
curve):

1. Draw and label the horizontal and vertical axes.


2. Take the cumulative frequencies along the y axis (vertical axis) and the upper class
limits on the x axis (horizontal axis)
3. Plot the cumulative frequencies against each upper class limit.
4. Join the points with a smooth curve.

Steps for constructing a greater than or more than Ogive chart (more than
Cumulative frequency curve):

1. Draw and label the horizontal and vertical axes.


2. Take the cumulative frequencies along the y axis (vertical axis) and the lower class
limits on the x axis (horizontal axis)
3. Plot the cumulative frequencies against each lower class limit.
4. Join the points with a smooth curve.
Simple Bar Diagram
As the name suggests, a simple bar diagram takes the form of a simple bar which is drawn
to represent a single data as a whole without further classification of the various
characteristics of the said data. The length of such a diagram is fixed in proportion to the
magnitude of the data while the width is fixed arbitrarily keeping in view the number of
diagrams to be accommodated on the given piece of paper. Such a diagram can be drawn
for giving a better look and facilitating comparison.
Sub-Divided Bar Diagram
Sub-divided or component bar cha t is used to represent data in which the total magnitude
is divided into different or components. In this diagram, first we make simple bars for each
class taking total magnitude in that class and then divide these simple bars into parts in the
ratio of various components. This type of diagram shows the variation in different
components within each class as well as between different classes. Sub-divided bar
diagram is also known as component bar chart or staked chart.

Years Wheat Barley Oats


1991 34 18 27
1992 43 14 24
1993 43 16 27
1994 45 13 34

To make the component bar chart, first of all we have to take year wise total production.

Years Wheat Barley Oats Total


1991 34 18 27 79
1992 43 14 24 81
1993 43 16 27 86
1994 45 13 34 92

The required diagram is given below:


Pie Diagram

A pie chart displays data, information, and statistics in an easy-to-read 'pie-slice' format
with varying slice sizes telling you how much of one data element exists. The bigger the
slice, the more of that particular data was gathered.
Constructing Circle Graphs or Pie Charts

A pie chart (also called a Pie Graph or Circle Graph) makes use of sectors in a circle. The
angle of a sector is proportional to the frequency of the data.

The formula to determine the angle of a sector in a circle graph is:

Study the following steps of constructing a circle graph or pie chart:

Step 1: Calculate the angle of each sector, using the formula Step
2: Draw a circle using a pair of compasses
Step 3: Use a protractor to draw the angle for each sector.
Step 4: Label the circle graph and all its sectors.

Example:

In a school, there are 750 students in Year1, 420 students in Year 2 and 630 students in
Year 3. Draw a circle graph to represent the numbers of students in these groups.

Solution:

Total number of students = 750 + 420 + 630 = 1,800.


Draw the circle, measure in each sector. Label each sector and the pie chart.

Measure of Central Tendency

You might also like