You are on page 1of 26

ADMAS UNIVERSITY

FUCULITY of INFORMATICS
DEPARTMENT of COMPUTER SCIENCE

Basic Statistics (Stat101) Lecture Note for Second Year Computer


Science Students

BY:
Ayele Gebeyehu
Ass. Professor, Department of Statistics, Wolkite University
Email: gebeyehu29@gmail.com/ or
ayele.gebeyehu@wku.edu.et

March 20, 2020


Addis Ababa, Ethiopia
Chapter one: Introduction
Introduction
Statistical thinking has now a day became very essential for different fields of study. Its
usefulness has now spread to such diverse fields as agriculture, business, accounting,
marketing, economics, management, medicine, political science, psychology, sociology,
engineering, metrology, tourism, etc.

In every research, meaningful conclusions can only be drawn based on data collected from a
valid scientific design using appropriate statistical methods. Therefore, the selection of an
appropriate study design is important to provide an unbiased and scientific evaluation of the
research questions. Each design is based on a certain rationale and is applicable in certain
experimental situations.

1.1 Definition and classification of Statistics


Definition: We can define statistics in two ways.
1. Plural sense (lay man definition).
It is an aggregate or collection of numerical facts.
2. Singular sense (formal definition)
Statistics is defined as the science of collecting, organizing, presenting, analyzing and
interpreting numerical data for the purpose of assisting in making a more effective
decision.
Classification of Statistics:
Depending on how data can be used statistics is divided in to two main areas or branches.
These are descriptive statistics and inferential statistics.
Descriptive Statistics: Statistical method that deals with organizing or summarizing a
given set of data in to a meaningful form. Most of the statistical information in
newspapers, magazines, reports and other publications come from data that has been
summarized and presented in a form that is easy for the reader to understand.
 Here there is no generalization or conclusion about the population.
►It consists of organization and presentation of data.

E.g. Frequency distribution, measure of central tendency (such as mean, median),


measure of dispersion (like range, Variance, Standard deviation, etc...)
 Descriptive statistics doesn’t go beyond describing the data themselves
Inferential statistics:-It is the process of drawing conclusion (inference) about a
population based on the information obtained from the sample. Because of time, cost and
other constraints data are collected from only small portion of the group (or sample). The
major contribution of statistics is that it enables us to use data from the sample to make
estimates and test claims about the characteristics of a population. This process is referred
as statistical inference which:

- Is performing and testing hypothesis, determining relationships among


variables, and making prediction.
- Used to describe, infer, estimate, approximate the characteristics of the target
population

- Statistical techniques based on probability theory are required.

1.2 Stages in Statistical Investigation


There are six stages or steps in any statistical investigation.
1. Formulating the problem: research must be emanating if there is the problem. The
investigator must be sure to understand the problem and formulate it in statistical term.
2. Collection of data: It is the process of gathering information or data about the variable
of interest for our specific purpose. Great care must be exercised in collecting data
because they form the foundation of statistical analysis. If the data are faulty, the
conclusions drawn can never be reliable. The data may be available from existing
published or unpublished sources or else may be collected by the investigator himself. i.e
data may obtained either primarily or secondarily.
3. Organization of data: It is the process of editing, classification and tabulation of data.
• Editing: is the process of checking and connecting data for omission,
inconsistencies, irrelevant answer and wrong computation in the collected
data.
• Classification: is the task of grouping the collected and edited data in to
different similar categories based on some criteria
• Tabulation: is to put classified data in the form of table.
Arranging or classification of data in the suitable order makes the information easier for
presentation

4. Presentation of the data: The organized data can now be presented in the form of
tables and diagram. At this stage, large data will be presented in tables in a very
summarized and condensed manner. The main purpose of data presentation is to
facilitate statistical analysis. Graphs and diagrams may also be used to give the data a
bright meaning and make the presentation attractive.

5. Analysis of data: It is the process of extracting relevant information from the


summarized data, mainly through the use of mathematical operation. This is the stage
where we critically study the data to draw conclusions about the population parameter.
The purpose of data analysis is to dig out information useful for decision making.
Analysis usually involves highly complex and sophisticated mathematical techniques.
However, in this material only the most commonly used methods of statistical analysis
are included. Such as the calculations of averages, the computation of measures of
dispersion, regression and correlation analysis are covered.

6. Inference of data: This is the stage where draw valid conclusions from the results
obtained through data analysis. Interpretation means drawing conclusions from the data
which form the basis for decision making. The interpretation of data is a difficult task and
necessitates a high degree of skill and experience. If data that have been analyzed are not
properly interpreted, the whole purpose of the investigation may be defected and
fallacious conclusion be drawn. So that great care is needed when making interpretation.

1.3 Definition of some Statistical terms


a. Population: It is the collection of all possible observations of a specified characteristic
that are being studied in specified time and place.

b. Sample: It is a subset of the population, selected using some sampling technique in


such a way that they represent the population.

c. Sampling: The process or method of sample selection from the population.

d. Sample size: The number of elements or observation to be included in the sample.

e. Census: Complete enumeration or observation of the elements of the population. Or it


is the collection of data from every element in a population
f. Parameter: Characteristic or measure obtained from a population.

g. Statistic: Characteristic or measure obtained from a sample.

h. Variable: It is an item of interest that can take on many different numerical values.
1.4 Applications, Uses and Limitations of Statistics
Applications of statistics:
Statistics can be applied in any field of study which seeks quantitative evidence. For
instance, engineering, economics, natural science, etc.
Engineering: Statistics have wide application in engineering.
 To compare the strength of two types of materials
 To determine the probability of reliability of a product.
 To control the quality of products in a given production process.
 To compare the improvement of yield due to certain additives such as fertilizer,
herbicides, e t c.
Uses of statistics:
The main function of statistics is to enlarge our knowledge of complex phenomena. The
following are some uses of statistics:
1. It presents facts in a definite and precise form.

2. Data reduction.

3. Measuring the magnitude of variations in data.

4. Furnishes a technique of comparison

5. Estimating unknown population characteristics.

6. Testing and formulating of hypothesis.

7. Studying the relationship between two or more variable.

8. Forecasting future events.


Limitations of statistics
As a science statistics has its own limitations. The following are some of the limitations:
 Deals with only quantitative information.
 Deals with only aggregate of facts and not with individual data items.
 Statistical data are only approximately and not mathematical correct.
 Statistics can be easily misused and therefore should be used be experts.
1.5 Type of variables and measurement scale
Variables can be classified in to two groups. i.e.
1. Quantitative variable
2. Qualitative Variable
Quantitative variable is a variable which is naturally measured as a number for which
for which arithmetic operation make sense. Quantitative variable classified in to two. i.e.
1. Descriptive variable
2. Continuous variable
Descriptive variable can assume certain numerical value. That is, there are gaps between
the possible values of such as 0, 1, 2, 3… and said to be countable.
Example: -Number of students in the class room
- Number of children in the family
Continuous variable: can take any value within a specified interval with finite enough
measuring device. They are obtained by measuring.
Example: height, weight, age etc.
Qualitative variable: is any variable that is not quantitative. As naturally measure,
qualitative variables have no numerical meaning. Example: hair color, political
affiliation, gender etc.
Scale of Measurement
In statistics, scale of measurement refers to ways in which variables are classified and
categorized. There are 4 scale measurements.
1. Nominal Scales
 It level of measurement which classifies data into mutually exclusive categories in
which no order or ranking can be imposed on the data.

 No arithmetic and relational operation can be applied.


Examples:
 Political party preference (Republican, Democrat, or Other,)
 Sex (Male or Female.)
 Marital status(married, single, widow, divorce)
 Country code
 Regional differentiation of Ethiopia.

2. Ordinal Scales
 Ordinal Scales are measurement systems that possess the property of order.
 Level of measurement, which classifies data into categories that can be ranked.
 Differences between the ranks do not exist.
 Arithmetic operations are not applicable but relational operations are applicable.
 Note: Ordering is the sole property of ordinal scale.
Examples: Letter grades (A, B, C, D, F).
 Rating scales (Excellent, Very good, Good, Fair, poor).
 Military status
3. Interval Scales
Interval scales are measurement systems that possess the properties of Order and
distance, but not the property of fixed zero (Absolute zero).
Level of measurement which classifies data that can be ranked and differences are
meaningful. However, there is no meaningful zero, so ratios are meaningless.

All arithmetic operations except division are applicable.

Relational operations are also possible.


Examples: IQ, Temperature in oF.
4. Ratio Scales
Ratio scales are measurement systems that possess all three properties: order, distance,
and fixed zero. The added power of a fixed zero allows ratios of numbers to be
meaningfully interpreted; i.e. the ratio of Bekele's height to Martha's height is 1.32,
whereas this is not possible with interval scales. Level of measurement which classifies
data that can be ranked, differences are meaningful, and there is a true zero. True ratios
exist between the different units of measure.
All arithmetic and relational operations are applicable.
Examples:
 Weight
 Height
 Age
1.6.Methods of data collection
Statistical data may be obtained either from primary or secondary source. A primary
source is a source from where first-hand information is gathered. On the other hand,
secondary source is the one that makes data available, which were collected by some
other agency before & it may be published or unpublished. Published sources include
publications of research institutions, publications of financial &commercial institutions,
different reports, etc. Unpublished sources include records maintained by private firms
&business houses who may not like to release their data to outsider. When our source is
secondary data check that: the type and objective of the situations, the purpose for which
the data are collected and compatible with the present problem, and the nature and
classification of data is appropriate to our problem.
Collection of data implies a systematic and meaningful assembly of information for the
accomplishment of the objective of a statistical investigation. It refers to the methods
used to gathering the required information from the units under investigation. The quality
of data greatly affects final output of an investigation. Hence, at most care should be
attached to the data collection process and every possible precaution should be taken to
ensure accuracy while collecting data. Otherwise, with inaccurate and inadequate data,
the whole analysis is likely to be faulty and also the decisions to be taken will also be
misleading.

i) Direct Observation

In this approach, an investigator stays the place of survey and notes down the first hand
information. Direct observations can be used to discover a variety of information
including consumer behavior, working methods & other aspects of social & economic
behavior. Direct observation is more experimental and usually applied in scientific
studies. It is time consuming and also costly. Also the method is highly subjective.

ii) Interview Method

It is a conversation between two groups, i.e incited by the interviewer in order to obtain
the required information. The interviewer sets a series of questions directly elected for
his/her work in advance & conducts the interview. Interviewing is a technique that is
primarily used to gain an understanding of the underlying reasons and motivations for
people’s attitudes, preferences or behavior. Interviews can be undertaken on a personal
one-to-one basis or in a group. They can be conducted at work, at home, in the street or in
a shopping centre, or some other agreed location.

The interview may be face to face or by telephone

 Face to face interview is advantageous to question a person’s motives & attitudes


about some characteristics or behavior
 Telephone interview is relatively less time consuming

Limitation:

 Respondents are sometimes unwilling & reluctant to supply the information.

 Respondents differ in ability & motivation in clearly supplying the


information.

 Requires highly experienced & skilled interviewer.

 The personal bias & prejudice of the interview may affect the result.

 It excludes those who don’t have telephone.

iii) Questionnaire Method


Under this method, a list of questions related to the survey is prepared and sent to the
various respondents by hand, post, website, email etc .However; this method cannot be
used if the respondent is illiterate.

Types of Questions
1. Open-ended Questions: permit free responses that should be recorded in the
respondent’s own words. The respondent is not given any possible answers to
choose from. Such questions are useful to obtain information on:
- Facts with which the researcher is not very familiar,
- Opinions, attitudes, and suggestions of informants, or
- Sensitive issues
2. Closed Questions: offer a list of possible options or answers from which the
respondents must choose. When designing closed questions one should try to:
- Offer a list of options that are exhaustive and mutually exclusive, and
- Keep the number of options as few as possible.

The following are the major points that we need to take into account while preparing the
questionnaire. The number of questions should be small. Naturally respondents are not
comfortable with lengthy questionnaires. Lengthy questionnaire usually bore
respondents. If a lengthy questionnaire is unavoidable, it should preferably be divided in
to two or more parts.

The question should be short, clear, simple, and unambiguous. Moreover, the question
must be arranged in to a logical order so that natural and spontaneous reply to each is
induced. For instance it is not appropriate to ask a person how many packets of cigarette
he /she smoke before asking whether he/she smoke or not.

Questions of sensitive nature should be avoided. Sensitive questions are those questions
that are too personal and pecuniary like source of income, drinking habit, etc. The logic
here is that respondents do not willingly answer sensitive questions. Such information, if
necessary, may be gathered through interviews or through other indirect questions.

Mail questionnaires should be accomplished by a covering letter, which should state the
purpose of the questionnaire, promise of confidentially of responses, etc.

Summary Exercise
1. What is the difference between descriptive and inferential statistics?
2. Clearly state stages in statistical investigation?
3. Write uses and limitation of statistics?
4. Classify each of the following as nominal, ordinal, interval and ration scale of
measurement.
a. Pages in the 25 best-selling mystery novels.
b. Rankings of golfers in a tournament.
c. Temperatures inside 10 pizza ovens.
d. Weights of selected cell phones.
e. Times required completing a chess game.
f. Ratings of textbooks (poor, fair, good, excellent).
g. Number of amps delivered by battery chargers.

Chapter 2: Methods of Data Presentation


Introduction
Having collected and edited the data, the next important step is to organize it. That is to
present it in a readily comprehensible condensed form that aids in order to draw
inferences from it. It is also necessary that the like be separated from the unlike ones. The
presentation of data is broadly classified in to the following two categories:

 Tabular presentation
 Diagrammatic and Graphic presentation.
The process of arranging data in to classes or categories according to similarities
technically is called classification. Classification is a preliminary and it prepares the
ground for proper presentation of data.

2.1 Frequency distributions


Frequency distribution is the organization of raw data in table form using classes and
frequencies. There are three basic types of frequency distributions.
 Categorical frequency distribution
 Ungrouped frequency distribution
 Grouped frequency distribution
1) Categorical frequency Distribution:
Used for data that can be place in specific categories such as nominal, or ordinal. e.g. marital
status.
Example: a social worker collected the following data on marital status for 25
persons. (M=married, S=single, W=widowed, D=divorced)

M S D W D
S S M M M
W D S M M
W D D S S
S W W D D

Solution:

Since the data are categorical, discrete classes can be used. There are four types of marital
status M, S, D, and W. These types will be used as class for the distribution. We follow
procedure to construct the frequency distribution.
Step 1: Make a table as shown.

Class Tally Frequency Percent

(1) (2) (3) (4)


M
S
D
W

Step 2: Tally the data and place the result in column (2).
Step 3: Count the tally and place the result in column (3).
Step 4: Find the percentages of values in each class by using;
f
% * 100 Where f= frequency of the class, n=total number of value.
n
Percentages are not normally a part of frequency distribution but they can be added since
they are used in certain types diagrammatic such as pie charts.
Step 5: Find the total for column (3) and (4).
Combing the entire steps one can construct the following frequency distribution.
Class Tally Frequency Percent

(1) (2) (3) (4)


M ///// 6 20

S //// // 7 28
D //// // 7 28
W //// 5 24

2) Ungrouped frequency Distribution is a table of all the potential raw score values that
could possible occur in the data along with the number of times each actually occurred. It
is often constructed for small set or data on discrete variable.

Constructing ungrouped frequency distribution:


 First find the smallest and largest raw score in the collected data.
 Arrange the data in order of magnitude and count the frequency.
 To facilitate counting one may include a column of tallies.

Example: The following data represent the mark of 20 students.

80 76 90 85 80

70 60 62 70 85

65 60 63 74 75

76 70 70 80 85

Construct a frequency distribution, which is ungrouped.

Solution:

Step 1: Find the range, Range=Max-Min=90-60=30.


Step 2: Make a table as shown

Step 3: Tally the data.

Step 4: Compute the frequency.

Mark Tally Frequency

60 // 2

62 / 1

63 / 1

65 / 1

70 //// 4

74 / 1

75 // 2

76 / 1

80 /// 3

85 /// 3

90 / 1

Each individual value is presented separately, that is why it is named ungrouped


frequency distribution.

3) Grouped frequency Distribution:

When the range of the data is large, the data must be grouped in to classes that are more than
one unit in width.
Definitions:

 Grouped Frequency Distribution: a frequency distribution when several numbers


are grouped in one class.
 Class limits: Separates one class in a grouped frequency distribution from another.
The limits could actually appear in the data and have gaps between the upper limits of
one class and lower limit of the next.
 Units of measurement (U): the distance between two possible consecutive measures.
It is usually taken as 1, 0.1, 0.01, 0.001, ….
 Class boundaries: Separates one class in a grouped frequency distribution from
another. The boundaries have one more decimal places than the row data and
therefore do not appear in the data. There is no gap between the upper boundary of
one class and lower boundary of the next class. The lower class boundary is found by
subtracting U/2 from the corresponding lower class limit and the upper class
boundary is found by adding U/2 to the corresponding upper class limit.
 Class width: the difference between the upper and lower class boundaries of any
class. It is also the difference between the lower limits of any two consecutive classes
or the difference between any two consecutive class marks.
 Class mark (Mid points): it is the average of the lower and upper class limits or the
average of upper and lower class boundary.
 Cumulative frequency: is the number of observations less than/greater than or equal
to a specific value.
 Less than type cumulative frequency: it is the total frequency of all values greater
than or equal to the lower class boundary of a given class.
 Greater than type cumulative frequency: it is the total frequency of all values less
than or equal to the upper class boundary of a given class.
 Cumulative Frequency Distribution (CFD): it is the tabular arrangement of class
interval together with their corresponding cumulative frequencies. It can be more than
or less than type, depending on the type of cumulative frequency used.
 Relative frequency (rf): it is the frequency divided by the total frequency.
 Relative cumulative frequency (rcf): it is the cumulative frequency divided by the
total frequency.

Guidelines for classes

1. There should be between 5 and 20 classes.


2. The classes must be mutually exclusive. This means that no data value can fall
into two different classes
3. The classes must be all inclusive or exhaustive. This means that all data values
must be included.
4. The classes must be continuous. There are no gaps in a frequency distribution.
5. The classes must be equal in width. The exception here is the first or last class. It
is possible to have an "below ..." or "... and above" class.

Steps for constructing Grouped frequency Distribution

1. Find the largest and smallest values


2. Compute the Range(R) = Maximum - Minimum
3. Select the number of classes desired, usually between 5 and 20 or use Sturges rule
k  1  3.32 log n where k is number of classes desired and n is total number of
observation.
4. Find the class width by dividing the range by the number of classes and rounding
R
up, not off. w  .
k
5. Pick a suitable starting point less than or equal to the minimum value. The starting
point is called the lower limit of the first class. Continue to add the class width to
this lower limit to get the rest of the lower limits.
6. To find the upper limit of the first class, subtract U from the lower limit of the
second class. Then continue to add the class width to this upper limit to find the
rest of the upper limits.
7. Find the boundaries by subtracting U/2 units from the lower limits and adding U/2
units from the upper limits. The boundaries are also half-way between the upper
limit of one class and the lower limit of the next class. !may not be necessary to
find the boundaries.
8. Tally the data.
9. Find the frequencies.
10. Find the cumulative frequencies. Depending on what you're trying to accomplish,
it may not be necessary to find the cumulative frequencies.
11. If necessary, find the relative frequencies and/or relative cumulative frequencies

Example*:

Construct a frequency distribution for the following data.

11 29 6 33 14 31 22 27 19 20
18 17 22 38 23 21 26 34 39 27

Solutions:

Step 1: Find the highest and the lowest value H=39, L=6

Step 2: Find the range; R=H-L=39-6=33

Step 3: Select the number of classes desired using Sturges formula;

k  1  3.32 log n =1+3.32log (20) =5.32=6(rounding up)

Step 4: Find the class width; w=R/k=33/6=5.5=6 (rounding up)

Step 5: Select the starting point, let it be the minimum observation.

 6, 12, 18, 24, 30, 36 are the lower class limits.

Step 6: Find the upper class limit; e.g. the first upper class=12-U=12-1=11

 11, 17, 23, 29, 35, 41 are the upper class limits.

So combining step 5 and step 6, one can construct the following classes.
Class limits
6 – 11
12 – 17
18 – 23
24 – 29
30 – 35
36 – 41

Step 7: Find the class boundaries;

E.g. for class 1 Lower class boundary=6-U/2=5.5

Upper class boundary =11+U/2=11.5

 Then continue adding w on both boundaries to obtain the rest boundaries. By


doing so one can obtain the following classes.

Class boundary
5.5 – 11.5
11.5 – 17.5
17.5 – 23.5
23.5 – 29.5
29.5 – 35.5
35.5 – 41.5

Step 8: tally the data.

Step 9: Write the numeric values for the tallies in the frequency column.

Step 10: Find cumulative frequency.

Step 11: Find relative frequency or/and relative cumulative frequency.

The complete frequency distribution follows:


Class limit Class Class Fre <Cf >C rf.
boundary Mark q. f
6 – 11 5.5 – 11.5 8.5 2 2 20 0.10
12 – 17 11.5 – 17.5 14.5 2 4 18 0.10
18 – 23 17.5 – 23.5 20.5 7 11 16 0.35
24 – 29 23.5 – 29.5 26.5 4 15 9 0.20
30 – 35 29.5 – 35.5 32.5 3 18 5 0.15
36 – 41 35.5 – 41.5 38.5 2 20 2 0.10

2.2. Diagrammatic and graphical presentation of data


The most convenient and popular way of describing data is using graphical presentation.
It is easier to understand and interpret data when they are presented graphically than
using words or a frequency table. A graph can present data in a simple and clear way.
Also it can illustrate the important aspects of the data. This leads to better analysis and
presentation of the data. In this article, we discuss the approach for the most commonly
used diagrammatic or graphical methods such as bar chart, pie chart, histogram,
frequency polygon and cumulative frequency polygon.

2.2.1. Diagrammatic Presentation of Data

The three most commonly used diagrammatic presentation for discrete as well as
qualitative data are:

 Pie chart
 Bar chart
 Pictogram
A) Pie chart
A pie chart is a circle that is divided in to sections or wedges according to the percentage of
frequencies in each category of the distribution. The angle of the sector is obtained using:

Example 2.4:Draw a suitable diagram to represent the following population in a town.

Men Women Girls Boys

2500 2000 4000 1500

Solutions:

Step 1: Find the percentage.

Step 2: Find the number of degrees for each class.

Step 3: Using a protractor and compass, graph each section and write its name with
corresponding percentage.

Class Frequency Percent Degree

Men 2500 25 90

Women 2000 20 72

Girls 4000 40 144

Boys 1500 15 54

Total 10000 100 360


Boys Men
15% 25%

Girls Women
40% 20%

B) Bar Charts
 Used to represent & compare the frequency distribution of discrete variables and
attributes or categorical series.
 Bars can be drawn either vertically or horizontally.
In presenting data using bar diagram,

 All bars must have equal width and the distance between bars must be equal.
 The height or length of each bar indicates the size (frequency)of the figure represented.
There are different types of bar charts. The most common are:

 Simple bar chart


 Component bar chart.
 Multiple bar charts.
I. Simple bar chart
 Are used to display data on one variable.
 They are thick lines (narrow rectangles) having the same breadth. The magnitude of a
quantity is represented by the height /length of the bar.
Example 2.5: Number of students in the four department of Science College given as follows:

Department Physics Maths Chemistry Biology

Number of 200 400 450 600


students
Male 170 350 250 200

Female 30 50 200 400

Draw a simple bar chart of the number of students by department.

Solution:
Simple bar chart

800 600
Frequency

600 400 450


400 200
200
0
Phys Maths Chem Bio
Deprtm ent

II. Component Bar chart


 When there is a desire to show how a total (or aggregate) is divided in to its component
parts, we use component bar chart.
 The bars represent total value of a variable with each total broken in to its component
parts and different colors or designs are used for identifications

Example 2.6:Draw a component (sub-divided) bar chart of the number of students by


department is given in the example 2.5.

Solution:

Sub-divided bar chart

800
600 Female
Frequency 400 Male
200
0
Phys Maths Chem Bio
Department

III. Multiple Bar charts


 These are used to display data on more than one variable.
 They are used for comparing different variables at the same time.

Example 2.7: The following data represent sales by product, 1957- 1959 of a given company for
three products A, B, C.

Product Sales in ($)

1957 1958 1959

A 12 14 18

B 24 21 18

C 24 35 54

Draw a multiple bar chart to represent the sales by product from 1957 to 1959.

Solution:

C) Pictograph

In this diagram, we represent data by means of some picture symbols. We decide about a suitable
picture to represent a definite number of units in which the variable is measured.

2.2.2. Graphical Presentation of data


The histogram, frequency polygon and cumulative frequency graph or ogive is most commonly
applied graphical representation for continuous data.

Procedures for constructing statistical graphs:


 Draw and label the X and Y axis.
 Choose a suitable scale for the frequencies or cumulative frequencies and label it on the Y
axis.
 Represent the class boundaries for the histogram or ogive or the mid points for the frequency
polygon on the X axis.
 Plot the points.
 Draw the bars or lines to connect the points.
Histogram: A graph which displays the data by using vertical connected bars of various heights
to represent frequencies. Class boundaries are placed along the horizontal axis. Class marks and
class limits are sometimes used as quantity on the X axis.
Example 2.8: Construct a histogram to represent the following data.
Class 15-24 25-34 35-44 45-54 55-64 65-74 75-84
limits
Frequency 3 4 10 15 12 4 2

Histogram
Frequency
20
15
15 12
10
10
4 4
5 3 2

0
Class boundaries

Frequency polygon: If we join the mid-points of the tops of the adjacent rectangles of the
histogram with line segments a frequency polygon is obtained. When the polygon is continued to
the x-axis just outside the range of the lengths the total area under the polygon will be equal to
the total area under the histogram.

Example 2.9: Construct a frequency polygon for the previous data in example 2.8.
Solution:
Class Frequency Class Class R.F. % R.F. Less than More than
limits marks boundaries (percent) C.F. C. F.

15 - 24 3 19.5 14.5 - 24.5 0.06 6% 3 50

25 – 34 4 29.5 24.5 - 34.5 0.08 8% 7 47

35 - 44 10 39.5 34.5 - 44.5 0.20 20% 17 43

45 - 54 15 49.5 44.5 - 54.5 0.30 30% 32 33

55 - 64 12 59.5 54.5 - 64.5 0.24 24% 44 18

65 - 74 4 69.5 64.5 - 74.5 0.08 8% 48 6

75 - 84 2 79.5 74.5 - 84.5 0.04 4% 50 2

Total 50 1.00 100%

Adding two class marks with f i  0 , we have 9.5 at the beginning, and 89.5 at the end, the
following frequency polygon is plotted:

Frequency Polygon
20
F
r 15
e
q
10
u 5
e
n 0
c 9.5 19.5 29.5 39.5 49.5 59.5 69.5 79.5 89.5
y
Class mark

Summary exercises

1. Construct a frequency distribution for the following data.

11 29 6 33 14 31 22 27 19 20
18 17 22 38 23 21 26 34 39 27
2. The following table is a grouped frequency distribution of money spent per visit by a random
sample of 100 customers at a dep’t store.
Amount spent 3-7 8-12 13-17 18-22 23-27 Total
no of customers 10 30 35 20 5 100
i.) For each of the above class state
a) class boundary
b) the class width
c) the class mark
d) draw histogram and ogive curve

You might also like