You are on page 1of 37

TABLE OF CONTENTS

TABLE OF CONTENTS
TITLE PAGE
INTROUCTION TO MODUE 1 ………………………………………………….. 2
UNIT 1 INTRODUCTION ………………………………………………………... 3
1.1. DEFINITIONS AND CLASSIFICATIONS ………………………………. 4
1.2. METHOD OF DATA COLLECTION ……………………………………. 9
1.3. METHODS OF DATA REPRESENTATION …………………………… 16
UNIT SUMMARY………………………………….………….. …………….. 36
UNIT 2 MEASURE OF CENTRAL TENDENCY…………………………. 39
2.1. MATHEMATICAL MEASURES ……………………………………….. 40
2.2. POSITIONAL MEASURES……………………………………………… 57
2.3. THE MODE………………. …………………………………………….. 72
UNIT SUMMARY………………………………………….. ………………. 78
UNIT 3 MEASURE OF VARIATION……..……………………… 79
3.1. POSITIONAL MEASURES……………. ……………………………… 80
3.2. MATHEMATICAL (CALCULATED MEASURES) …………………… 83
3.3 RELATIVE MEASURES OF VARIATION……………………………. 99
3.4. MEASURE OF SKEWNESS AND KURTOSIS ……………………… 106
UNIT SUMMARY …………………………………………………………….. 117
FEED BACK TO ACTIVITIES ……………………………………………….. 118
ANSWER TO EXERCISES …………………………………………………… 120
REFERENCES ………………………………………………………………… 126

INTRODUCTION TO MODULE 1

Dear Student, this module will provide you with general introductions, how to collect and
present as well as how to summarize statistical data with the help of measures of central
tendency and measures of variation.

The module consists of three units. The first unit deals with definitions of some basic terms
in Statistics, classifications of statistics and data, different methods employed in data
collection as well as data representation.
A distribution can be characterized by future measures ; Measure of central tendency,
dispersion , skewness & kurtosis.
The second unit deals with the different measures of central tendency, which are collectively
called averages. These averages are broadly classified in to mathematical (averages-based
on all data ) and positional averages( not based on all data).
The last unit presents the different measures of variation, which are also classified in to
mathematical and positional measures. Measure of dispersion may either be absolute ,which
are expressed in the same unit as the variable it self or relative ,which are free of units of
measurement are useful in comparing the variability or consistency of two or more sets of
data. Finally, this unit would have been incomplete with out discussing measures of kurtosis
and skewness. The last sub unit presents other two measures , skewness and kurtosis ,used
to describe a given set of data in relation to its shape.

Objectives:
At the end of this module, you will be able to:
 Define some basic terms in statistics and identify some important classifications.
 Know and apply the methods of collection & representation of data.
 Define, explain the characteristics of, and calculate measures of central tendency &
variation.
 Compare the variability among different sets of data.
 Explain the distribution of data using measures of Skewness & Kurtosis.

UNIT 1
Methods of Data collection & presentation
CONTENTS
1.1 INTRODUCTION 4
1.2 METHODS OF DATA COLLECTION 9
1.3 METHODS OF DATA REPRESENTATION 16

UNIT INTRODUCTION
What is Statistics? What is the need to study statistics? How is it employed? These are the
basic questions one has to raise with the field of statistics. This unit will provide only partial
answers to these questions. Even though some of you might have been introduced to

2
Statistics in secondary schools, this unit is assumes you are faced with statistics for the first
time.

This unit has three sub-units. The first sub unit defines some important terms starting with
the word “Statistics” it self, treated as singular and plural, and some classifications. The
second one is about the different methods of data collection while the third one shows how
to present a set of collected data by means of tables, graphs and diagrams.

Objectives:
At the end of this unit, you will be able to:
 Give the meanings of some basic terms in statistics
 Classify statistics and statistical data
 Identify different classifications of data.
 Organize data in to a frequency distribution.
 Represent data by means of graphs and diagrams.

1 INTRODUCTION

This sub-unit will define some basic terms in statistics, and will also classify Statistics
broadly, and some other basic terms.

DEFINITIONS AND CLASSIFICATIONS OF STATISTICS


A. Definitions of Statistics
There have been many definitions of the term” Statistics” – depending on its use in the
plural or in its singular sense.
Definition 1.1: In the plural sense, statistics means a collection of numerical facts, figures or
“statistical data”.

3
Example 1.1: Vital statistics (numerical data on marriages, births, divorces, etc), social
statistics (numerical data on health, education, crimes, etc) are some examples of statistical
data under this definition.
Note that all numerical data are not statistical data. Some of the characteristics which data
must possess are given below:
i. Statistics are numerically expressed; qualitative statements are not statistics.
ii.Statistics are aggregates of facts. Single and isolated figures are not statistics as they
cannot be compared and are unrelated.
iii. Statistics are collected in a systematic manner for a pre-determined purpose.

Example 1.2: The statement “Student X scored 85 in Math 241” is not statistics although it
is a numerical statement of facts. On the other hand, if the instructor says that the average
mark in Stat 241 of this year is 70%, then this would be considered as statistics, since the
average has been computed from many related figures.

Definition 1.2: In the plural sense again, statistics refers to numerical measures obtained
from a sample (“statistic” being the singular term).

These measures may include the sample mean ( x ), sample variance ( s 2 ), sample standard

derivation(S), and so on.

Definition 1.3: In its singular sense, the word Statistics refers to the subject area of applied
mathematics which is concerned with development and application of methods and
techniques for collecting, organizing, presenting, analyzing and interpreting statistical data.
This meaning of Statistics is the one which refers to the field of Statistics.
This last definition of Statistics points out the following five stages in any statistical study-
Collection, Organization, Presentation, Analysis, and Interpretation of data.
1. Collection of data: This is the process of obtaining measurements or counts which
constitutes the first step in any statistical investigation.
2. Organization of data: Helps to have a clear understanding of the information gathered
which includes editing, classifying, and tabulating the data collected.

4
3. Presentation of data: The purpose of data presentation is to have an overall view of what
the data actually look like, and to facilitate further statistical analysis. Data presentation can
be done in the form of tables and graphs or diagrams. How ever this step can be escaped
with no loss of continuity in the remaining steps.
4. Analysis of data: The purpose of which is to dig out useful information for decision
making, which may require just critical observation of the data collected or sophisticated
mathematical techniques.
5. Interpretation of data: This is concerned with drawing conclusions from the data
collected and analyzed; this is a difficult task and requires a high degree of skill and
experience.

B. Classification of Statistics
Statistics is classified in to two broad categories: Descriptive Statistics, and Inferential
Statistics.
1. Descriptive Statistics: This part deals with methods for only describing the given set of
data collected without going any further; that is without attempting to infer (conclude)
anything that goes beyond the data themselves. These methods may involve frequency
distributions, averages, dispersions, skew ness, kurtosis, relation ships,….etc.

Example 1.3: Suppose that a sample of grades of 6 students were 45, 60, 72, 80, 85 and 93.
If the instructor reports that half of them scored below 75, it is descriptive statistics. Also,

“The average score of the six students is 72.5” or “The range of the six students is 48” are
descriptive statistics.
Example 1.4: The tables and graphs in a dean’s office comparing the number of students in
his faculty, by sex, age, etc., for different years is descriptive statistics.

2. Inferential (or Inductive) Statistics: This is concerned with drawing conclusions or


generalizations (that is, making inferences) about the characteristics of a population based
on information obtained from a sample, taken from that population.

Example 1.5: With reference to Example 1.3, if the instructor declares (based on this
sample result) that the average score of the whole class is 72, this is inferential statistics.

5
Activity 1.1
The three lemons which a student bought at the gate weighed 7, 8, and 12 ounces. Which
of the following statements are purely descriptive and which are generalizations/inferential?
a) The average weight of the three lemons is 9 ounces.
b) The average weight of lemons sold at the gate is 9 ounces.

1.1.2 Definitions of some basic terms.


1. Population or Universe
Definition 1.4 A population is the complete collection of individuals, objects, or
measurements under investigation for a given objective.
A population may be finite or infinite, depending on whether the number of elements is
limited or unlimited. If the population is finite, we represent the population size (that is, the
number of objects or elements in the population) by N.

Example 1.6: The following are few examples of finite population: First year students in
BDU, light bulbs manufactured by a factory per day, houses in Bahir Dar town, etc.
Note that, in Statistics, the word “population”
i. doesn’t necessarily refer to people; and
ii. is synonymous with the word “ Universal set “ in set theory.

Definition 1.5: A statistical (descriptive) measure obtained from a population is called a


parameter.
Example1.7: Summary measures like the mean (denoted by  ) or variance (denoted by
 2 ) or standard deviation (  ) and the like, which characterize the population. Parameters
are usually unknown.
Definition 1.6: A sample is a representative part of a population under study.
Example 1.8: If one wants to make a study on the first year BDU students, say, to compare
their ESLCE results and their freshman results, then ESLCE results of all the first year
students of BDU can be taken as a population. But if some students are selected, then their
ESLCE results would be a sample.
A sample is selected in such a way that it reflects the characteristics of the whole population.
If a study is carried out on all the elements of a finite population, we have what is known as

6
a census (or complete enumeration). In other words, a census (or complete enumeration)
is a collection of information from each and every member of a statistical population. In
Ethiopia, for instance, census is carried out every ten years to count the total houses and
population in the country.

Definition 1.7 A measure obtained from a sample is called a statistic.


A statistic is, in a sample context, the analogue to a parameter of a population. These
measures may include the sample mean ( x ), sample variance ( S 2 ), sample standard
deviation (S), or any other summary measure based up on sample data.
Note that a statistic is used to estimate an unknown parameter of the population from which
the sample is drawn since it is often impossible to carry out census due to many reasons and
a parameter can not be obtained.
Sample size- the sample size (denoted by n) is the number of elements included in the
sample for investigation (n  2).
1.1.3 Importance and Applications of Statistics
Statistical methods are applied to any kind of situation in the face of uncertainty. The
following are some of the uses of statistics.
1. It condenses and summarizes a mass of data in to a few presentable, understandable
and precise figures.
2. It facilitates comparisons. Statistical devices such as averages, percentages, ratios,
etc., are the tools used for this purpose.

3. It helps in formulating and testing hypothesis. For instance, hypotheses like whether
a new medicine is effective in curing a disease, whether a given set of data follows
an assumed distribution, whether there is an association between two variables, etc.,
can be tested by appropriate statistical tools.
4. It helps in prediction. Statistical methods are very useful in analyzing the previous
data and predicting some future trends.
Nowadays, there is no field of science that does not make use of statistics. The fact that
many textbooks have been written on Business Statistics, Educational Statistics, Medical
Statistics, Psychological Statistics, Agricultural Statistics, Industrial Statistics, etc. show that

7
the field has diverse applications. Nevertheless, Statistics has its own limitations and it can
also be misused. It can
a) be used for the wrong purpose; that is, for purposes that are different from the
purpose for which they were collected.
b) be collected incorrectly so that they are biased.
c) be analyzed carelessly so that the results obtained are misleading.

Check- list for 1.1


Put a tick mark (√) for each of the following questions if you can solve the problems,
and an X otherwise
1. Can you give the definition of statistics in your own words?
2. Can you list down the characteristics of a statistical data?
3. Can you list down the five steps involved in a statistical investigation?
4. Can you explain the two classifications of statistics?
5. Can you mention the important features of statistics?
6. Can you define a population in your own words?
7. Can you define a sample?
8. Can you give an example of a population and a sample?
9. Can you define a parameter & a statistic?
10. Can you tell some of the importance (uses) of statistics?
11. Can you tell some of the limitations of statistics?

Exercise 1.1
1. What is the basic difference between descriptive and inferential statistics?
2. On three consecutive days, a traffic police issued 9, 14, and 10 speeding tickets,
and 5, 10 and 12 tickets for passing through red lights. Which of the following
conclusions can be obtained from these data by purely descriptive methods and
which are generalizations/inferential?
a) On the three days, the police issued more speeding tickets than tickets for
passing red lights.
b) On two of the three days, the officer issued more speeding tickets than tickets for
passing red lights
c) The officer will seldom write more than 15 tickets on any one day.

8
3. “There are three kinds of lies: lies, damned lies, and Statistics”. Comment.
4. Give two examples of an infinite population.
5. A sample is always a proper subset of a population (Say True or False).

1.2 METHODS OF DATA COLLECTION


Since a set of data is a raw material for Statistics, utmost care must be taken while collecting
data. If data are inaccurate and inadequate, the whole analysis may be faulty and the
conclusions drawn will be misleading.
In this sub-unit, the definition of data will be given; data will be classified (according to the
nature of the data, levels of measurement scales, sources of data and by role of time) and the
methods by which data are collected will be discussed.
1.2.1 Definition and Classification of Data
Definition 1.8. Data can be defined as any collection of facts or figures collected as part of
everyday life and expressed in numbers, or it is simply the numerical result of any count or
measurement.

The word “data” is the plural form is the Latin word “datum”, meaning fact. If the data are
written down as they are collected, then we call them “raw data”, otherwise if written in
some ascending or descending order, they are called “arrayed data”.
The type of data obtained depends upon the nature of the study and the population of
interest. The following are some classifications of data.
1 Data classified according to their nature-data may be classified as qualitative or
quantitative.
Qualitative data- (also called attribute or categorical data) are described only in words
(i.e., explained qualitatively) and are not directly quantifiable. Numbers will be obtained
secondarily by counting.

Example 1.9: Marital status (married, single, divorce, widow), literacy rate, gender,
religion, etc., are all qualitatively expressed.

Quantitative data- (also called numerical data) can assume numerical values, obtained by
either counting or by measuring.

9
Example 1.10: Speed, number of children per family, age, etc., are examples of quantitative
data, since they are expressed in numbers.
A quantitative data may further be classified as discrete or continuous.
Discrete data- are countable data which are obtained by counting and can take only whole
numbers.
Example 1.11: Number of absentee students in a statistics class, number of students in
BDU, number of car accidents in Bahir Dar per year, are all discrete.
Continuous data- take any value within a specific range. These are data obtained by
measurement.

Example 1.12: Data obtained on heights, weights, rainfall records, income, length of chalk,
etc., are examples of continuous data.

2. Data classified by scale of measurement. According to the scale of measurement, data


can be classified as nominal, ordinal, interval, or ratio data.

Nominal data: These are categorical or qualitative data which are names or labels only
.These data are converted in to numerical data by coding the various categories. However
these data are numerical in appearance only, because they do not share any of the properties
of ordinary arithmetic operations. We cannot infer quantitative differences from nominal
data.
Example 1.13: Sex of a person, male and female, could be coded as 0 and 1, respectively;
but we cannot say that 1 is greater than 0 (which is meaningless).Similarly blood type
(A,B,AB,O),religion (Christian, Muslim, Catholic,…etc.) are nominal data.

Ordinal data: are defined as nominal data that can be ordered or ranked. That is, ordinal
data can have meaningful inequalities. In ordinal data “>” does not necessarily mean
“greater than”, but may be used to denote “happier than”, “preferred to”, or “more difficult
than”.
Note that even though these values can be ranked the interval between two successive values
is not known or can not be measured.

Example 1.14: The hardness of iron, wood and glass of the same size can be given codes
11, 6 and 1, respectively. Then, 11 > 6 and 1 < 6 is meaningful, but we can not say that

10
11 -6 = 6 -1.

Interval data: are more refined (or advanced) than ordinal data. Interval data are defined as
ordinal data which have equality of units; that is, an increase from one level to the next
always reflects the same increase in quality, i.e., the interval between intervals is known.
Example 1.15: Suppose that we are given the following temperature readings in degree
Fahrenheit: 630, 680, 910, 1070 1260, and 1310.
Here, we can write 1070 > 680 or 910 < 1310, meaning warmer than or colder than.
Also we can write 680 – 630 = 1310 - 1260 , since equal temperature differences are equal in
the sense that the same amount of heat is required to raise the temperature of an object form
630 to 680 as from 1260 to 1310. On the other hand, it would not mean much if we say that
1260 is twice as hot as 630 even though 126 =2  63. To show why, converting both to the
Celsius scale, we get 126oF= 52.2oC, and 63oF= 17.2 oC, and the first figure is now more than
three times the second. This difficulty arises because both scales have artificial origins. So,
interval data are those which can have meaningful inequalities as well as meaningful
differences but not meaningful quotients.

Ratio data: These can have meaningful inequalities and differences as well as meaningful
quotients. All measurement data like height, weight, volume, and area are ratio data. With
ratio data, one can apply any arithmetic operation since zero on the scale implies absolute
absence of the characteristics under consideration.
Example 1.16: The length of an object can be measured in cm or inch. Since both have the
same origin of zero and 1 in = 2.54 cm, the ratio of 10 to 2.54cm is the same as that of
100 to 254 cm. That is, the ratio between any two lengths is the same regardless of which
ratio scale is used.
Activity 1.2
While collecting data from students in this class, think of at least one example of nominal,
ordinal, interval, and ratio data.

3. Data classified by source- data may be primary or secondary. Primary data are those
Collected by the investigator himself, whereas secondary data are obtained from available
data already collected by some other agency for the same or different purpose.

11
4. Data classified by the role of time- Data may be either cross section or time series.
Cross section data: - is a set of observations taken at one point in time.
Time series data: - collected for a sequence of periods, usually at equal intervals, may be on
a weekly, monthly, quarterly, yearly, etc, basis.
1.2.2 Methods of Data Collection
The methods used in collecting data could vary according to the nature of the study and
availability of resources like time, money and skilled human power. According to the
sources of data, there are two methods of data collection: Primary and secondary method.
1. Primary method-consists of obtaining data or information by any one of the following
ways: Direct personal interviews, indirect oral interviews, Information from correspondents,
mailed questionnaire method or questionnaires to be filled by enumerators.
a) Direct personal interviews- there is a face-to-face contact with persons from whom
information is to be obtained (known as informants).

Some merits (advantages) of this method are large response rate, permits explanation of
questions (if necessary), the language of communication can be adjusted to the informants
level, etc.
Some of the limitations include – may be very costly, needs trained interviewers and more
time is required by this method as compared to other methods.

b) Indirect oral interviews- the investigator contacts third parties, called witnesses,
capable of supplying information. This method is adopted in those cases where the
informants are not inclined to respond if approached directly; like an inquiry regarding
addiction to drugs, alcohol, etc.
c) Information from correspondents- the investigator appoints local agents or
correspondents in different places to collect information. These correspondents collect and
transmit information to the central office where the data are processed. Newspaper agencies
generally adopt this method. This method is also adopted by some departments of
government where regular information is to be collected from a wide area.
d) Mailed Questionnaire method- a list of questions pertaining to the survey (known as
questionnaire) is prepared and sent to the various informants by post, and the respondent is
expected to fill the questioner and send it back to the investigator.

12
Some merits of this method are: easily adopted where the study covers vast area; it is
relatively cheap; ensures anonymity; respondents are reached without appointment; gives
time for personal questions or those requiring reaction by the family.
Some limitations include: the method is only for literate informants; involves less response
rate; information supplied may be incorrect and may be difficult to verify the accuracy.
e) Questionnaires to be filled by enumerators- the enumerators contact the informants,
get replies to their questions and fill them in the questionnaire form themselves. (The form
which contains the set of questions to be asked by an enumerator is known as a schedule.)
Some merits of this method are:- can be adopted if informants are illiterate; very little non-
response rates; information is more reliable as the accuracy can be checked by
supplementary questions (if necessary).
Some limitations of this method are: - quite costly method; success largely depends upon
the training given to enumerators; skilled interviewing requires experience and training
(which is usually neglected).

Principles of Questionnaire Construction


Before framing the questionnaire, it is necessary to set out in detail the data we desire from
the answers to a questionnaire. There are no hard-and-fast rules, but the following general
principles may be helpful in framing a questionnaire:
1. Covering letter. In as few words as possible, the person conducting the survey must
introduce himself; state the purpose of the survey, assurance of confidentiality of the
answers, etc., will be enclosed.
2. Number of questions should be small. There is an inverse relationship between the
length of a questionnaire and the rate of response.
3. The questionnaire should provide necessary definitions of terms and instructions
how to fill the form.
4. Arrangement of questions. The questions should be arranged in a logical order so
that they should not skip back and forth from one topic to another.
5. Questions should be short and simple to understand; unless the respondent is
technically trained, technical words should not be used.

13
6. Avoid certain type of questions:
a) Ambiguous questions (having different meanings to different people).
b) Irrelevant questions. The rule is that “if you are not sure that the question is
important for the purpose of the investigation, then avoid the question”.
c) Too personal questions – like income, tax-paid, etc.
d) Specific information questions, say, in stead of asking “what is your age?”
or “How many children do you have?” intervals may be asked.
e) Open-end questions- like “What should be done about?”, “Why do you
use?” will be difficult to tabulate and analyze.
f) Leading questions-like “Do you agree that all wise students use?”
g) Questions requiring calculation – like annual net income, ratios or
percentage –may take time and the questionnaire may not be sent back. Try
to get the primary data and the calculations may be left to you.
7. Questions should be capable of objective answers. For instance, while asking a
worker how he travels to his office, multiple – choices can yield objective answers.
8. Use “Yes” or “No” type questions as far as possible. If there is a clear-cut alternative
like “Did you vote the last election?” sometimes it is possible to add other
alternatives such as “do not know”, “no opinion”, “indifferent”, etc.
9. Tick-type questions. The best kinds of questions are those which allow printed
answers to be ticked. For instance, instead of asking the sex of a person and leave a
blank space, it is better to give the possible options and let him tick in a box.
10. Questionnaire should look attractive. The printing and the paper used, etc., and
plenty of space should be left for answers depending up on the type of questions.
11. Indicate the units in terms of which information is to be given.

Example 1.17: To collect data on distance, one must decide on using one unit
measuring length, say, cm, meter, inch, km or mile consistently.
12. Cross –checks. If possible, add one or more cross-checks to determine whether or
not the respondent is answering at least important questions correctly.

14
13. Pre-testing the questionnaire (or pilot survey). The questionnaire should be pre-
tested with a group before mailing it out. This helps to discover the short comings of
the questionnaire (if any) and to revise it before the main survey.

2. Secondary Method
In most studies, the investigator finds it impractical to collect first hand information and he
makes use of data collected by others. The data collected with the help of primary methods
is usually published in Journals, Newspapers, Magazines, Reports or Books. It may also
remain unpublished by the investigator or some other office. Such data, whether published,
or unpublished could be used by another investigator for a similar or completely different
purpose. The method of collecting such secondary data is known as the Library method.
An investigator has to be extra-cautious while using secondary data; for instance, he should
consider whether or not the data are suitable for the current investigation, i.e regarding
 Coverage (if the data is adequate for the purpose )
 Reliable (if the data can be used with full confidence)
 Timeliness.
Definition 1.9: A characteristic which shows variability or takes on different values is called
a variable.
Variables are divided into two categories: quantitative or qualitative.
 Quantitative variable – is the one which leads to quantitative data. Hence we can talk about a
discrete variable (yielding discrete data) and a continuous variable (yielding continuous data).
 Qualitative variable- similarly, leads to qualitative data.
Check – list for 1.2
Put a tick mark (√) for each of the following questions if you can solve the problems,
and an X otherwise
1. Can you classify data depending up on the nature, the scale of measurement, the
source, and the role of time?
2. Can you give your own examples of nominal, ordinal, interval and ratio data?
3. Can you tell and describe the methods of data collection?
4. Can you list down the principles of questionnaire construction?

Exercise 1.2
1. Classify the following as discrete or continuous variable.

15
a)Temperature; b) number of courses offered in BDU; c) rain fall; d) age.
2. Classify the following as nominal, ordinal, interval or ratio data:
a) hair color; b) area in m2; c) Id card numbers;
d) number of students in a class ; e) hair length; f) types of cars.
3. Distinguish between primary and secondary data. What precautions should be taken
before using secondary data?
4. What is a questionnaire? Discuss the main points to be considered while designing
a questionnaire.

1.3 METHODS OF DATA REPRESENTATION


In this sub- unit, techniques of organizing and summarizing raw data will be discussed; first,
using a tabular form known as a frequency distribution, and then using graphs and diagrams.
Frequency Distribution (F.D.)
A frequency distribution is a summarized presentation of the values of a variable arranged in
order of magnitude either individually (for a discrete variable), or in to classes (for a
continuous variable), or into categories (in case of qualitative data) along with their
frequencies.
A frequency distribution has two parts; namely,
i. The values of the variable (if quantitative) or the categories (if qualitative), and
ii. The number of observations (frequency) corresponding to the values or
categories.
There are two types of frequency distributions: Categorical (or qualitative), and
Numerical (or quantitative)
i. Categorical Frequency Distribution
Here data are classified according to non-numerical categories. A categorical FD is
constructed in such a way that the categories must be mutually exclusive and exhaustive,
i.e., an element must be counted in one and only one category.
The categorical FD is used to present nominal and ordinal data.

a) Nominal data: Here the construction is straight forward: count the occurrences in
each category and find the totals.

16
Example 1.18: The martial status of 60 adults classified as single, married, divorced and
widowed is given below:
Marital status Single Married Divorced Widowed Total
Frequency 25 20 8 7 60

b) Ordinal data. The construction is identical to the nominal case. How ever, the
categories should be put in an ordered manner.
Example 1.19: Satisfaction on teaching method in a class of size 80 is a good example.
Satisfaction Very Satisfied Satisfied Dissatisfied Very dissatisfied Total
Frequency 15 36 6 3 60

Activity 1.3
Construct a frequency distribution for a survey taken at a hotel, that 40 tourists arrived by
the following means of transportation:
car car bus plane plane car plane plane bus car plane car car car
plane bus car bus car plane car car car bus car bus bus
plane plane plane car plane plane plane bus bus car car plane car

ii. Numerical Frequency Distribution


In such frequency distributions, the data are classified according to numerical size. This is
used to summarize interval and ratio data. Numerical frequency distributions may be
discrete or continuous, depending on whether the variable is discrete or continuous.
a) Discrete (Ungrouped) Frequency Distribution
Preparing this type of distribution is very simple: count the number of times each possible
Value is repeated. To facilitate counting, prepare a column of “tallies”, and blocks of five
tallies are prepared. Finally, count the number of bars and get frequencies.

Example 1.20: In a survey of 30 families, the number of children per family was recorded
and obtained the following data:
4 2 4 3 2 8 3 4 4 2 2 8 5 3 4 5 4 5 4 3 5 2 7 3 3 6 7 3 8 4.
Since the variable “Number of children in a family” can assume only the values
0, 1, 2, 3, 4,  is a discrete variable, its frequency distribution is a discrete one.

17
These individual observations can be arranged in ascending order of magnitude to from an
array: 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 5 5 5 5 6 7 7 8 8 8.
The distribution of children in 30 families would be:
No. of children 2 3 4 5 6 7 8 Total
No. of family 5 7 8 4 1 2 3 30

Activity 1.4
On a certain busy road there were the following numbers of traffic accidents during 50
morning rush hours. Construct a distribution showing how many mornings there were 0,
1, 2, 3, 4, 5, 6, or 7 accidents.

1 0 3 2 3 5 3 0 2 7 3 6 1
0 4 3 1 3 1 5 3 4 4 3 1 6
2 1 3 1 2 2 1 0 2 0 5 2
0 1 1 2 4 5 3 4 3 4 2 4

Suggestions in constructing a discrete frequency distribution:


1. Make sure that the variable is discrete and determine the possible values.
2. Prepare 3 columns: the first for the possible values, the second for tally marks, and
the third for the frequency of each value as:
Values Tally marks Frequency

3. Write the possible values in the first column in ascending order.


4. For each value, put one tally in column 2.
5. Count the tallies of each value and write the corresponding frequencies in column 3.
Example 1.21: Construct a discrete frequency distribution for the following data of the
scores of 50 students in a test corrected out of 25.
20 24 22 19 20 10 18 24 10 15
21 20 15 19 20 10 14 22 10 18
18 15 15 18 20 15 14 22 14 20
15 14 14 20 21 10 20 20 15 24
10 10 20 22 14 21 20 14 15 10

18
Solution: The variable “scores of a student”, as given here is discrete and the possible
values are 10, 14, 15, 18, 19, 20, 21, 22 and 24, hence the following FD is constructed:
Distribution of scores of 50 students in a Statistics test:

Score Tally marks Number of St.


10 //// /// 8
14 //// // 7
15 //// /// 8
18 //// 4
19 // 2
20 //// //// / 11
21 /// 3
22 //// 4
24 /// 3
Total 50

Remark: If the number of possible values of a discrete variable is very large so that the
discrete frequency distribution will no more be a condensed presentation, the data have to be
handled as continuous and distributed in to classes.
b) Continuous (Grouped) Frequency Distribution
Continuous FD’s arise from continuous variables. Unlike for a discrete FD, where one class
is used for each value of the variable, a class can not be allocated to each value of a
continuous variable. Otherwise, the purpose of classification (i.e., condensation of the data)
will be lost.
The categories in to which the observations are distributed are called classes or class
intervals. The classes should be set so that they contain all the items (i.e., be exhaustive)
and no two classes share the same item (i.e., it should be mutually exclusive or should not
overlap). This is the basic principle in the construction of such frequency distributions.
Example 1.22: Consider the following FD on wages of 100 workers in a factory.

Wage 40-44 45-49 50-54 55-59 60-64 65-69 70-74 75-79

Freq. 6 9 15 17 20 13 12 8
CB’s 39.5- 44.5- 49.5- 54.5- 59.5- 64.5- 69.5- 74.5-79.5
44.5 49.5 54.5 59.5 64.5 69.5 74.5

19
In this example, all workers earning between 40 and 44 birr (inclusive) are grouped in to the
first class, all workers earning between 45 and 49 birr (inclusive) are grouped in to the
second class, and so on.
Basic Terms in a continuous frequency distribution
i. Class Frequency (or simply frequency)
It refers to the number of items belonging to a class. For instance, in Example 1.22, 15 is
the frequency of the third class, meaning there are 15 workers earning wage between 50 and
54 (inclusive).

ii. Class limits (C.L.)


The lowest and height values that can be included in a class such that there is gap between
successive classes are called class limits. The lower class limit (L.C.L.) of a class is a value
such that no lower value can fall in to that class, where as the upper class limit (U.C.L.) of a
class is a value such that no upper value can fall in to that class.
For the above example, if we consider the first class, 40 is the L.C.L. and 44 is the U.C.L,
meaning no value less than 40 and no value more than 44 can be included in this class.
Notation: L.C.L1 = lower class limit, and U.C.L1 = Upper class limit of the first class
(In this case, L.C. L1 = 40, U.C. L1 = 44).
In general, L.C.Li = Lower class limit and , U.C.Li = Upper class limit of the ith class.

iii. Class Boundary (C.B) or Real class limits


Taking into account the fact that values recorded as 40 actually means a value lying
between 39.5 and 40.5, and a value recorded as 44 actually lying between 43.5 and 44.5
these true units for the class 40-44 are 39.5 and 44.5 . These true units for each class are
known as class boundaries. Rephrasing, class boundaries are the lowest and the highest
values in each class when there is no gap between successive classes. To work with the
distribution of a variable as if it was continuous, we make use of these real class limits (also
known as class boundaries).

Finding class boundaries


Let d =LCL of a class – UCL of preceding class. Add half of this difference to all upper
class limits to get the upper class boundary (UCB), and subtract it from all lower class limits
to get the lower class boundary (LCB).That is,

20
1 1
UCBi = UCLi + d and LCBi = LCLi - d
2 2
1
Consider Example 1.22, d  45  44  1 and d  0.5 .For the first class,
2
UCB = UCL + 0.5 = 44+0.5=44.5, and LCB= LCL-0.5=40-0.5=39.5; continuing in such a
way, we get the class boundaries in the third row of the example.

iv) Class width (w)


The size or width or magnitude of a class interval is the difference between the upper and
lower class boundaries. The class width is the range of values each class should cover.
The class width for all classes of Example 1.22 is 5; that is,
w1  44.5  39.5  5, w2  49.5  44.5  5 , and so on, w 8  79.5  74.5  5 .

Since w1  w2    w8  5  w , we say that all classes have equal class width or the class
width is said to be constant or uniform, denoted by w.

Note: When all the classes are of the same size, the class width can also be obtained as the
difference between any two successive lower limits or upper limits. For instance,
w1  45  40  5 and w2  50  45  5 .
v) Class Mark (C.M.)
The class mark is the mid-point of the class interval or is a value which lies mid way
between the lower and upper limits of the class. It is obtained as:
LCL  UCL LCB  UCB
CM  , or CM  .
2 2
Note: In further analysis of the data, a CM is used to represent all the items in that class.
40  44
With reference to Example 1.22, the first CM is  42  CM 1 .
2
45  49 75  79
The second CM is  47  CM 2 , and so on , CM 8   77 .
2 2
When the class width is uniform (or constant) in a distribution, we can use the relation,
CM2 = CM1+w, CM3= CM2+w; and in general,
CMi+1 = CMi + w; CMi being the CM of the ith class.

VI) Open and closed ended distribution.

21
If the lower class limit /boundary of the first class or the upper class limit /boundary
of the last class or both are not specified then the distribution is said to be open-end
distribution otherwise it is close-end distribution.
Activity 1.5
Given the following class intervals: 15 – 29, 30 – 44, 45 – 59, 60 – 74, and 75 -89.
Find a) the LCL and UCL of each class, b) the LCB and UCB of each class.
c) the class marks , and d) the class width
1.3.1.2 Constructing a continuous frequency distribution
In spite of the great importance of classification in statistical analysis, no hard-and-fast rule
can be laid down for it, the following points may be kept in mind for classification.
i) The classes should be clearly defined and should not lead to any ambiguity.
ii) The classes should be exhaustive, i.e., each of the given values should be included in
one of the classes.
iii) The classes should be mutually exclusive and non-overlapping.
iv) The classes should be equal in width as much as possible. Whenever possible, all classes
should be of the same size (i.e., constant or uniform). This facilitates further analysis and
simplifies comparison between different classes. A frequency distribution with constant
class width can be presented graphically with greater ease. However, in some cases,
constant class width may be either impossible or unnecessary.
v) Indeterminate classes, i.e. open ended classes should be avoided as far as possible, since
they create difficulty in analysis and interpretation.
vi) The number of classes should be neither too large nor too small.

Practical steps in constructing continuous frequency distribution;


1. Determine the number of classes (k)
The number of classes depends up on total number of observations, the nature of data and
the ease of computation of the various descriptive measure of frequency distributions. In
general, the number of classes (k) largely depends up on the number of measurements and
variability among data decide with the help of Sturges’ rule-of-thumb: k = 1 + 3.322 log n,
rounded up or down to the nearest integer, where n is the number of observations and log is
the common logarithm.

22
Example 1.23: If n=10, k = 4.32  4; if n=100, k= 7.644  8; if n= 1000, k =10.96  11.

2. Determine the Class Width (w)

If the number of classes is known and if it is decided to use a uniform class width, we use

Range
w and rounded up or down to the nearest integer , where Range is the difference
k
between the highest and the smallest value of the variable.

Note: As far as possible, a class width of 5 or a multiple of 5 is convenient and facilities


computations.
3. Determine the Class Limits
Class limits should be chosen in such away that the mid value of the class interval and actual
average of the observations in that class interval are near to each other as much as possible.
The lower class limit of the first class should be less than or equal to the smallest value of
the observations collected from the field .The lower class limit of each class had better be a
multiple of zero or five. After determination of the lower class limit of the first class, we add
the size of a class on the lower class limit to obtain the lower class limit of the lower class
limit of the next higher class.

4) Determine the frequency of each class


Frequency of each class can be determined simply by counting the number of observations
belonging to each class.
5) Sum up the frequency of each class to check whether it is equal to the total number of
data collected from the field or not.

Example 1.24: Construct a continuous FD for the following raw data on marks (out of 100)
obtained by 50 students in Statistics.
57, 53, 65, 55, 50, 45, 64, 52, 16, 46, 42, 63, 33, 64, 53, 25, 54, 35, 48, 55, 70, 47, 39, 58,
52, 36, 65, 75, 26, 20, 55, 60, 83, 61, 45, 63, 49, 42, 35, 18, 51, 45, 42, 65, 39, 59, 45, 41,
30, 40.
Solution:
i. Since n = 50, using the Struges’ rule – of –thumb, the number of classes is:

23
k= 1+ 3.322 log 50 =6.64  7.
ii. Range = highest value – lowest value = 83 –16= 67.
Range 67
iii) Class width  w    9.57  10 .
k 7
iv) Since the smallest value is 16, the LCL1 can be 15 and the UCL1 should be 24; and the
frequency distribution would look like:

Marks frequency
Note: 1. For the class 15 - 24 3 boundaries, see the
25 – 34 4
nature of the data. 35 – 44 10
 If there is 45 -54 15 no decimal point, then d
55 – 64 12
= 1. 65 – 74 4
 If there 75 – 84 2 is one digit after the
Total 50
decimal, then d = 0.1.
 If there are two digits after the decimal, then d = 0.01.
2. If arrays of the data are formed, then a tally sheet may be unnecessary.

Activity 1.6
The following are the grades which 40 students obtained in a Statistics examination.
75 89 66 52 90 68 83 94 77 60
38 47 87 65 97 49 65 72 73 81
63 77 91 88 74 37 85 76 74 63
69 72 31 87 76 58 63 70 72 65
Group these grades into classes
30 – 39, 40 – 49, 50 – 59, 60 – 69, 70 -79, 80-89 and 90 – 99.

Example 1.25. Construct a FD for the following records of a test of 12 students out of 50:
25.4, 25.5, 29.8, 30.0, 35.2, 36.4, 40.0, 42.1, 45.3, 47.0, 48.6, 49.2.
Solution:
Since n = 12, k = 1+ 3.322 log 12 = 4.59  5; Range = 49.2 – 25.4 = 23.8;
23.8
class width is w   4.76  5 ; and LCL1 = 25.0  25.4 and d = 0.1.
5

24
Thus, the F.D. is given by:

Class Interval 25.0 – 29.9 30.0 – 34.9 35.0 – 39.9 40.0– 44.9 45.0 – 49.9 Total
Frequency 3 1 2 2 4 12
CB’s 24.95–29.95 29.95-34.95 34.95-39.95 39.95-44.95 44.95-49.95

1.3.1.3. Relative and Percentage frequency distribution


The relative frequency (RF) of a class shows the relative concentration of items in that class
in relation to the total frequency. Dividing the frequency in each class by the total frequency
gives the R.F. It can be converted in to a percentage F.D by multiplying each R.F by 100.
Thus, the relative frequency gives the proportion or the percentage of cases in each group.
Class frequency f
i.e R.F =  i  pi .
Total frequency n
Example 1.26: The class marks, class boundaries and the R.F of Example 1.24 are:

Class limits Frequency Class Class pi % R.F


marks boundaries
15 - 24 3 19.5 14.5 - 24.5 0.06 6%
25 – 34 4 29.5 24.5 - 34.5 0.08 8%
35 - 44 10 39.5 34.5 - 44.5 0.20 20%
45 - 54 15 49.5 44.5 - 54.5 0.30 30%
55 - 64 12 59.5 54.5 - 64.5 0.24 24%
65 - 74 4 69.5 64.5 - 74.5 0.08 8%
75 - 84 2 79.5 74.5 - 84.5 0.04 2%
Total 50 1.00 100%

1.3.1.4 .Cumulative frequency distributions


The cumulative frequency of a class tells us how often the values fall below or above that
class. There are two types of cumulative frequency distributions: the “less than” and the “or
more than” cumulative frequency distributions.

25
i. The “less than” cumulative F.D. is obtained by adding the frequency of all the
preceding classes including the frequency of that class.
ii) The “or more than” cumulative F.D. is obtained by adding the frequency of the
succeeding classes including the frequency of that class.
Note: - To keep up with the literary meanings of the phrases “ less than” and “ or more
than” and also to give sense to the presentation of cumulative frequencies, class boundaries
are used instead of class limits. Moreover, the meaning of cumulative frequency of a class is
better understood when class boundaries are used.
Example 1.27: For the data in Example 1.24, both cumulative frequency distributions are
given below:
The less than cumulative frequency distribution is:
Marks Cum. Freq.
Less than 14.5 0
Less than 24.5 3
Less than 34.5 7 The or more than cumulative frequency distribution
Less than 44.5 17
Less than 54.5 32 is
Less than 64.5 44
Less than 74.5 48
Less than 84.5
Marks 50
Cum. Freq.
14.5 or more 50
24.5 or more 47
34.5 or more 43
44.5 or more 33
54.5 or more 18
64.5 or more 6
74.5 or more 2
84.5 or more 0

Note: 1. Both cumulative F.D’s can be converted in to percentage cumulative F.D’s.


2. Relative frequencies can also be converted in to cumulative R.F’s.
3. Classifications such as 5 -10, 10 – 15, 15 – 20 are called, exclusive classifications,
since the upper limits are excluded from that class. It has similar interpretation with
class boundaries. All our previous examples are inclusive classifications, meaning that
both the LCL and UCL of a class are included ( or counted) in to that class .

Activity 1.7

26
The numbers of empty seats in a class room are grouped in to a table with classes 0 – 4,
5 – 9, 10 – 14, 15 – 19, 20– 24, and 25 or more. Will it be possible to determine exactly
from this table the number of empty seats on which there were
a)  10 ; b) > 10 ; c) > 14 ; d)  14 ; e) exactly 9 empty seats?

Diagrammatic and Graphical Method of Data Presentation


A F.D can be presented graphically or diagrammatically which help
 To understand the information easily.
 To make the data attractive to the eye.
 To make comparisons of items easy.
 To draw attention of the observer.
The purpose of graphs and diagrams is not to provide exact and detailed information. Any
further information can be obtained from the original data.
Under this section, the most commonly used diagrams and graphs will be discussed.
1.3.2.1 Diagrammatic Presentation of Data
Diagrammatic presentation of data is usually used to present categorical data.
There are two most commonly used charts (usually for qualitative data); namely, Bar
diagram (Bar chart) and Pie diagram (Pie chart).
a) Bar chart
A bar graph or a bar chart is the simplest and most commonly used pictorial representation
of data. It uses a series of equally spaced bars of uniform width. The bases of the bars are
the categories on the horizontal axis while the height of each bar represents the absolute
frequency of the particular category.

Example 1.28: Number of students in the four department of Science Faculty (BDU):
Department number of students
Physics 200
Maths 400
Chemistry 450
Biology 600
A simple bar chart of the number of students by department is given below:

27
Simple bar chart

800 600
600 450
400

Frequency
400 200
200
0
Phys Maths Chem Bio
Deprtment

This bar chart can further be sub divided in to components (say, by sex) and it will be called
sub-divided bar chart or component bar chart.
Example 1.29: Number of students in the four departments by sex is:

Department Number of students


Male Female Total
Physics 170 30 200
Maths 350 50 400
Chemistry 250 200 450
Biology 200 400 600

The sub-divided bar chart would look like:

Sub-divided bar chart

800
600 Female
Frequency 400 Male
200
0
Phys Maths Chem Bio
Department

Note: Percentage bar-chart is also possible in making comparisons; this is prepared by


converting the components into percentage.

28
Percentage bar chart

100%
Female
percent 50% Male

0%
Phys Maths Chem Bio
Department

b) Pie-chart (diagram)
Pie- charts are very popularly used in practice to show percentage breakdowns of a
categorical data. A pie-chart is a circle representing a set of data by dividing a circle into
sectors proportional to the number of items in the categories.
To construct a pie-chart, we need to find the R.F’s; compute the central angles :

fi
i   3600 , and finally draw a circle partitioned according to the central angles.
n
Example 1.30: The number of students of BDU by faculty is represented by a pie-chart as:
Faculty Number of student % Size of central angle
Education 5250 35% 1260
BEF 3750 25% 900
Engineering 3000 20% 720
Science 1950 13% 470
Law 1050 7% 250
Total 15,000 100% 3600

Pie chart for No. of students

7% Education
13%
35% BEF
Eng.
20% Science
25% Law

Pie-charts are useful to make comparisons as long as the number of components is not large
(usually, five or six components are used).

1.3.2.2 Graphical presentation of Data


Graphical method of data presentation is usually used to present continuous data.

29
A F.D. can be presented graphically in any one of the following ways:
Histogram, Frequency polygon or curve, Cumulative frequency curves (or ogives).

a) Histogram
A histogram is a graph consisting of a series of adjacent rectangles whose bases are equal to
the class width of the corresponding classes and whose heights are proportional to the
corresponding class frequencies with no gap between rectangles.
To construct a histogram, the class boundaries, the class marks or the class limits are plotted
on the horizontal axis and the class frequencies are plotted on the vertical axis. The absolute
(actual) frequencies may also be indicated on the top of each rectangle.

Example 1.31: The histogram for the frequency distribution of Example 1.24 is drawn a
follows using class boundaries:

Histogram

20
15
15 12
Frequency

10
10
4 4
5 3 2

0
Calss boundaries

b) Frequency polygon
A frequency polygon is a line graph where class frequencies are plotted against the class
marks and the successive points are connected by straight lines. As in the histogram, the
class marks are plotted along the X-axis and frequencies along the Y-axis. Two classes with
zero frequencies at both ends must be added to tie down the graph to the X-axis.

Example 1.32: Draw a frequency polygon for the F.D. of Example 1.24.
Solution: Adding two class marks with f i  0 , we have 9.5 at the beginning, and 89.5 at the
end, the following frequency polygon is plotted:

30
Frequency Polygon

16
14
12
Frequency

10
8
6
4
2
0
5
.5
.5
.5
.5
.5
.5
.5
.5
9.

29

59

79
19

39
49

69

89
Class Boundaries

Note: If the successive points are joined by a smooth curve rather than a straight line, the
resulting graph is known as a frequency curve.
c) Cumulative frequency curves (Ogives)
The cumulative frequency curve, also known as ogive, is a graphical representation of a
cumulative frequency distribution. Ogives are of two types: the “less than ogive” and the “or
more ogive”.
i. The less than ogive – the less than cumulative frequencies are plotted against upper class
boundaries of their respective classes and they are joined by either straight lines or smooth curves.
ii. The or more than ogive- in this case, the “ or more than” cumulative frequencies are
plotted against the lower class boundaries of their respective classes, and the connections
may be by straight lines or smooth curves.

Example 1.33: Referring to Example 1.24 both types of ogives are drawn below.

The Less than Ogive

60
50
Cumulative
Frequency

40
30
20
10
0
14.5 24.5 34.5 44.5 54.5 64.5 74.5 84.5
Class Boundaries

31
Cumulative Frequency
The More than Ogive

60
50
40
30
20
10
0
1 2 3 4 5 6 7 8
Class Boundaries

Note: 1. For both ogives, one class with frequency zero is added for similar reason
with the frequency polygon.
2. The ogives are useful to know the properties of cases below or above
certain values and to determine certain values graphically ( median, deciles, etc).
Activity 1.8
For the frequency distribution prepared in Activity 1.6, draw
a) a histogram using the mid points.
b) a frequency polygon to represent the distribution.

Check list for 1.3


Put a tick mark (√) for each of the following questions if you can solve the problems,
and an X otherwise. Can you
1. Define and construct categorical, discrete, and continuous frequency distributions?
2. Define the basic terms in a frequency distribution like class limits, class boundaries,
class width, and class mark?
3. Define & construct a relative, and cumulative frequency distributions?
4. Explain diagrammatical and graphical presentations of data?
5. Explain, and construct the different types of diagrams?
6. Explain, and construct the common types of graphs?

Exercise 1.3:
1. The following are the numbers of false alarms which a security monitoring service
received in 30 days. Construct a frequency distribution.

32
3 6 2 4 5 4 2 5 6 3 4 7 4 6 5
5 5 4 3 7 4 4 6 3 5 5 7 4 4 6

2. The following are weekly salaries (in birr) of employees of a firm:


91 139 126 119 100 87 61 77 99 95
88 112 118 89 116 97 105 95 80 86
108 106 127 93 86 135 148 116 76 69

The data are to be presented in a frequency distribution.


a) How many classes can be used? c) What LCL would be used for the first class?
b) What class width should be used? d) Prepare the complete frequency distribution.
3. The number of persons attending an art exhibition daily are grouped as follows:
Days 0-39 40-79 80-119 120-159
Frequency 157 244 96 67
Find (if possible) the number of days a) at least 79; b) more than 79; c) 40 or more; d)
at most 79; and e) between 80 and 159 persons attended the exhibition.
4. Prepare a frequency distribution for the following raw data on the life length of a
spare part that regulates temperature (data arranged for simplicity)
0.05 0.11 0.59 0.76 0.92 1.12 1.22
1.35 1.42 1.49 1.53 1.69 1.92 1.97
2.07 2.15 2.22 2.24 2.36 2.39 2.41
2.49 2.53 2.90 3.21 3.25 3.49 3.61
3.80 4.37 4.58 4.62 5.29 5.48 5.68
5. The weights of 50 members of a university football team vary form 168 to 273
Pounds. Indicate the limits in to which these weights might be grouped.
6. Measurements of a temperature, measured to the nearest tenth degree Celsius,
vary from 148.2o to 160.6o. Indicate the class limits of 7 classes to group the data.
7. The monthly earnings of a factory vary from 227.82 to 396.05 Birr. Indicate the
limits of seven classes with a width of Birr 25 in to which these values might be
grouped.
8. The class marks of a distribution of the daily number of burglaries reported to a
police are 4, 13, 22, 31 and 40. If the class width is uniform, find
a) the class width; b) the class boundaries; and c) the class limits.
9. Measurements of the lengths of fish to the nearest tenth of an inch are grouped in
to a table whose classes have the boundaries: 5.95, 7.95, 9.95, 11.95, 13.95,
and 15.95. Find the lower and upper limits of each class.
10. Given the following frequency distribution:

33
Class limits Frequency
0–1 16
2–3 25
4–5 13
6–7 4
8-9 2
Find a) the class marks; b) the class boundaries; c) the relative frequencies

34
UNIT SUMMARY
1. Definitions of statistics
a) Statistics is a collection of numerical facts of figures.
b) Statistics refers to summary measures obtained from a sample.
c) Statistics is a method of collecting, organizing, presenting, analyzing and
interpreting data; and these constitute the five statistical stages in any study.
2. The two broad classifications of statistics are:
a) Descriptive statistics-deals with summarizing and describing the data collected; and
b) Inferential statistics – based on descriptive statistics obtained from samples,
serves to make generalizations about the population.
3. There are four basic classifications of data:
a) according to their nature , data may be
- qualitative – described only in words (or data that can not be expressed in numbers)
- quantitative – those having numerical values, which may be discrete
(obtained by counting) or continuous (obtained by measuring ).
b) according to the scale of measurement, data may be classified as:
- nominal – data which are names or labels only ,obtained by coding
qualitative data, where codes do not have any arithmetic properties;
- ordinal – data that can be ordered or ranked , coded qualitative data, and
codes can indicate ordinary arithmetic inequality;
- interval – numerical data having meaningful inequality and difference; and
- ratio – numerical data satisfying all the arithmetic operations; that is,
inequalities, difference and quotients.
c) according to source data, data may be
- primary – collected for the first time by the investigator ; or
- secondary – already collected by some other agency for some other
purpose.

35
d) according to the role of time, data may be
* cross-sectional – taken at a point in time; or
* time – series –for a sequence of periods.
4. There are two methods of data collection :
a) Primary method: Collect primary data through
Direct personal interviews, Indirect oral interviews, Information form
correspondents, Mailed questionnaire, or Questionnaire filled by enumerators.
b) Secondary method: Collect secondary data through Library method.
5. Data can be represented in two ways:
a) By frequency distributions, may be categorical or numerical.
* Numerical F.D., which may be discrete or continuous.
* To construct a continuous frequency distribution, we need to fix
- the number of classes (k) using Sturges’ rule-of-thumb : k = 1+3.322 log n;
- the class width (w) : w = Range /k; and
- the class limits such that the smallest value falls in the first class and the
largest Value falls in the last class.
* Relative F.D’s show the class frequencies expressed as fraction of the total.
* Cumulative frequencies show the number of values less than or more than
a specified value.
b) By graphs or diagrams.
i. Graphs :- Common graphs include
 Histogram: interconnected bars where class boundaries, class marks or
class limits are erected on the X-axis and frequencies on the Y-axis.
 Frequency polygon: the class marks are plotted along the X-axis and
frequencies along the Y-axis.

36
 Ogives (or cumulative frequency curves): are of two types:
 the “ less than ogive” plots the less than cumulative
frequencies against class boundaries.
 the “ or more than ogive” plots the “or more than “ cumulative
frequencies against class boundaries.
ii. Diagrams
 Bar charts – equally spaced bars erected on the X-axis and the height show the
frequency.
Pie-chart – a circular presentation of categorical data.

37

You might also like