Business Statistics Introduction To Statistics Handout 2018

O RO MI A S T AT E UNIVE RS IT Y
SCHOOL OF BUSINESS AND ECONOMICS
Department of Economics and Development Finance
Business Statistics Handout

for BME Students
UNIT ONE
USES OF DATA ANALYSIS AND COURSE OVERVIEW
1.1. Definition of Statistics

Statistics has been defined differently by different professionals from time to time so much so
that scholarly articles have collected together hundreds of definitions. Basically it is very difficult to
give one exact and complete definition to statistics, but the following are general and common:
 Statistics is the method of getting information from data to help decision makers.
 Statistics is a science which deals with the collection, organization, analysis and
interpretation of data and drawing valid conclusion from the analysis.
 Statistics is the science of learning from data
In another way, we can also define the subject statistics in plural and singular noun.
 When we used as plural sense, statistics means numerical data and
 When used in singular sense it means statistical method embodying the theory and techniques
used for collecting, analyzing and drawing inferences from numerical data.
1.2. Branches of Statistics

Generally, the subject matter statistics can be divided as descriptive and inferential statistics. The
division is based on how a given set of data are used.
i. Descriptive Statistics
Obviously, data are collected for some purpose and the collected data do not provide unless
processed. Data need to be organized and summarized before they are used to support decision
and these are done by descriptive statistics.
Therefore, descriptive statistics is part of statistics concerned about arranging, summarizing and
presenting a set of data in such a way that the meaningful essentials of the data can be extracted
and grasped easily.
The data can be presented using tools like graphs, tables, averages, mode, medians etc.
Example: - Unemployment rate of a country

- Total production of a nation
ii. Inferential statistics
Statistics are interested in obtaining information about a total collection of elements that is called
population. But population is often too large to examine using statistics. Therefore, inferential
statistics is part of statistics which is concerned the drawing of conclusion from sample (taken
from population) to population.
OSU July, 2018 Page 1

Example: if the Ethiopian economic association reports that the domestic product of Ethiopia this
year is 120 million tons, this is descriptive statistics. But if the association predicts the domestic
product to be doubled after 10 years based on the present information, this is inferential
statistics.
1.3. Types/Classification of Statistical Data
Fundamental Terms
 Statistical Variable
A variable is a characteristic under study that assumes different values for different elements.
For example if you consider profits of „three‟ International Companies.
Company 2002 profits
Midrock 22 (Billions)
Sunshine 18 “
PSCO 10 “
Here, the 2002 profit is a variable. Other examples: income, height, weight, cars sales and so on.
 Data
The numerical values represented by any variable are called Data.
For example: 22, 18, 10 in the above table are data. The singular form is datum
 Quantitative and Qualitative Variables

A variable can be classified as Quantitative and Qualitative variables.
1) Variable that can be measured numerically are called Quantitative variable. The corresponding
data are called quantitative data. For example: Time, Income, Gross Sales, Price, Height,
weight, no of accidents on a road par day,
2) Variables that cannot be measured numerically but can be divided into different categories
are called qualitative or categorical variables. The corresponding data –qualitative data. For
example: The states of on under graduate college student is a qualitative variable since a student
can fall into any one of four categories:
1st 2nd 3rd 4th
Freshman Sophomore Junior Senior
Other examples: gender of a person, hair color, etc.
 Discrete and Continuous Variables

Quantitative characteristics can be classified as discrete and continuous variables.
1) Discrete Variable: A variable whose values are countable (integral values only).
Example:
 The number of cars sold in any day as the number of cars sold must be 0, 1, 2, ….It
cannot be between 0 and 1 or 1 and 2
 Number of people visiting a bank on any day
 Number of cars in a parking list
 Family size

2) Continuous Variable A variable that can assume any numerical value (Integral as well as
Fractional). That cannot be counted as 0, 1, 2, 3, and so on.
Example:
 The time taken to serve a customer by a bank teller is a continuous-variable. Because it
can assume any value between 2 and 50 seconds…. 3.48 minutes etc.
 Total assets of banks, weight, height, age, etc.
 Any variable that involves money is considered as a continuous-variable.
After understanding fundamental terms of the concept we can now discuss the types and sources of
data. In general, statistical data are classified based on where the data come. Based on this data are
classified as secondary and primary data.
i. Secondary Data
In statistical studies, one must first check availability of prior studies related to the topic of
interest and whether these are relevant for the present purpose. Data types which are already
collected and recorded by another body for another purpose are called secondary data.
 The common sources of secondary data are governmental publications, journals and reports,
publication of research organizations and different books.
 These types of data help in saving time and expenses of the study and unnecessary
duplication of efforts.
ii. Primary Data

When relevant data are not obtained in recorded form, original data should be collected by
conducting first hand investigation. Therefore, primary data are those which are collected by
conducting first hand investigation for the current purpose.
There are two ways that can be used in order to collect primary data.
a) Questionnaire and interview, and
b) Experiment and observation
a) Questionnaire and Interview

 Questionnaire
This is to collect data by asking people who are considered to have the desired first hand
information to fill written list of questions. A well-designed list of questions regarding the subject
under consideration that are to be filled or answered by selected respondents is called
questionnaire.
 The preparation of questionnaire required knowledgeable and skillful designing. It should
be prepared in such a way that respondents can fill it with ease and clarity.
 The questions in the questionnaire should be as few in number, clear and non-offending as
possible. This is because respondents can answer them without ambiguity, in a precise
manner, and also without losing interest so as not to leave some questions unanswered.
 The investigator must be careful not to also leading questions that can result in biased
replies. Usually questionnaires are distributed to respondents by mail.
 Interview

Interview is a one-to-one communication between the investigator (data collector) and the
respondent. If the numbers of respondents are not many and there are enough data collections,
personal interview is preferred to distributing questionnaires by mail since in the case of interview,
it is possible to clear out doubts and cross-check inconsistent answers right away.
 In some cases, where some or all the respondents are not literate (thus cannot read the
questionnaire), conducting interview is a must.
 Usually in conducting face-to-face interview, respondents feel obliged to reply and so most
questions will be answered.
 But if the respondents are many in number and particularly dispersed to far reaching areas, the
questionnaire method would be preferred.
b) Experiment and Observation
In this case, the investigator may not question anyone but conduct an experiment him/herself and
record results or observe a certain phenomenon and record the happenings for him/her. For
instance, if the interest is to have information on weights of a sample of 5 years old children, then
a selected number of children can be weighed end the weights recorded by the investigator.
 Obtaining primary data using any of these methods is more difficult, time taking and costly as
compared to using secondary data.
 In getting primary data, the investigator has to do everything i.e. planning the methods of
collection, selecting the sample (if necessary), designing questionnaires, planning the
experiment and then conducting actual experiment.
 Then he/she has to compile the collected raw data in a systematic manner.
In the case of using secondary data is simpler as compared to primary data. However primary data
are more reliable and suitable for the study at hand in most cases as they are original data, their
collection designed and conducted by the investigator to suit the present purpose.
 It must also be noted that in many studies, both primary and secondary data can be used
together, especially when the available secondary data are incomplete but accurate enough and
the rest can be supplementary by collecting primary data.
 For instance if the desire is to collect data on monthly income of employees in a company‟s
payroll record can supply income come from basic salary while income from other sources can
be obtained by interviewing the employees themselves.
1.4. Levels of measurement

Measurement levels refer to the property of value assigned to the data based on the properties of
order, distance and fixed zero. In mathematical terms measurement is a functional mapping
from the set of objects 0i to the set of real numbers M (0i).
The goal of measurement systems is to structure the rule for assigning numbers to objects in such a
way that the relationship between the objects is preserved in the numbers assigned to the objects.
Measurement is the assignment of numbers to objects or events in a systematic fashion. Four levels
of measurement scales are commonly distinguished: Nominal, Ordinal, Interval and Ratio.
1) Nominal Scale
Nominal scales are measurement systems that posses none of the three properties (order, distance
and fixed zero).

 Levels of measurement, which classifies data into mutually exclusive, all-inclusive categories in
which no order or ranking can be imposed on the data.
 No arithmetic and relationship operation can be applied
Example:
 Political preference ( Republican, Democrat, or Other)
 Sex ( Male, or Female)
 Marital status ( Married, Single, Widow, Divorce)
2) Ordinal Scale
Ordinal scales are measurement systems that possess the property of order, but not the property
of distance. The property of fixed zero is not important if property of distance is not satisfied.
 Level of measurement, which classifies data into categories that can be ranked. Differences
between the ranks do not exist.
 Arithmetic operations are applicable but relational operations are applicable.
 Ordering is the sole property of ordinal scale.
Example:
 Letter grades (A, B, C, D, F)
 Rating scales (Excellent, Very Good, Good, Fair, Poor)
3) Interval Scales
Interval scales are measurement systems that possess the properties of order and distance but not
the property of fixed zero.
 Level of measurement which classifies data that can be ranked and differences are meaningful.
However, there is no meaningful zero, so ratios are meaningless.
 All arithmetic operations except divisions are applicable
 Rational operations are also possible.
Example:
 IQ
 Temperature in 0f
4) Ratio Scales
Ratio scales are measurement systems that possess all three properties: order, distance, and fixed
zero. The added power of a fixed zero allows ratios of numbers to be meaningfully interpreted; i.e.
the ratio of Ananya‟s height to Eyosia‟s height is 1:32 where as this is not possible with interval
scales.
 Level of measurement which classifies data that can be ranked, differences are meaningful, and
there is true zero. True ratios exist between the different units of measure.
 All arithmetic and rational operations are applicable.
Example:
 Weight, Height, Number of Students, Age etc.
1.5. Uses and Misuses of Statistics

 Uses of statistics
Normally, knowingly or unknowingly we use statistics almost in the day to day activities of our
lives. When you want to compare yourselves with your classmates, for example, you use statistics.
We study statistics, however, mainly because we are involved in decision making.
Statistics aids our decision making because it:
 Provides the models that are needed to study situations involving uncertainties,
 Eases identification and determination of functional relationship among variables,
 Enable us to predict the condition of something happening
 Serves as a source of sufficient information for effective decision
 Presents facts in definite and precise form
 Studies the relationship between two or more variable
 Condenses & summarizes mass of data in to a few presentable, understandable & precise figures.
 Misuses of statistics
Some of the possible ways where statistics can be misused are:
 They can be used for the wrong purpose, that is, for purposes that are different from the
purpose of for which they were collected
 They can be collected incorrectly if there is bias.
 They can be analyzed carelessly so that the results obtained are misleading.
The improper use of statistical tools by unscrupulous people with an improper statistical bend of
mind has led to the public distrust in statistics. By this we mean that public loses its belief, faith
and confidence in the science of statistics and starts condemning it. Such irresponsible,
inexperienced and dishonest persons who use statistical data and statistical techniques to fulfill their
selfish motives have discredited the science of statistics with some very interesting comments:
♠ An ounce of truth will produce tons of statistics.
♠ Statistics can prove any thing.
♠ Figures do not lie. Liars figure.
♠ Statistics is unreliable science
♠ There are three types of lies-lies, damned lies and statistics wicked in the order at their
naming; and so on.
Some of the reasons for the above remarks may be enumerated as follows:
a. Figures are innocent and believable, and the facts based on them are psychologically more
convincing. But it is a pity that figures do not have the label of quality on their face.
b. Arguments are put forward to establish certain results which are not true by making use of
inaccurate figures or by using incomplete data, thus distorting the truth.
c. Though accurate, the figures might be molded and manipulated by dishonest persons to
conceal the truth and present in wrong and distorted picture of the facts to the public for
personal and selfish motives.
Hence, if statistics and its tools are misused, the fault does not lie with the science of statistics.
Rather, it is the people who misuse it, are to be blamed. Utmost care and precautions should be
taken for the interpretation of statistical data in all its manifestations. “Statistics should not be used
as a blind man uses a lamp post for support instead of illumination”.

1.6. Limitation of Statistics

Although statistics is indispensable to almost all sciences –social, physical, and natural, and is very
widely used in almost all spheres of human activity, it is not with out limitations which restrict its
scope and utility.
1) Statistics doesn’t study qualitative phenomenon directly

 Statistics are numerical statements in any department of enquiry placed in relation to each other.
Since statistics is a science dealing with a set of numerical data it can be applied to the study of
only those phenomena, which can be measured quantitatively.
 Thus the statement like „Population of Oromia has increased considerably during the last few
years‟, or „the standard of living of the people in Oromia has gone up compared with last year‟
do not constitute statistics.
 As such statistics can not be used directly for the study of quality characteristics like health,
beauty-honesty, welfare, poverty etc, which can not be measured quantitatively directly.
2) Statistics does not study individuals
 According to Prof. Harace Secrist, “By statistics we mean aggregate of facts affected to a
marked extent by multiplicity of factors --- and placed in relation to each other.”
 Thus a single or isolated figure cannot be regarded as statistics unless it is a part of the
aggregate of facts relating to any particular field of enquiry.
 Thus, statistical methods do not give any recognition to an object or a person or an event in
isolation.
 For instance, the price of a single commodity, the profit of a particular concern or the
production of a particular business house do not constitute statistics since these figures are
unrelated and incomparable.
 However, the aggregate of figures relating to prices and consumption of various commodities
the sales and profits of a business, the income, expenditure, production, etc.., over different
periods of time, places etc, will be statistics.
 Thus, from statistical point of view the figure of the population of a particular country in some
given year is useless unless we are also given the figures of the population country for different
years or of different countries for the same year for comparative studies.
 Hence, statistics is confined only to those problems where group characteristics are to be
studied.
3) Statistical laws are not exact

 Since the statistical laws are probabilistic in nature, inferences based on them are only
approximate and not exact like the inferences based on mathematical or scientific (Physical
and Natural Sciences) laws.
 Statistical laws are true only on the average. If the probability of getting a head in a single
throw of a coin is ½, it doesn‟t imply that if we toss a coin 10 times, we shall get 5 heads and 5
tails. In 10 throws of a coin, we may get 8 heads and 2 tails, or 6 heads and 4 tails or no heads or
all the 10 heads or we may not get even a single head.
 By this we mean that if the experiment of throwing the coin is carried on indefinitely (very large
number of times), then we should expect on the average 50% heads and 50% tails.
4) Statistics is liable to be misused.

 The most significant limitation of statistics is that it must be used by experts. According to
Bowley, “statistics only furnishes a tool though imperfect which is dangerous in the hands of
those who do not know its use and deficiencies”.
 Greatest limitation of statistics is that it deals with figures which are innocent in them selves
and do not bear on their face the label of their quality and can be easily distorted, manipulated
or molded by politicians.
 Statistics neither prove nor disprove any thing. It is merely a tool which, if rightly used may
prove extremely useful but if misused by inexperienced, unskilled and dishonest statisticians
might lead to very fallacious conclusions and even prove to be disastrous.
 In the words of W.I. King,” statistics are like clay of which you can make a God or a Devil as
you please.” At another place he remarks, a science of statistics is the useful servant but only of
great value to those who understand its proper use.
Thus the use of statistics by the experts who are well experienced and skilled in the analysis and
interpretation of statistical data for drawing correct and valid inferences very much reduces the
chance of mass popularity of this important science.

UNIT TWO
METHODS OF DATA COLLECTION AND SAMPLING TECHNIQUE
2.1. Methods of Data Collection
 Before we study the methods of data collection, it is important to define two important terms in
statistics – Population and Sample.
 In statistical language, population is the total elements or items under investigation where as Sample
is a part or subset of this population under investigation.
 For instance, if a researcher is interested to study the performance male and female students in
PSCO, all students of the college constitute the population. Among the students if you select some
number of female and male students, this collection which is subset of the population is sample.
 In statistics, the sample taken from the population must approximately represent the characteristics
of the population.
In general, we have two methods of data collection: Sample survey and Census survey
1) Census survey
A survey that includes every member of the population is called a Census. In the process of data
collection, data are gathered from all elements that we are interested to study.
2) Sample survey
The method of collecting data from a portion of the population is called a sample survey. The purpose of
conducting a sample survey is to make decisions about the corresponding population.
It is important that the results obtained from a sample survey closely match the results that we would
obtain by conducting a census. Other wise, decisions derived from a sample survey will not apply to the
corresponding population. That is, such a sample is not representative sample.
 Advantages and Disadvantages of Census

Advantages
 Highest accuracy is obtained as all members (elements) are covered by the enquiry
 All the characteristics of the universe are maintained in original.
 This method is free from sampling errors.
 When the Population is a small one, complete and accurate results can be obtained
Demerits (disadvantages)
 It requires a great deal of enumerators, time, & money. It is practically beyond the reach of researchers
 The census method is useless in case results are urgently required.
 An element of bias will get larger and large as the number of observations increase.
 In practice, some times it is not possible to examine every item in the population. For example in
destructive testing explosives and in medical testing (drug effectiveness)

 Merits and Demerits of Sample Method

Advantages
 Sampling is best when results are required urgently.
 Some times census method is impossible to be employed. For example in testing explosives, sample
can be tested to find out the strength of explosive. Another case is to test the effectiveness of drug, we
can only take sample of people to find out the effectiveness of the drug
 Sample method requires less time, money, and manpower as compared to census method.
 In sampling, we are in a position to get much more accurate information than by census method:
 Detailed information can be obtained from a small group of respondents.
 Qualified persons or investigators can be appointed and intensive training given
 Relatively limited data can be handled much more easily.
 Follow-up is easy in case of poor-response.
 None-sampling errors will be minimized as the data collected & processed is relatively small.
Disadvantages
 The results obtained may be false, inaccurate and misleading as the sample might not have been drawn
properly
 Chances of sampling errors- errors occurred in taking samples from a population- are great.
 When the population is small, sampling is not useful
2.2. Sampling and Sampling Techniques

Principles of Sampling
The possibility of reaching valid conclusions concerning a population, on the basis of a sample, are based
on two principles. These principles of sampling can be called as the laws of sampling. These are:
 The law of statistical regularity
 The law of inertia of large numbers
They are not laws in the strict sense of the term rather; they are only tendencies which operate universally.
 The Law of Statistical Regularity
 This law may be stated as follows: “On an average the sample chosen at random from the universe
will have the same composition and characteristics as the universe (population.)”.
 For example, if one intends to make a study of average weight of students of a college, it is not
necessary to take weight of all students. A few students may be selected at random from all the
classes, their weights taken and average weight of the college students in general may be inferred.
 But before the results of the sample can be applied to population; two conditions must be met:
 Firstly, the sample should be random, that is every item of in the population has an equal
chance of being included in the sample.
 Secondly, the sample it should be sufficiently representative
In statistics, there is a basic principle that the larger the number of items, the more reliable is the
results obtained there from. Because it is possible then to avoid the influence of abnormal items on the
average. The larger the size of the sample the more reliable is the result because the sampling error is
inversely proportional to the square root of the number of item in the sample. i.e.

1
Es 
n
Where Es is the sampling error and n is size of the sample, i.e, number of items included in the sample.
 Once it is ensured that the sample selected is representative of the population, it is possible for one to
depict fairly and accurately the characteristics and composition of the population by studying only a
sample of it. Thus, this law is of much importance as it saves times, energy & money by studying
only a part of the population & then applying the results obtained by the sample to the population.
 Inertia of Large Numbers
 This law is an extension to the law of statistical regularity. It only states that: “other things being
equal, greater the size of the sample, more accurate the results are likely to be”. This is because large
groups of data have a higher degree of stability than that possessed by small ones.
 For example, if a coin is tossed 50 times, head may appear 30 times & the tail 20 times. But if the
coin is tossed 1,000 times, we may get 500 heads & 500 tails. This is so because when the number is
very large, then some item move in one direction, and others move in the opposite direction, thus
canceling out each other.
 Sampling Techniques
Sampling techniques are the different techniques of collecting data (information) from a portion of a
population. The major sampling techniques may be grouped in to:
Sampling
Techniques
Probability Non-
Sampling Probability
Sampling
Simple Systematic Stratified Cluster Convenienc Purposive Quota -

random random random random e Sampling Sampling
sampling sampling sampling sampling Sampling
I. Probability Sampling
All probability samples are based on chance selection procedure i.e, every element of the population has a
known non-zero probability of selection. This eliminates the bias inherent in the non-probability sampling
procedures because probability sampling process is random. Random refers to the procedure for selecting
the sample. Randomness refers to a procedure the outcome of which cannot be predicted because it is
dependent on chance. The selection of the sample based on the theory of probability is also known as
random selection and some times probability sampling is also known as random sampling.

1. Simple Random Sampling
It is a probability sampling in which each element in the population has an equal chance of being included
in the sample. The sampling process is simple because it requires only one stage of sample selection.
Example: Drawing names from a hat or selecting the winning raffle ticket from a large drum.
Merits:
 It is minimal advance knowledge of population is needed and easy to analyze data & compute error.
Demerits:
 It does not use knowledge of population that researches may have. Large errors for same sample size
than stratified sampling.
 In simple random sampling, respondents may be widely dispersed, hence higher cost.
2. Systematic Sampling
A sampling procedure in which an initial starting point is selected by a random process and then every n th
number on the list is selected.
Let us suppose that N units in the population are arranged in some systematic order and serially
numbered from 1 to N and we want to draw a sample of size n from it such that:
 N= nk = k = N/n, where k is usually called the Sample Interval
Systematic sampling consists in selecting any unit at random from the first k units numbered from 1 to k
and then selecting every kth unit in succession subsequently. Thus, if the first unit selected at random is i th
unit, then the systematic sample of size n will consist of the units numbered.
 i + k, i+2k, …, I + (n-1)k.
The random number ‘i’ is called the random start and its value, in fact, determines the whole sample.
As an example, let us suppose that we want to select 50 voters from a list of voters containing 1,000
names arranged systematically. Here
 n=50; N=1,000; K= N/n = 1,000/50= 20
We select any number from 1 to 20 at random and the corresponding voter in the list is selected. Suppose
the selected number is 6. Then the systematic sample will consist of 50 voters in the list at serial umbers:
6, 24, 46, 66… 966, 986.
Merits
 Simple to draw sample and easy to check. It has moderate cost.
Demerits
 If sampling interval is related to a periodic ordering of the population, it may introduce increased
variability.
2. Stratified Sampling
A probability sampling procedure in which sub-samples are drawn from samples with in different strata
that are more or less equal on some characteristic.
The first step of choosing strata on the basis of existing information is the same for both stratified and
quota sampling. However, the processes of selecting sampling units (elements) with in the stratum differ
substantially. In stratified sampling, a sub sample is drawn using simple random sample with in each
stratum. This is not true with quota sampling.
 The reason for taking a stratified sample is to have a more efficient sample than could be taken on the
basis of simple random sampling.
 Another reason for taking a stratified sample is the assurance that the sample will accurately reflect the
population on the basis of the criterion or criteria used for stratification.
Merits
 It assures representation of all groups in a sample.
 Characteristics of each stratum can be estimated and comparisons made.
 Further it reduces variability for same sample size.
Demerits
 It requires accurate information on proportion in each stratum.
 If stratified lists are not already available they can be costly to prepare.
3. Cluster Sampling
An economically efficient sampling technique and in which the primary sampling unit is not the
individual element in the population but a larger cluster of elements are selected randomly.
 The area sample is the most popular type of cluster sample.
 A grocery researcher for example may randomly choose several geographic areas as the primary
sampling units and then interview all, or a sample, of grocery stores with in the geographic
clusters. Interviews are confined to these clusters; no interviews occur in other clusters.
 Cluster samples are frequently utilized when no lists of the sample population are available.
Merits
 If clusters are geographically defined, yields lowest field cost.
 It requires listing of all clusters but of individuals only with in clusters.
 It can estimate characteristics of clusters as well as of population.
Demerits
 It introduces larger error for comparable size than other probability samples.
 Researcher must be able to assign population members to unique cluster, or duplication or
omission of individual results.
II. Non- Probability Sampling

 In non- probability sampling, the probability of any particular member of the population being
chosen is unknown.
 The selection for elements in non-probability sampling is quite arbitrary, as researchers rely heavily on
personal judgment. Some of the non-probability sampling are:
1. Convenience Sampling
 Convenience sampling (also called haphazard or accidental sampling) refers to the procedure of
obtaining sample elements or people who are most conveniently available.
 For example, it is convenient and economical to sample employees in a nearby area. Researchers use
convenience samples to obtain a large number of completed questionnaires quickly and economically.

Merits
 Very low cost, extensively used.
 No need for list of population, i.e., it is independent of the size of the population.
Demerits:
 Variability and bias of estimates cannot be measured or controlled
2. Judgment or Purposive Sampling

 It is non-probability sampling technique in which an experienced individual (expert) selects the sample
based upon his or her judgment about some appropriate characteristic required of the sample members.
 It is often used in attempts to for cost election results.
Merits
 It is useful for certain types of forecasting like sample guaranteed to meet a specific objective.
 More over, it has moderate cost and average use.
Demerits
 It introduces bias due to experts‟ beliefs and it may make sample unrepresentative.
 This is because elements in the population don‟t have some chance to be included in the sample
3. Quota Sampling.
 It is non-probability sampling in which the researcher classifies population by pertinent properties,
determines desired proportion of sample from each class & quotas for each interviewer.
 Suppose a firm wishes to investigate consumers who currently own videotape recorders. The
researcher wish to ensure that each brand of recorder is proportionately included in the sample.
 The purpose of quota sampling is to ensure that the various subgroups in a population are represented
on pertinent sample characteristics to the exact extent that the investigators desire.
 Stratified sampling, probability sampling procedure, also has this objective, and it should not be
confused with quota sampling. In quota sampling, the interviewer has a quota to achieve.
Merits
 It introduces some stratification of population and requires no list of population.
 It has moderate cost and it is used very extensively.
 One can finish data collection in a very short period of time.
Demerits
 It introduces bias in researcher‟s classification of subjects.
 Further non-random selection with in classes means error from population can not be estimated.
 Sampling and Non-Sampling Errors

The following two terms are important to understand the concepts of sampling and non-sampling errors.
Statistic (Sample statistics)

All measured characteristics (numerical values) associated with a sample are called statistics. We use the
term sample statistics to designate variables in the sample or measures computed from the sample data.
For example, sample mean, sample standard deviation, sample proportion and so on
Parameter (population parameters)

The term population parameters are used to designate the variables or measured characteristics of the
population. Sample statistics are used to make inferences about population parameters. For example,
population mean, population standard deviation, and population proportion and so on
Sampling Error
It is the difference between the value of a sample statistic obtained from a sample and the value of the
corresponding population parameter obtained from the population. It is important to remember that a
sampling error occurs because of chance.
 In case of the mean; Sampling error = X  
Non- Sampling Error

The errors that occur for other reasons, such as errors made during collection, recording, and tabulation of
data, are called non-sampling errors. Such errors occur because of human mistakes and not chance.
Example: Consider the population of five employees‟ salaries: 1000, 2000, 3500, 3500 and 4000 birr.
Now suppose we take a random sample of three salaries from this population: 2000, 3500 and 3500. What
is the sampling error (due to mean)?
1000  2000  3500  3500  400
Solution: The population mean is   = 2800 and
5
The sample mean is X 
2000  3500  3500 = 3000
3
The sampling error = X   = 3000– 2800 = 200
 The mean salary estimated from the sample is 200 birr higher than the mean salary of the population.
 This difference occurred due to chance, that is, because we used a sample instead of the population.
Now suppose, when we select the above mentioned sample, we mistakenly record the second salary as
2900 instead of 2000. As a result, we calculate the sample mean as: X 
2900  3500  3500 = 3300
3
Consequently, the difference between sample mean & population mean is: X   = 3300 – 2800 = 500
 This difference does not represent the sampling error. As we calculated earlier, only 200 of this
difference is due to sampling error. The remaining portion: 500-200= 300 birr represents non-
sampling error because it occurred due to the error we made in recording second salary in the sample.

2.3. Methods of Data Presentation

Classification of Data
The data collected in any statistical investigation, known as raw data, are so voluminous and huge that
they are unwieldy and incomprehensible. So, having collected and edited the data, the next important step
is to organize it. i.e. to present it in a readily comprehensible condensed form which will high light the
important characteristics of the data, facilitate comparisons and render it suitable for further processing
(statistical analysis) and interpretations.
A statistical table is an orderly and logical arrangement of data into rows and columns and it attempts to
present the voluminous and heterogeneous data in a condensed and homogeneous form. But before
tabulating the data, generally, systematic arrangement of the raw data into different homogeneous classes
is necessary to sort out the relevant and significant features from the irrelevant and significant ones.
This process of arranging the data into groups or classes according to resemblances and similarities is
technically called classification. Thus, classification impressed upon the „arrangement of the data into
different classes which are to be determined depending upon the nature, objectives & slope of the enquiry.
For instance, the number of students registered at Public Service College of Oromiya during academic
year 2005 E.C may be classified on the basis of any of the following criterion.
i. Different faculties: ii. Sex
- Agribusiness iii. Age
- Human Resource iv. The Zone to which they belong
- Accounting v. Religion
- Law vi. Heights or weights
Thus the same set of data can be classified into different groups or classes in the number of ways based on
any recognizable physical, social or mental characteristic which exhibits variation among the different
elements of the given data.

Functions of Classification
The functions of classification are summarized as follows:
 It condenses the data
 It facilitates comparisons
 It helps to study the relationships
 It facilitates the statistical treatment of the data
Rules for Classification

 No hard and fast rules can be laid down for data classification.
 However, consistent with the nature and objectives of the enquiry, the following general guiding
principles may be observed for good classification:
(i) It should be un-ambiguous:

 The classes should be rigidly defined so that they should not lead to any ambiguity.
 For example, if we classify a group of individuals as „employed‟ and „un-employed‟: it is imperative
to define in clear cut terms as to what we mean by an employed person and un-employed person.
(ii) It should be exhaustive and mutually exclusive:

 The classification must be exhaustive in the sense that each and every item in the data must belong to
one of the classes.
 Further, the various classes should be mutually disjoint or non-overlapping so that an observed value
belongs to one and only one of the classes.
(iii) It should be stable:

 In order to have meaningful comparisons of the results, an ideal classification must be stable i.e. the
same pattern of classification should be adopted throughout the analysis and also for further enquiries
on the same subject.
(iv) It should be suitable for the purpose:
 The classification must be in keeping with the objectives of the enquiry.
 For instance, if we want to study the relationship between the university education and sex, it will be
futile to classify the students with respect to age and religion.
Bases of Classification
The bases or the criteria with respect to which the data are classified primarily depend on the objectives
and the purpose of the inquiry. Generally, the data can be classified on the following four bases:
 Geographical classification  Qualitative classification
 Chronological classification  Quantitative classification

i. Geographical Classification
 As the name suggests, in this classification the basis of classification is the geographical or location
differences between the various items in the data like; States, Cities, Regions, Zones, Areas etc.
 For example, the yield of agricultural output per hectare for different countries in some given period or
the density of the population (per square km.) in different countries of the world etc.
ii. Chronological Classification

 Chronological classification is one in which the data are classified on the basis of differences in time.
 For instance, the production of an industrial concern for different periods; the profits of a big business
house over different years; the population of any country for different years.
 The time series data, which are quite frequent in Economic and Business Statistics, are generally
classified chronologically, usually starting with the first period of occurrence.
iii. Qualitative Classification

 In qualitative classification the data are classified according to the presence or absence of the attributes
in the given units.
 If the data are classified into only two classes with respect to an attribute like its presence or absence,
the classification is termed as simple or dichotomous. Examples: classifying a give population as
honest or dishonest; male or female; employed or un-employed; beautiful or not beautiful etc.
 However, if the given population is classified into more than two classes with respect to a given
attribute, it is said to be manifold classification. For example, for the attribute intelligence the
various classes may be, say, genius, very intelligent, average intelligent, bellow average and dull
 Moreover, if the given population is divided into classes on the basis of simultaneous study of more
than one attribute at a time, the classification is again termed as manifold classification.
iv. Quantitative Classification

 classification of data on the basis of phenomenon which is capable of quantitative measurement like
age, height, weight, prices, production, income, expenditure, sales, profits, etc.,
 The quantitative phenomenon under study is known as variable and hence this classification is also
sometimes called classification by variables.
 For example, the earnings of different stores may be classified as under:
Daily Earnings (Birr) of 50 Departmental Stores
Daily Earnings Number of Stores
Up to 100 6
101-200 14
201-300 8
301-400 10
401-500 12

 In the above classification, the daily earnings of the stores are termed as variable and the number of
stores in each class as the frequency. The above classification is termed as grouped frequency
distribution.
Frequency Distribution
Definitions:
 Raw data: Is recorded information in its original collected form, whether it is counts or
measurements.
 Frequency: is the number of values in a specific class of the distribution.
 Frequency distribution: is a summarized presentation of the values of a variable arranged in order of
magnitude either individually (in case of discrete variable) or in to classes (in case of continuous
variable) or into categories (in case of qualitative data).
There are three basic types of frequency distributions:
1) Categorical frequency distribution
 This is used for data that can be placed in specific categories such as nominal or ordinal data.
 Example: marital status of 60 adults classified as single, married, divorced and widowed is given as:
Marital Status Single Married Divorced Widowed Total
Number of adults 25 20 8 7 60
2) Ungrouped Frequency distribution

 It is a table of all the potential raw score values that could possible occur in the data along with the
number of times each actually occurred.
 It is often constructed for small set or data on discrete variable.
Constructing ungrouped frequency distribution

 First, find the smallest and largest raw score in the collected data
 Arrange the data in order of magnitude and count the frequency.
 To facilitate counting one may include a column of tallies.
Example: the following data represents the marks of 20 students

 80, 70, 65, 76, 76, 60, 60, 70, 90, 62, 63, 70, 85, 70, 74, 80, 80, 85, 75, 85
Marks Tally Frequency Marks Tally Frequency
60 // 2 75 // 2
62 / 1 76 / 1
63 / 1 80 /// 3
65 / 1 85 /// 3
70 //// 4 90 / 1
74 / 1

3) Grouped frequency distribution
 This is a frequency distribution when several numbers are grouped in one class.
 We use grouped frequency distribution when we are not interested about the value of individual
variable rather for value of groups.
Basic terminologies
 Class- each group of data set
 Class limits- separate one class in a grouped frequency distribution from another. The limits could
actually appear in the data & have gaps b/n the upper limits of one class and lower limit of the next.
 Units of measurement (d) - the gap b/n upper limit of one class and lower limit of the next class.
 Class boundaries- separate one class in a grouped frequency distribution from another. The
boundaries have one more decimal places than the row data and therefore do not appear in the data.
There is no gap between the upper boundary of one class and lower boundary of the next class.
The lower class boundary is obtained by subtracting d/2 from the corresponding lower class limit
and the upper class boundary is obtained by adding d/2 to the corresponding upper class limit.
 Class width (W) – the difference between upper and lower class limits or boundaries of any two
consecutive classes or the difference between two consecutive class marks.
 Class mark or mid points – it is the average of the upper and lower class limits of any class.
Example:
Consider the following distribution of marks of 200 students in an examination, arranged serially in
order of their roll numbers.
Table 1: Marks of 200 Students

70 45 33 64 50 25 65 75 30 20
55 60 65 58 52 36 45 42 35 40
51 47 39 61 53 59 49 41 15 53
42 63 78 65 45 63 54 52 48 46
57 53 55 42 45 39 64 35 26 18
41 53 48 21 28 49 42 36 41 29
30 33 37 35 29 37 38 40 32 49
43 32 24 38 38 22 41 50 17 46
46 50 26 15 23 42 25 52 38 46
41 38 40 37 40 48 45 30 28 31
46 40 32 34 44 54 35 39 31 48
48 50 43 55 43 39 41 48 53 34
32 31 42 34 34 32 33 24 43 39
40 50 27 47 34 44 34 33 47 42
17 42 57 35 38 17 33 46 36 23
42 21 51 37 42 37 38 42 49 52
38 53 57 47 59 61 33 17 71 39
44 42 39 16 17 27 19 54 51 39
43 42 16 37 67 62 39 51 53 41
53 59 37 27 29 33 34 42 22 31

 The data in the above form is called the raw or disorganized data. In the raw form the data are
so unwieldy and scattered that, the various details contained in them remain unflawed and
incomprehensible.
 The above presentation of the data in its raw form does not give us any useful information and is
rather confusing to the mind.
 Our objective will be to express the huge mass of data in a suitable condensed form which will
highlight the significant facts and compares and furnish more useful information
Step -1
A better presentation of the above raw data would be to arrange them in an ascending or descending
order of magnitude which is called the „arraying’ of the data. However, this presentation (arraying),
though better than the raw data does not reduce the volume of the data.
Step-2
A much better way of the representation of the data is to express it in the form of a discrete or
ungrouped frequency distribution where we count the number of times each value of the variable
(marks in the above illustration) occurs in the above data. This is facilitated through the technique of
Tally-Marks or Tally-Bars as explained below.
Table 2: Marks of 200 Students
Marks Tally Bars Frequency Marks Tally Bars Frequency
15 || 2 42 ||||| ||||| |||| 14
16 || 2 43 ||||| 5
17 ||||| 5 44 ||| 3
18 | 1 45 ||||| 5
19 | 1 46 ||||| | 6
20 | 1 47 |||| 4
21 || 2 48 ||||| | 6
22 || 2 49 |||| 4
23 || 2 50 ||||| 5
24 || 2 51 |||| 4
25 || 2 52 |||| 4
26 || 2 53 ||||| ||| 8
27 ||| 3 54 ||| 3
28 || 2 55 ||| 3
29 ||| 3 57 ||| 3
30 ||| 3 58 | 1
31 |||| 4 59 ||| 3
32 ||||| 5 60 | 1
33 ||||| || 7 61 || 2
34 ||||| || 7 62 | 1
35 ||||| 5 63 || 2
36 ||| 3 64 || 2
37 ||||| || 7 65 ||| 3
38 ||||| ||| 8 67 | 1
39 ||||| |||| 9 70 || 2
40 ||||| | 6 75 | 1
41 ||||| || 7 78 | 1
Step -3: Arranging the data into groups
If the identity of the units about whom a particular information is collected is not relevant nor is the
order in which the observations occur, then the first real step of condensation consists in classifying
the data into different classes by dividing the entire range of the values of the variable into a suitable
number of groups called classes and then recording the number of observation in each group (class).
In order to construct group or class for the data follow the following steps:
 Find the largest and smallest values: in our case 78 and 15
 Compute the range (the difference between the two values) : in our case 78 – 15 = 63
 Determine the number of class or groups, usually between 5 and 20 but in general use the
‘Sturges rule’ to determine i.e. k= 1+3.322 log 10 N where K= no. of classes and N= the total
number of observation. In our case it is calculated as 13.
 Find the class width (W): it is calculated by dividing the range by the number of classes and
R
rounding up not off. In our case W= = 63/13 = 5
K
 Take the minimum value as the lower class limit of the first class and then the width to find the
rest of the lower limits. To find the upper limit of the first class count 5 values in the first class
and take the last one as upper limit and then add the width to find the rest of the upper limits.
 Find frequencies for each class
Marks of 200 Students
Marks Frequency Marks Frequency
(X) (f) (X) (f)
15-19 11 50-54 24
20-24 9 55-59 10
25-29 12 60-64 8
30-34 26 65-69 4
35-39 32 70-74 2
40-44 35 75-79 2
45-49 25
Types of Class Interval

(a) Inclusive Type Classes
 The classes of the type 30-39, 40-49, 50-59, 60-69, etc., in which both the upper and lower limits
are included, are called “inclusive classes”.
 For instance the class interval 40-49 includes all the values from 40 to 49, both inclusive. The next
value, 50 is included in the next class 50-59 and so on. However, the fractional values between 49
and 50 cannot be accounted for in such a classification.
 Hence „Inclusive Type‟ of classification may be used for a grouped frequency distribution of
discrete variable where the variable takes only integral values but not for continuous variables.

(b) Exclusive Type Classes
 The classes of the type; 15-20 i.e., 5  X<20; 20-25 i.e., 20  X<25; 25-30 i.e., 25  X<30 etc.
 Such classes in which upper limits are excluded from the respective classes and are included in the
immediate next class are termed as „exclusive classes‟.
Open End Classes

 The classification is termed as „open end classification‟ if the lower limit of the first class or the
upper limit of the last class or both are not specified and such classes in which one of the limits is
missing are called „open and classes‟.
 For example, the classes like the marks less than 20; age above 60 years, salary not exceeding Birr
100 or salaries over Birr 200, etc., are „open end classes‟ since one of the classes limits (lower or
upper) is not specified in them.
 As far as possible, open end classes should be avoided since in such classes the mid-value or class
mark cannot be accurately obtained and this poses problems in the computation of various statistical
measures for further processing of the data.
 Moreover, open end classes present problems in graphic presentation of the data also.
Remark
 In case of open end classes, it is customary to estimate the class mark or mid-value for the first class
with reference to the succeeding class (i.e.2nd class). In other words, we assume that the magnitude
of the first class is same as that of second class.
 Similarly the mid-value of the last class is determined with reference to the preceding class i.e., last
but one class. This assumption will, of course, introduce some error in the calculation of further
statistical measures (averages, dispersion, etc.).
Cumulative Frequency Distribution
 A frequency distribution simply tells us how frequently a particular value of the variable is
occurring. However, if we want to know the total number of events getting a value „less than‟ or
„more than‟ a particular value of the variable, this frequency table fails to furnish the information.
 This information can be obtained very conveniently from the „cumulative frequency distribution‟
which is obtained on successively adding the frequencies of the values of the variable (classes)
according to a certain law.
 The laws used are of „less than‟ and „more than‟ type giving rise ‟less than cumulative frequency
distribution‟ and „more than cumulative frequency distribution‟.
Let us consider the following distribution of marks of 70 students in a test:
Marks No of Students
30 – 35 5
35 – 40 10
40 – 45 15
45 – 50 30
50 – 55 5
55 - 60 5
Total 70

a) Less Than Cumulative Frequency
Less than cumulative frequency for any value of the variable (class) is obtained on adding successively
the frequencies of all the previous values (or classes), including the frequency of variable (class) against
which the totals are written, provided the values (classes) are arranged in ascending order of magnitude.
‘Less Than’ Cumulative Frequency Distribution of Marks of 70 Students
Marks Frequency ‘Less than’ C.F
(f) (L.C.F.)
30 – 35 5 5
35 – 40 10 5+ 10 = 15
40 – 45 15 15+15 = 30
45 – 50 30 30+30 = 60
50 – 55 5 60+ 5 = 65
55 - 60 5 65+ 5 = 70
The above „less than‟ cumulative frequency distribution can also be written as follows:
Marks Frequency
Less than 30 0
“ “ 35 5
“ “ 40 15
“ “ 45 30
“ “ 50 60
“ “ 55 65
“ “ 60 70
Total 70
b) More Than Cumulative Frequency
The „more than cumulative frequency‟ is obtained similarly by finding the cumulative totals of
frequencies starting from the highest value of the variable (class) to the lowest value (class).
‘More Than’ Cumulative Frequency Distribution of Marks of 70 Students
Marks Frequency ‘More than’ c.f
(f) (M.C.F.)
30 – 35 5 65+ 5 = 70
35 – 40 10 55+ 10= 65
40 – 45 15 40+15 = 55
45 – 50 30 10+30 = 40
50 – 55 5 5+ 5 = 10
55 - 60 5 5
The above „more than‟ c.f. distribution can also be expressed in the following form:
Marks No of student
More than 30 70
“ “ 35 65
“ “ 40 55
“ “ 45 40
“ “ 50 10
“ “ 55 5
“ “ 60 0
Total 70

Remarks:
 Cumulative frequency distribution is of particular importance in the computation of median,
quartiles and other partition values of a given frequency distribution.
 In „less than‟ cumulative frequency distribution, the c.f. refers to the lower limit of the
corresponding class.
2.3. Methods of Data Presentation

Graphic presentation
 A graphic presentation includes a variety of graphs like bar graphs, line graphs, polygon graphs that
are useful for describing data sets having distinct values.
1) Bar Graphs
 A bar graph is a graphical presentation which plots the successive values with their frequencies
using bars. All boxes in the bar graph have equal width
Example: Consider the frequency table of the final results of 30 students

Value Frequency Value Frequency
40 3 65 2
45 2 70 2
50 4 75 6
60 3
2) Frequency Polygon
 A frequency polygon is a graphical presentation which plots the successive values of a data set with
their frequencies and connects the plotted points with a straight line.
 Example: A frequency polygon for the above data could be represented by frequency polygon as:

 A polygon is a closed sided figure. In figure 2 , to make it a closed figure we have to add two values
one at the lower limit and one at the upper limits with zero frequencies, for example in the above
graph, we add lower value 35 and upper value 80 with zero frequencies.
3) Pie chart
 It is used to plot relative frequencies in which a circle is sliced up into distinct sectors when the data
are non-numerical. The area of each sector represents the relative frequency of the value of the item.
f f
 If the relative frequency of the data value is , then the area of the sector is the fraction of the
n n
angle of the circle; i.e the area of a sector is
 f  (3600). The angle at the center of the circle is 3600
n 
Example:
Items Expenditure
Food 160 birr
Cloths 80 birr
House rent 120 birr
Education 40 birr
Total 400 birr
Solution:
 To express this data in a pie chart, 1st determine the proportion of each sector in the total area.
Items Expenditure Proportion of each sector in degree
Food 160 birr 160
x 3600 = 1440
400
Cloths 80 birr 80
x 3600 = 720
400
House rent 120 birr 120
x 3600 = 1080
400
Education 40 birr 40
x 3600 = 360
400
Total 400 birr 400
x 3600 = 3600
400
 Using this table, it could be possible to represent the data in a pie chart as follows:
4) Histograms
 Histogram is a bar graph with the bars placed adjacent to each other. The vertical axis of a
histogram can represent either the class frequency in a frequency histogram or relative class
frequency in a relative frequency histogram.
Example: The following table shows the distribution of the life time of 485 radio tubes.
Life time No of tubes with Life time No of tubes with
(in hours Life time (in hours) Life time
300 – 400 60 700 – 800 60
400 – 500 40 800 – 900 80
500 – 600 80 900 - 1000 45
600 – 700 120
 Represent the above frequency distribution by Histogram

UNIT THREE
DESCRIPTIVE STATISTICS AND STATISTICAL SUMMARIZATION
3.1 Measures of Central Tendency

 They are those measures which enable us to represent a large set of raw data using a single value so
that meaningful essential can be extracted from it.
 Numerical descriptive measures provide precise, objectively determined values that are easy to
manipulate, interpret and compare with one another.
3.1.1 The Mean
The Arithmetic Mean and its Properties
Sample Mean ( X )
 Suppose we have a sample of n data whose values are given by x1, x2, ………,xn, the sample mean
which we designate it by X is defined by:
n
x
i 1 x1  x2  .........  xn
i
x= =
n n
Example: What is the average monthly income of 10 students in a class given below.
300 1200 800 500 750 2000 1500 1800 350 600
300  1200  800  500  750  2000  1500  1800  350  600 9800
Solution: x = = = 980
10 10
Properties of a sample mean

Suppose we have a sample of n data whose values are given by x1, x2, …,xn, and the sample mean is x
 If each data value is increased or decreased by a constant amount c, then this causes the sample
mean to be increased or decreased by c.
 Mathematically, we can express this as: Let Yi = Xi + C (i = 1,2,……….. n) then Y = X + C
 Similarly, Let Y = Yi – C (for i = 1,2, ……. n) then Y = X – C ; where Y is the sample mean of the
new data set after increasing or decreasing each data value by a constant c.
 If each data value is multiplied by a constant amount c, then the new sample mean ( X n ) will be
given by: X n = c x
 Similarly, if each data value is divided by a constant c (c ≠ 0), then the new sample mean ( X n ) will
1
be given by: X n = x
c
When the data are arranged in a frequency table, the sample mean can be expressed as the sum of the
products of the data value and their frequencies and divided by the size of the data set (i.e. sum of
frequencies). Suppose we have a sample of n data whose values are given by x 1, x2, …,xn with
frequencies f1,f2,…,fn, the sample mean is given by:

n
fx i i
f1 x1  f 2 x2  ....  f n xn
X = i 1
= , Where: n = f
f f1  f 2  ....  f n
i
i
Example:
Suppose the following data represent the number of patients served in a clinic per day
Number of patients Number of days
1 1
5 3
10 5
12 2
15 4
 Find the average number of patients served in the given clinic per day.
Solution: The sample mean for data arranged in a frequency table is computed as
n
fx
i 1
i i
 X = Where: fi – represents the number of days and xi – number of patients
f i
1x1  5 x3  10 x5  12 x2  15 x4 1  15  50  24  60 150
X = = = = 10
1 3  5  2  4 15 15
Sometimes the data value in the data set may have different importance & as a result we may attach
different weight (wi). In this case, the sample mean is said to be a weighted average ( x w) & is given as:
n
w x
i 1
i i
w1 x1  w2 x2  ............  wn xn
 xw= = , where: wi is the weight of xi
n
w1  w2  w3  ........  wn
w
i 1
i
Example: Suppose a student take four courses Statistics, Economics, Mathematics and Basic English.
Course Title Credit hours Grade obtained Scale
Statistics 4 B 3
Economics 3 C 2
Mathematics 3 A 4
Basic English 3 C 2
 Find the Grade Point Average (GPA)
Solution:
4 x3  3x2  3x4  3x 2 12  6  12  6 36
 GPA = = = ---------------- GPA = 3.00
433 2 12 12
Grand mean / X G/
 Suppose we have two distinct samples of sizes n1 and n2. If the sample mean of the first sample is
X 1 and that of the second is X 2. Then the sample mean of the combined sample of size n1 + n2 is
called the Grand Mean.
n X n X
 The Grand Mean is denoted by X G and is given by: X G = 1 1 2 2
n1  n2

Example:
A company runs two manufacturing plants. A sample of 30 employees at plant 1 gets a mean salary of
1000 birr. A sample of 20 employees at plant 2 gets a mean salary of 1200 birr. What is the average
salary for all 50 employees?
Solution: The average salary for all 50 employees is given by:

X G= n1 X 1 + n2 X 2
n1 + n2
30(1000)  20(1200) 30,000  24,000 54,000
= = = = 1080 birr
30  20 50 50
Harmonic mean / X H.M /

 Suppose we have a sample of n data whose values are given by x1 x2…… xn, the harmonic mean,
n n
denoted by X H.M is given by: X H.M = =
1 1 1 1 1
   ........ 
x1 x2 x3 xn
 xi
Example: Suppose the values of a data set are 3, 5 and 4. Find their harmonic mean.
Solution
3 3 3(60) 180
X = = = = = 3.83
20  12  15
H.M
1 1 1 47 47
 
3 5 4 60
When the data are arranged in a frequency table, we can also compute the harmonic mean. Suppose we
have a sample of n data whose values are given by X1, X2, …….,Xn with frequencies f1, f2 ………., fn.
The harmonic mean of these observations is given by:
n n
 X H.M = = , where: n =  fi
fi f1 f2 fn
X 
X1 X 2
 ........... 
Xn
i
Example: The following is a frequency table of the ages of a sample of students in a certain college.
Age value Frequency
20 5
22 10
24 12
26 18
Solution:
 The harmonic mean is given by:
n n 45 45 45
X H.M = = = = = = 23.8
fi f1 f 2 fn 5 10 12 18 0.25  0.45  0.5  0.69 1.89
X X X   .............   
X n 20 22 24 26

i 1 2
 However, if you computer the sample mean, you will get
X =
 fiXi = 205  2210  2412  2618 = 1076 = 23.9
 fi 5  10  12  18 45

 So far, we discussed the methods useful to compute the sample mean for ungrouped data. However,
data may be arranged in a class interval. The method useful in computing sample mean for grouped
data is discussed as follows:
Geometric Mean X Gim
 Suppose we have a sample of n data whose values are given by X1, X2, …… Xn. The Geometric
Mean of these observations is given by:
X G ,.M = n X1. X 2. X 3....... X n , Where n is the number of observations .
Example: Suppose the values of a data set are 2, 3,and 36. Find the geometric mean .
3 3
Solution: X G ,.M = 2 x3x36 = 216 = 6
When the data are arranged in a frequency table. We can compute the geometric mean as follows.
Suppose we have a sample of n data whose values are given by x1 x2, ---- xn with frequencies f1, f2, ---
fn. The geometric mean of these observations is given by:
fi f2 fn
.
 X G ,.M =
X1 X 2 ..... X n ,where: n=  fi is the total number of observations.
Example: Suppose the values of a data set are given in a frequency table as follows.
Value frequency
1 2
2 3
3 1
4 4
 Solution: The geometric mean X Gim is given by: X G ,.M = 10
12 .23.31.44 = 10 6144 = 2.39
Relationship among Arithmetic Mean, Harmonic Mean and Geometric Mean
 Suppose a sample of data have two values x1 and x2.
X1  X 2
 The Arithmetic mean ( X AM ) of these two observations is given by: ( X AM ) =
2
2 2X 1 X2
 The Harmonic mean ( X AM ) of these two observations is given by: ( X AM ) = =
1

1 X1  X 2
X1 X 2
 The Geometric mean ( X GM ) of these two observations is given by: ( X GM ) = 2 X1, X 2
 From harmonic mean, we obtain

2X1X2 = (X1+X2) ( X HM ) ; Divide both sides by 2, we obtain
 X  X2  X1  X 2
X1X2 =  1  ( X H .M ) ; Substituting X in to the expression we obtain
 2  2

X1 X2 = X A.M  X ; Taking square root on both sides, we obtain
HM
2
2 X1 X 2 = X A.M X H .M

 Thus, for a data having only two observations, geometric mean is the square root of the products of
arithmetic and harmonic mean.
Remark:
  
 If the two observations X1, and X2 are equal but positive then ( X GM ) = X HM = X A.M 
     
The relationship b/n X A.M , ( X GM ) and X HM will be ( X GM ) ≤ X HM  ≤ X 
A. M
1) Sample Arithmetic Mean for grouped data

 If the data sets are given in a class interval form, then the sample mean will be given by:
X=
 fi X i
 fi
Where Xi = represents the class mark of each class interval
fi = represents the corresponding frequencies of each class interval.
 Class mark (w) is the average value of the lower and upper class limits of a given class or it is the
average value of the lower and upper class boundaries of a given class. i.e.
LCLi  UCLi
 Class mark of ith class (wi) =
2
Where: LCL: represents lower class limit of ith class
UCL: represents upper class limit of ith class Or
LCBi  UCBi
 Class mark of ith class (wi:) =
2
th
Where: LCB: represents lower class boundary of i class
UCB: represents upper class boundary of ith class
Example: Consider the class interval given for final results in statistics. Compute the sample mean
Class interval Frequency
40 – 49 5
50 – 59 4
60 – 69 5
70 – 79 8
80 – 89 7
90 – 99 1
Solution
 To compute the sample mean, first we have to find the class mark for each class. To find the class
mark, as indicated above, take the average of the lower and upper class limits. Using this formula the
class mark for each class is given in the table below.
Class interval Class mark (wi) Frequency (fi) (Xi )( fi)
40 – 49 44.5 5 222.5
50 – 59 54.5 4 218
60 – 69 64.5 5 322.5
70 – 79 74.5 8 596
80 – 89 84.5 7 591.5
90 – 99 94.5 1 94.5
Total 30 2045
 Then the sample mean is given by:
X=
 xi fi = 5x44.5  4 x54.5  5x64.5  8x74.5  7 x84.5  94.5x1
 fi 5  4  5  8  7 1
222.5  218  322.5  596  591.5  94.5 2045
= = = 68.17
30 30
3.1.2 The Median
~
Sample median ( X )
A statistic which is used to indicate the center of a data set but which is not affected by extreme values is
a sample median. It is the middle value when the data are ranked or arranged from the smallest to the
largest.
 If the number of data values is odd, then the sample median is the middle value, i.e. the median is the
 n  1  th
value corresponding to   item, where n is the total number of observations.
 2 
 If the number of data values is even, then the sample median is the average of the two middle values,
 n n 2
th   
i.e. the median is the value corresponding to  2  2   item.
 2 
 
 
Example: Find the sample median for the data representing number of items sold by a grocery in 5 days.
10, 28, 5, 12, 30
Solution:
 To find the median, first arrange the data in increasing order as follows: 5, 10, 12, 28, 30
Since the sample size is 5 (which is odd), the sample median is the 3rd smallest value. That is, the
median number of items sold in the five days is 12.
n  1 th
 This median can also be obtained by taking ( ) item which is the 3rd item (12).
2
Example: The following data represent the number of patients served in a certain clinic for 10 days
5, 13, 20, 2, 6, 18, 9, 15, 7, 18
Solution:
 To find the median, first arrange the data in increasing order
2, 5, 6, 7, 9, 13, 15, 18, 18, 20
Since the sample size is 10 (which is even). The median is the average of the two middle values. Thus,
9  13
the median is the average value of 9 and 13, which is = 11
2
n
 n  2  th
 This median can also be obtained by taking the average value of ( 2 )th and   item, i.e. The
 2 
10 th  10  2  th
median is the average value of = 5th item and   = 6 item which is 13.
2  2 
 Thus, Median = 9 + 13 = 11
2
Median for grouped data

 If the data set is given in a class interval form, the median will be given by:
n 
   f w
2 
X = LCB +
fi
Where: LCB; is the lower class boundary of the median class.
: fi is the frequency of the median class
: w is the class width
: fi is the sum of frequencies of all classes before the median class
: n is the sample size which is equal to the sum of all frequencies. i.e. n=  fi
Example: Compute the median for the class interval given for final results in statistics.
40 – 49 5
50 – 59 4
60 – 69 5
70 – 79 8
80 – 89 7
90 - 99 1
Solution:
 To find the median, first we have to convert the class limits in the above table in to class boundaries
which are indicated in the table below.
Class interval Frequency Cumulative frequencies

39.5 – 49.5 5 5
49.5 – 59.5 4 9
59.5 – 69.5 5 14
69.5 – 79.5 8 22
79.5 – 89.5 7 29
89.5 – 99.5 1 30
 The median class is a class which contains the median value, and it is the class which contains the
 n  1  th
value corresponding to   item. Hence, in the above table, the median class is the class which
 2 
 30  1  th  31  th
contains the   =   = 16.5 value.
th
 2   
2
 There fore, the class which contains the 16.5th value is 69.5 – 79.5 which the median class is also.

The lower class boundary of the median class is 69.5, the class width is 10, the frequency of the median
class is 8, and the sum of the frequencies before the median class is 14. Thus, the median will be given
by:
n   30 
   f w   14 10
X = LCBi +   = 69.5+  2 
2 10
= 69.5 + = 69.5 + 1.25= 70.75
fi 8 8
Properties of median
 Unlike the sample mean, which uses all the data values, median use only one or two middle values.
 Median is not affected by extreme values. That means, even if there are few extreme values (i.e. very
small or very large values). The median value will not be affected.
3.1.3 The mode

 It indicates the data value that occurs most frequently in the data set. Thus, to find the mode from a
frequency distribution table, we can take the value which corresponds to the highest frequency.
Remark:
 It is possible to have no mode if all observations occurs equal number of times. For example, in
the following frequency distribution table, there is no mode, since all values occurs equal number of
items.
Value 1 2 3 4
Frequency 5 5 5 5
 It is possible to have one mode if one observation occurs most frequently in the data set. For
example, in the following frequency distribution table there is only one mode which is the value 3
because it occurs most frequently in the data set. In this case, the data set is called unimodal
Value 1 2 3 4 5
Frequency 3 2 6 5 1
 It is possible to have more than one mode, if two or more observations occur most frequently in the
data set. If only two values occur most frequently in the data set, then the data set is called
bimodal. While if more than two values occur most frequently in the data set, then the data set is
called multimodal.
Mode for grouped data

 Suppose the data set are given in a class interval form, the mode is given by:
 d1 
Mode ( X ) = LCBi +  cu
 d1  d 2 
Where: LCBi is the lower class boundary of a modal class
: d1 is the d/ce b/n the frequency of the modal class and the frequency of the preceding class.
: d2 is the d/ce b/n the frequency of the modal class and the frequency of the succeeding class.

Example: Find the mode for the class interval given for final results in statistics.
40 – 49 5
50 – 59 4
60 – 69 5
70 – 79 8
80 – 89 7
90 – 99 1
Solution
 To find the mode convert the class limits into class boundaries and obtained the following class.
39.5 – 49.5 5
49.5 – 59.5 4
59.5 – 69.5 5
69.5 – 79.5 8
79.5 – 89.5 7
89.5 – 99.5 1
 From this table, the modal class is 69.5 – 79.5 which a class with the highest frequency is.
 LCBi = 69.5,  d1 = 8 – 5 = 3  d2 = 8 – 7 = 1  w = 10
 Using these information, the mode is given as:
 3 
10 = 69.5 +
^ 30
X = 69.5 +  = 69.5 + 7.5 = 77
 3  1 4
3.2 Measures of Variation/Dispersion
Types of measures of dispersion
1) Range
 It is a measure of dispersion which can be computed by taking the difference between the largest and
the smallest observed values. Range = Maximum value – minimum value
 As you observe, the range uses only two extreme values, hence, it is not the best measure of
dispersion.
2) Mean Deviation
 The difference between a number in a data set and the mean of the data is called a deviation shows
how much a value varies from the mean. Deviation = X – A
 Mean deviation is the average of the absolute deviation taken from central tendency, usually from
mean or median.
 Let X1, X2, X3, -------Xn are n observed values, then
M.D =
 X1  A , where A is a measure of central tendency.
n

Example
 Find the mean deviation from the mean for the following frequency distribution table represents the
temperature of a certain city for 10 days.
Temperature 200C 220C 240C 260C
Number of days 3 4 1 2
Solution:
 To find the mean deviation from the mean, first let’s compute the sample mean as follows:
X =
 fi X i = f1x1  f 2 x2  f3 x3  f 4 x4 = 203  224  241  262
n f1  f 2  f3  f 4 3  4 1 2
60  88  24  52 224
= ------------ x = = 22.4
10 10
 Now, mean deviation from the mean is given by
 fi / Xi  X / 3(20  22.4)2  4(22  22.4)2  1(24  22.4)  2(26  22.4)2
n 3  4 1 2
(3)(5.76)  4(0.16)  1(2.56)  2(12.96)
= = 4.64
10
3) Standard Deviation
 Standard deviation is a measure of dispersion that considers all values in a data set. We can compute
standard deviation for population values or for sample values.
 The standard deviation for population is called population standard deviation /δ/ and the standard
deviation for sample is called sample standard deviation /s/.
 X  
2
 Population standard deviation /δ/ is given by δ =

i
N
Where: Xi is the ith population value
 is the population mean given by  =  i

X
N
N: Total number of observations in the populations (population size)
 The square of the population standard deviation is population variance denoted by δ2
2   X i   2
δ =
N
 x 
2
x

i
Sample Standard deviation / S/ is given by: s =
n 1
Where: Xi is the ith sample value
: X is the sample mean which is given by X =

 Xi
n
: n is the sample size
 The square of the sample standard deviation is sample variance (s2)
 x  X 
2
2 i
S =
x 1

3.3. Measures of Location
 This is a continuation of the concept of medians and these talks about the position of datum at
different locations.
 The concept can be understood through the examination of quartile, deciles and percentile.
1) Quartiles /Q/
 It is a measure of location which divides the data set in to four equal parts, depending on the value of
(i=1, 2, 3), we have
 Q1 -1st quartile  Q2 – 2nd quartile  Q3 – 3rd quartile
 in  2 
th
 Quartile (Q): for I = 1, 2, 3 is the value corresponding to   item, where: n is the total
 4 
number of observations
 First Quartile (Q1) -is a point which divides the data set where 25% of the observations lie below it
and 75% of the observations lie above it.
n 2
th
Q1 (first quartiles = the value corresponding to   item

 4 
 Second Quartile (Q2) -is a point which divides the data set where 50% of the observations lie below
it and 50% of the observations lie above it.
 n 1
th
Q2 (second quartile) = the value corresponding to   item

 2 
 Third Quartile (Q3)- is a point which divides the data set where 75% of the observations lie below
it and 25% of the observations lie above it.
 3n  2 
th
Q3 (Third quartile) = the value corresponding to   item

 4 
Example:
 Find the 2nd quartile for observations which shows the number of items sold for 10 days in a shop
12, 18, 15, 10, 8, 16, 4, 9, 18, 5
Solution
 First arrange the data set in an increasing order
4, 5, 8, 9, 10, 12, 15, 18, 18
 Total number of observations is 10.
 Q2 = the value corresponding to (2n + 2)th item
4
= The value corresponding to (11/2) th item = the value corresponding to 5.5 th item
= the value corresponding to 5th item + ½ (the 6th item – 5th item)
= 10 + ½ (12 – 10) = 10 + 1--------------------------------------------------------Q2= 11
2) Deciles /Di/
 Decile is also a measure of location which divides the data set in to ten equal parts. Depending on
the value of i, (i = 1, 2.3, 4, 5, 6, 7, 8, 9), we have
 D1 - 1st decile  D4 - 4th decile  D7 - 7th decile
 D2 - 2nd decile  D5 - 5th decile  D8 - 8th decile
rd th
 D3 - he 3 decile  D6 - 6 decile  D9 - 9th decile

 Decile (Di, i = 1, 2, 3, 4, -----9) =the value corresponding to (in + 5) th item

10
 First docile (D1) - is a point which divides the data set where 1/10 of the observations lie below it
and 9/10 of the observations lie above it.
 Second decile (D2) -is a point which divides the data set where 2/10 of the observations lie below it
and 8/10 of the observations lie above it.
 Third decile (D3) -is point which divides the data set where 3/10 of the observations lie below it and
7/10 of the observations lie above it.
th
 i 
 In general, decile Di is a point which divides the data set where   of the observations lie below
 10 
it and
10  i  th of the observation lie above it
10
Example: Find D3 for the observations which shows the number of items sold for 10 days in a shop:
12, 18,15,10,8,16,4,9,16,5
Solution
4,5,8,9,10,12,15,16,18,18
 Total number of observations is 10
 310  5  th
 D3 = the value corresponding to   item
 10 
35 th
= The value corresponding to item =The value corresponding to 3.5th item
10
= The value corresponding to3rd item + 0.5 (4th – 3rd) item = 8 + 0.5 (9 – 8) = 8.5
3) Percentile /Pi/
 Percentile is a measure of location which divides the data set in to 100 equal parts. Depending on the
value of i, (i = 1, 2, 3, 4-------- 99) we have
 P1 -1st percentile  P3 - 3rd percentile  “ ” “ “ “ “
 P2 - 2nd percentile  P4 - 4th percentile  P99 - 99th percentile
 25th percentile (P25)th: is a point which divides the data set where (25/100)th = (1/4)th of the
observations lie below it and (75/100) th = (3/4)th of the observations lie above it. You will observe
that P25 is equal to Q1 where ¼ th of the observations lie below it and (3/4) th of the observations lie
above it.
th
 50 
 50 percentile (P50): is a point which divides the data set where 
th
 (half of the observations)
 100 
th
 50 
lie below it and   (half of the observations) lie above it.
 100 
 Note that P50 is equal to Q2 which is also equal to D5 because all indicates a point which divides the
data set where half of the observations lie below it and half of the observations lie above it.

i th
 In general, percentile Pi is a point which divides the data set where of the observations lie
100
below it and
100  i  th of the observations lie above it.
100
 in  50 
 Percentile (Pi, i = 1, 2, 3 ------- 99) = The value corresponding to   item.
 100 
Example
 Find P50, and P75 for the observations which shows the number of items sold for 10 days in a shop
12, 18,15,10,8,16,4,9,18,5
Solution
4,5,8,9,10,12,15,16,18,18
 Total number of observations is 10
 5010  50 
 P50 = The value corresponding to  100
 th item = The value corresponding to the 550 th item
 100
th
= The value corresponding to the (5.5) item
= The value corresponding to 5th item + 0.5 (6th item – 5th item)
= 10 + 0.5 (12 –10) = 10 + 1 = 11 which is equal to Ds and Q2
 7510  50 
 P75 = The value corresponding to  100
 th item = The value corresponding to the 800 th item
 100
th
= The value corresponding to the 8 item = 16
Remark:
 From the above discussion, you can observe the following relationship
 Q1 = P25  Q3 = P75  “
 Q2 = D5  Q1 = P10  “
=P50  Q2 = P20  D9 = P90
Quartile in a grouped data

 If the data set is given in a class interval form, then the quartile is given by
 n 
 i   f w
Qi = LCBi +  
4
, i = 1,2,3
fi
Where: LCBi is the lower class boundary of the quartile class.
:  f is the sum of frequencies of all classes below the quartile class
: fi is the frequency of the quartile class
: n is the total number of observations

Example: Find Q1 and Q2 for the class interval given for final results in statistics:
40 – 49 5
50 – 59 4
60 – 69 5
70 – 79 8
80 – 89 7
90 – 99 1
Solution
 To find the quartiles, first we have to convert the class limits in the above table in to class boundaries
as indicated in the table below.
Class interval Frequency Cumulative frequencies

39.5 - 49.5 5 5
49.5 - 59.5 4 9
59.5 – 69.5 5 14
69.5 – 79.5 8 22
79.5 – 89.5 7 29
89.5 – 99.5 1 30
 The quartile class is a class which contains the respective quartile. i.e the class containing
 n  1  th
i  value.
 4 
 n  1  th
 The first quartile class is a class which contains the value corresponding to i  item = The
 4 
31 th
value corresponding to item = the value corresponding to 7.75th item, which is 49.5 – 59.5.
4
 The lower class boundary of the first quartile class is 49.5. The frequency of the first quartile class
is 4, the class width is 10 the sum of frequencies for classes below the first quartile class is 5 and the
total number of observations is 30.
 Using this information, the first quartile is given by:
 30 
  5 10
 Q1 = 49.5 + 
4  = 49.5 +
7.5  510 = 49.5 + 2.510 = 49.5 + 6.25 = 55.75
4 4 4
 n  1  th
 The second quartile class is a class which contains the value corresponding to = 2   item the
 4 
31 th
value corresponding to item = the value corresponding to the 15.5 th item which is 69.5 – 79.5.
2
 The lower class boundary of the second quartile is 69.5. The frequency of the second quartile class is
8, and the sum of frequencies for classes below the second quartile class is 14.

 Using this information, the second quartile is given by:
 30 
  14 10
 Q2 = 69.5 +  
2 10
= 69.5 + = 69.5 + 1.25 = 70.75, which is equal to the median value
8 8
Deciles in a grouped data
 If the data set is given in a class interval form, then the decile is given by
 n 
 i   f w
Di = LCBi +  
10
(, i = 1, 2, 3, --------9)
fi
Where: LCBi is the lower class boundary of the decile class
:  f i is the sum of frequencies of all classes below the decile class
: fi is the frequency of the decile class
Example: Find D2, and D5 for the class interval given for final results in statistics,
Class interval frequency Cumulative frequency
39.5 - 49.5 5 5
49.5 – 59.5 4 9
59.5 - 69.5 5 14
69.5 - 79.5 8 22
79.5 – 89.5 7 29
89.5 – 99.5 1 30
Solution
 The decile class is a class which contains the respective deciles. i.e it is a class which contains the
 n  1  th
value corresponding to i  item.
 10 
 n  1  th
 The second decile (D2) class is a class which contains the value corresponding to 2  item =
 10 
 n  1  th  31 
the value corresponding to   item = the value corresponding to   th item = the value
 5  5
corresponding to 6.2th item, which is 49.5 – 59.5.
 The lower class boundary of the second decile is 49.5. The frequency of the second decile class is 4
and the sum of the frequencies of all classes below the second decile class is 5.
 Using this information, the second decile is given by:
  30  
 2   5 10
 D2 = 49.5+ 
 10  
= 49.5 +
6  510 = 49.5 + 10 = 49.5 + 2.5 = 52
4 4 4
 Similarly, the value of D5 lies in a class interval which contains the value corresponding to
 30  1  th
5  item which is the value corresponding to the 15.5 item.
th
 10 

 This value lies in a class 69.5 – 79.5. The lower class boundary is 69.5, the frequency of the given
class is 8 and the sum of frequencies below the given class is 14.
 Using these information, the fifth decile (D5) is given by:
 530 
  14 10
 15  14 
 D5=69.5+  
10 = 69.5 +
10 10
= 69.5 +  = 70.75
8  8  8
Percentile in a grouped data

 If the data sets are given in a class interval form, then the percentile (Pi) is given by:
 in 
   f w
Pi = LCBi +  
100
(, i = 1,2,3,-----99.)
fi
Where: LCBi is the lower class boundary of the percentile class

:  f is the sum of frequencies of all classes below the percentile class.
: fi is the frequency of the percentile class
Example:
 Considering the class interval given for final results in statistics in the above example, find P25.
Solution:
 The percentile class is a class which contains the respective percentiles, i.e. it is a class which
 n  1  th
contains the value corresponding to i  item
 100 
 n  1  th
 The 25th percentile (P25) class is a class which contains the value corresponding to 25  item
 100 
which is the value corresponding to 7.75th item. The class which contains the value corresponding to
7.75th item is 49.5-59.5 and the lower class boundary is 49.5. The frequency of the given class is 4
and the sum of frequencies of all classes below the given class is 5.
 Using these information, the 25th percentile /P25/ is given by
 2530 
  5 10
 100  7.5  510 = 49.5 +  2.5 10 = 49.5 + 6.25
 P25 = 49.5 + = 49.5 +  
4 4  4 
= 55.75 which is equal to the first quartile.

AREAS UNDER THE STANDARD NORMAL CURVE
(The entries are the probabilities that a random variable having the standard normal distribution will
take on a value between 0 and z.)
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990

UNIT FOUR
PROBABILITY AND PROBABILITY DISTRIBUTION
Introduction
There are few things in our lives that are absolutely uncertain. This uncertainty makes life challenging and
interesting. How boring it would be to know in advance everything that was going to happen to us would
be so much less fun if the mystery were taken out. We wouldn't need elections because the winners and
losers would be know beforehand we wouldn't need much of a stock market because everyone would
know what stock prices would be tomorrow and every day after that We wouldn't need sports events
because we would already know the outcomes
The idea behind probabilities is to try to quantify these uncertainties means that a variety of outcomes are
possible we can better understand this uncertainty and be more prepared for the possibilities if we use
probabilities to describe which out comes are likely and which are unlikely.
4.1. Elementary Probability Concepts and Theory

Concepts of Probability and Fundamental Terms
Definition:
 Probability as a general concept can be defined as the measure of a chance that something will occur.
 Probability theory is an integral part of all statistics, and in particular is essential to the theory of
statistical inference. It also helps decision makers to make decisions under uncertainty.
 In order to obtain a deeper understanding of probability, there is a need to define some terms in
probability discussion.
1) Random experiment
 Kind of experiments, in which the value vary from one performance of the experiment to the next even
though the conditions are the same.
Example:
 Toss a coin, the results of the experiment is either tail (T) or Head (H).
 Toss a die, the results of the experiment are one of the numbers in the set {1,2,3,4,5,6 }.
2) Sample Space (S)

 A set that consists of all possible outcomes of a random experiment is called a sample space (S), and
 Each element in the sample space is called a sample point.
Example:
 In a random experiment of tossing a coin, the possible out comes are Head or Tail: S = {H, T}
 In a random experiment of tossing a die, the possible outcomes: S = {1, 2, 3, 4, 5, 6}
 Note: If a sample space has a finite number of sample points, then it is called a finite sample space.
Otherwise, it is called an infinite sample space.
3) Event /E/
 An event is a subset of the sample space.

 An event consisting of a single sample point is called a simple or elementary event.
 While an event which consists of all sample points in a sample space is called sure or certain event.
Example:
 Toss a coin once, the possible outcome is either H or T.From this experiment we can define d/t events.
 The sample space consists of all possible outcomes of the given experiment, i.e. S = {H,T}
 E1: The event that head occurs in tossing a coin once, i.e, E1= {H}
 E2: The event that tail occurs in tossing a coin once, i.e. E2= {T}
 E3: The event that head or tail occurs in tossing a coin once, i.e. E3= {H, T}-Certain event
 E4: The event that number 2 has occurred in tossing a coin once, i.e. E4={2}-Impossible event
 If there are two or more simple or elementary events, then by combining them using set operations we
can form compound events.
For example, let‟s take the two events E1 & E2 mentioned in the above example which are simple or
elementary events. By combining them it can be possible to form d/t compound events. For instance,
 E1 U E2: The union of two the events. It is the event that either E1 or E2 or both occur.
 E1  E2: The intersection of the two events. It is the event that both E1 and E2 occur together.
’:
 E1 complement of E1: It is the event that all elements in the sample space but not in E1 occur.
 If the set corresponding to the intersection of two events is empty set, i.e. if the two events are disjoint
(E1  E2 =) then the events are mutually exclusive events. In other words, if two or more events
are mutually exclusive, then they can‟t occur together.
4) Die
 A die is a homogeneous cube with six face marked with numbers from 1 to 6.
5) Coin
 A homogeneous circular flat object with two fares marked with head and tail or 0 and 1.
6) Deck / Pack / of 52 - cards
 A pack /deck/ of cards is a set of playing cards. The deck of 52 cards consisting of 26 black cards
and 26 red cards in terms of colors. There are 4 suits in pack of 52 cards.
Club (flower) Spade (spear) Heart Diamond
n= 13 n=13 n=13 n=13

 Club and spade cards are black cards and heart and diamond are red cards
 There are 13 playing cards each of which consisting of 4 cards (of suits): Ace ,two ,three, four, five ,
six , seven, eight , nine ,ten , jack ,queen ,king / knave/ .
7) Equally likely cases
 The two outcomes are said to be equally likely or equally probable if none of them is expected to
occur in preference to other.
 For example, in tossing of a coin all the outcomes H or T are equally likely if the coin is unbiased.
8) Independent Events
 Events are said to be independent of each other, if happening of any one of them is not affected by the
happening of any one of others. For example

 In tossing of a die repeatedly, the event of getting '5' in the 1st throw is independent of getting '5'
in the second throw , third or subsequent throws
 Similarly, drawing of balls from an urn gives independent evens if the draws are made with
replacement, i.e. if the ball drawn in the 1st draw is replaced then the resulting draws will be
independent.
4.2. Counting Techniques
 In the study of what is possible there is a problem of determining the number of ways in which things
can happen. For this purpose we need counting techniques that include the multiplication of choices,
permutations and combinations
4.2.1. Multiplication of choices

 The problem of determining the number of classifications can be handled systematical by the
following principles of counting that all discussed below
Fundamental Principle of Counting

 If one thing can be accomplished in n1 different ways and after this a second thing can be performed
in n 2 different ways and finally a k thing can be performed in n k different ways then all k things
can be performed in the specific order in n1 . n 2 …. nk different ways.
Example: If a man has 2 shirts & 4 ties then, then in how many ways he can choose a shirt and then a tie?
 Solution: By the principle of counting, the first can be performed in 2 different ways and the second
can be performed in 4 different ways ,together they can be performed in 2x 4= 8 different ways
4.2.2. Factorial
 It refers to the product of all positive integers from 1 up to and including a given integer.
 It is represented by n! Where n is the given integer. n! = nx(n-1)x(n-2)x…x1. Note: 0! = 1
Example: In how many different ways can 4 different Instructors be introduced to the student?
 Solution: There are 4! = 4x3x2x1 = 24 ways in which they can be introduced
4.2.3. Permutations
 Suppose there are n distinct objects and we want to arrange r of these objects in a line. There are n
ways of choosing the 1st object, n-1 ways of choosing the 2nd object and continue like this and finally
there are n-(r-1) ways of choosing the rth object.
 Applying the fundamental principle of counting, a permutation of n different objects taken „r‟ at a
time which is denoted by n Pr
is an order arrangement of only r object of the n objects is given by:
n!
n Pr 
(n  r )!
 The number of d/t permutation of n different objects taken r at a time (without repetition) is given by:
n!
 n Pr = n(n-1)( n – 2 )----- (n-r +1) =
(n  r )!
Example: Find the number of permutations of four letters a, b, c, d, if we take only two of the four letters

 Solution: Here n=4 (4 distinct objects) and r = 2 (we want to arrange two letters at a time). The
n! 4! 4 x3x 2 x1
number of permutation of 4 taken 2 at a time is given: Pr  = = = 12
(n  r )!
n
2! 2 x1
Example: Among 10 members of a certain company, three names are drawn for the offices of president,
vice president and secretary. In how many different ways can this be done?
10!
 Solution: 10P3 = =10x9x8 = 720
7!
 The number of permutations of n distinct objects taken n at a time n Pn = n!
Example: Find the number of permutations of the four letters a, b, c, d, if we take four letters at a time
4!
Solution: 4P4 = = 4! = 4x3x2x1 = 24
0!
 The number of permutations of n distinct objects arranged in a circle is (n-1)!
Example: In how many ways can 6 people be seated at a round table if they can sit anywhere?
Solution: There are 5! = 5x4x3x2x1 = 120 ways of arranging the 6 people in a circle.
 The number of permutations of n objects of which n1 are of one kind, n2 are of a second kind, …,
n!
nk are of a kth kind, and n1 + n2 +…+ nk = n is given by:
n1  n2  ...  nk
Example: 4 red marbles, 2 white marbles and 3 blue marbles are arranged in a row. If all the marbles of
the same color are not distinguishable from each other, how many different arrangements are possible?
9!
Solution: The number of ways of arranging the 9 marbles is given by: = 1260
4!2!3!
4.2.4. Combination
 A Combination of n different objects taken 'r at a time denoted by n C r is a selection of only r object
of the n object without any regard to the order of arrangement. In permutation, we are interested in the
order of arrangement of the objects.
 For example: abc and bca are the same in combination, however, they are different in permutation.
 The number of different combination of r objects selected from n distinct objects (without
repetition) is
n! P
 n Cr  = n r , for r =0, 1, 2 …, n
r!(n  r )! r!
Example: In how many ways can a committee of 3 people be chosen out of 7 people?
7!
Solution: There are 7C3 = = 35 ways of forming a committee of 3 people chosen out of 7 people.
3!4!
 The number of different combination of ‘n ‘objects selected from ‘n ‘distinct objects are
n!
 n Cn  1
n!(n  n)!

 The number of different combination of n objects (distinct) taken r at a time (with repetition ) is
(n  r  1)! (n  r  1)
 ( n  r 1) C r  
r!(n  r  1  r )! r!(n  1)
Example: In how many ways can a person select three shirts out of 7 different types?
Solution: The person can select 3 shirts out of 7 in: 7C3 = 7! = 7! = 35 ways
3!(7-3)! 3!4!
Example: From 4 vowels and 6 consonants, how many words can be formed consisting of 2 vowels and 3
consonants?
Solution: 2 vowels can be selected in 4C2 ways, the 3 consonants can be selected in 6C3, and the resulting
5 different letters (2 vowels, 3 consonants) can be arranged among themselves in 5P5 =5! ways. Then
 Number words = 4C2 . 6C3. 5! = 14400
4.3. Approaches to the study of probability
 As a measure of the chance or probability, with which we can expect the event to occur, it is
convenient to assign a number between 0 and 1. If we are sure or certain that the event will occur, we
say that its probability is 100% or 1, but if we are sure that the event will not occur, we say that its
probability is zero.
 There are three approaches to the study of probability.

4.3.1. Classical Approach
 Suppose an event A can happen in h different ways out of a total of 'n' possible equally likely ways.
The probability of occurrence of an event A is defined as: -------------- P(A)= h
n
 On the contrary, the probability of non-occurrence of an event A is defined as: P(A) = 1-P(A) = 1- h
n
 This approach is called classical definition /mathematical/ of probability which is based on the
assumption that events / sample points/ of an experiment are equally likely.
Example: Consider an experiment of rolling a die once. What is the probability of getting?
a) a '2' b) an odd number c) a '1' or a '3' and d) a '7'
 Solution: In rolling a die once, the possible outcomes are: 1, 2, 3, 4, 5, 6. That is, the sample space is
S={1,2,3,4,5,6} , n(s) = 6
a) E = {2}, the number of favorable cases to an event E is 1-------------P(E) = 1/6
b) E = {1, 3, 5}, the number of favorable cases to an event E is 3----------P=(E)= 3/6 = 1/2
c) E = {1, 3}, the number of favorable cases to and event E is 2------------P(E)= 2/6 = 1/3
d) In rolling a die, it is impossible to get a '7'. Hence the event is E= { }, so P(E)= 0/6 = 0
Example: Two items are chosen at random from a box containing 4 defective and 6 non-defective items.
What is the probability that:
a) Both are defective b) Both are non-defective and c) One is defective and the other is none defective
Solution: There are a total of 4+6=10 items. We can choose two items, out of 10 in
10C2 = 10! = 10! = 45 days
2!(10-2)! 2!8! There are a total of 45 possible equally likely outcomes.

a) Both are defective only if:
 Out of 4 defectives, 2 are chosen and
 Out of 6 non-defectives, non-is chosen.
 The number of ways of choosing 2 defectives and zero non-defective is:
(4C2) x (6C0) = 6---- this is by using fundamental rule of counting.
 Hence, the probability that both are defectives is:
P=h = 6
n 45
b) Both are non-defective only if
 Out of the 4-defectives, none-is chosen.
 Out of the 6-defectives, 2 are chosen.
 The number of ways of choosing 0 defective and 2 non-defectives are:
(4C0) x (6 C2) = 15 this is by using fundamental rule of counting.
 Hence, the probability that both are non-defective is:
P = h = 15 = 0.3333
n 45
c) One is defective and the other is non-defective only if.
 Out of 4 defectives, 1 is chosen.
 Out of 6 non-defectives, 1 is chose.
 The number of ways of choosing 1 defective and 1 non-defective is: (4C1) x (6 C1) = 24
 Hence, the probability that one is defective and one is non-defective is: P = h = 24 = 0.5333
n 45
Note:
 The classical definition of probability does not require the actual experimentation, i.e. no
experimental data are required for its computation nor it is based on previous experiment.
 It enables us to obtain probability by logical reasoning prior to making any actual trials and hence it is
also known as 'a priori' or theoretical or mathematical probability.
Limitations:
h
 If as n approaches to infinity then P  is undefined.
n
 If the various outcomes are not equally likely, then it is difficult to get the probability.
 If the actual value of n is not known, then it is difficult to get the probability.
4.3.2. Empirical/Relative Frequency/Approach

 If an experiment is repeated n times, where n is very large, and event E is observed to occur in 'h' of
these, then the probability of the event is:
h h
P(E) = P This is called the empirical probability of the event.
n n
Example: If you toss a coin 1000 times and find heads 532 times, we estimate the probability of a head is
P= 532 = 0.532.
1000
 This is the relative frequency. Since in relative frequency approach, probability is obtained objectively
by repetitive empirical observations, empirical probability provides validity to the classical approach.
 If an unbiased coin is tossed at random, then the classical probability gives the probability of a head as
½. Thus if we toss a coin 20 times, then classical probability suggests we should have 10 heads. In
practice, this may not generally be true.
 As a result, in 20 throws of a coin, we may get no head at all or 1 or 2 heads. However, the empirical
probability suggests that if a coin is tossed a large number of times; say 500 times, we should on the
average expect 50% heads and 50% tails.
 Thus empirical probability approaches classical probability as the number of trails becomes large i.e.
h
P( A)  Lim  as n approaches to 
n
 Both classical and frequency approaches have serious draw backs, first because of the words "equally
likely" are vague and second because the "large number" involved is vague. Because of these
difficulties, mathematicians have led to an axiomatic approach to probability.
4.3.3. Axiomatic Approach

 Let the sample space S. Probability of occurrence of any event A P(A) satisfy the following axioms:
 Axiom 1: For every event A, P(A) > 0. This is called Axiom of positive ness
 Axiom 2: For sure or certain event S, P(S)= 1 , this is called -Axiom of certainty
 Axiom 3: For any number of mutually exclusive events A1, A2,...., An
P(A1 U A2 U... U An) = P(A1) + P(A2)+...+P(An)
 The axiomatic definition of probability includes both the classical and empirical definitions of
probability and at the same time is free from their draw backs.
 If A and B are two mutually exclusive events then:
1. A  B: An event which represents the happening of at least one of the events A and B.
2. A  B: An event which represents the simultaneous happening of both the events A and B.
3. A : An event that A does not happen.
4. A  B : Neither A nor B happens i.e. none of A and B happens.
5. A  B : An event that A doesn't happen but B happens.
6. ( A  B )  ( A  B) : Exactly one of the two events A and B happens.
SOME IMPORTANT THEOREMS ON PROBABILITY

Theorem 1: The probability of occurrence of at least one of the two events A and B is given by - Addition
theorem of probability.
P (A  B) = P (A) + P (B) - P (A  B)
Theorem 2: If events A and B are mutually exclusive events, i.e. if A  B is empty set, then
P (A  B) = P (A) + P (B)
 That is, the probability of happening of any one of the two mutually exclusive events is equal to the
sum of their individual probabilities.
Theorem 3: For 3 events: A, B & C, the probability of the occurrence of at least one of them is given by:
P (A  B  C) = P (A) + P (B) + P(C) - P (A  B) - P (B  C) - P (A  C) + P (A  B  C)
 If A, B and C are mutually exclusive (disjoint) events, then
P (A  B  C) = P (A) + P (B) + P(C)
Example: A card is drawn from a well-shuffled park of 52 cards. Find the probability that the card is
a) A diamond or a king, b) A Queen or a red card C) A Queen or a king
Solution:
a) Out of the 52 cards, there are 4 kings and 13 diamonds.
 A = the event of drawing a diamond

 B = the event of drawing a king
 Then P (A) = ¼ , P (B) = 4/52 = 1 / 13, P (A  B) = 1 /52. We want to find P (A  B)?
P (A  B) = P (A) + P (B) - P (A  B), Since A  B = {King and diamond}
n( A  B ) 1 1 1 1 4
 P (A  B) =  Then: P (A  B)=   
n( S ) 52 4 13 52 13
b) Out of the 52 card, 26 are red cards and 4 are Queen.
 A = the event of drawing a Queen.
 B = the event of drawing a red-card.
 Then P (A) = 4/52 , P(B) = 26/52 = ¼
Out of 4 Queens, 2 of them are red cards and the other 2 are black cards. So
 A  B = {heart of Queen, diamond of Queen.}i.e A and B are not mutually exclusive events
n( A  B ) 2 1 4 26 2 7
  ------------- (A  B) = P (A) + P (B) - P (A  B) =   
n( S ) 52 26 52 52 52 13
c) Out of the 52 cards, there are 4 Queens and 4 kings.
 The event of drawing a king.
 The event of drawing a Queen.
 Since a card can not be a king and at the same time a Queen, A and B can not occur at the same time.
Thus A and B are mutually exclusive events. Then, P(A) = 4/52 , P(B) = 4/52
4 4 8 2
 Then P (A  B) = P (a Queen or a king) = P (A) + P (B) =   
52 52 52 13
Example: A chartered accountant applies for a job in two firms X and Y. The chance of his being selected
in firm X is 30% and being rejected in firm Y is 60%.
a) Suppose it is known that an applicant can be accepted in one and only one of the two firms X and Y.
What is the probability that he will be accepted in one of the firms?
b) Suppose that the probability of at least one of the applications being rejected is 0.5. What is the
probability that he will be selected in one of the firms?
Solution:
a) Define the following events:
 A: The event that an applicant is accepted in firm X. P (A) = 0.3 = 30%
 B: The event that an applicant is accepted in firm Y. P (B) = 100% - 60 % = 40% = 0.4
 Since the chance of being rejected in firm Y is 60%, his chance of being accepted in firm Y is 100-60
= 40% = 0.4. Since an applicant can be accepted in one and only one of the two firms, events A and B
cannot both occur at the same time. i.e. A and B are mutually exclusive. Therefore:
P (A  B) = P (A) + P (B) = 0.3 + 0.4 = 0.7 = 70% chance of being accepted in one of the firms.
b) P (A) = 0.3, P(B)= 0.4---------------P ( A  B ) = 0.5

Since by De-organ's Law: (A  B)‟ = A‟  B‟, P(A  B) = 1- P (A  B)‟ = 1- P (A‟  B‟ ) = 1-0.5= 0.5
 Then P(A  B) = 0.5, i.e. the two events are not mutually exclusive, this is b/c an applicant can be
accepted in one of the two firms X or Y.
 Therefore: P (A  B) = P(A) + P (B) - P (A  B) = 0.3 + 0.4 - 0.5 = 0.7 - 0.5 = 0.2. Then, he has a
20% chance of being accepted on the two firms X or Y.
4.4. Conditional Probability

 Before we define this probability, let‟s first define the two terms: Dependent & independent events
Dependent and independent events.

 Two events are said to be independent if the occurrence or non- occurrence of one event has no effect
on the probability of occurrence of the other event otherwise they are said to be dependent
Example: Toss a coin twice. Define the following events:
A. The first toss result in a head B. The second toss result in a head
 Are the two events dependent or independent?
Solution: Here it is obvious that every toss stands alone and is in no way connected with any other toss so
the result of the first toss doesn't affect the probability of occurrence of head on the second toss, i.e.
whether the first toss results in a head or tail, the probability of getting a head on the second is 0.5 there
force A and B are independent.
Conditional Probability
 Let E1 and E2 be two events and P(E1) > 0. The probability denoted by P(E2/E1) is the probability that
event E2 will occur given that event E1 has already occurred is given by:
P( E1  E2 )
P( E2 / E1 ) 
P( E1 )
P( E1  E2 )
 Similarly, P( E1 / E2 )  , if P (E2) > 0
P( E2 )
Example: Suppose a single die is tossed once.
A. Find the probability that a single toss of a die will result in a number less than 4.
B. Find the probability that a single toss of a die will result a number less than 4 given that the toss
resulted in an odd number.
Solution
a. Let the event E denotes a number less than 4 i.e. E={1,2,3} and the sample space consists of S =
n E 
{1,2,3,4,5,6} hence, the probability that event E will occur is given by: PE  
3 1
= 
nS  6 2
b. Here we have two events, event E1 denotes a number less than 4, i.e E1 {1, 2,3} and event E2 denotes
odd numbers, i.e. E2 = {1,3,5.}. Now it is required to find the probability that E1 will occur given that
E2 has already occurred, i.e. P(E1/E2)
P E 1  E 2 
 Using the formula given for conditional probability, the probability is given as: P(E1/E2) =
P E 2 
nE 1  E 2  n E 2 
 But PE1  E 2  = =  , and PE 2  
2 1 3 1
= 
nS  6 3 nS  6 2
1
P E 1  E 2  3 1  2  2
 Thus (E1/E2) = = =  
P E 2  1 3 1  3
2
Multiplication rule for two dependent events
 The probability of simultaneous happening of two events A and B is given by
 P( A  B)  P( A).P( B / A) , P( A)  0  P( A  B)  P( B).P( A / B) , P( B)  0
Multiplication Law for Independent Events

 If A and B are independent events then P(A  B) = P(A). P(B)
Example: Suppose 100 employees of a firm was asked whether they are in favor or against paying high
salaries to chief executive officers of the firm .The responses were classified as follows
In favor Against Total
Male 15 45 60
Female 4 36 40
Total 19 81 100
a. Compute the conditional probability P(in favor /male)

b. Compute the conditional probability P(female /in favor)
c. What is the probability that the employee is in favor or male?
Solution: Define the following event
 A = The event that the employee is in favor P(A) = 19 / 100= 0.19
 B= The event that the employee is male P(B) = 60 / 100 = 0.60
 C = The event that the employee is Female P(C) = 40 / 100= 0.40
15
P( A  B 0.15
a) P(in favor /male)= P( A / B)   100   0.25
P( B) 0.60 0.60
4
P( A  C ) 0.04
b) P(female/ in favor) = P(C / A)   100   0.2105
P( A) 0.19 0.19
c) P(A  B) = P(A)+ P(B) - P(A  B) = 0.19 + 0.60-0.15 = 0.64
4.5. Probability Distribution
 In the population, the values of the variable may be distributed according to some definite probability
law which can be expressed mathematically and the corresponding probability distribution is known as
theoretical probability distribution.
 Such probability laws may be based on a prior considerations or a posteriori inferences. These
distributions are based on exportations on the basis of previous experiences. Theoretical distributions
also enable us to fit a mathematical model or a function of the form p(x) to the given data.
 These probability distributions are:

i. Binomial distribution
ii. Poisson Distribution
iii. Normal Distribution.
 The first two distributions are discrete probability distributions and the third is a continuous
probability distributions.
4.5.1. Binomial/Bernouli/ Distribution
 It is the result of an experiment or process which has only two possible outcomes.

 In order to standardize the statistical vocabulary, it is conventional to label one of the possible
outcomes as „success’ and the other as „failure’. The purpose of the labels is only to distinguish
between the two outcomes: success doesn’t mean good or ‘failure’ doesn’t mean bad.
The Binomial distribution can be used under the following conditions:

 The random experiment is performed repeatedly a finite and fixed number of times.
 The outcome of the random experiment (trial) results in the dichotomous classification of events.
In order words, the outcome of each trial may be classified into two mutually disjoint categories,
called success (occurrence of the event) and failure (non-occurrence of the event).
 All the trials are independent.
 Probability of success in any trial is p and is constant for each trial q = 1 – p, is then termed as the
probability of failure and is constant for each trial.
 If we toss a fair coin „n‟ times which is fixed and finite and the out come of any trial is one of the
mutually exclusive events head (success) and tail (failure). Furthermore, all the trails are independent,
because the result of any throw of a coin does not affect and is not affected by result of other throws.
Moreover, the probability of success (head) in any trial is ½ and the probability of failure (tail) in any
trial is also ½ which are constant for each trial.
Binomial Formula
 Consider an experiment in which P=Probability of success & q=1–P is the probability that the failure.
 We make the following assumptions:
 The number of trials (n) is fixed.
 P probability of success is same for each trial.
 The trails are independent.
 The sum of probability is unity i.e. 1.
 Under these assumptions, probability that event will occur „r „times in „n‟ trials, where n  r is given:
P( X  r ) n Cr .P r (1  p) nr  n Cr .P r .q nr , where r = 0, 1, 2, 3 … n
 This discrete probability distribution is called Binomial distribution. X denotes a random variable on
the number of successes in „n‟ trials, which can take the values 0, 1, 2. . . n; since in „n‟ trials we may
get no success (all failures), one success, two successes, . . . , or all the „n‟ successes. We are
interested in finding the corresponding probabilities of 0, 1, 2 . . . n successes.
Example 1: Toss a coin four times. What is the probability of getting?

a) Exactly 2 heads, b) Exactly 4 heads, c) At most 3 heads, d) No head , e) At least one head
Solution: Define the random variable X as the number of heads obtained. Then:
o P = probability of getting a head in a single toss of a coin = 0.5
o n = the number of trials (of times the experiment is done) = 4
 The probability distribution of X is:

P( X  r ) 4 Cr (0.5) r (1  0.5) 4r  4 Cr (0.5) r .(0.5) 4r , Where r = 0 , 1, 2,3,4

4!
a) Exactly two heads: P (X = 2) = 4C2 (0.5)2(0.5)2 = ( ) (0.0625) = (6) (0.0625) = 0.375.
2!2!
4!
b) Exactly 4 heads: P(X= 4) =4C4 (0.5)4(0.5)0 = (0.5)4(0.5)0 = 0.0625.
4!0!
c) At most 3 heads means X ≤ 3. Hence the required probability is:
P( X≤ 3) = P( X = 0 ) + P(X=1 ) + P( X= 2 ) + P( X =3 ) = 4C0(0.5)0(0.5)4 + 4C1 (0.5)1(0.5)3 +
2 2 3 1
4C2(0.5) (0.5) + 4C3 (0.5) (0.5) = 0.0625 +0.2500 +0.3750 +0.2500 = 0.9375 .
d) No head means X = 0. Hence the required probability is P(X= 0) = 4C0 (0.5)0(0.5)4 = 0.0625
e) At least one head means X ≥ 1, then the required probability is:
P(X≥ 1) = P(x= 1) + P(X= 2) + P(X = 3) + P( X = 4) = 0.2500 + 0.3750 +0.2500 + 0.0625 = 0.9375
Or
P(X≥1) = P(at least one head) = 1– P (No head ) = 1 – P(X = 0 ) = 1–0.0625 = 0.9375
Example 2: A national advertising agency estimates that only 40% of all new products introduced in a
certain country succeed. Out of 8 new product that were recently introduce, what is the probability that:
a) At most 5 succeed and b) At least 7 succeed.
Solution: Define the random variable X as the number of new products that succeed.
o P = the probability that the new product succeed in a single release = 40% = 0.4
o n = number of released = 8
a) P(X ≤5) = 0.9052. This is from the binomial table with n = 8 , P= 0.4 , and r = 5
Thus we can say that it is highly likely that no more than 5 will succeed
b) P(X ≥ 7) = 1–P(X < 7) = 1–P(X ≤ 6) = 1–0.9915= 0.0085. This is from the binomial table with n= 8,
P= 0.4 & r = 6. Thus, it is less likely or there is almost no chance that 7 or more will succeed out of 7.
4.5.2. The Normal Distribution
In this section, we will examine a very important continuous probability distribution known as the normal
probability distribution. The normal probability distribution has the following characteristics:
 The graph of the normal probability distribution has a single peak at the center of the distribution. The
Mean, Median and the Mode which in a normal distribution are equal-are all located at the peak.
Therefore, exactly one-half, or 50% of the areas is to left of the center of the distribution and exactly
one-half of the area is to the right of it.
 A normal probability distribution is symmetrical about its mean. If you were to “fold” the
probability distribution along its central value, the two haves would be identical.
 The normal curve tells of smoothly in a “bell shape” and the two tails of the probability distribution
extend indefinitely in either direction. In theory, the curve never actually touches the X- axis as
indicated below.

 In normal probability distributions, we have the following relationships:

 About 68% of the distribution is within one standard deviation of the mean.
 About 95% of the observations are within two standard deviations of the mean.
 Virtually all (99.73%) of the area is within three standard deviations of the mean.
 For example, if a normal probability distribution has a mean of 20 and standard deviation of 4 then,
about 68% of the values are between 16 & 24, found by 20 1(4), about 95% of the values are
between 12 & 28, found by 20  2(4) & virtually all the values are between 8 & 32, found by 20  3(4)
The Standard Normal Distribution
 There are many normal probability distributions one for each pair of values for a mean & standard
deviation. This makes normal probability distribution very versatile in describing many different real-
world situations & it would be very difficult to provide tables for each such distribution.
 An efficient method for overcoming this difficulty is transforming a variable into a standard normal
variable. This method is called standardizing the distribution.
Z= X-

Where: Z = the standardized value, or Z – value
X = any observation of interest
 = the mean of the normal distribution
 = the standard deviation of the normal distribution
 The value of Z actually follows a normal probability distribution with a mean of zero and standard
deviation of one unit.
 This probability distribution is called Standard Normal Probability Distribution. Thus, we can
convert any normal distribution to the standard normal distribution by using the above formula.
Example: The ages of patient admitted to H hospital are normally distributed with a mean of 60 years and
standard deviation of 12 years. Find the Z-value (standardized value) for a patient (a) aged 78? (b) aged 45
Solution: The Z–values are computed as follows:

(a) X = 78,  = 60,  = 12 Computing Z: Z = X -  = 78 – 60 = 1.5
 12
(b) X = 45, computing Z: Z = X -  = 45 – 60 = 1.25
 12

Age and Z- values for patients Aged 78 and 45 are shown graphically below:
Properties of the Standard Normal Distribution
 The area under the Standard Normal Curve is equal to one: The area to the left of Z = 0 is equal to 0.5
and the area to the right of Z = 0 is also equal to 0.5
 Since the Standard Normal Distribution is symmetric about its mean: that is the area bounded by Z=-a
and Z =0 is equal to the area bounded by Z =0 and Z = a, where a is any real number
Example: Find the area under the standard normal curve bounded by Z = 0 and:
(a) Z = 0.45 (b) Z = 2.83 (c) Z = -0.060 (d) Z = -1.76
 To find area bounded by Z = 0 and Z = 0.45 look up the value opposite 0.4 and under 0.05 in the
standard normal distribution table. From the table, the area is 0.1736.
 If we look up the value opposite 2.8 & under 0.3, we obtain the area bounded by Z=0 and Z= 2.83.
This value is 0.4977.

 The area bounded by Z = -0.60 and Z = 0 is equal to the area bounded by Z = 0 and Z =0.6 because of
symmetric of the normal distribution. Thus, we look up the value opposite 0.6 and under 0.00 that is
0.2257.

UNIT FIVE
ESTIMATION AND HYPOTHESIS TESTING
Introduction
Statistical inference is based on estimation and hypothesis testing. In both estimation and hypothesis
testing, we shall be making inferences about characteristics of populations from information contained in
samples. To calculate the exact proportion or the exact mean would be an impossible goal. Even so, we
will be able to make an estimate, make a statement about the error that will probably accompany this
estimate, and implement some controls to avoid as much of the error as possible.
5.1. Point and Confidence Interval Estimation

Types of Estimates
1) Point estimate:
It is a single number that is used to estimate an unknown population parameter. For instance, a
department head would make a point estimate if she said,” Our current data indicate that this course will
have 350 students in the fall.”
 It is often insufficient, because it is either right or wrong. If you are told only that her point estimate of
enrollment is wrong, you do not know how wrong it is, and you cannot be certain of the estimate‟s
reliability.
 If you learn that it is off by only 10 students, you would accept 350 students as a good estimate of the
future enrollment.
 Therefore, a point estimate is much more useful if it is accompanied by an estimate of the error that
might be involved.
2) Interval Estimate:
 It is a range of values to estimate population parameters. It indicates the error in two ways: by the
extent of its range and by the probability of the true population parameter lying within the range.
 In this case, the department head would say something like; “I estimate that the true enrollment in this
course in the fall will be between 330 and 380 and that it is very likely that the exact enrollment will
fall within this interval.” She has a better idea of the reliability of her estimate.
 If the course is taught in sections of about 100 students each, and if she had tentatively scheduled five
sections, then on the basis of her estimate, she can now cancel one of those sections and offer an
elective instead.
Estimator & Estimates

 Any sample statistic that is used to estimate a population parameter is called an estimator.
 The sample mean can be an estimator of the population mean , and the sample proportion can
be used as an estimator of the population proportion.
 When we have observed a specific numerical value of an estimator, we can say that value is an
estimate. In other words, an estimate a specific observed value of a statistic. We form an estimated
by taking a sample and computing the value taken by our estimator in that sample.

 Suppose that we calculate the mean ode meter reading (mileage) from a sample of used taxis and find
it to be 98,000 miles. If we use this specific value to estimate the mileage for we use this specific value
to estimate the mileage for a whole fleet of used taxis, the value 98,000 miles would be an estimate.
Criteria of a Good Estimator
1) Unbiased ness:
 It refers to the fact that a sample mean is an unbiased estimator of a population mean because the
mean of the sampling distribution of sample means taken from the sample population is equal to
the population mean itself.
2) Efficiency:
 It refers to the size of the standard error of the statistic. If we compare two statistics from a sample
of the same size and try to decide which one is the more efficient estimator, we would pick the statistic
that has the smaller standard error, or standard deviation of sampling distribution.
 If we calculate the standard error of the sample mean and find it to be 1.05 and then calculate the
standard error of the sample median and find it to be 1.6 , we would say that the sample mean is a
more efficient estimator of the population mean because its standard error is smaller.
 It makes sense that an estimator with a smaller standard error will have more chance of producing an
estimate nearer to the population parameter under consideration
3) Consistency:
 A statistic is a consistence estimator of population parameter if as sample size increases it becomes
almost certain that value of the statistic comes very close to the value of population parameter.
 If an estimator is consistent, it becomes more reliable with large samples. Thus, if you are wondering
whether to increase the sample size to get more information about a population parameter, find about
first whether your statistic is a consistent estimator.
4) Sufficiency:
 If an estimator makes so much use of the information in the sample that no other estimator could
extract from the sample additional information about the population parameter being estimated.
A) Point Estimates
 When a parameter is being estimated, the estimate can be either a single number in which the
estimate is called a "point estimate or it can be a range of scores in which the estimate is called an
interval estimate. Confidence intervals are used for interval estimates.
 Point estimates are used as parts of other statistical calculations. For example, a point estimate of the
standard deviation is used in the calculation of a confidence interval for μ. Point estimates of
parameters are often used in the formulas for significance testing.
1) The sample mean is the best estimator of the population mean µ. It is unbiased, consistent, the most
efficient estimator, and, as long as the sample is sufficiently large, its sampling distribution can be
approximate by the normal distribution.
Let us look at a medical supplies company that produces disposable hypodermic syringes. Each syringe is
wrapped in a sterile package and then jumble packed in a large corrugated carton. Jumble packing causes
the cartons to contain differing numbers of syringes. Because the syringes are sold on a per unit basis, the
company needs an estimate of the number of syringes per carton for billing purposes. When we have taken
a sample of 35 cartons at random and recorded the number of syringes in each carton:
 = = = 102 syringes

Thus, using the sample means as our estimator, the point estimate of population mean µ is 102 syringes
per carton. The manufacturing price of a disposable hypodermic syringe is quite small (about 25 ), so both
the buyer and the seller would accept the use of this point estimate as the basis for billing, and the
manufacturer can save the time and expenses of counting each syringe that goes in to a carton.
Table -1 101 103 112 98 97 93
Result of a sample of 35 105 100 100 93 94 97
cartons of hypodermic 97 100 97 110 103 99
syringes per carton 93 98 106 112 105 100
114 97 110 98 112 99
Value of X Sample
Table 2 (needles per mean
carton )
Calculation of sample 101 10,201 102 -1 1
variance & standard 105 11,025 102 3 9
deviation for syringes 97 9409 “ -5 25
per carton 93 8649 “ -9 81
114 12996 “ 12 144
103 10609 “ 1 1
100 10000 “ -2 4
100 10000 “ -2 4
98 9604 “ -4 16
97 9409 “ -5 25
112 12544 “ 10 100
110 12100 “ 8 64
97 9409 “ -5 25
106 11236 “ 4 16
110 12100 “ 8 64
98 9604 “ -4 16
93 8649 “ -9 81
110 12100 „ 8 64
112 12544 “ 10 100
98 9604 “ -4 16
97 9409 “ -5 25
94 8836 “ -8 64
103 10609 “ 1 1
105 11025 “ 3 9
112 12544 “ 10 100
93 8649 “ -9 81
97 9409 “ -5 25
99 9801 “ 7 49
100 10000 “ 8 64
99 9801 “ 7 49
3,570 365,368  1,228
= 36.12

- = –- = 36.12
s= = = 6.01 syringes Sample Standard Deviation

=
2) Point Estimates of the Population Variance and Standard Deviation
 Suppose the management of medical supplies company wants to estimate the variance and /or
standard deviation of the distribution of the number of packaged syringes per carton.
 The most frequently used estimator of the population standard deviation α is the sample
standard deviation s as in table -2 and if, instead of considering.
=
 As our sample variance, we had considered: =

 The result would have some bias as an estimator of the population variance; specifically, it would tend
to be too low. Using a divisor of n-1 gives us an unbiased estimator of . Thus, we will use
3) Point Estimate of the Population Proportion

 The proportion of units that have a particular characteristic in a given population is symbolized P. If
we know the proportion of units in a sample that have same characteristic (symbolized ), we can use
this as an estimator of P with all the desirable properties of a good estimator.
Counting our example of the manufacturing of medical supplies, we shall try to estimate the population
proportion from the sample proportion. Suppose management wishes to estimate the number of cartons
that will arrive damaged, owning to poor handing in shipment after the carton leave the factory. We can
check a sample of 50 cartons from their shipping point to the arrival at their destination and then record
the presence or absence of damage. If, in this case we find that the proportion of damaged cartons in the
sample is 0.08, we would say that
 0.08 sample proportion damaged
Because the sample proportion is convenient estimator of the population proportion P, we can
estimate that the proportion of damage cartons in the population will also be 0.08.
B) Interval Estimates
 An interval estimate described a range of values with in which a population parameter is likely to lie.
 Suppose that the marketing research director needs an estimate of average life in month of car batteries
his company manufactures, we select a random sample of 200 batteries, record car owner‟s names and
address as listed in store records, and interview them about the battery life they have experienced.
 Our sample of 200 users has a mean battery life of 36 months. If we use the point estimate of the
sample mean as the best estimator of the population mean , we would report that the mean life
of the company’s batteries is 36 months.
 But the director also asks for a statement about the uncertainty that will be likely to accompany this
estimate, that is, a statement about the range with in which the unknown population mean is likely to
lie. To provide such a statement we need to find the standard error of the mean.
 Standard error
= Standard Deviation of the population

 Suppose we have already estimated the standard deviation of the population of the batteries and
reported that it is 10 months. Using this standard deviation and the first equation from previous
chapters, we can calculate the standard error of the mean.
= = = 0.707 month one standard error of the mean
 We could now report the director that our estimated of the life of the batteries is 36 months, and the
standard error that accompanies this estimate is 0.707. In other words, the actual mean life for all
the batteries may lie some where in the interval estimate of 35.293 to 36.707 months.
 Next, we need to calculate the chance that the actual life will lie in this interval or in other intervals
of different width that we might choose , 26 (2X0.707), 3(0.707), and so on
 Note: we have not used the finite population multiplier to calculate the standard error of the mean
because of the population of batteries is large enough to be considered infinite
Probability of the true population parameter falling within the interval estimate:
To begin to solve this problem, we should review relevant parts of normal probability distribution.
Fortunately, we can apply these properties to standard error of the mean and make the following statement
about the range of values used to make an interval estimate for our battery problem.
The probability is 0.955 that the mean of a sample size of 200 will be within 2 standard errors from
and hence within 2 standard errors of 95.5 percent of the entire sample means. Theoretically, if we
select 1000 samples at random from a given population and then constructed an interval of 2 standard
errors around the mean of each of these samples, about 9.95 of these intervals will include the population
mean similarly the probability is 0.683 that the mean of the sample will be within 1 standard error of the
population mean, and so forth. This theoretical concept is basic to our study of interval construction and
statistical inference.
 Now we can report to the director as our best estimate of the life of the company‟s battery is 36
months, and we are 68.3 percent confident that the life lies in the interval from 35.293 to 36.707
months ( 36 1 ).
 Similarly, we are 95.5 percent confident that the life falls within the interval of 34.586 to 37.414
months (36 2 ) and we are 99.7 percent confident that battery life falls within the interval of
33.879 to 38.121 months ( 36 3. )
Example: For a population with a known variance of 185 a sample mean of 64 individuals leads to 217
as an estimate of the mean,
A) Find the standard error of the mean
B) Establish an interval estimate that should include the population mean 68.3 percent of the time.
 = 185 = = 13.60 and N = 64 = 217

a) = = = = 1.7 b) = 217 1.70 = ( 215.3 , 218.7)
5.2. Interval Estimates and Confidence Intervals

In statistics, the probability that we associate with an interval estimate is called the confidence level.
This probability indicates how confident we are that interval estimate will include population parameter.
 A higher probability means more confidence. In estimation, the most commonly used confidence
levels are 90%, 95% and 99 %, but we are free to apply any confidence level.

 The confidence interval is the range of the estimate we are making. If we report that 90 % confident
that the mean of the population of incomes of people is a certain community birr 800 to 24,000, then
the range 800-24,000 is our confidence interval
 Often, however, we will express the confidence interval in standard errors rather than in numerical
values. This we will often express confidence intervals like this 1.64 , where
 1.64 = upper limit of the confidence interval.
 1.64 = lower limit of the confluence interval.
 Thus, confidence limits are the upper and lower limits of the confidence interval. In this case,
 1.64 is the upper confidence limit and
 1.64 is the lower confidence limit.
Relation between Confidence Level and Confidence Interval
You may think that we should use a high confidence level, such as 99%, in all estimation problems. After
all, a high confidence level seems to signify a high degree of accuracy in the estimate in practice, however
high confidence level will produce large confidence intervals, and such large intervals are not precise;
they give very fuzzy estimates.
 Consider an appliance store customer who inquires about delivery of a new washing machine below in
the table are several of question the customer might ask & the likely responses. This table indicates the
direct r/ship that exists b/n confidence level & confidence interval for any estimate, as the customer
sets a tighter & tighter confidence interval, store manager agrees to a lower & lower confidence level.
 Notice: when confidence intervals are too wide, as is the case with a 1 year delivery, the estimate
may have very little real value, even though store manager attaches 99% confidence level to it.
 Similarly if the confidence intervals is too narrow (“will may washing machine get home before I
do? “) the estimate is associated with such a low confidence level 1% that we question its value.
Customer’s Question Store manager’s response Implied Implied confidence

confidence level interval
Will I get my washing I am absolutely certain of that Better than 99% 1year
machine with in 1 year
Will you deliver the I am almost positive it will be At least 95% 1 month
washing machine with in 1 delivered this month
month?
Will you deliver three I am pretty certain it will go out About 80% 1 week
machines within a week? within this week
Will I get my washing I am not certain we can get it to About 40% 1 day
machine tomorrow? you then
Will my washing machine There is little chance Near 1% 1 hour
yet home before I do?
5.3. What is a hypothesis?

A hypothesis (plural "hypotheses") is a statement which may or may not be true. It is a statement made
about the result of an experiment which we then test.
 For instance, we may want to test how effective a new drug is at curing illness, so we produce a
hypothesis ("The new drug reduces the level of illness by at least 20%") and then carry out an
experiment to see if it is true.

In formal hypothesis testing we actually produce two hypotheses, called H0 (known as the "null
hypothesis") and H1 (known as the "alternative hypothesis"). In fact, these two are always given as
opposites of each other. Using the drug example above, the two hypotheses might be stated as:
 H0: "The new drug doesn't reduce illness by at least 20%."
 H1: "The new drug reduces illness by at least 20%."
 Since these exactly contradict each other, one of them must be true, whatever the result of the
experiment. After we have carried out the experiment, we will either accept H1 (or reject H0) or accept
H0 (and reject H1).
In general, we usually arrange the hypotheses so that H0 states that the accepted status quo is correct, and
H1 states that the situation is really different from what we would expect.
 For example, if we are told that a machine usually packs 500 nails on average into each box, and we
wanted to test whether this were true, we would probably write the hypotheses as follows:
 H0: "The mean number of nails per box is 500."
 H1: "The mean number of nails per box is not 500."
A word of warning
 The two hypotheses must say the exact opposite of each other to cover all possibilities. Suppose
we had the following:
 H0: The people in the sample are significantly shorter than the population in general.
 H1: The people in the sample are significantly taller than the population in general.
In this case we have left a loop hole. What happens if the experimental results indicate that the people in
the sample are generally the same height as the population (i.e. not significantly taller or shorter)? In this
case we would rewrite the hypotheses as:
 H0: The people in the sample are not significantly taller than the population in general.
 H1: The people in the sample are significantly taller than the population in general.
In this way we know that one of the two hypotheses must be true.
 Here are some questions for you. In each case, state whether the pair of hypotheses given are suitable -
i.e. whether they cover all possibilities or not. Then click on the button to mark your answers.
Suitable Unsuitable
 H0: The machine packs more than 500 nails on average into each box.
 H1: The machine packs 500 nails or fewer on average into each box.
 H0: The machine packs 500 nails or fewer on average into each box.
 H1: The machine packs 501 nails or more on average into each box.
 H0: The machine packs fewer than 500 nails on average into each box.
 H1: The machine packs more than 500 nails on average into each box.
B) Significance Level
When we test the hypotheses, we can never be 100% certain of our conclusions. We can only be confident
to a certain level - hopefully a high one. Typically we construct our test so that we will be 95% certain that
the conclusion we draw is a correct one. This is called a 95% confidence level, or a 5% significance level.
 Other figures which are quite common are the 99% confidence level (1% significance level) or 90%
confidence level (10% significance level). In each case, the percentage indicates how confident we are
that our conclusion is correct.
 The higher the confidence level (99% is higher than 95%), the more certain we are, but the less
likely it is that our test data will pass the test!

 It may turn out later that we have made a mistake (as the years roll past, and more data comes pouring
in). Such is the nature of statistics!
 If it turns out that we wrongly accepted H1, when we should have accepted H0, then we call this a
Type I error. On the other hand, if it turns out that we wrongly accepted H0, when in fact H1 was the
correct statement, then we have made a Type II error.
C) Sampling
The art of sampling means taking a small number of a population of items and testing them, and then
drawing a conclusion about the population as a whole.
 For instance, if you wanted to estimate how many hours of television people in Ethiopia watched on
average, you couldn't possibly ask them all, so you would ask a sample of people & then draw a
conclusion based on what they said. Clearly the larger the sample, the more representative the results.
 Let's illustrate this with an example 20 people by spying on them through their letter boxes.
4 7 1 0 1 2 5 2 4 3
1 6 4 1 6 2 3 2 3 0
 The mean value of all those figures is found, as you would expect, by adding them all together and
dividing by 20. It comes to 2.85.
 Would the accuracy be improved if we choose 5 items out of the 20 for each sample rather than 3
items? Yes, as 5 items is a larger percentage of the population (25%) than 3 items (15%).
 It is sure that if we chose a higher sample size (6 numbers, 7 numbers, 8 numbers per sample) then the
means would get closer to the true mean and the standard deviation would go down.
D) Standard Error
 When carrying out hypothesis testing on samples we use a measure called the Standard Error. This is
based on the standard deviation of a population, but takes into account the size of the sample
which we draw from the population and on which we base any conclusions.
 To get the standard error (S.E.) we divide the standard deviation by the square root of the number of
items in the sample, n:
s
Standard Error (S.E.) =
n
E) Critical Region
 The sort of hypotheses that we are going to test will involve comparing the mean of a sample of items
against a true mean for a population. This true mean applies to a whole population (too many to
count), although it may be only a claim (i.e. someone may tell us what the mean of the population is,
and we may want to test it).
 Either way, the symbol that we use for the true mean is m (the Greek letter "mu", equivalent to our
letter "m" - "m" for mean) and the mean of the sample of items will be called . We define a
critical region around the true mean, and then we see if the sample mean lies within that region.
Firstly, decide on a significance level. We normally choose a 5% significance level (a 95% confidence
level), which means that we will be 95% certain of drawing the correct conclusion, although there will be
a 5% chance that we will have made the wrong decision (even if we do the mathematics correctly).
 Look at the hypotheses carefully. Do they imply that something will be different to the mean value,
or do they imply that it will be higher or lower?
 If the crucial word is "different" (or a word that means the same thing) then we call the test a "two
tail test", i.e. any item which is substantially different from the mean in either direction count as
"different".
 However, if the hypotheses use words like "taller", "longer", "better" (or "shorter", "worse", "less
efficient" for that matter) then it is a "one tailed test". For instance, if we want to know whether the
machine packs significantly more than 500 nails into each box or not, then a box containing 497
certainly wouldn't provide any evidence to support the hypothesis!
 If the test is a two-tailed test, then the critical region has an upper limit and a lower limit, with the
true mean exactly in the middle. The distance from the mean to each limit is the standard error (not
the standard deviation in this case) multiplied by a certain number which will depend on what
significance level we are using.
 In the case of a 5% significance level (95% confidence level), the critical number is 1.96. This is the
same as for the 95% confidence interval that is part of the theory of the Normal Distribution, although
it is wrong to think of the critical region as a 95% confidence interval.
 How could it contain 95% of the items in the population when it is based on the standard error, which
in turn depends on the size of our sample? If we altered the number of items in the sample, then the
size of the critical region would also change!
 For instance, if the mean were 100 and the standard error was 8, then we would multiply 8 by
1.96 (to give 15.68). The lower limit of the critical region would then be 100 - 15.68 = 84.32,
and the upper limit would be 100 + 15.68 = 115.68.
 It's a different matter if the test is a one tailed test. In this case, the critical region only has one limit:
 If the test is a right-tailed test (we are testing whether the sample mean is significantly higher,
better, heavier etc.) then there is no lower limit, and the upper limit is the true mean plus the
standard error multiplied by a special number (1.64 for 5% significance level).
 If the test is a left-tailed test (we are testing whether the sample mean is significantly lower,
worse, lighter etc.) then there is no upper limit, and the lower limit is the mean minus the
standard error multiplied by the same special number.
 The diagrams below show the critical regions for a one-tailed test (both right-tailed and left-tailed
versions) for a 95% confidence level.
Right-Tailed Test Left-Tailed Test
Critical region = up to m + 1.64 S.E Critical region = m - 1.64 S.E. upwards
 Therefore, the critical region marks the range of values in which we can be fairly certain that the true
mean, from which our sample was taken, lies.
 For instance, if we have calculated that the critical region at a 95% confidence level is between 10
and 20, then we can be 95% confident that the true mean lies within that region.
 Similarly, if the critical region is one-tailed at the 1% significance level, with a lower limit at 25
and no upper limit, then we can be 99% confident that the true mean is greater than 25.
Hypothesis testing itself

Steps in Hypothesis Testing
1. The first step is to specify the null hypothesis (H0) and the alternative hypothesis (H1).
 If the research concerns whether one method of presenting pictorial stimuli leads to better
recognition than another, the null hypothesis would most likely be that there is no difference
between methods (H0: μ1 - μ2 = 0). The alternative hypothesis would be H1: μ1 ≠ μ2.

 If the research concerned the correlation between grades and SAT scores, the null hypothesis would
most likely be that there is no correlation (H0: ρ= 0). The alternative hypothesis would be H1: ρ ≠ 0.
2. The next step is to select a significance level. Typically the 0.05 or the 0.01 level is used.
3. The third step is to calculate a statistic analogous to the parameter specified by the null hypothesis.
 If the null hypothesis were defined by the parameter μ1- μ2, then the statistic M1 - M2 would be
computed.
4. The fourth step is to calculate the probability value (often called the p value).
 The p value is the probability of obtaining a statistic as different or more different from the
parameter specified in the null hypothesis as the statistic computed from the data. The calculations
are made assuming that the null hypothesis is true.
5. The probability value computed in Step 4 is compared with the significance level chosen in Step 2.
 If the probability is less than or equal to the significance level, then the null hypothesis is rejected;
 If the probability is greater than the significance level then the null hypothesis is not rejected.
 When the null hypothesis is rejected, the outcome is said to be "statistically significant"
 When the null hypothesis is not rejected then the outcome is said be "not statistically significant."
6. If the outcome is statistically significant, then the null hypothesis is rejected in favor of the alternative
hypothesis.
 If the rejected null hypothesis were that μ1- μ2 = 0, then alternative hypothesis would be that μ1≠ μ2.
 If M1 were greater than M2 then the researcher would naturally conclude that μ1 ≥ μ2.
7. The final step is to describe the result and the statistical conclusion in an understandable way.
 Be sure to present the descriptive statistics as well as whether the effect was significant or not.
 For example, a significant difference between a group that received a drug and a control group
might be described as follow:
 Subjects in the drug group scored significantly higher (M = 23) than did subjects in the control
group (M = 17), t(18) = 2.4, p = 0.027.
 The statement that "t(18) = 2.4" has to do with how the probability value (p) was calculated. A
small minority of researchers might object to two aspects of this wording.
 First, some believe that the significance level rather than the probability level should be
reported. The argument for reporting the probability value is presented in another section.
 Second, since the alternative hypothesis was stated as µ1 ≠ µ2, some might argue that it can only
be concluded that the population means differ and not that the population mean for the drug
group is higher than the population mean for the control group.
 This argument is misguided. Intuitively, there are strong reasons for inferring that the direction of
the difference in the population is the same as the difference in the sample. There is also a more
formal argument. A non-significant effect might be described as follows:
 Although subjects in the drug group scored higher (M=23) than did subjects in the control group,
(M = 20), the difference between means was not significant, t(18) = 1.4, p = 0.179.
 It would not have been correct to say that there was no difference between the performance of the
two groups. There was a difference. It is just that the difference was not large enough to rule out
chance as an explanation of the difference. It would also have been incorrect to imply that there is no
difference in the population. Be sure not to accept the null hypothesis.

Illustration 1:
A company sells cotton reels and claims that the average (mean) length of the cotton on each reel is
250m with a standard deviation of 14m. To test this claim, suppose you buy a sample of 30 cotton reels
and measure the length of the cotton on them. The mean length of the cotton in your sample is 243m per
reel. Can you accept the company's claim?
Solution
1. What are the hypotheses?
The company claims that the mean length of the cotton is 250m. We think it may be something different
from that. The two hypotheses should therefore be:
 H0: "The mean length of the cotton is 250m per reel."
 H1: "The mean length of the cotton is something other than 250m per reel."
2. Decide what type of test it is, and what significance level is required
 Clearly we want to see whether mean length is 250m or different from 250m, so it is a two-tailed test.
 Conventional and choose a 5% significance level (95% confidence level).
3. Calculate the standard error.
 This is fairly straight-forward, simply divide the standard deviation by the square root of the number
of items in the sample, i.e.
 Standard error = 14 / Ö30 = 14 / 5.28 = 2.56
4. Calculate the critical region.
 Since the test is a two-tailed test, we will need a symmetrical critical region:
 Lower limit = 243 - 1.96 x 2.56 = 237.98 meters
 Upper limit = 243 + 1.96 x 2.56 = 248.02 meters
5. Compare the company's claim for the true mean (m) with the critical region.
 If it is inside the critical region, then accept H0 and reject H1.
 If it is outside the critical region, accept H1 and reject H0.
In this case, the claimed true mean is 250m, which is outside the critical region (just!) This means that we
can be 95% certain that H0 is wrong, and that H1 is correct. We accept H1 at the 5% significance level.
This means that we can conclude that the company's claim (250 meters on average on every reel) is
probably wrong, although there probably isn't enough evidence to go to court!
Illustration 2:
A company has a machine that manufactures light bulbs with a mean lifetime of 5000 hours and a
standard deviation of 160 hours. The company is considering buying a new machine which promises to
make light bulbs which last significantly longer than those produced by the old machine. A sample of
200 bulbs from the new machine are tested and found to have a mean life time of 5020 hours. Does the
new machine produce longer-lasting bulbs?
 Here we are not being asked to test whether the true mean has a certain value - we are told the value of
the true mean (5000 hours) and just have to accept that. Instead, we are being asked whether the
sample is compatible with that true mean, or whether it is substantially larger.
 However, the same method can be applied. In both this and the previous question, we are asked
whether the sample is compatible with the true mean. In the previous question, it was the "true" mean

that was slightly suspect. In this question, it is the sample itself which is suspect. Is it better than the
true mean, or is it just a fluke?
1. What are the hypotheses?
 H0: The mean lifetime of the bulbs from the new machine is not greater than 5000 hours.
 H1: The mean lifetime of the bulbs from the new machine is greater than 5000 hours.
2. What sort of test is it?
 Here we want to know whether the lifetime is greater than 5000 hours, so it is a one-tailed test,
more specifically a right-tailed test.
 What significance level do we want? Normally we would pick 95% confidence level, but let's be
daring for once! Let's choose a 99% confidence level, i.e. 1% significance level.
 For this test, a one-tailed test requires the special number 2.33 at 1% significance level.
3. Calculate the standard error.
s 160 160
Standard error = = = = 11.31
Ön Ö200 14.14
4. Calculate the critical region.
We are basing our results on a sample of light bulbs that come from some unknown population, whose
true mean may be the same as for the current process or may be bigger (certainly we have no reason to
believe that it is smaller based on our sample!)
 In this case, we calculate a critical region for the mean from our sample. There is no upper limit.
 The lower limit = - 2.33 S.E. = 5020 - 2.33 x 11.31 = 4993.65
What does this critical region mean exactly? Well, we can be 99% certain that the mean of the lifetimes of
the bulbs produced by the new machine will lie within this region. There is only a 1% chance that the
mean will be less than 4993.65.
 If the established mean lies within this region, then it is compatible with the mean of the bulbs from
the new machine - i.e. the mean lifetimes produced by both machines could well be the same (no
significant difference)
5. Compare the established value of the true mean with the critical region.
 5000 is not smaller than 4993.65, so the established mean is within the critical region. This means that
we can accept H0 and reject H1.
 The mean lifetime of the bulbs in the sample is not significantly higher than 5000 at the 1%
significance level, and so the machine does not produce significantly longer-lasting bulbs.
 Our advice to the company would be to stick with the machine that they've already got!
Postscript on Confidence Intervals

Students often get confused between critical regions for 95% confidence levels and 95% confidence
intervals, so it would be a good idea if we summarized them. Here is a brief summary of the differences:
 If you have a true population mean m, and a true standard deviation s, then the 95% confidence
interval is calculated as follows: m ± 1.96s. This region will contain 95% of all the items in the
population.

 If, on the other hand, you only have a sample of the items in the population, with a sample mean of
& standard deviation of s, then the 95% critical region is calculated as: ± 1.96 S.E, where
the standard error, S.E., is s / Ön.
 In this case, we can be 95% certain that the true mean of the population from which this sample
was taken lies within this region.
 The same is true for other degrees of certainty (e.g. 99% or 90%) except that the critical number is not
1.96 (it is 2.58 for 99%, 1.64 for 90% etc.) Note that these numbers change again when you are
considering a one-tailed test instead of a two-tailed test.
5.4. Type I and Type II errors

1) Type I error
 A Type I error is the error made by rejecting the null hypothesis when it is in fact true.
2) Type II error (First definition)

 A Type II error is the error made by failing to reject the null hypothesis when it is in fact false.
 A Type II error is the error made by failing to reject the null hypothesis when the alternative
hypothesis is true.

UNIT SIX
SIMPLE CORRELATION AND REGRESSION
6.1. Simple Correlation
 Suppose we have two variables X=(X1, X2, X3---Xn) and Y= (Y1, Y2, Y3 ….Yn). When higher values
of X are associated with higher values of Y and lower values of X are associated with lower values of
Y, then the correlation is said to be positive or direct.
Example:
 Income and expenditure  Height and weight
 Number of hours spent in studying and score obtained
 Distance covered and fuel consumed by car
 When higher values of X are associated with lower values of Y and lower values of X are associated
with higher values of Y, then the correlation is said to be negative or inverse.
Example:
 Demand and supply  Income and proportion income spent on food
 The correlation between X and Y may be one of the following:
 Perfect positive (slope =1)  Negative (slope between -1 and 0)
 Positive ( slope b/n 0 & 1)  Perfect negative (slope = -1)
 No correlation ( slope = 0)
 The presence of correlation between two variables may be due to three reasons:
1) One variable being the cause of the other. The cause is called subject or independent variable,
while the effect is called dependent variable.
2) Both variables being the result of a common cause. That is the correlation that exists between
two variables is due to their being related to some third force.
Example: Let X1= be ESLCE result
Y1= be the rate of surviving in the university
Y2= be the rate of getting a scholar ship
 Both X1& Y1 and X1 &Y2 have high positive correlation, likewise Y1 & Y2 have positive correlation
but they are not directly related, but they are related to each other via X1.
3) Chance:
 The correlation that arises by chance is called spurious correlation.
Example: - Price of teff in Addis Ababa and grade of students in USA
- Weight of individuals in Addis Ababa and income of individuals in Harar.
 Therefore, while interpreting correlation coefficient, it is necessary to see if there is any likelihood of
any relationship existing between variables under study.
 The correlation coefficient between X and Y denoted by ‘r’ is given by:

  n XY  ( X )( Y ) 
r=  ( X i  x )(Yi  Y ) = r=  XY  n X Y
n X 2

 ( X ) 2 n Y 2  ( Y ) 2   
2 

2
 X  n X   Y  n Y 
  2 2
(X i  x )2 (Yi  y ) 2

  

Remark:
Always r lies between -1 and 1 inclusive and it is also symmetric. Interpretation of ‘r ‘
 Perfect positive linear r/nship (r=1)  Some negative r/nship if r is b/n -1 & 0
 Some positive linear r/nship if r is b/n 0 & 1  Perfect negative relationship if r = -1
 No linear relationship if r = 0
Example:
 Calculate the simple correlation b/n mid semester & final exam scores of 10 students (both out of 50)
Student 1 2 3 4 5 6 7 8 9 10
Mid exam(X) 31 23 41 32 29 33 28 31 31 33
Final exam(Y) 31 29 34 35 25 35 33 42 31 34
   2 
Solution: n = 10, X = 31.2, Y  32.9, X  973.4 , Y 2 =1082.4
 XY  10331,  X 2
 9920 , Y 2
1


r=
 XY  n X Y =
10331  10(31.2)(32.9
= 0.363
 
2 

2
)9920  10(973.4)(11003  10(1082.4)
 X  n X   Y  n Y 
2 2
  
 This means mid semester exam and final exam scores have slightly positive correlation.
Exercise 1: A researcher who is concerned about the consumption rate of individuals took a sample of 10
individuals & observed their consumption & income (both in tens of Birr) for one month as shown below.
Individual Income (x) Consumption (y)
1 15 15
2 35 30
3 42 30
4 60 50
5 72 48
6 128 100
7 98 93
8 35 33
9 15 14
10 50 50
(a) Compute the coefficient of correlation and interpret.
(b) Find the least squares line of consumption on income.
(c) Estimate the consumption of an individual whose income is 200 Birr
 The above formula and procedure is only applicable on quantitative data, but when we have qualitative
data like efficiency, honesty, intelligence, etc. we calculate what is called spearman’s rank
correlation coefficient as follows:
 Steps i) Rank the different items in X and Y
 Steps ii) Find the difference of the ranks in a pair, denote them by D i

 Steps iii) Use the following formula
6  Di
2
rs= 1 -
n(n 2  1)
Where r s = coefficient of rank correlation
D = the difference between paired ranks
n = the number of pairs
Example: Aster and Chaltu were asked to rank 7 different types of lipsticks, see if there is correlation
between the tests of the ladies.
Lipsticks A B C D E F G
Aster 2 1 4 3 5 7 6
Chaltu 1 3 2 4 5 6 7
Solution
RX 2 1 4 3 5 7 6 total
RY 1 3 2 4 5 6 7
D=RX-RY 1 -2 2 -1 0 1 -1
D2 1 4 4 1 0 1 1 12
6  Di
2
6(12)
 rs= 1 - =1- = 0.786----------------------yes, there is positive correlation.
n(n  1)
2
7(48)
6.2. Simple linear Regression

Imagine that you are putting your house up for sale, and you must put a selling price on it. To arrive at a
fair asking price, you must consider: What you paid for the house, what has been happening to the
price of houses since you bought yours, the actual selling price of other houses in your
neighborhood, the sizes of the houses in your neighborhood, the size of your house; and many, other
variables. Such kinds of information prove useful in predicting how you expect the buyers are likely to
react to your house.
This illustration involves a complex relationship between one unknown variable (the selling price of a
house) and a collection of other variables. Here, what we need is finding a suitable mathematical
relationship between the one unknown variable, called the dependent variable, and the group, of other
known quantities, called independent variables. One methodology for handling this type of problem is
called Regression Analysis.
Definition: Linear Regression

A quantitative expression of the basic nature of the relationship between a dependent variable and the
values of one or more independent variables on which it depends is called regression.
In regression, we can have only one dependent variable. But, we can have more than one independent
variable. The situation where we have only one independent variable is called simple regression. If

the mathematical method/ model relating the dependent variable to the independent variable is linear, the
such relationship is referred to us simple linear regression.
In regression analysis, we shall develop an estimating equation –i.e., a mathematical formula that
relates dependent variable to the independent variable. Then, after we have learned the pattern of this
relationship, we can apply correlation analysis to determine the strength of that relationship, i.e. how
well the estimating equation actually describes the relationship?
The Least squares Method

 How do we determine an equation relating two variables? The first step is collection of data for the
variables under consideration. Suppose X & Y denote the height & weight of individuals,
respectively, & that our interest is to formulate a mathematical model relating these two variables.
 Take a sample of, say, n individuals and measure the heights X 1 , X2, X3, . . . Xn, and weights, Y1, Y2,
Y3. . . Yn. The next step is plotting the points (X1, Y1), (X2, Y2), . . . (Xn, Yn), on the XY-plane.
 The resulting set of points on the above XY – plane is called scatter diagram. A scatter diagram can
give us two types of information. Visually, we can look for patterns which indicate that the variables
are related. Then, if the variables are related, we can see what kind of line, or estimating equation,
describes this relationship.
 The relationship between two variables can be direct or inverse. If the dependent variable increases
as the independent variable increases, then the relationship is said to be direct/positive.
 For instance, we expect the sales of a company to increase as the advertising budget increases.
Hence, the relationship b/n these two variables (sales & advertising expense) is expected to be direct.
Relationships can also be inverse/negative. In such cases the dependent variable decreases as the
independent variable increases.
 If the relationship b/n two variables X & Y linear, we express this as:
Y =  +X
Here, Y represents the individual values of actual observed points. But as can be seen from Figure 1, all
points do not lie on the fitted line. i.e. so, we should begin to use Y to symbolize the individual values of
the estimated points; i.e. those points that line on the estimating line.
 Accordingly, we shall write the equation for the estimating line as:

Yˆ  a  bX
 The estimating line will have a good fit if it minimizes the error between the estimated point on the
line and the actual observed points that were used to draw it.
 This error is given by:

E  Y  Yˆ  Y  a  bX 
 The sum of squares of the errors (SSE) is:

SSE =  (Yi-a-bXi)2
 The best fitting line is the line for which the SSE is the minimum. By applying differential calculus
to the SSE, the slope of the best fitting line becomes:
b = nXiYi – (Xi) (Yi) ………………………..... (1)
nXi2 – (Xi)2
 And the Y- intercept becomes:
a = Y – b X . . . . …………………………….(2)
Where: X = the mean value of the independent variable

Y = the mean value of the dependent variable
 The line Yˆ = a + bX, where b and a are computed using relations of equation (1) and (2) is called the
least squares line of Y on X.
Example: The following table shows the number of items produced (X) & the cost incurred in producing
them (Y) (in Birr).
Number of items produced (X) 4 5 6 8 9
Cost (Y) 15 18 18 20 20
(a) Find the equation of the least squares line treating cost as the dependent variable.
(b) Identify the slope and the Y-intercept and interpret them
(c) Estimate the cost of producing 7 items
Solution: The calculation involved are displayed in the following table
X Y XY X2
4 15 60 16
5 18 90 25
6 18 108 36
8 20 160 64
9 22 198 81
X = 32 Y = 93 XY = 616 X 2= 222
a) Since we have 5 pairs of observations, n = 5. The slope b is computed as:
104
 b = nXY – (X) (Y) = 5(616) – (32) (93) = = 1.21
86
nX2 –(x)2 5(222) – (32)2
 To compute the Y–intercept a, first we need to find the average values of X and Y.
X =  = 32 = 6.4 and Y = Y = 93 = 18.6
n 5 n 5
Thus, the Y-intercept is computed as: a = Y  bX = 18.6 - 1.21(6.4) = 10.86
 Therefore, the equation of the least squares line is:
Yˆ = a + bx  Yˆ = 10.86 + 1.21x
(b) The Y-intercept is a = 10.86. It can be obtained by substituting X = 0 in the equation. This value
tells us that, even if no item is produced, there will be a fixed cost of 10.86 Birr (such as insurance
cost, maintenance cost etc). The slope is b = 1.21. This figure indicates that for a unit change in the
number of items produced, the cost changes by 1.21 Birr. It is the marginal cost.
(c) The cost of producing 7 items is estimated as: Yˆ = 10.86 + 1.21(7) = 19.33 Birr
 One of the mathematical properties of a line fitted by the method of least squares is that the individual
positive and negative errors add up to zero. For the above problems this property is displayed below.
X Y Yˆ = 10.856 + 1.21x e = Y- Yˆ
4 15 10.856 + 1.21(4) = 15.696 -0.696
5 18 10.856 + 1.21(5) = 16.906 1.094
6 18 10.856 + 1.21(6) = 18.116 -0.116
8 20 10.856 + 1.21(8) = 20.536 -0.536
9 22 10.856 + 1.21(9) = 21.746 0.254
e =( Y  Yˆ )= 0
Coefficient of Determination
Another measure of goodness –of – fit of the regression line is the coefficient of determination, which is
the square of the correlation coefficient, that is,r2 lies between 0 and 1, inclusive.
Coefficient of Determination = r2
 An r2 close to 1 indicates a strong correlation between X and Y, while an r2 close to 0 means there is a
little correlation between these two variables.
The total variation in the dependent variable (Y) can be divided into two: Explained variation and
unexplained variation.
 Explained variation is the change in the dependent variable (Y) explained by changes in the
independent variable (X). The proportion of explained variation is:
r2 x 100 %
 Unexplained variation is the variation in the dependent variable (Y) due to chance, excluded
variables, etc. The proportion of unexplained variation is:
(1 -r2) x 100 %
Example: The following data is on the monthly amount of money spent on advertising (x) (in thousands
of Birr) of a certain airlines in randomly selected five months.
Advertising expense (X) 10 12 8 17 10
Number of passengers (Y) 15 17 13 23 17
(a) Compute the coefficient of determination
(b) Find the proportion of explained variation and interpret.
(c) Find the proportion of unexplained variation and interpret.
Solution: From the data, we have the following summary results.

 n =5, Xi = 57, Yi = 85, XiYi = 1019, Xi2 = 697, Yi2 = 1501
(a) To obtain coefficient of determination, first we need to calculate coefficient of correlation:
r = nXiYi - (Xi) (Yi)
(nXi2 - (Xi)2) ((nYi2 – (Yi)2)
= 5(1019) - (57) (85) = 250 = 0.9725
(5(697) –(57)2) (5(1501) – (85)2) [(236) (280)]1/2
 Hence, the coefficient of determination is: r2 = (0.9725)2 = 0.9458. This figure indicates that there is
a strong correlation between advertising expense and number of passengers.
(b) The proportion of explained variation is: r2 x 100% = 0.9458 x 100% = 94.58%. Thus, we can
conclude that 94.58% of the change in the number of passengers is explained by changes in the
amount of money spent on advertising.
(c) The Proportion of unexplained variation is: (1- r2) x100% = (1-0.9458) x100% = (0.0542)
x100% = 5.42%. Thus, 5.42% of the change in the number of passengers is explained by some
other variables other than advertising expense (such as ticket price, plane safety, etc)

AREAS UNDER THE STANDARD NORMAL CURVE
(The entries are the probabilities that a random variable having the
standard normal distribution will take on a value between 0 and z.)
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990

Business Statistics Introduction To Statistics Handout 2018

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Business Statistics Introduction To Statistics Handout 2018

Uploaded by

Copyright:

Available Formats

O RO MI A S T AT E UNIVE RS IT Y

SCHOOL OF BUSINESS AND ECONOMICS

Department of Economics and Development Finance

Business Statistics Handout

1.1. Definition of Statistics

1.2. Branches of Statistics

Example: - Unemployment rate of a country

ii. Inferential statistics

OSU July, 2018 Page 1

 Quantitative and Qualitative Variables

 Discrete and Continuous Variables

OSU July, 2018 Page 2

ii. Primary Data

a) Questionnaire and Interview

OSU July, 2018 Page 3

1.4. Levels of measurement

OSU July, 2018 Page 4

1.5. Uses and Misuses of Statistics

OSU July, 2018 Page 6

1.6. Limitation of Statistics

1) Statistics doesn’t study qualitative phenomenon directly

3) Statistical laws are not exact

4) Statistics is liable to be misused.

OSU July, 2018 Page 7

OSU July, 2018 Page 8

2.1. Methods of Data Collection

 Advantages and Disadvantages of Census

OSU July, 2018 Page 9

 Merits and Demerits of Sample Method

2.2. Sampling and Sampling Techniques

OSU July, 2018 Page 10

Simple Systematic Stratified Cluster Convenienc Purposive Quota -

OSU July, 2018 Page 11

II. Non- Probability Sampling

OSU July, 2018 Page 13

2. Judgment or Purposive Sampling

 Sampling and Non-Sampling Errors

OSU July, 2018 Page 14

Parameter (population parameters)

 In case of the mean; Sampling error = X  

Non- Sampling Error

OSU July, 2018 Page 15

2.3. Methods of Data Presentation

OSU July, 2018 Page 16

Rules for Classification

(i) It should be un-ambiguous:

(ii) It should be exhaustive and mutually exclusive:

(iii) It should be stable:

OSU July, 2018 Page 17

ii. Chronological Classification

iii. Qualitative Classification

iv. Quantitative Classification

OSU July, 2018 Page 18

2) Ungrouped Frequency distribution

Constructing ungrouped frequency distribution

Example: the following data represents the marks of 20 students

OSU July, 2018 Page 19

Table 1: Marks of 200 Students

OSU July, 2018 Page 20

Types of Class Interval

OSU July, 2018 Page 22

Open End Classes

OSU July, 2018 Page 23