You are on page 1of 121

AMBO UNIVERSITY WALISO CAMPUS

Statistics for MGT I:


Normal Distributions (Continuous Probability
Distributions)
Normal Distributions
 A probability distribution
 is formed from all possible outcomes of a random process
(for a random variable X) and the probability associated with
each outcome.
 may either be discrete (distinct/separate outcomes, such as
number of children) or continuous (a continuum of outcomes,
such as height).
 A probability density function is defined such that the
likelihood of a value of X between a and b equals the integral
(area under the curve) between a and b. This probability is
always positive.
 Further, we know that the area under the curve from negative
infinity to positive infinity is one. The whole area under the
curve is 1.
Normal Distributions… cont
 Normal distribution,
 is also known as the
Gaussian distribution,
 is a probability distribution
that is symmetric about the
mean, showing that data near
the mean are more frequent
in occurrence than data far
from the mean.
 is a continuous probability
distribution that has a bell-
shaped probability density
function.
Normal Distributions… cont
 The top of the curve shows the mean, mode, and
median of the data collected.
 In a normal distribution the mean is zero and the
standard deviation is 1.
 Normal distributions are symmetrical, but not all
symmetrical distributions are normal.
 Its standard deviation depicts the bell curve's relative
width around the mean.
Normal Distributions… cont
 The normal distribution is symmetric and centered on
the mean (same as the median and mode).
 While the x-axis ranges from negative infinity to
positive infinity, nearly all of the X values fall within +/-

 Often times the x values of the standard normal


distribution are called z-scores.
 We can calculate probabilities using a normal
distribution table (z-table).
Normal Distributions… cont
 If you need to find the area to the right of a
z-score (Z greater than some value), you
need to subtract the value in the table from
one.

 The standard normal distribution has two


parameters: the mean and the standard
deviation.
 For a normal distribution, the three sigma
rules include:
• 68% of the observations are within +/- one
standard deviation of the mean,
• 95% are within +/- two standard
deviations, and
• 99% are within +- three standard
deviations.
The 3 sigma rules
Normal Distributions… cont

 From the above figure, we can see that:


I. about 68% of the z scores lie within 1 standard deviation of the
mean, that is, between −1 and +1.
II. about 95% of the z scores lie within 2 standard deviations of the
mean, that is between −2 and +2.
III. almost all the z scores lie between −3 and +3 standard deviations
from the mean. (Our graph shows 100% of the observations lie
between −3 and +3 but more accurately this is 99.74%)
Normal Distributions… cont
How to read z-table?
Eg: Find the probability of Z = -1.22. look the table

 0.1112 is the area to the left when Z= -1.22


Normal Distributions… cont
 By using z-table calculate P(-1 < Z < 1)
Solution
Find p(z) < -1 and p(z)<1 in z-table
p(z)< -1 = 0.1587.
 Because the normal distribution is symmetric, we therefore know that the
probability that z is less than one also equals 0.1587
p(z)< 1 = 0.1587
 To calculate the probability that z falls between 1 and -1,
We take 1 – 2(0.1587) = 0.6826.
 The green area in the figure below roughly equals to 68% of the area
under the curve.
Normal Distributions… cont
 We can convert any and all normal distributions to the standard
normal distribution using the equation below.

 A Z- score is the number of standard deviations that a value, X,


is away from the mean.
 If the value of X is less than the mean, the Z-score is negative; if
the value of X is greater than the mean, the Z-score is positive.
Normal Distributions… cont
 The probability
calculations in normal
distribution are made
by computing areas
under the graph.
 Thus, to find the
probability that a
random variable lies
within any specific
interval we must
compute the area under
the normal curve over
that interval.
Normal Distributions… cont
 A z-score for an individual value can be interpreted as
follows:
 Positive z-score: The individual value is greater than the mean.
 Negative z-score: The individual value is less than the mean.
 A z-score of 0: The individual value is equal to the mean.
 The larger the absolute value of the z-score, the further
away an individual value lies from the mean.
 The value of the z-score tells you how many standard
deviations you are away from the mean.
 If a z-score is equal to 0, it is on the mean.
 A positive z-score indicates the raw score is higher than
the mean average. For example, if a z-score is equal to
+1, it is 1 standard deviation above the mean.
Normal Distributions… cont
Example:
 The Graduate Management Admission Test (GMAT) is
widely used by graduate school of business as an
entrance requirement. In one particular year, the mean
score for the GMAT was 485, with a standard deviation
of 105. assuming that GMAT scores are normally
distributed, what is the probability that a randomly
selected score from this administration of the GMAT:
1. What is the probability falls b/n 600 and the mean
inclusive? i.e p(485 <= X <= 600)
Normal Distributions… cont
Solution

Z 485 = 485-485/105= 0/105 = 0


Z600 = 600-485/105= 115/105 = +1.095 =+1.10
So, p(485 <= X <= 600) = p(0 <= Z <= +1.10)
p(0 to +1.10)
Find z value = +1.10
It is 0.8643
 =+1.10 shows 1.10 SDDs away from the mean(above
the mean).
Normal Distributions… cont
2. What is the probability that randomly selected score is
greater than 650? i.e p( X > 600) = ?
a)Calculate the appropriate z values
b)Find the area (probability) in the table
c)Interpret your result.
Solution:
a. Z650 = 650-485/105= 165/105 = +1.57
b. The probability in table. It is 0.9418
p(x>650) = p(z>+1.57)
c. = +1.57 shows 1.57 SDDs away from the mean(above
the mean).
Normal Distributions… cont
Further explanation on the above example
Identify area below/up to z-value, area beyond z-value and
μ to z-value for the Z-value = +1.57 = 0.9418
A. Area beyond z-value= 1-0.9418 = 0.0582
B. Area below z-value = 0.9418

C. Area b/n μ & z value/μ to z-value


= 0.5-0.0582 = 0.4418
A + B = 0.9418 + 0.0582 = 1

Z-value Μ to Z Area beyond z- Area up to Z


value
+1.57 0.4418 0.0582 0.9418
Normal Distributions… cont
3. What is the probability that randomly selected score is
less than 300?
Solution:
a. Z300 = 300-485/105= -185/105 = -1.76
b. The probability in table for the z value is 0.0392
p(x<300) = p(z< -1.76)
= 0.5 - p(0 to -1.76) = 0.5 - 0.0392 = 0.4608
c. = -1.76 shows the area to the left of z i.e 0.4608.

Z-value Μ to Z Area beyond z- Area up to Z


value
-1.76 0.0392 0.4608 0.0392
Normal Distributions… cont
4. What is falls between 350 and 550, inclusive?
That is p(350<=X<=550)?
Solution:
a. Z350 = 350-485/105= -135/105 = -1.29 = 0.0985
Z550 = 550-485/105= 65/105 = +0.62 = 0.7324
b. The probability in table for the z value is -1.29 = 0.0985
and +0.62 = 0.7324
So, p(350<=X<=550) = p(-1.29<=z<= +0.62)
= p(0 to -1.29) + p(0 to +0.62)
= 0.0985 + 0.7324 = 0.7185
Normal Distributions… cont
Exercise: If z = 2.15,
 What is the area beyond z score (smaller probability), find the
area up to z score (larger probability), what is the sum of area
beyond z score and area up to z score? And what is the area b/n μ
& 2.15σ ?
 Area beyond z-value
= 1-0.9842 = 0.0158
This means the probability z- score that exceed 2.15 is
0.0158
 Area up to z-value = 0.9842
 Sum of area beyond z score and area up to z score
Z-value = 0.0158Μ+to0.9842
Z =Area
1 beyond z-value Area up to Z
Area
2.15 b/n μ & 2.15σ 0.482 = 0.5 -0.0158
Area beyond z-value = 0.5 – 0.0158 =
0.9842
0.482
Class Work
1. What proportion of the z scores are less than a z
score of 1.58?
2. What is the area between z scores of 0.33 and
1.33?
Statistics for Management II

Course Code: MGMT 2073


Statistics for Management II: Contents of the Course

• Chapter 1: Sampling and Sampling Distributions


• Chapter 2: Statistical Estimations
• Chapter 3: Hypothesis Testing
• Chapter 4 : Chi-square Distributions
• Chapter 5 : Analysis of Variance
• Chapter 6: Regression and Correlation
Chapter 1: Sampling and Sampling Distributions

Sampling Theory: Basic Definitions


 Population refers to
• a set of all elements of interest under study or collection of
all elements about which some inference is to be made.
• all items that have been chosen for study.
• the totality of elements being studied noted by (N)
Example:
• HIV carriers in south west shewa zone
• Students in Waliso town,
• MSEs in Waliso town,
Basic Definitions … cont
 Sample (n) refers to
• a part or subset of the population.
• small portion of population believed to
represent the entire population.
Examples: -
• HIV carriers in waliso town which is selected
HIV carriers in south west shewa zone.
• Gerasu Duki Comprehensive Senior Secondary
School students selected from Senior Secondary
School students found in south west shewa zone.
Basic Definitions … cont
 Sampling: process of selecting elements from
the population or gathering information from
parts of the population.
 Element: the basic unit (each unit) of the
population.
 Census: gathering information from all
elements in the population or refers to complete
enumeration or the measurement of every
individual or item in the population.
Basic Definitions … cont
 Population parameter: refers a measurement or
the characteristics taken from the population.

Examples:-
• Mean of population,

• Population Variance,

• Population standard Deviation,

• Population range.
Basic Definitions … cont
 Sample static: refers to a measurement or
characteristic taken from the sample.
• Examples:-
• Sample mean (X)
• Sample Variance (S2)
• Sample standard Deviation ( S)
• Sample proportion
Basic Definitions … cont
 Sampling unit: may be a geographical one such as state,
district, village, etc., or a construction unit like house,
etc., or it may be a social unit such as family, club,
school, etc., or it may be an individual.
The needs for sampling

• Studying a whole population is often cumbersome


and resource extensive (expensive!).
• It may not be possible to identify and measure
each individual in the population, or it may be
destructive to do so.
• Contacting the entire (whole) population would
often takes more time (time consuming).
The needs for sampling… cont..
• Practical infeasibility of complete enumeration,
• Enough reliability of inference based on
sampling,
• The adequacy of sample results,
• The quality of data collected,
• The physical impossibility of checking all items
in the population
Example:
– Population of birds
– population of snakes
– population of fish
Sampling Design & Sample Frame

 Sampling Design: refers to


• a definite plan for obtaining a sample from a
given population.
• technique or the procedure the researcher would
adopt in selecting items for the sample.
• It may as well lay down the number of items to
be included in the sample i.e., the size of the
sample.
• It is determined before data are collected.
Characteristics of Good sample Design

• Goal orientation,
• Measurability,
• Practicality,
• Economy,
• Independence,
• Homogeneity, and
• Adequacy.
Sample Frame

• is also known as ‘source list’ from which sample is to be


drawn.
• contains the names of all items of a universe (in case of
finite universe only).
• should be comprehensive, correct, reliable and appropriate.
• is extremely important for the source list to be as
representative of the population as possible.
• can be perfect or defect.
 Perfect frame: - identifies each element once and only once
but it’s seldom/not often available in real life.
 Defect Frame:- can be classified as incomplete
frame ,inaccurate frame, inadequate frame and outdate
frame.
Bias and errors in sampling, non-sampling errors
 Sampling errors are the random variations in the
sample estimates around the true population parameters.
 Since they occur randomly and are equally likely to be in
either direction, their nature happens to be of
compensatory type and the expected value of such
errors happens to be equal to zero.
 Sampling error decreases with the increase in the size
of the sample, and it happens to be of a smaller
magnitude in case of homogeneous population.
 Sampling error can be measured for a given sample
design and size. The measurement of sampling error is
usually called the ‘precision of the sampling plan’.
Bias and errors in sampling, non-sampling errors… cont
 If we increase the sample size, the precision
can be improved. But, increasing the size of the
sample has its own limitations like a large sized
sample increases the cost of collecting data and
also enhances the systematic bias.
 Thus, the effective way to increase precision is
usually to select a better sampling design which
has a smaller sampling error for a given sample
size at a given cost.
Bias and errors in sampling, non-sampling errors… cont

 In practice, however, people prefer a less precise


design because it is easier to adopt the same and also
because of the fact that systematic bias can be
controlled in a better way in such a design.
 In brief, while selecting a sampling procedure,
researcher must ensure that the procedure causes a
relatively small sampling error and helps to control
the systematic bias in a better way.
 Researcher must keep in view of the two causes of
incorrect inferences in sampling:
• Systematic Bias
• Sampling Error
Bias and errors in sampling, non-sampling errors… cont
 Systematic Bias:-
• results from errors in sampling procedure and it cannot be
reduced or eliminated by increasing the sample size.
• refers to faulty procedure of selecting representative and
tendency problems to overlook elements those have peculiar
characteristics.
• is a result of one or more of the following factors:
 Inappropriate sampling frame,
 Defective Measuring devices,
 Non respondents,
 Indeterminacy Principle/the quality of being vague and
poorly defined principle,
 Natural Bias in reporting of data
Bias and errors in sampling, non-sampling errors… cont
 Sampling Error:
• refers to the difference between sample statics (characteristics
taken from the sample) and its corresponding population
parameter (characteristics taken from the population).
• Sample surveys do imply the study of a small portion of the
population and as such there would naturally be a certain
amount of inaccuracy in the information collected. This
inaccuracy may be termed as sampling error or error
variance.
 Example: suppose a population of five production employees had
efficiency rating of 97, 103, 96, 99 and 105, further suppose that a
sample of two ratings of 97 and 105 is selected to estimate the
Bias and errors in sampling, non-sampling errors… cont
Bias and errors in sampling, non-sampling errors… cont
 Non-sampling Error: Errors arises from administrative
procedures, collecting inconsistent reporting data
(collecting data from faulty provider, sources.)
Type of samples (sampling Techniques)

• Probability (Random) samples


• Non- probability (Non-random) samples
 Probability sampling:
• is also known as ‘random sampling’ or ‘chance
sampling’.
• Under this sampling design, every item of the universe
has an equal chance of inclusion in the sample.
• is, so to say, a lottery method in which individual units
are picked up from the whole group not deliberately but
by some mechanical process.
Type of samples (sampling Techniques)….
 Probability sampling: ….
 The implications of random sampling (or simple random
sampling) are:
• It gives each element in the population an equal
probability of getting into the sample; and all choices
are independent of one another.
• It gives each possible sample combination an equal
probability of being chosen.
• Keeping this in view, we can define a simple random
sample (or simply a random sample) from a finite
population as a sample which is chosen in such a way
that each of the possible samples has the same
probability, 1/ of being selected.
Type of samples (sampling Techniques)….
 Probability sampling: ….
• To make it more clear, we take a certain finite population
consisting of six elements (say a, b, c, d, e, f ) i.e., N = 6.
Suppose that we want to take a sample of size n = 3
from it.
• Then, there are 6C3 = 20 possible distinct samples of
the required size, and they consist of the elements abc,
abd, abe, abf, acd, ace, acf, ade, adf, aef, bcd, bce, bcf,
bde, bdf, bef, cde, cdf, cef, and def.
• If we choose one of these samples in such a way that
each has the probability 1/20 of being chosen, we will
then call this a random sample.
Type of samples (sampling Techniques)….
 Advantages of probability sampling
• It is the only sampling method that provides
essentially unbiased estimates.
• It permits the researcher to evaluate in
quantitative terms.
 Disadvantages of Random sampling
• Relatively expensive than non-probability
sampling.
• The method of selection can be complex and
time consuming.
Types of probability sampling

1. Simple Random sampling


2. Systematic Random sampling
3. Stratified sampling
4. Cluster sampling
5. Multi-stage sampling
Types of probability sampling… cont…

A. Simple Random sampling


• is random sampling in which each unit of population
have equal chance of being included in the sample
(also known as lottery method)
• is the simplest and most popular techniques of sampling
• In simple random sampling, all choices are independent
of each other
 Example 1: If we want to take a sample of 25 persons out
of a population of 150, the procedure is to write the names
of all the 150 persons on separate slips of papers, fold these
slips, mix them thoroughly and then make a blindfold
selection of 25 slips.
Types of probability sampling… cont…

 Simple Random sampling….


Types of probability sampling… cont…

Simple Random sampling….


Types of probability sampling… cont…
B. Systematic Random sampling
 In some instances, the most practical way of sampling is
to select every ith item on a list.
 In systematic sampling only the first unit is selected
randomly and the remaining units of the sample are
selected at fixed intervals.
 For instance, if a 4 percent sample is desired, the first
item would be selected randomly from the first twenty-
five and thereafter every 25th item would automatically
be included in the sample.
Types of probability sampling… cont…

B. Systematic Random sampling…


 Suppose the population consists of ordered N units
(number 1 to N) and a sample of size n is selected from
the population in such a way that N/n = K (rounded to the
nearest integer). Here , k is called sampling interval the
event integer).
Types of probability sampling… cont…

C. Stratified sampling
 If a population from which a sample is to be drawn does
not constitute a homogeneous group, stratified
sampling technique is generally applied in order to
obtain a representative sample.
 Under stratified sampling, the population is divided into
several sub-populations that are individually more
homogeneous than the total population (the different sub-
populations are called ‘strata’) and then we select items
from each stratum to constitute a sample.
 Stratified sampling results in more reliable and detailed
information.
Types of probability sampling… cont…
C. Stratified sampling… cont…
• How to form strata?: The strata be formed on the basis
of common characteristic(s) of the items to be put in
each stratum.
• How should items be selected from each stratum?: the
usual method, for selection of items for the sample from
each stratum, resorted to is that of simple random
sampling. Systematic sampling can be used if it is
considered more appropriate in certain situations.
• How to allocate the sample size of each stratum?: we
usually follow the method of proportional allocation
under which the sizes of the samples from the different
strata are kept proportional to the sizes of the strata.
Types of probability sampling… cont…
C. Stratified sampling… cont…
• Factors to be considered in Stratified Sampling
– Base of stratification
– Number of strata
– Sample size with in strata
Types of probability sampling… cont…
D. Cluster sampling
 If the total area of interest happens to be a big one, a
convenient way in which a sample can be taken is to divide the
area into a number of smaller non-overlapping areas and
then to randomly select a number of these smaller areas
(usually called clusters).
 In this method, the universe (population) is divided in to some
recognizable subgroups, called clusters. After this, simple
random sample of this cluster is drawn and all elements
belonging to the selected cluster longitude the sample.
 Cluster sampling, no doubt, reduces cost by concentrating
surveys in selected clusters. But, certainly it is less precise than
random sampling. It reduces the cost of sampling for a
population scattered over large geographic areas.
Types of probability sampling… cont…
E. Multi-stage sampling
• is a further development of the principle of cluster
sampling.
 When different sampling techniques are combined at the
various stages to select the representative from the Universe
it is known as multi-stage sampling.
Eg: Suppose we want to investigate the working efficiency of
nationalised banks in Ethiopia and we want to take a sample
of few banks for this purpose. The first stage is to select
large primary sampling unit such as states in a country.
Then we may select certain districts and interview all
banks in the chosen districts. This would represent a two-
stage sampling design with the ultimate sampling units
being clusters of districts.
Non-probability (Non-Random) Sampling

It can be classified as:


– Convenience (Accidental) sampling
– Purposive sampling/ judgmental sampling
– Quota sampling
– Snow ball sampling
A. Accidental or Haphazard or convenience sampling:
select anyone who is convenient. It can produce
ineffective, highly unrepresentative samples and is not
recommended.
Non-probability (Non-Random) Sampling….

B. Quota sampling: is an improvement over


haphazard/carelessness, but it is too a weak type of
sampling.
 In quota sampling, a researcher first identifies categories
of people (e.g. male and female), then decides how many
to get in each category.
 Thus, the number of people in various categories of the
sample is fixed. The researcher interviews the first 5
males that he/she encounters/accidentally be in the
same place and interact with/. Similarly, the first 5
females encountered are interviewed.
Non-probability (Non-Random) Sampling….
C. Purposive or judgmental sampling: - is an acceptable
kind of sampling for special institution.
 It uses the judgment of an expert in selecting cases or it
selects cases with a specific purpose in mind.
 The researcher never knows whether the case selected
represents the population.
 It is used in exploratory research or in field research.
 Purposive sampling can be appropriate in some
situations.
• E.g., in a study of labor problems, you may want to talk
only with those who have experienced on-the job
discrimination. Companies often try out new product
ideas on their employees.
Non-probability (Non-Random) Sampling….
D. Snowball sampling (also called network, chain
referral, or reputation) sampling: is a method for
identifying and sampling (or selecting) the cases in a
network.
 Each person or unit is connected with another through a
direct or indirect linkage. This does not mean that each
person directly knows, interacts with, or is influenced by
every other person in the network.
Quiz
 Describe the difference/s b/n simple random
sampling and stratified sampling in detail by
stating how the researchers apply them.
Central limit Theorem (CLT)

• is the relationship between the shape of population


distribution and the shape of sampling distribution of the
sample mean.
• tells us that for a population with any distribution, the
distribution of the sample means approaches a normal
distribution as the sample size increases.
• states that:-
1. If the population is normally distributed, the distribution of
sample mean is normal regardless of the sample size.
2. If the population from which samples are taken is not
normal, the distribution of sample mean will be
approximately normal if the sample size (n) is sufficiently
large (n30) the larger the sample size used the closer the
sampling distribution is to the normal curve.
Central limit Theorem (CLT)…..
 The profundity/quality of the Central Limit
Theorem: As sample size gets larger, even if you
start with a non-normal distribution, the sampling
distribution approaches a normal distribution.
 The essence of the Central Limit Theorem: As the
sample size increases, the sampling distribution
of the sample mean concentrates more and more
around µ (the population mean). The shape of the
distribution also gets closer and closer to the normal
distribution as sample size n increases. Or the mean
of the sample means is the population mean.
Central limit Theorem (CLT)…..
 The significance of CLT:
• It permits to use sample statist to make inference
about population parameters without knowing anything
about the shape of the frequency distribution that
population rather than what we can get from the sample.
• It permits to use normal distribution curve for analyze
distributions whose shape is unknown.
• It creates potential for applying the normal distribution to
many problems when the sample is sufficiently large .
Central limit Theorem (CLT)…..
 Example: The distribution of annual earnings of
all bank tellers with the five years of experience
is skewed negatively. This distribution has a
mean of mean birr 15,000 and standard deviation
of birr 2000 if we draw a random sample of 30
tellers what is the probability that their earning
will average more than birr 15,750 annually?
Central limit Theorem (CLT)…..
Central limit Theorem (CLT)…..
3. Find the area covered by the interval

• Interpretation: There is 2.02% chance that the average earning


of the bank tellers being more than birr 15750 annually in a
group of 30 tellers.
Sampling Distributions
 Sampling distribution:
 refers to the probability distribution of a given statistic
based on a random sample of a certain size n.
 describes the way in which a statistic or a function of
statistics, which is/are the function(s) of the random
variables x1, x2, …….., xn, will vary form one sample to
another sample of the same size.
 If we take certain number of samples and for each
sample compute various statistical measures such as
mean, standard deviation, etc., then we can find that
each sample may give its own value for the statistic
under consideration.
Sampling Distributions… cont
 All such values of a particular statistic, say mean,
together with their relative frequencies will constitute
the sampling distribution of the particular statistic.
 Accordingly, we can have sampling distribution of mean,
or the sampling distribution of standard deviation or the
sampling distribution of any other statistical measure.
 The sampling distribution tends quite closer to the
normal distribution if the number of samples (sample
size) is large.
 The significance of sampling distribution follows from
the fact that the mean of a sampling distribution is the
same as the mean of the universe. That is
Sampling Distributions… cont
 Some important sampling distributions, which are
commonly used are:
(1) sampling distribution of mean;
(2) sampling distribution of proportion;
(3) Student’s ‘t’ distribution
(4) F distribution; and
(5) chi-square distribution.
 Some of this sampling distributions are mentioned in
brief as below.
Sampling Distribution of Sample Mean

 refers to the probability distribution of all the possible means of


random samples of a given size(n) that we take from a
population size (N). In other word, Sampling distribution of the
mean shows how far sample means could be from a known
population mean. How far is from
 Sampling distribution of the mean is described by two
parameters:
 The expected value (X) = X or mean of sampling disturb of
mean
Standard deviation of the mean X, the standard error of the mean.
Sampling Distribution of Sample Mean….

Properties of the sampling distribution of the Sample Mean


 The mean of sampling distribution of the means is equal to the population mean
Algebraically  = X
 The standard deviation of the sampling distribution of the mean (standard Error) is
equal to population standard deviation divided by the square root of the
sample size.
Sampling Distribution of Sample Mean….

• With simple random sampling the value of standard deviation of


the mean depends on whether the population is finite or infinite.
• Population is said to be infinite when it is not possible to list or
count all the elements in the population (i.e. when the elements
are unlimited).
Sampling Distribution of Sample Mean….
Properties of the sampling distribution of the Mean…
 As we increase the size of the sample, the spread of the
distribution of the sample mean become smaller.
 The sampling distribution of mean is approximately
normal for sufficiently large sample size ( 30)
 Example: Assume there is a population which contains
only five numbers (elements): 0,3,6,3 and 18 which is
designated by letters A,B,C,D and E respectively. If
sample size of 3 is selected:
Sampling Distribution of Sample Mean….
 Required:
• Compute for sample number size three
sample can be drown from the given
population
• What is the population mean of the given
data?
• What is sampling distribution of the sample
means for the given sample size of 3
• What is the mean of the sampling distribution?
Sampling Distribution of Sample Mean….
• Compute for sample number size three sample can
be drown from the given population
Sampling Distribution of Sample Mean….
• What is the population mean of the given data?
Sampling Distribution of Sample Mean….
• What is sampling distribution of the sample
means for the given sample size of 3
Sampling Distribution of Sample Mean….
Sampling Distribution of Sample Mean….
Sampling Distribution of proportions
 In some circumstance in statistics, it is important to know a
proportion of certain characteristics in a population.
 For example:
• A quality central engineer might want to know what
proportion of products on assembly lines is detective
(successful).
• A labor economist might want to know what proportion of the
labor force is unemployed/proportion of non-detectives i.e.,
of failures.
• Sample proportion is computed by dividing the frequency
that a given characteristics occurs in a sample by the
number of items in the sample. It is the ratio of the
number of successes in a sample to the size of that sample.
Sampling Distribution of proportions

 Like other probability distribution, sampling distribution


of proportion can be described by two parameters:
• The mean of the sample proportions , 
• Standard deviation of the proportions
Sampling Distribution of proportions

Properties of sampling Distribution of


1. A sampling distribution of the mean does, the
proportion of detectives i.e successes (p) is always
equals to the means of the sample proportion
p= 
2. The standard deviation of the proportions is equals to
Sampling Distribution of proportions
 If ‘p’ represents the proportion of detectives i.e., of successes and
‘q’ represents the proportion of non-detectives i.e., of failures (or q
= 1-p) and if ‘p’ is treated as a random variable, then the sampling
distribution of proportion of successes has mean = p with
standard deviation =
 Note: Finite population correction factor is not
necessary if n < 0.05 N
 CLT : states that normal distribution approximates the shape of the
distribution of sample proportions if np and nq are greater than
five (5). Here under we solve problems involving sample
proportion by using a normal distribution whose mean and
standard deviation are: or
Sampling Distribution of proportions
• NB: The sampling distribution of proportion can be
approximated by a normal distribution whenever the
sample size is large (i.e np and q > 5).
Example: Suppose that 60% of electrical contractors in a
region use a particular brand of wire. What is the prob. of
taking a random sample of size 120 from those electrical
contractor and finding that 0.5 or less use that brand of
wire?
Given: P = 0.6,  = 120, q = 1-0.6 = 0.4 required : p ( 0.5 )?
Solution :
Step 1: Check that np and nq> 5
= 120 (0.6) = 72 and 120 (0.4) = 48 both are greater than 5
Sampling Distribution of proportions
Step 2: Calculate for

Step 5: interpretation – the probability of finding 50% or less of


contractors to use this particular brand of wire is 48.75% if we
take a random sample of 120 contractors.
Exercises
1. If 10% of a population of parts is defective, what is
the probability of randomly selecting 80 parts and
finding that 12 or more are defective?
2. Suppose that a population proportion is .40 and that
80% of the time you draw a random sample from
this population, you get a sample proportion of 0.35
or more. How large a sample were you taking?
3. If a population proportion is 0.28 and if the sample
size is 140, 30% of the time the sample proportion
will be less than what value if you are taking
random samples?
Unit 2: Statistical Estimation

Concepts of Statistical Estimation


 Sampling distribution of the mean shows how far
sample means could be from a known population mean.
 Sampling distribution of the proportion shows how far
sample proportions could be from a known
population proportion.
 In estimation, our aim is to determine how far an
unknown population mean could be from the mean of
a simple random sample selected from that
population; or how far an unknown population
proportion could be from a sample proportion.
Concepts of Statistical Estimation…

Basic Concepts
 Estimation
• is the process of using statistics as estimates of parameters.
• is any procedure where sample information is used to estimate/
predict the numerical value of some population measure (called a
parameter).
 Estimator: any sample statistic that is used to estimate a
population parameter (sample mean for population mean, sample
variance for sample standard deviation, sample proportion for
population proportion).
 Estimate- is a specific numerical value of our estimator. Eg:
Important Properties of Good Estimators
• indicates standard that can be used to evaluate estimators:
A. Unbiasedness : The estimator is said to be unbiased when the mean
of sampling estimator is equals to population parameter.
 The sample mean is an unbiased estimator of the population
mean.
 The sample variance is an unbiased point estimator the
population variance because the mean of the sampling distribution
of the sample variance is equal to the population variance.
 The sample proportion is an unbiased estimator of the population
proportion.
 However, because standard deviation is a nonlinear function of
variance, the sample standard deviation is not an unbiased
estimator of population standard deviation.
B. Efficiency: refers to the size of standard error of statistics. The most
efficient estimator is the one with smallest variance.
Important Properties of Good Estimators…
C. Consistency: related to the relationship between the
larger the sample size and the variance. It is related to
their behavior as the sample gets large.
 A statistic is a consistent estimator of population
parameter if sample size is large/increases, it becomes
almost more certain, the value of the statistic comes
very close to the value of the population parameter. An
unbiased estimator is a consistent estimator if the
variance approaches to 0/zero/ as n increases.
D. Sufficiency (or sufficient statistic): is an estimator that
utilizes all the information a sample contains about the
parameter to be estimated. For example, the sample mean
is a sufficient estimator of the population mean.
Concepts of Statistical Estimation…

Estimation procedure
1. Select a sample.
2. Collect the required information from the
members of the sample.
3. Calculate the value of sample statistics.
4. Assign the value(s) to the corresponding
population parameter.
 An estimator may be a point estimator or interval
estimator.
Concepts of Statistical Estimation…
 Statistical Inference:
• branch of statistics which is concerned with using
probability concepts to deal with uncertainty in
decision making.
• is based on estimation and hypothesis testing and
any inference drawn about the population (i.e
based on sample statistic or function of sample
information).
Type of Estimates

 We can make two types of estimates about a population


(population parameter):
• A point Estimate and
• Interval Estimate .
Type of Estimates: Point Estimator of the Mean and Proportion

 A point estimate:
• is a single number that is used to estimate an unknown population
parameter.
• It is a single value that is measured from a sample and used as an
estimate of the corresponding population parameter.
 The most important point estimates (given that they are
single values) are:
Type of Estimates
 An interval estimate
• is a range of values used to estimate a population
parameter.
• describes the range of values with in which a parameter
might lie.
• Stated differently, an interval estimate is a range of
values with in which the analyst can declare with some
confidence that the population parameter will fall.
Example
• Suppose we have the sample 10,20,30,40 and 50
selected randomly from a population whose mean μ is
unknown.
• The sample mean, , = ∑xi /n = 10 + 20 + 30 + 40 +
50 /5= 30 is a point estimate of μ/population mean.
• On the other hand, if we state that the mean, µ, is
between x ± 10 , the range of values from 20 (30-10) to
40 (30+10) is an interval estimate.
Interval Estimators of the Mean and the Proportion
 Point estimators of population parameters, while useful, do not
convey as much information as interval estimators. It produces
a single value as an estimate of the unknown population
parameter. The estimate may or may not be close to the
parameter value; in other words, the estimate may be incorrect.
 An interval estimate, on the other hand, is a range of values
that conveys the fact that estimation is an uncertain process.
 The standard error of the point estimator is used in creating
a range of values; thus a measure of variability is incorporated
into interval estimation.
 Further, a measure of confidence in the interval estimator is
provided; consequently, interval estimates are also called
confidence intervals. For these reasons, interval estimators
are considered more desirable than point estimators.
Interval estimation for population mean (µ)
 As a result of the CLT, the following Z formula for
sample means can be used when sample sizes are large,
regardless of the shape of the population distribution or
for smaller sizes if the population is normally distributed.
Interval estimation for population mean (µ) …
 The confidence interval for population mean is affected
by:
• The population distribution, i.e., whether the population
is normally distributed or not
• The standard deviation, i.e., whether σ is known or not.
• The sample size, i.e., whether the sample size, n, is large
or not.
Condition1: Confidence interval estimate of (µ) when Distribution is Normal and σ is known

 A confidence Interval Estimate for µ is an interval


estimate together with a statement of how confident we
are that the interval estimate is correct.
 When the population distribution is normal and at the
sometime σ is known, we can estimate µ (regardless of
the sample size) using the following formula:
Where:
Condition1: Confidence interval estimate of (µ) when
Distribution is Normal and σ is known…
 From the above formula we can learn that an interval
estimate is constructed by adding and subtracting the
error term to and from the point estimate. That is, the
point estimate is found at the center of the confidence
interval.
 To find the interval estimate of population mean, μ we
have the following steps:
Condition1: Confidence interval estimate of (µ) when Distribution is
Normal and σ is known…
 Example: The Vice President of Operations for
Ethiopian Tele Communication Corporation (ETC) is
in the process of developing a strategic plan. He
believes that the ability to estimate the length of the
average phone call on the system. He takes a random
sample of 60 calls from the company records and
finds that the mean sample length for a call is 4.26
minutes; past history for these types of calls has
shown that the population standard deviation for calls
length is 1.1 minutes. Assuming that the population is
normally distributed and 95% confidence level.
Estimate the population mean
Condition1: Confidence interval estimate of (µ) when
Distribution is Normal and σ is known…
Given: n= 60 calls = 4.26 minutes σ = 1.1 minutes C= 0.95
Solution:
Step 1: Compute the Standard Error of the sample

Step 2: compute for from the confident coefficient

Step 3: Find the Z value for the from the t- table


Z = Z0.025 = 1.96
Condition1: Confidence interval estimate of (µ) when Distribution is Normal and σ
is known…

Step 4: Construct the confidence interval

= 4.26 - 1.96 (0.142) = 4.26 - 0.146


= 4.26 – 0.2783 to 4.26 + 0.2783
= 3.9817 to 4.5383
3. 98<= μ <= 4.54
Step 5: Interpret the result
 The vice president of ETC can be 95% confident that the
average length of a call for the population is between
3.98 and 4.54 minutes
Condition 2: Confidence interval estimate of µ for normal distribution (when σ unknown and n is large)

 If the population standard deviation is unknown, it has


to be estimated from the sample (i.e when σ is unknown
we use sample standard deviation).
S= simply
 Then the standard error of the mean ( ) is estimated
from standard error of the sample mean

• Therefore the confidence interval to estimate μ when population


standard deviation is unknown, when population is normal and
n is large
Example
• Suppose that the car renting firm in Hawassa wants to estimate the
average number of miles traveled by each of its car rented. A
random sample of 110 rented cars revealed that the sample mean
travel distance per day is 85.5 miles, with sample standard
deviation of 19.3 miles. Assuming normal distributions construct
99% confidence interval to estimate population mean.
Given
n = 110 rented cars s = 19.3 miles
= 85.5 miles c = 99% or 0.99
Solution
Step 1: compute for standard error of the sample
Example…
Step 2: compute for the proration of incorrect statement

Step 3 compute for value from standard normal table (t – table) reflecting
confidence level
Z = Z0.005 = 2.576
Step 4 compute for population mean (μ) or Construct the Confidence
interval of µ

μ = 80.76 to 90.24
Step 5: Interpretation: we state with 99% confidence that the average
distance traveled by rented cars lies between 80.76 and 90.24miles.
Condition 3: Confidence interval for μ, when σ unknown, n small and population is normal

 If the sample size is small (n<30), we can develop an interval


estimate of a population mean only if the population has a normal
probability distribution.
 If the sample standard deviation (s) is used as an estimator of the
population standard deviation σ and if the population has a normal
distribution, interval estimation of the population mean can be
based up on a probability distribution known as t-distribution.
Characteristics of t-distribution
1. The t-distribution is symmetric about its mean (0) and ranges
from - ∞ to ∞.
2. The t-distribution is bell-shaped (unimodal) and has
approximately the same appearance as the standard normal
distribution (Z- distribution).
Condition 3: Confidence interval for μ, when σ unknown, n small and
population is normal…
Characteristics of t-distribution….

3.The t-distribution depends on a parameter v called the degrees of


freedom of destruction V = n-1 where n is sample size. The degrees
of freedom (v) refers to the number of values we can choose freely

4. The variance of the t-distribution is = for v>2

5. The variance of t-distribution always exceeds 1

6. As v increases, the variance of t-distribution appropriates to 1 and the


shape approaches to standard normal distribution
Condition 3: Confidence interval for μ, when σ unknown, n small
and population is normal…
Characteristics of t-distribution….
7. Because the variance of the t-distribution exceeds 1.0 while the
variance of the Z-distribution equals 1, the t-distribution is slightly
flatter in the middle than the Z-distribution and has thicker tails.
8. The t-distribution is a family of distributions with a different
density function corresponding to each different value of the
parameter ν. That is, there is a separate t-distribution for each
sample size. In proper statistical language, we would say, “There
is a different t-distribution for each of the possible degrees of
freedom”.
9. For t-formula for sample when σ is unknown, the sample size is
small and the population is normally distributed is
This formula is essentially the same as the z- formula, but the
distribution table values are not
Condition 3: Confidence interval for μ, when σ unknown, n small and
population is normal…
Characteristics of t-distribution….
• The confidence interval to estimate μ becomes:

Steps
1. Calculate for the degrees of freedom (v = n-1)
2. Calculate sample standard error of the mean i.e.
3. Compute for a / 2 (compute for incorrect proportion)
Condition 3: Confidence interval for μ, when σ unknown, n small
and population is normal…
Example: If a random sample of 27 items produces x =
128.4 and s = 20.6. What is the 98% confidence interval
for μ? Assume that x is normally distributed for the
population. What is the point estimate?
Given
• n = 27 s = 20.6
• x = 128.4 c = 98% = 0.98
Solution
 The point estimate of the population mean is the sample mean, in
this case 128.4 is the point estimate.
1. Compute for degrees of freedom: V= n – 1= 27 – 1 = 26
2. Calculate sample standard error of the mean: = 20.6/√27
= 20.6/5.196 = 3.965
Condition 3: Confidence interval for μ, when σ unknown, n
small and population is normal…
3: a/2 = (1-C)/2 = (1-0.98)/2 = 0.02/2 = 0.01
4. Look up t , v value from t-distribution, i.e
t , ,v = t0.01,26 = 2.479
4: Determine for confidence interval
μ = 128.4 ±2.479 * 3.965
128.4 – 9.829 ≤ μ ≤ 128.4 + 9.829
118.571 to 138.229
5: Interpret Results: There is a 98% confidence that the
population mean lies between 118.57 and 138.23
Interval Estimation of the Population Proportion
• We know that a sample proportion, p, is an unbiased
estimator of a population proportion P and if the sample
size is large, the sampling distribution of p is normal with:

 However, here p is unknown and we want to estimate p by


p and hence z becomes Z =
That is, σp is substituted by

• Solving for P results in and since Z can be


both +ve and –ve values it becomes
Interval Estimation of the Population Proportion…
• Since Z represents the confidence level we write it as:

• Example
• Suppose that 60% of electrical contractors in a region use
a particular brand of wire what is the prob. of taking a
random sample of size 120 from those electrical contractor
and finding that 0.5 or less use that brand of wire?
Interval Estimation of the Population Proportion…

You might also like