You are on page 1of 71

STATISTICAL

INFERENCE
CHAPTER 01
RANDOM SAMPLING METHODS
DESCRIPTIVE STATISTICS INFERENTIAL STATISTICS
It consists of methods for organizing Inferential statistics consists of
and summarizing information. methods for drawing and measuring
Descriptive statistics includes the the reliability of conclusions about a
construction of graphs, charts, and population based on information
tables and the calculation of various obtained from a sample of the
descriptive measures such as population.
averages, measures of variation, and
percentiles
Population and Sample

Population: The collection of all individuals or items under


consideration in a statistical study.

Sample: That part of the population from which information


is collected.
CENSUS
The procedure of gathering data from the entire
population is called census. Sometimes it is preferable
to conduct a census of the entire population rather
than taking a sample.

A business researcher may opt to take a census rather


than a sample, provided there is adequate time and
money available to conduct such a census to eliminate
the possibility that by chance a randomly selected
sample may not be representative of the population.
Sampling
Sampling is a procedure for collecting sample data.
Sampling is widely used in business as a means of gathering
useful information about a population. Data are gathered from
samples and conclusions are drawn about the population as a
part of the inferential statistics process.
Reasons for Sampling
Taking a sample instead of conducting a census offers several
advantages.
1. The sample can save money.
2. The sample can save time.
3. For given resources, the sample can broaden the scope of the
study.
4. Because the research process is sometimes destructive, the
sample can save product.
5. If accessing the population is impossible, the sample is the only
option
SOME COMMON SAMPLING TECHNIQUES
SAMPLING IS THE METHOD THROUGH WHICH A SAMPLE IS COLLECTED FROM A
POPULATION.

The following are some common sampling techniques:


• Simple random sampling
• Stratified random sampling
• Systematic random sampling
• Cluster sampling
• Multi-stage random sampling

Let’s start exploring each of these sampling methods!


SIMPLE RANDOM SAMPLING
A sampling procedure for which each possible sample of a given size is
equally likely to be the one obtained.
STEPS TO GENERATE A RANDOM SAMPLE:
• CODING: Each unit of the population is numbered from 1 to N (where N is
the size of the population). Start coding with the first element coded as
00.
• Use a random number generator to generate codes until you reach the
desired size of the sample.
• Skip any duplicates that you might get while using the random number
generator.
• Write down the elements against the codes that you have generated
randomly.
A quick example will make it sound easy!
EXAMPLE
Consider the following population. Select a simple random sample of 5 elements.
POPULATION:
A B C D E F G H I J K L M N
N= 14 (Total number of elements in the population)
n= 5 (Sample size)
1. Start coding the elements starting from 00 2. Now use a calculator to generate
A 00 H 07 random numbers.
B 01 I 08 Rand# X maximum code
C 02 J 09 Rand# X 13= (Shift.X13+)
D 03 K 10 Codes generated= 07, 00, 10, 9, and 1
E 04 L 11 RANDOM SAMPLE OF SIZE 5=
F 05 M 12 07 H
G 06 N 13 00 A
10 K
RanInt(0,13)= 09 J
Alpha . SHIFT ,13)=RanInt(0,13) 01 B
PRACTICE QUESTION
From the following population draw a simple random sample of size 6.
ALI ASIM JUNAID VARISHA
SARA HADIA MAHAM SHAZIA
AMNA HAYA MARIA JAFFAR
ASMA HARIS MAAZ ALIZA
ANAS HAMNA SAROSH SHOAIB
ZUBIA FAIZAN DANIAL TANZIL
HAIDER FOUZIA EISHA HASSAN
STRATIFIED RANDOM SAMPLING

A second type of random sampling is stratified


random sampling, in which the population is
divided into non-overlapping subpopulations called
strata. The researcher then extracts a random
sample from each of the subpopulations.
The main reason for using stratified random
sampling is that it has the potential for reducing
sampling error. Sampling error occurs when, by
chance, the sample does not represent the
population. With stratified random sampling, the
potential to match the sample closely to the
population is greater than it is with simple random
sampling because portions of the total sample are
taken from different population subgroups
METHODS OF ALLOCATION USED TO DIVIDE THE
POPULATION INTO STRATA

1.EQUAL ALLOCATION
2.PROPORTIONAL ALLOCATION
3.OPTIMUM/ NEY-MAN ALLOCATION
EQUAL ALLOCATION METHOD
A sample of equal size is drawn from each stratum of the population.

Q1. Calculate the sample of size of each stratum using equal allocation method if
a sample of size 40 is to be drawn from the population.

STRATUM A STRATUM B STRATUM C STRATUM D TOTAL


70 30 40 60 200

N = size of the total population= 200


n = size of the sample= 40
Number of strata= 4
40/4= 10
ANSWER:
10 elements are to be drawn from each stratum so that the sample follows
an equal allocation method.
Proportional allocation method
The percentage of the sample taken from each stratum is proportionate to the
percentage that each stratum is within the whole population.

Q. Calculate the sample of size of each stratum using proportional allocation


method if a sample of size 40 is to be drawn from the population.
STRATUM 1 STRATUM 2 STRATUM 3 STRATUM 4 TOTAL
(N1) (N2) (N3) (N4)
70 30 40 60 200

FORMULA: nh = Nh* (n/N)


N= population size n= sample size Nh= number of elements in a stratum nh= sample
size of each stratum
Sample size of each stratum= Number of elements in each stratum* (Sample Size /
Population Size)
Sample size from Stratum 1 n1 = N1* (n/N) n1= 70* (40/200)= 14
Sample Size from Stratum 2 n2= N2* (n/N) n2= 30* (40/200)= 6
Sample Size from Stratum 3 n3 =N3* (n/N) n3= 40* (40/200)= 8
Sample Size from Stratum 4 n4 =N4* (n/N) n4= 60* (40/200)= 12
TOTAL= 14+6+8+12= 40

OPTIMUM/ NEY-MAN ALLOCATION METHOD

Neyman allocation is a method used to allocate sample to strata based on the


strata variances. This allocation considers the size of strata as well as variability.
FORMULA: nh= n* { (Nh*Sh)/(∑ Nh*Sh)}
nh = sample size of hth stratum
Nh= stratum size of the hth stratum
Sh= standard deviation of the hth stratum
n = total sample to be drawn from the population
EXAMPLE:
Calculate the sample size of each stratum by using optimum allocation if you
have to draw a sample of 30 units.
STRATUM 1 STRATUM 2 STRATUM 3 TOTAL
(n1) (n2) (n3)
Nh 70 (N1) 30 (N2) 100 (N3) 200
Sh 3 (S1) 9 (S2) 4 (S3)
Nh*Sh 210 270 400 880 (∑ Nh*Sh)

SOLUTION
n1 = 30 X ( 210/880) = 7.15 =7
n2 = 30 X (270/880) = 13.6 =14
n3 = 30 X (400/880) = 9.204 = 9
7+14+9= SAMPLE OF 30 UNITS
EXAMPLE 2:
STRATUM 1 STRATUM 2
23 (00) 1330 (00)
39 (01) 2394 (01)
47 (02) 3197 (02)
64 (03) 4280 (03)
28 (04) 5261 (04)
39 (05) 3194 (05)
48 (06)
67 (07)
29 (08)
38 (09)
42 (10)

Using the population given above find a sample of size 6


units by using optimum allocation method.
SOLUTION
• Find the standard deviation for both the strata.
• Standard deviation can be calculated using the calculator or by using the following
formula.

• Count the total number of elements in each strata.


Standard deviation for Stratum 1 = 13.89
Standard deviation for Stratum 2= 1380.3
n1 = 6 X { (11* 13.89)/ (13.89 *11)+(6*1380.3)} = 0.108 = 1
n2= 6 X { (6* 1380.3)/ (13.89 *11)+(6*1380.3)} = 5.89 = 5
• Code the population for each strata.
• Using a random number generator, find the sample of size 1 from Stratum 1 and a
sample of size 5 from Stratum 2.
• SAMPLE OF SIZE 6= 47, 3194, 4280, 2394, 1330, and 1330
SYSTEMATIC RANDOM SAMPLING

systematic sampling is used because


of its convenience and relative ease of
administration. With systematic
sampling, every k-th item is selected
to produce a sample of size n from a
population of size N. The value of k,
sometimes called the sampling cycle,
can be determined by the following
formula. If k is not an integer value,
the whole-number value should be
used.
DETERMINING THE VALUE OF k
k = N/n
where
n = sample size
N = population size
k = size of interval for selection
EXAMPLE
As an example of systematic sampling, a management information
systems researcher wanted to sample the manufacturers in Texas. He
had enough financial support to sample 1,000 companies (n). The
Directory of Texas Manufacturers listed approximately 17,000 total
manufacturers in Texas (N) in alphabetical order.
N=17000, n=1000
K=17000/1000,
Use random number code to select a value between 1 and 17 inclusive as
a starting point.
Suppose the selected number 5. He would have started with the 5th
company, then selected the 22nd (5 + 17), and then the 39th, and so on
EXAMPLE 1.9 Systematic Random Sampling
Sampling Student Opinions Recall
Example 1.8, in which Professor
Hassett wanted a sample of 15 of the
728 students enrolled in college
algebra at his school.
Use systematic random sampling to
obtain the sample.
Solution We apply Procedure 1.1.
Step 1 Divide the population size by the sample size and round the result down to the nearest whole
number, k.
The population size is the number of students in the class, which is 728, and the sample size is 15.
Dividing the population size by the sample size and rounding
down to the nearest whole number, we get 728/15 = 48 (rounded down). Thus,
k = 48

Step 2 Use a random-number table or a similar device to obtain a number, m, Between 1


and k.
Referring to Step 1, we see that we need to randomly select a number between 1 and 48.
Using a random-number table, we obtained the number 22 (but we could
have conceivably gotten any number between 1 and 48, inclusive). Thus, m = 22.

Step 3 Select for the sample those members of the population that are numbered m, k + m, m +
2k, . . . .
From Steps 1 and 2, we see that m= 22 and m = 48. Hence, we need to list
every 48th number, starting at 22, until we have 15 numbers. Doing so, we get the 15 numbers
displayed in Table 1.6.
TABLE 1.6
Numbers obtained by systematic
random sampling
22 166 310 454 598
70 214 358 502 646
118 262 406 550 694
Sampling Distribution of the
Sample Mean

For a variable x and a given sample size, the


distribution of the variable is called the
sampling distribution of the sample mean.

25
EXAMPLE
2 4 6 8
Q. Draw all possible samples of size 2 from the above population
1. Without replacement
2. With Replacement
3. Calculate sample mean of each sample and make a sampling distribution
of
4. Show that mean of all sample means is equal to population mean E(X)= 
SOLUTION:
1. Sample without replacement:
(2,4) (2,6) (2,8) (4,6) (4,8) (6,8)
1. Sample with replacement:
(2,4) (2,2) (2,6) (2,8) (4,2) (4,4) (4,6) (4,8) (6,2) (6,4)
(6,6) (6,8) (8,2) (8,4) (8,6) (8,8)
Samples without Sample mean (x)
replacement

(2,4) (2+4)/2 3

(2,6) (2+6)/2 4

(2,8) (2,8)/2 5

(4,6) (4,6)/2 5

(4,8) (4,8)/2 6

(6,8) (6,8)/2 7

total= 30
EXAMPLE

Sampling Distribution of the Sample Mean

Heights of Starting Players Suppose that the


population of interest consists of the five starting players
on a men’s basketball team, who we will call A, B, C, D,
and E. Further suppose that the variable of interest is
height, in inches. Table 7.1 lists the players and their
heights.
a. Obtain the sampling distribution of the sample
mean for samples of size 2.
b. Make some observations about sampling error
when the mean height of a random sample of two
players is used to estimate the population mean height. 28
C. Find the probability that, for a random sample of size 2, the
sampling error made in estimating the population mean by the
sample mean will be 1 inch orless; that is, determine the probability
that . x will be within 1 inch of μ.

29
Solution For future reference we first
compute the population mean height:

a. The population is so small that we can list the


possible samples of size 2. The first column of
Table 7.2 gives the 10 possible samples, the
second column the corresponding heights
(values of the variable “height”), and the third
column the sample means.

30
EXAMPLE 7.3 Sampling
Distribution of the Sample Mean
Heights of Starting Players Refer to Table 7.1, which
gives the heights of the five starting players on a
men’s basketball team.

a. Obtain the sampling distribution of the sample


mean for samples of size 4.
b. Make some observations about sampling error
when the mean height of a random sample of four
players is used to estimate the population mean
height.
c. Find the probability that, for a random sample of
size 4, the sampling error made in estimating the
population mean by the sample mean will be 1 inch or
less; that is, determine the probability that . x will be
within 1 inch of μ. 31
Solution
a. There are five possible samples of size 4. The first
column of Table 7.3 gives the possible samples, the
second column the corresponding heights (values of
the variable “height”), and the third column the sample
means. Figure 7.2 on the following page is a dotplot for
the distribution of the sample means.
b. From Table 7.3 or Fig. 7.2, we see that none of the samples of
size 4 has a mean equal to the population mean of 80 inches.
Thus, some sampling error is certain.
32
The Mean of the Sample Mean

There is a simple relationship between the mean of the


variable . and the mean of the variable under
consideration: They are equal, or
= μ.
In other words, for any particular sample size, the mean
of all possible sample means equals the population
mean. This equality holds regardless of the size of the
sample.

33
EXAMPLE 7.4 Mean of the Sample Mean

Heights of Starting Players The heights, in


inches, of the five starting players on a
men’s basketball team are repeated in Table
7.5. Here the population is the five players and
the variable is height.

a. Determine the population mean, μ.


b. Obtain the mean, , of the variable for
samples of size 2. Verify that the relation =μ
holds.
c. Repeat part (b) for samples of size 4.
34
Solution:
a. To determine the population mean (the mean
of the variable “height”), we apply Definition
3.11 on page 128 to the heights in Table 7.1:

Thus the mean height of the five players is 80 inches.

35
b. To obtain the mean of the variable for
samples of size 2, we again apply Definition 3.11,
but this time to . Referring to the third column of
Table 7.2
on page 299, we get

By part (a), μ = 80 inches. So, for samples of size 2, = μ.

Interpretation For samples of size 2, the mean of all


possible sample means equals the population mean.

36
37
The Standard Deviation of the Sample Mean

EXAMPLE 7.5

Standard Deviation of the Sample Mean


Heights of Starting Players Refer back to Table 7.5.
a. Determine the population standard deviation, σ.
b. Obtain the standard deviation, , of the variable .
x for samples of size 2. Indicate
any apparent relationship between and σ.
c. Repeat part (b) for samples of sizes 1, 3, 4, and 5.
d. Summarize and discuss the results obtained in
parts (a)–(c).

38
Solution
a. To determine the population standard deviation (the
standard deviation of the
variable “height”), we apply Definition 3.12 on page 130 to
the heights in Table 7.5. Recalling that μ = 80 inches, we
have

Thus the standard deviation of the heights of the


five players is 3.41 inches.

39
b. To obtain the standard deviation of the variable
for samples of size 2, we again apply Definition 3.12,
but this time to . x. Referring to the third column of
Table 7.2 on page 299 and recalling that = μ = 80
inches, we have

to two decimal places. Note that this result is not the


same as the population standard deviation, which is σ
= 3.41 inches. Also note that is smaller than σ.

40
c. Using the same procedure as in part (b), we
compute for samples of sizes 1, 3, 4, and 5 and
summarize the results in Table 7.6.

d. Table 7.6 suggests that the standard deviation of gets


smaller as the sample size gets larger. We could have predicted
this result from the dotplots shown in Fig. 7.3 on page 300 and the
fact that the standard deviation of a variable measures the variation
of its possible values.
41
Sampling Distribution of
Suppose a small finite population consists of only N = 8 numbers:
54 55 59 63 64 68 69 70
We can see the shape of the distribution of this population of data.
Suppose we take all possible samples of size n = 2 from
this population with replacement.
Observe the shape of the distributions.
Notice that even for small sample sizes, the distributions of sample means
for samples taken from the uniformly distributed population begin to “pile
up” in the middle. As sample sizes become much larger, the sample mean
distributions begin to approach a normal distribution and the variation
among the means decreases.
So far, we examined three populations with different distributions. However,
the sample means for samples taken from these populations appear to be
approximately normally
distributed, especially as the sample sizes become larger. What would
happen to the distribution of sample means if we studied populations with
differently shaped distributions?
The answer to that question is given in the central limit theorem.
CENTRAL LIMIT THEOREM
If samples of size n are drawn randomly from a population that has a
mean  of and a standard deviation  of , the sample means, , are
approximately normally distributed for sufficiently large sample sizes (n
≥30) regardless of the shape of the population distribution. If the
population is normally distributed, the sample means are normally
distributed for any size sample. From mathematical expectation,* it can
be shown that the mean of the sample means is the population mean.
=
and the standard deviation of the sample means (called the standard
error of the mean) is the standard deviation of the population divided by
the square root of the sample size.
EXAMPLE
In Table 7.4, the means and standard deviations of the means
are displayed for random samples of various sizes (n = 2
through n = 30) drawn from the uniform distribution of a = 10
and b = 30 shown in Figure 7.5. The population mean is 20,
and the standard deviation of the population is 5.774. Note
that the mean of the sample means for each sample size is
approximately 20 and that the standard deviation of the
sample means for each set of 90 samples is approximately
equal to .
We determine z scores for individual values from a normal distribution:

If sample means are normally distributed, the z score formula applied to


sample means would be
EXAMPLE
Sometimes in analyzing a sample, a researcher will choose to use the sample
proportion, denoted. If research produces measurable data such as weight,
distance, time, and income, the sample mean is often the statistic of choice.
However, if research results in countable items such as how many people in a
sample choose Dr. Pepper as their soft drink or how many people in a sample
have a flexible work schedule, the sample proportion is often the statistic of
choice. Whereas the mean is computed by averaging a set of values, the
sample proportion is computed by dividing the frequency with which a
given characteristic occurs in a sample by the number of items in the sample.
For example, in a sample of 100 factory workers, 30 workers might belong
to a union. The value of for this characteristic, union membership, is 30/
100 = .30.
In a sample of 500 businesses in suburban malls, if 10 are shoe stores, then
the sample proportion of shoe stores is 10 500 = .02.
The sample proportion is a widely used statistic and is usually computed
on questions involving Yes or No answers. For example, do you have at
least a high school education? Are you predominantly right-handed? Are
you female? Do you belong to the student accounting association?
EXAMPLE
A point estimate is a statistic taken from a sample that is used to estimate a
population parameter. A point estimate is only as good as the
representativeness of its sample.
If other random samples are taken from the population, the point estimates
derived from those samples are likely to vary. Because of variation in sample
statistics, estimating a population parameter with an interval estimate is often
preferable to using a point estimate.
An interval estimate (confidence interval) is a range of values within which
the analyst can declare, with some confidence, the population parameter
lies. Confidence intervals can be two sided or one sided. This text presents
only two-sided confidence intervals. How are confidence intervals
constructed?

You might also like