You are on page 1of 57

Applied Statistics Lab using R

Dr. Nikunja Bihari Barik


Ph.D (IIT BBS)
Assistant Professor
Vellore Institute of Technology

Andhra Pradesh

May 6, 2021
(Module-3) Inference for Decision Making-I

The formation of a data-based decision procedure can be produce a


conclusion about some scientific system.

STAT (MAT2001) Dr. Nikunja 2/4


(Module-3) Inference for Decision Making-I

The formation of a data-based decision procedure can be produce a


conclusion about some scientific system.
A medical researcher may decide on the basis of experimental evidence
whether alcohol drinking increases the risk of cancer in humans.

STAT (MAT2001) Dr. Nikunja 2/4


(Module-3) Inference for Decision Making-I

The formation of a data-based decision procedure can be produce a


conclusion about some scientific system.
A medical researcher may decide on the basis of experimental evidence
whether alcohol drinking increases the risk of cancer in humans.
A sociologist might wish to collect appropriate data to enable him or her
to decide whether a person’s blood type and eye color are independent
variables.
Population The complete collection of all the elements to be studied.

STAT (MAT2001) Dr. Nikunja 2/4


(Module-3) Inference for Decision Making-I

The formation of a data-based decision procedure can be produce a


conclusion about some scientific system.
A medical researcher may decide on the basis of experimental evidence
whether alcohol drinking increases the risk of cancer in humans.
A sociologist might wish to collect appropriate data to enable him or her
to decide whether a person’s blood type and eye color are independent
variables.
Population The complete collection of all the elements to be studied.
Sample A sub collection of elements drawn from a population.

STAT (MAT2001) Dr. Nikunja 2/4


(Module-3) Inference for Decision Making-I

The formation of a data-based decision procedure can be produce a


conclusion about some scientific system.
A medical researcher may decide on the basis of experimental evidence
whether alcohol drinking increases the risk of cancer in humans.
A sociologist might wish to collect appropriate data to enable him or her
to decide whether a person’s blood type and eye color are independent
variables.
Population The complete collection of all the elements to be studied.
Sample A sub collection of elements drawn from a population.
Hypothesis A statistical hypothesis is an assertion or conjecture
(statement) about the parameters of one or more populations. The
decision making procedure about the hypothesis is called hypothesis
testing.

STAT (MAT2001) Dr. Nikunja 2/4


Hypothesis
Suppose we are interested in deciding whether or not the mean burning
rate of a solid propellant used to power aircrew escape systems is 50
centimeters.

STAT (MAT2001) Dr. Nikunja 3/4


Hypothesis
Suppose we are interested in deciding whether or not the mean burning
rate of a solid propellant used to power aircrew escape systems is 50
centimeters.
H0 : µ = 50 centimeters per second.
H1 : µ 6= 50 centimeters per second.

STAT (MAT2001) Dr. Nikunja 3/4


Hypothesis
Suppose we are interested in deciding whether or not the mean burning
rate of a solid propellant used to power aircrew escape systems is 50
centimeters.
H0 : µ = 50 centimeters per second.
H1 : µ 6= 50 centimeters per second.
Null hypothesis H0 , is a working model that we adopt temporarily or for
the sake of argument.
Alternative hypothesis, which we denote H1 or Ha , contains all the
values of the parameter that we will consider plausible if we reject the null
hypothesis.

STAT (MAT2001) Dr. Nikunja 3/4


Hypothesis
Suppose we are interested in deciding whether or not the mean burning
rate of a solid propellant used to power aircrew escape systems is 50
centimeters.
H0 : µ = 50 centimeters per second.
H1 : µ 6= 50 centimeters per second.
Null hypothesis H0 , is a working model that we adopt temporarily or for
the sake of argument.
Alternative hypothesis, which we denote H1 or Ha , contains all the
values of the parameter that we will consider plausible if we reject the null
hypothesis.
Alternative hypothesis specifies values of µ that could be either greater or
less than 50 centimeters per second, it is called a two-sided alternative
hypothesis. Some situations, we may wish to formulate a one-sided
alternative hypothesis.
STAT (MAT2001) Dr. Nikunja 3/4
Significant level

Test the validity of H0 against H1 at certain level of significance, i.e. 5%


or 1% etc.

STAT (MAT2001) Dr. Nikunja 4/4


Significant level

Test the validity of H0 against H1 at certain level of significance, i.e. 5%


or 1% etc.
Suppose we will not reject the null hypothesis H0 : µ = 50, if
48.5 ≤ x̄ ≤ 51.5, we will reject the null hypothesis in favor of the
alternative hypothesis H1 : µ 6= 50.

STAT (MAT2001) Dr. Nikunja 4/4


Significant level

Test the validity of H0 against H1 at certain level of significance, i.e. 5%


or 1% etc.
Suppose we will not reject the null hypothesis H0 : µ = 50, if
48.5 ≤ x̄ ≤ 51.5, we will reject the null hypothesis in favor of the
alternative hypothesis H1 : µ 6= 50.
The values of x̄ that are less than 48.5 and greater than 51.5 constitute
the critical region for the test,
While all values that are in the interval
48.5 ≤ x̄ ≤ 51.5 form a region for a region
for which we will fail to reject the null
hypothesis is called the acceptance
region. The boundaries between the
critical regions and the acceptance region
are called the critical values.
STAT (MAT2001) Dr. Nikunja 4/4
Significant level

Test the validity of H0 against H1 at certain level of significance, i.e. 5%


or 1% etc.
Suppose we will not reject the null hypothesis H0 : µ = 50, if
48.5 ≤ x̄ ≤ 51.5, we will reject the null hypothesis in favor of the
alternative hypothesis H1 : µ 6= 50.
The values of x̄ that are less than 48.5 and greater than 51.5 constitute
the critical region for the test,
While all values that are in the interval
48.5 ≤ x̄ ≤ 51.5 form a region for a region
for which we will fail to reject the null
hypothesis is called the acceptance
region. The boundaries between the
critical regions and the acceptance region
are called the critical values.
STAT (MAT2001) Dr. Nikunja 4/4
Errors of Sampling

Type I error or α error


If the null hypothesis H0 is true but it is rejected by test procedure, then
the error made is called Type I error.
Type II error or β error If the null hypothesis H0 is false but it is
accepted by test, the error committed is called Type II error.

STAT (MAT2001) Dr. Nikunja 5/4


Errors of Sampling

Type I error or α error


If the null hypothesis H0 is true but it is rejected by test procedure, then
the error made is called Type I error.
Type II error or β error If the null hypothesis H0 is false but it is
accepted by test, the error committed is called Type II error.

STAT (MAT2001) Dr. Nikunja 5/4


Errors of Sampling

Type I error or α error


If the null hypothesis H0 is true but it is rejected by test procedure, then
the error made is called Type I error.
Type II error or β error If the null hypothesis H0 is false but it is
accepted by test, the error committed is called Type II error.

Example A certain type of Covid-19 vaccine is known to be only 25%


effective after a period of 2 years. To determine if a new and some what
more expensive vaccine is superior in providing protection against the same
virus for a longer period of time for 20 people are chosen randomly from
the sample.

STAT (MAT2001) Dr. Nikunja 5/4


Errors of Sampling

Example A certain type of Covid-19 vaccine is known to be only 25%


effective after a period of 2 years. To determine if a new and some what
more expensive vaccine is superior in providing protection against the same
virus for a longer period of time for 100 people are chosen randomly from
the sample.
Answer

STAT (MAT2001) Dr. Nikunja 6/4


Properties of Test of Hypothesis

1. The type I error and type II error are related. A decrease in the
probability of one generally results in an increase in the probability of the
other.
2. The size of the critical region, and therefore the probability of
committing a type I error, can always be reduced by adjusting the critical
value(s).
3. An increase in the sample size n will reduce α and β simultaneously.

STAT (MAT2001) Dr. Nikunja 7/4


One and Two tailed tests

STAT (MAT2001) Dr. Nikunja 8/4


Choice of Null and Alternative hypotheses
Example A manufacturer of a certain brand of rice cereal claims that the
average saturated fat content does not exceed 1.5 grams per serving.
State the null and alternative hypotheses to be used in testing this claim
and determine where the critical region is located.

STAT (MAT2001) Dr. Nikunja 9/4


Choice of Null and Alternative hypotheses
Example A manufacturer of a certain brand of rice cereal claims that the
average saturated fat content does not exceed 1.5 grams per serving.
State the null and alternative hypotheses to be used in testing this claim
and determine where the critical region is located.
Answer H0 : µ = 1.5
H1 : µ < 1.5

STAT (MAT2001) Dr. Nikunja 9/4


Choice of Null and Alternative hypotheses
Example A manufacturer of a certain brand of rice cereal claims that the
average saturated fat content does not exceed 1.5 grams per serving.
State the null and alternative hypotheses to be used in testing this claim
and determine where the critical region is located.
Answer H0 : µ = 1.5
H1 : µ < 1.5
Example A real estate agent claims that 60% of all private residences
being built today are 3−bedroom homes. To test this claim, a large
sample of new residences is inspected; the proportion of these home with 3
bedrooms is recorded and used as the test statistic. State the null and
alternative hypotheses to be used in this test and determine the location
of the critical region.

STAT (MAT2001) Dr. Nikunja 9/4


Choice of Null and Alternative hypotheses
Example A manufacturer of a certain brand of rice cereal claims that the
average saturated fat content does not exceed 1.5 grams per serving.
State the null and alternative hypotheses to be used in testing this claim
and determine where the critical region is located.
Answer H0 : µ = 1.5
H1 : µ < 1.5
Example A real estate agent claims that 60% of all private residences
being built today are 3−bedroom homes. To test this claim, a large
sample of new residences is inspected; the proportion of these home with 3
bedrooms is recorded and used as the test statistic. State the null and
alternative hypotheses to be used in this test and determine the location
of the critical region.
Answer H0 : p = 0.6
H1 : p 6= 0.6
STAT (MAT2001) Dr. Nikunja 9/4
Use of P-values for decision making in Testing Hypotheses

For 1% level of significance


acceptance region is α = 1%
For Two tailed
P (z1 < z < z2 ) = 1 − α = 1 − 0.01 = 0.99 Since acceptance region is
symmetric about mean
P (0 < z < z2 ) = 0.99
2 = 0.495
The area under the normal curve with 0.495 is z = 2.58
rejection region is = 0.5 − 0.495 = 0.005

Level of significance Two tailed Left tailed Right tailed


1% zα = 2.58 zα = −2.33 zα = 2.33
5% zα = 1.96 zα = −1.64 zα = 1.64

STAT (MAT2001) Dr. Nikunja 10 / 4


Test of hypothesis for large samples

How to make decision?


Compare the test statistic z with the critical values zα at given level of
significance (α).
1. If |z| < zα , we conclude that it is not significant, we accept the null
hypothesis.

STAT (MAT2001) Dr. Nikunja 11 / 4


Test of hypothesis for large samples

How to make decision?


Compare the test statistic z with the critical values zα at given level of
significance (α).
1. If |z| < zα , we conclude that it is not significant, we accept the null
hypothesis.
2. If |z| > zα , then the difference is significant and hence we reject the
null hypothesis.

STAT (MAT2001) Dr. Nikunja 11 / 4


Test of hypothesis for large samples

How to make decision?


Compare the test statistic z with the critical values zα at given level of
significance (α).
1. If |z| < zα , we conclude that it is not significant, we accept the null
hypothesis.
2. If |z| > zα , then the difference is significant and hence we reject the
null hypothesis.
Under large sample test, the following are the important tests to test the
significance z-test
1. Testing the significance of single mean
2. Testing the significance of difference of means
3. Testing the significance of single proportion
4. Testing the significance of difference proportions

STAT (MAT2001) Dr. Nikunja 11 / 4


Procedure for z-test for single mean

Aim To test whether the difference between sample mean and population
mean is significant or not.
Null hypothesis: µ = µ0
Alternative hypothesis: µ 6= µ0 or µ > µ0 or µ < µ0
Level of significance: choose either 1% or 5%
x̄−µ
Test statistic : z = (σ/ √
n)

STAT (MAT2001) Dr. Nikunja 12 / 4


Procedure for z-test for single mean

Aim To test whether the difference between sample mean and population
mean is significant or not.
Null hypothesis: µ = µ0
Alternative hypothesis: µ 6= µ0 or µ > µ0 or µ < µ0
Level of significance: choose either 1% or 5%
x̄−µ
Test statistic : z = (σ/ √
n)
Conclusion: Compare the test statistic z with the critical value zα at given
level of significance (α).
If |z| < zα , we conclude that it is not significant, we accept the null
hypothesis.
If |z| > zα , then the difference is significant and hence we reject the null
hypothesis.

STAT (MAT2001) Dr. Nikunja 12 / 4


z-test

Example A manufacturer of sports equipment has developed a new


synthetic fishing line that the company claims has a mean breaking
strength of 8 kilometers with a standard deviation of 0.5 kilogram. Test
the hypothesis that µ = 8 kilograms against the alternative µ 6= 8
kilograms if a random sample of 50 lines is tested and found to have a
mean breaking strength of 7.8 kilograms. Use a 0.01 level of significance.

STAT (MAT2001) Dr. Nikunja 13 / 4


z-test

Example A manufacturer of sports equipment has developed a new


synthetic fishing line that the company claims has a mean breaking
strength of 8 kilometers with a standard deviation of 0.5 kilogram. Test
the hypothesis that µ = 8 kilograms against the alternative µ 6= 8
kilograms if a random sample of 50 lines is tested and found to have a
mean breaking strength of 7.8 kilograms. Use a 0.01 level of significance.
Example A random sample of 100 recorded deaths in the United states
during the past year showed an average life span of 71.8 years. Assuming a
population standard deviation of 8.9 years, does this seem to indicate that
the mean life span today is greater than 70 years? Use a 0.05 level of
significance.

STAT (MAT2001) Dr. Nikunja 13 / 4


P-value approach

P-value approach
Probability of obtaining a sample ”more extreme” than the ones observed
in your data, assuming null hypothesis is true.
Conclusion of P-value approach
If P ≤ α, then Reject H0
If P > α, then fail to Reject H0

STAT (MAT2001) Dr. Nikunja 14 / 4


P-value approach

P-value approach
Probability of obtaining a sample ”more extreme” than the ones observed
in your data, assuming null hypothesis is true.
Conclusion of P-value approach
If P ≤ α, then Reject H0
If P > α, then fail to Reject H0
Example A random sample of 100 recorded deaths in the United states
during the past year showed an average life span of 71.8 years. Assuming a
population standard deviation of 8.9 years, does this seem to indicate that
the mean life span today is greater than 70 years? Use a 0.05 level of
significance by p−value approach.

STAT (MAT2001) Dr. Nikunja 14 / 4


Student t-test

x̄−µ

t=
(s/ (n))
ν = n −P1 degrees of freedom.
p (x−x̄)2
s = ( n−1 ) = Sample standard deviation.
Conclusion: Compare the test statistic t with the tabulated t value at
given level of significance (α) with ν degree of freedom.
If |t| < tα , we conclude that it is not significant, we accept the null
hypothesis.
If |t| > tα , then the difference is significant and hence we reject the null
hypothesis.

STAT (MAT2001) Dr. Nikunja 15 / 4


Student t-test

Where to use Z-test or t-test

STAT (MAT2001) Dr. Nikunja 16 / 4


Student t-test

Where to use Z-test or t-test


1. If sample size n is large (n ≥ 30),— apply z−test
2. If sample size n is small (n ≤ 29),— apply t−test
3. If population standard deviation σ is known — apply z−test
4. If population standard deviation σ is not known — apply t−test

STAT (MAT2001) Dr. Nikunja 16 / 4


Student t-test

Where to use Z-test or t-test


1. If sample size n is large (n ≥ 30),— apply z−test
2. If sample size n is small (n ≤ 29),— apply t−test
3. If population standard deviation σ is known — apply z−test
4. If population standard deviation σ is not known — apply t−test
Example The Edison Electric Institute has published figures on the
number of kilowatt hours used annually by various home appliances. It is
claimed that a vacuum cleaner uses an average of 46 kilowatt hours per
year. If a random sample of 12 homes included in a planned study
indicates that vacuum cleaners use an average of 42 kilowatt hours per
year with a standard deviation of 11.9 kilowatt hours, does this suggest at
the 0.05 level of significance that vacuum cleaners use, on average, less
than 46 kilowatt hours annually? Assume the population of kilowatt hours
to be normal.
STAT (MAT2001) Dr. Nikunja 16 / 4
Two sample test

To compare customers satisfaction on two different products.

STAT (MAT2001) Dr. Nikunja 17 / 4


Two sample test

To compare customers satisfaction on two different products. To compare


weight of students in two different universities.
Test concerning two means represent a set of very important analytical
tools for the scientist or engineer. Two independent random samples of
sizes n1 and n2 respectively, are drawn from two populations with means
µ1 and µ2 and variances σ12 and σ22 . We know that the random variable
has a standard normal distribution.

(X̄1 − X̄2 ) − (µ1 − µ2 )


Z= q
σ12 /n1 + σ22 /n2

STAT (MAT2001) Dr. Nikunja 17 / 4


Two sample test

STAT (MAT2001) Dr. Nikunja 18 / 4


Two sample test

Example To compare customer satisfaction levels of two competing cable


television companies, 174 customers of company 1 and 355 customers of
company 2 were randomly selected and were asked to rate their cable
companies on a five-point scale, with 1 being least satisfied and 5 most
satisfied. The survey results are summarized in the following table

Company 1 Company 2
n1 = 174 n2 = 355
x¯1 = 3.51 x¯2 = 3.24
σ1 = 0.51 σ2 = 0.52

Test at the 1% level of significance whether the dat provide sufficient


evidence to conclude that Company 1 has a higher mean satisfaction
rating than does Company 2. Use the critical value approach.
STAT (MAT2001) Dr. Nikunja 19 / 4
Two sample test

Example An experiment was performed to compare the abrasive wear of


two different laminated materials. Twelve pieces of material 1 were tested
by exposing each piece to a machine measuring wear. Ten pieces of
material 2 were similarly tested. In each case, the depth of wear was
observed. The samples of material 1 gave an average (coded) wear of 85
units with a sample standard deviation of 4, while the samples of material
2 gave an average of 81 with a sample standard deviation of 5. Can we
conclude at the 0.05 level of significance that the abrasive wear of material
1 exceeds that of material 2 by more than 2 units? Assume the
populations to be approximately normal with equal variances.

STAT (MAT2001) Dr. Nikunja 20 / 4


Two sample test

Example Who earns more married or unmarried people of the following


table

Married Unmarried
X̄ $639.60 $658.20
σ $60 $90
n 40 60

Test at the 5% level of significance by using the critical value approach.

STAT (MAT2001) Dr. Nikunja 21 / 4


Test on a single Proportion

Politicians are certainly interested in knowing what fraction of the voters


will favor them in the next election.

STAT (MAT2001) Dr. Nikunja 22 / 4


Test on a single Proportion

Politicians are certainly interested in knowing what fraction of the voters


will favor them in the next election. All manufacturing firms are concerned
about the proportion of defective items when a shipment is made.

STAT (MAT2001) Dr. Nikunja 22 / 4


Test on a single Proportion

Politicians are certainly interested in knowing what fraction of the voters


will favor them in the next election. All manufacturing firms are concerned
about the proportion of defective items when a shipment is made.
H0 : p = p0
H1 : p < p0 use binomial distribution to compute the P −value
P = P (X ≤ x when p = p0 ).

STAT (MAT2001) Dr. Nikunja 22 / 4


Test on a single Proportion

Politicians are certainly interested in knowing what fraction of the voters


will favor them in the next election. All manufacturing firms are concerned
about the proportion of defective items when a shipment is made.
H0 : p = p0
H1 : p < p0 use binomial distribution to compute the P −value
P = P (X ≤ x when p = p0 ).
H0 : p = p0
H1 : p > p0 use binomial distribution to compute the P −value
P = P (X ≥ x when p = p0 ).

STAT (MAT2001) Dr. Nikunja 22 / 4


Test on a single Proportion

Politicians are certainly interested in knowing what fraction of the voters


will favor them in the next election. All manufacturing firms are concerned
about the proportion of defective items when a shipment is made.
H0 : p = p0
H1 : p < p0 use binomial distribution to compute the P −value
P = P (X ≤ x when p = p0 ).
H0 : p = p0
H1 : p > p0 use binomial distribution to compute the P −value
P = P (X ≥ x when p = p0 ).
H0 : p = p0
H1 : p 6= p0 use binomial distribution to compute the P −value
P = 2P (X ≤ x when p = p0 ) if x < np0 .
P = 2P (X ≥ x when p = p0 ) if x > np0 .

STAT (MAT2001) Dr. Nikunja 22 / 4


Test on a single Proportion

Decision reject H0 in favor H1 if the computed P −value is less than or


equal to α.

STAT (MAT2001) Dr. Nikunja 23 / 4


Test on a single Proportion

Decision reject H0 in favor H1 if the computed P −value is less than or


equal to α.
Procedures 1. H0 : p = p0
2. One of the alternatives H1 : p < p0 , p > p0 , or p 6= p0 .
3. Choose a level of significance equal to α.
4. Test statistics: Bionomial variable X with p = p0 .
5. Computations: Find x, the number of successes, and compute the
appropriate P −value.
6. Decision: Draw appropriate conclusions based on the the P −value.

STAT (MAT2001) Dr. Nikunja 23 / 4


Test on a single Proportion

Example A builder claims that heat pumps are installed in 70% of all
homes being constructed today in the city of Vijayawada, AP. Would you
agree with this claim if a random survey of a new homes in this city showed
that 8 out of 15 had heat pumps installed? Use a 0.10 level of significance.
Note:
1. If n is small, use binomial distribution.
2. If n is very large and p approaches to 0, use Poisson distribution
(µ = np0 ).
3. For continuous random variable use normal distribution
z = √x−np 0
np0 q0 =
√p̂−p0 , p̂ = x/n
p0 q0 /n

STAT (MAT2001) Dr. Nikunja 24 / 4


Test on a single Proportion

Example A commonly prescribed drug for relieving nervous tension is


believed to be only 60% effective. Experimental results with a new drug
administered to a random sample of 100 adults who were suffering from
nervous tension show that 70 received relief. Is this sufficient evidence to
conclude that the new drug is superior to the one commonly prescribed?
Use a 0.05 level of significance.

STAT (MAT2001) Dr. Nikunja 25 / 4


Test on a double Proportion

Difference between two proportion for example difference between the


proportion of smokers with lung cancer and nonsmokers with long cancer.
Let p1 and p2 be the proportion of sample sizes n1 and n2 and the number
of affected peoples x1 and x2 respectively.
pˆ1 = nx11 and pˆ2 = nx22
The common proportion P = pˆ1 nn11 + pˆ2 n2
+n2 = nx11 +n
+x2
2
Test statistic z = qpˆ1 −pˆ2 −(p1 −p2 )
P (1−P )( n1 + n1 )
1 2

STAT (MAT2001) Dr. Nikunja 26 / 4


Test on a double Proportion

Difference between two proportion for example difference between the


proportion of smokers with lung cancer and nonsmokers with long cancer.
Let p1 and p2 be the proportion of sample sizes n1 and n2 and the number
of affected peoples x1 and x2 respectively.
pˆ1 = nx11 and pˆ2 = nx22
The common proportion P = pˆ1 nn11 + pˆ2 n2
+n2 = nx11 +n
+x2
2
Test statistic z = qpˆ1 −pˆ2 −(p1 −p2 )
P (1−P )( n1 + n1 )
1 2
Example Dell company manufactures laptops. For quality control, two
sets of laptops were tested. In the first group, 32 out of 800 were found to
contain some sort of defect. In the second group, 30 out of 500 were
found to have a defect. Is the difference between the two groups
significant with 0.05 level of significance.

STAT (MAT2001) Dr. Nikunja 26 / 4


Test on a double Proportion

Example You are testing two flu drugs A and B. Drug A works on 41
people out of a sample of 195. Drug B works on 351 people in a sample of
605. Are two drugs comparable? Use a 5% level of significance.

STAT (MAT2001) Dr. Nikunja 27 / 4


Test on a double Proportion

Example You are testing two flu drugs A and B. Drug A works on 41
people out of a sample of 195. Drug B works on 351 people in a sample of
605. Are two drugs comparable? Use a 5% level of significance.
Example A candidate for election may switch in city A but not in city B.
A sample of 500 voters from city A showed that 59.6% were favored him
where as a sample of 300 voters from B showed that 50% of the voters
favored him. Discuss whether he switch has produced any effect on voters
in city A. Use a 5% level of significance.

STAT (MAT2001) Dr. Nikunja 27 / 4

You might also like