You are on page 1of 32

|  

|  
  

A statistical population is the aggregate of all the units


pertaining to a study.
i.e. it is the set of all elements about which we wish to
make inferences.
A sample is a subset of a population.
The process of drawing a sample from a large
population is called sampling.
STATISTIC: Characteristic or measure obtained from a
sample.
PARAMETER: Characteristic or measure obtained
from a population.
A sampling distribution is the probability distribution,
under repeated sampling of the population, of a given
statistic.
þ Consider a very large population.
þ Assume we repeatedly take samples
of a given size from the population
and calculate the sample mean for
each sample.
þ Different samples will lead to
different sample means.
þ The distribution of these means is
the ³sampling distribution of the
sample mean´.
Khen all of the possible sample means are
computed, then the following properties are true:
þThe mean of the sample means will be the mean
of the population (ȝ).
þThe variance of the sample means will be the
variance of the population divided by the sample
size (ı2/n).
þThe standard deviation of the distribution of a sample
statistic is known as the        

The nature of the sampling distribution depends on


the distribution of the population and/or the
statistic being considered and the sample size
used.
@   
  
 

    
           
  

    

 
        
 
        
  
 !  
  "  
   
     
         
  
  "
Testing of Hypothesis
Hypothesis is an assumption about a population
A few examples are as follows:
1. Mean purchases made by females (ȝ1) is more than
or equal to the mean purchases made by males (ȝ2)
in a textile stores (ȝ1 > ȝ2).
2. Mean age of female shoppers (ȝ1) is less than or
equal to that of male shoppers (ȝ2) in a book
exhibition (ȝ1 < ȝ2).
3. Mean monthly income of buyers (ȝ) in a shop is
more than or equal to Rs 10000\- (ȝ > 10000).
4. The mean stay-over time of customers (ȝ) in a shop
is at most 45 minutes (ȝ < 45).
Definitions
   : It is a function of population values.

| 
: It is a function of sample values.
  

: It is an assumption about the
population parameter which the statement of no
change. It is denoted by H0.

     

: It is the statement of
assumption which can be considered to be the
alternative to the null hypothesis is called the
     . It is denoted by H1.
As long as there is no apparent contradiction to
the null hypothesis, we retain this belief. But,
when we find observations contradicting it, there
is a reason to suspect the validity of this null
hypothesis and the problem of testing the null
hypothesis arises.
Khen we proceed to test H0, we must be aware
of the assumption that is expected to be valid if
null hypothesis turns out to be valid if null
hypothesis turns out to be invalid. This
assumption is known as alternative hypothesis.
H0: The mean I.Q. of all persons in a city is 105

þ H1: The mean I.Q. of all persons in the city is 100


þ(if it is known that the mean I.Q. is 105 or 100 and
nothing else)
þOR
þ H1: The mean I.Q. of all the persons in the city is less
than 105
þ(if it is known that the mean I.Q. is not more than 105)
þOR
þ H1: The mean I.Q. of all the persons in the city is more
than 105
þ(if it is known that the mean I.Q. is not less than 105)
þOR
þ H1: The mean I.Q. of all the persons is not equal to 105
þ(if any information is absent)
The first thing to do when given a claim is to
write the claim mathematically (if possible), and
decide whether the given claim is the null or
alternative hypothesis.

If the given claim contains equality, or a


statement of no change from the given or
accepted condition, then it is the null hypothesis,
otherwise, if it represents change, it is the
alternative hypothesis.
Example
"
  said Dr. X to Captain K.
Mr. S, as the science officer, is put in charge of
statistically determining the correctness of Xs'
statement and deciding the fate of the crew member
(to vaporize or try to revive)
His first step is to arrive at the hypothesis to be
tested.
Does the statement represent a change in previous
condition?
Yes, there is change, thus it is the alternative
hypothesis, H1
No, there is no change, therefore is the null
hypothesis, H0
The correct answer is that there is change.
Dead represents a
   

 
   of alive.
The null hypothesis always represents 

.
Therefore, the hypotheses are:

þ H0: Patient is alive.


þ H1: Patient is not alive (dead).
   |||
      

: Set up a null hypothesis based
on the belief and an appropriate alternate hypothesis.

2. |  |  |!  "#: The confidence with


which a null hypothesis is rejected or accepted depends upon
the significance level used for the purpose.

A level of significance say 5% means the risk of making a


wrong decision is only in 5 out of 100 cases. Level of
significance widely used is 5% or 1%. Thus, a 1% level of
significance provides greater confidence to the decision than a
5% significance level as the risk of making wrong decision is
only in 1 out of 100 cases. It is denoted by a Greek alphabet
alpha (Į). Khere (1 ± Į) is the CONFIDENCE LEVEL.
3. | 
   : The test criterion is selected
on the basis of sample size. If the sample is large (n O
30), the z-test implying normal distribution is used;
whereas if the sample size is small (n < 30), the t-test
is more suitable. The most commonly used tests are z,
t, F and Ȥ2.
A corresponding TEST STATISTIC is calculated.
4. 
    : The Test Statistic calculated in
the previous step is now classified to fall within the
acceptance region or the rejection region at the given
level of significance. Accordingly the null hypothesis
is accepted or rejected.
5. 
: On the basis of the decision the
conclusion is stated.
  | |$%
þ The problem of testing of a hypothesis is
actually a problem of deciding whether to
accept or to reject the null hypothesis H0, in
favor of alternate hypothesis H1.

þ The decision of rejecting or accepting of the


null hypothesis is taken on the basis of
observations made only on a sample of units
selected from the population. This decision
cannot be always correct. Khen this decision
is not correct, an error is said to occur.
States of nature are something that you, as a
decision maker has no control over.
Either it is, or it isn't. This represents the true
nature of things.

Possible states of nature _   

þ Crew member is alive (H0 true /H1 false )


þ Crew member is dead (H0 false / H1 true)
Decisions are something that you have control
over.
You may make a correct decision or an incorrect
decision.
It depends on the state of nature as to whether
your decision is correct or incorrect.
Possible decisions _    / conclusions
_  
 
þ Reject H0 if sufficient evidence to say patient
is dead, is available
þ Fail to Reject H0 if sufficient evidence to
say patient is dead, is not available
Statistically speaking
à State at eà
ecisi nà eà alseà
e ect à àà à àà à

 à à
 à à
 àà  àà
ail t àà à àà à
e ect à   à à   à à
 àà  àà
à |   à

 à àà  àà


 àà  à à
 à
àà
à
à à  àà à
 àà  à
àà
à

Following table gives the
possibilities that exist in reality.

Null Hypothesis H0 is
  
 

& ' Type I Error No Error

Decision   & ' No Error Type II Error


˜   

  ˜

˜   
 

  
˜

K!  ( 



 
 
)
 )
"#!
! 
To design a good test we would like to arrive at a
decision criterion in such a way that none of the two
errors, (Type I Error and Type II Error) occur.

But when P(Type I Error) ĺ 0, P(Type II Error) ĺ 1


& when P(Type II Error) ĺ 0, P(Type I Error) ĺ 1

Hence, no test can be perfect. Ke therefore design a


test such that one of the two probabilities is restricted
to a small value Į (0 < Į < 1 and Į is closer to 0) and
then minimize probability of the other error.
The error in rejecting H0, when it is true (Type I
Error) is more serious error than (Type II Error),
therefore an upper limit is put on P(Type I Error)
and P(Type II Error) is simultaneously
minimized. This upper limit is known as level of
significance.
Thus, a test is so designed that
P(Type I Error) < Į
then Į is called # !
! 
Hence, Į = Max. P(Type I Error).
|  
In p-value of the test
statistic is less than the
level of significance Į,
reject H0.

  



! 


In order to test different parameters, for


different sample sizes and comparisons of
such parameters for multiple populations,
different statistical distributions are used.
˜
 


˜
 ˜
 
˜  


 ˜  


 



 
 
 

 
         

         !

 ˜!
 
 
  



˜!




 

  
˜ #
"


 #   ˜#  

!# 
#  $# 
For testing association between two variables
*|+  
!  !
   



Expected frequencies are calculated using the
following formula:
 ð 
E=
X
O= Observed frequencies
 !  
    # 
*|+  
! 

!  



Expected frequencies are calculated


depending upon the distribution.
˜ 

You might also like