You are on page 1of 24

Inferential Statistics:

Sampling Distributions
& Hypothesis Testing
● Population  all possible values
● Sample  a portion of the population
● Statistical inference  generalizing from a sample to
a population with calculated degree of certainty
● Two forms of statistical inference
– Hypothesis testing
– Estimation
● Parameter  a characteristic of population, e.g., population
mean µ
● Statistic  calculated from data in the sample, e.g., sample
mean( x )
Hypothesis Testing
● Hypothesis testing is an essential procedure in statistics.
● A hypothesis test evaluates two mutually exclusive statements
about a population to determine which statement is best
supported by the sample data.
Hypothesis Testing
• Is also called significance testing
• Tests a claim about a parameter using evidence (data in a
sample)
• The procedure is broken into four steps
• Each element of the procedure must be understood
Hypothesis Testing Steps
A. Null and alternative hypotheses
B. Test statistic
C. P-value and interpretation
D. Significance level (optional)
Illustrative Example: “Body Weight”

The problem: In the 1970s, 20–29 year old men in the U.S.
had a mean μ body weight of 170 pounds. Standard deviation
σ was 40 pounds. We test whether mean body weight in the
population now differs.

Null hypothesis H0: μ = 170 (“no difference”)

The alternative hypothesis can be either Ha: μ > 170 (one-
sided test) or
Ha: μ ≠ 170 (two-sided test)
Test Statistic
This is an example of a one-sample test of a
mean when σ is known. Use this statistic to test
the problem:
x  0
z stat 
SE x
where  0  population mean assuming H 0 is true

and SE x 
n
Illustrative Example: z statistic

For the illustrative example, μ0 = 170

We know σ = 40

Take an SRS of n = 64. Therefore
 40
SE x   5
n 64

If we found a sample mean of 173, then

x   0 173  170
zstat    0.60
SE x 5
Illustrative Example: z statistic
If we found a sample mean of 185, then

x   0 185  170
zstat    3.00
SE x 5
x ~ N 170,5
Sampling distribution of xbar
under H0: µ = 170 for n = 64 
P-value

The P-value answer the question: What is the probability of
the observed test statistic or one more extreme when H0 is
true?

This corresponds to the AUC in the tail of the Standard
Normal distribution beyond the zstat.

Convert z statistics to P-value :
For Ha: μ> μ0  P = Pr(Z > zstat) = right-tail beyond zstat
For Ha: μ< μ0  P = Pr(Z < zstat) = left tail beyond zstat
For Ha: μμ0  P = 2 × one-tailed P-value

Use Table B or software to find these probabilities (next two
slides).
One-sided P-value for zstat of 0.6
One-sided P-value for zstat of 3.0
Two-Sided P-Value

One-sided Ha  AUC
in tail beyond zstat

Two-sided Ha 
consider potential
deviations in both Examples: If one-sided P =
directions  double 0.0010, then two-sided P = 2
the one-sided P-value × 0.0010 = 0.0020. If one-
sided P = 0.2743, then two-
sided P = 2 × 0.2743 =
0.5486.
Interpretation

P-value answer the question: What is the probability of
the observed test statistic … when H0 is true?

Thus, smaller and smaller P-values provide stronger
and stronger evidence against H0

Small P-value  strong evidence
Interpretation
Conventions*
P > 0.10  non-significant evidence against H0
0.05 < P  0.10  marginally significant evidence
0.01 < P  0.05  significant evidence against H0
P  0.01  highly significant evidence against H0

Examples
P =.27  non-significant evidence against H0
P =.01  highly significant evidence against H0

* It is unwise to draw firm borders for “significance”


Example

● you have a coin and you don’t know whether that is fair or tricky so let’s
decide null and alternate hypothesis
● H0 : a coin is a fair coin.
● H1 : a coin is a tricky coin. and alpha = 5% or 0.05
● Now let’s toss the coin and calculate p- value ( probability value).
● Toss a coin 1st time and result is tail- P-value = 50% (as head and tail
have equal probability)
● Toss a coin 2nd time and result is tail, now p-value = 50/2 = 25%
● and similarly we Toss 6 consecutive time and got result as P-value =
1.5% but we set our significance level as 95% means 5% error rate we
allow and here we see we are beyond that level i.e. our null- hypothesis
does not hold good so we need to reject and propose that this coin is a
tricky coin which is actually.
Two types of decision errors
Type I error = erroneous rejection of true H0
Type II error = erroneous retention of false H0 α ≡ probability of a Type I error
β ≡ Probability of a Type II error

Truth
Decision H0 true H0 false
Correct
Retain H0 Type II error
retention
Reject H0 Type I error Correct rejection
Maximizing and
Minimizing algebraic
equations in python
Linear Programming
● Linear programming is a set of techniques used in mathematical
programming, sometimes called mathematical optimization, to
solve systems of linear equations and inequalities while
maximizing or minimizing some linear function.
● It’s important in fields like scientific computing, economics,
technical sciences, manufacturing, transportation, military,
management, energy, and so on.
Linear Programming Problem
● The independent variables you need to
find—in this case x and y—are called the
decision variables.
● The function of the decision variables to
be maximized or minimized—in this case
z—is called the objective function, the
cost function, or just the goal.
● The inequalities you need to satisfy are
called the inequality constraints. You
can also have equations among the
constraints called equality constraints.
Solving optimization problems with SciPy
● from scipy.optimize import linprog
● linprog() solves only minimization (not maximization) problems
and doesn’t allow inequality constraints with the greater than or
equal to sign (≥).
● To work around these issues, you need to modify your problem
before starting optimization:
– Instead of maximizing z = x + 2y, you can minimize its negative(−z = −x −
2y).
– Instead of having the greater than or equal to sign, you can multiply the
yellow inequality by −1 and get the opposite less than or equal to sign (≤).
Output:
Solving optimization con: array([], dtype=float64)
fun: -20.714285714285715
problems with SciPy message: 'Optimization terminated
successfully.'
nit: 2
from scipy.optimize import linprog slack: array([0. , 0. , 9.85714286])
obj = [-1, -2] status: 0
success: True
lhs_ineq = [[ 2, 1],[-4, 5],[ 1, -2]] x: array([6.42857143, 7.14285714])
#constraint left side .con is the equality constraints residuals.
.fun is the objective function value at the optimum (if
rhs_ineq = [20,10,2] # constraint right side found).
.message is the status of the solution.
.nit is the number of iterations needed to finish the
bnd = [(0, float("inf")),(0, float("inf"))] # calculation.
.slack is the values of the slack variables, or the
bounds of x and y differences between the values of the left and right
sides of the constraints.
opt = linprog(c=obj, A_ub=lhs_ineq, .status is an integer between 0 and 4 that shows the
status of the solution, such as 0 for when the optimal
b_ub=rhs_ineq, solution has been found.
.success is a Boolean that shows whether the optimal
bounds=bnd,method="revised simplex") solution has been found.
.x is a NumPy array holding the optimal values of the
opt decision variables.

You might also like