Professional Documents
Culture Documents
Sampling Distributions
& Hypothesis Testing
● Population all possible values
● Sample a portion of the population
● Statistical inference generalizing from a sample to
a population with calculated degree of certainty
● Two forms of statistical inference
– Hypothesis testing
– Estimation
● Parameter a characteristic of population, e.g., population
mean µ
● Statistic calculated from data in the sample, e.g., sample
mean( x )
Hypothesis Testing
● Hypothesis testing is an essential procedure in statistics.
● A hypothesis test evaluates two mutually exclusive statements
about a population to determine which statement is best
supported by the sample data.
Hypothesis Testing
• Is also called significance testing
• Tests a claim about a parameter using evidence (data in a
sample)
• The procedure is broken into four steps
• Each element of the procedure must be understood
Hypothesis Testing Steps
A. Null and alternative hypotheses
B. Test statistic
C. P-value and interpretation
D. Significance level (optional)
Illustrative Example: “Body Weight”
●
The problem: In the 1970s, 20–29 year old men in the U.S.
had a mean μ body weight of 170 pounds. Standard deviation
σ was 40 pounds. We test whether mean body weight in the
population now differs.
●
Null hypothesis H0: μ = 170 (“no difference”)
●
The alternative hypothesis can be either Ha: μ > 170 (one-
sided test) or
Ha: μ ≠ 170 (two-sided test)
Test Statistic
This is an example of a one-sample test of a
mean when σ is known. Use this statistic to test
the problem:
x 0
z stat
SE x
where 0 population mean assuming H 0 is true
and SE x
n
Illustrative Example: z statistic
●
For the illustrative example, μ0 = 170
●
We know σ = 40
●
Take an SRS of n = 64. Therefore
40
SE x 5
n 64
●
If we found a sample mean of 173, then
x 0 173 170
zstat 0.60
SE x 5
Illustrative Example: z statistic
If we found a sample mean of 185, then
x 0 185 170
zstat 3.00
SE x 5
x ~ N 170,5
Sampling distribution of xbar
under H0: µ = 170 for n = 64
P-value
●
The P-value answer the question: What is the probability of
the observed test statistic or one more extreme when H0 is
true?
●
This corresponds to the AUC in the tail of the Standard
Normal distribution beyond the zstat.
●
Convert z statistics to P-value :
For Ha: μ> μ0 P = Pr(Z > zstat) = right-tail beyond zstat
For Ha: μ< μ0 P = Pr(Z < zstat) = left tail beyond zstat
For Ha: μμ0 P = 2 × one-tailed P-value
●
Use Table B or software to find these probabilities (next two
slides).
One-sided P-value for zstat of 0.6
One-sided P-value for zstat of 3.0
Two-Sided P-Value
●
One-sided Ha AUC
in tail beyond zstat
●
Two-sided Ha
consider potential
deviations in both Examples: If one-sided P =
directions double 0.0010, then two-sided P = 2
the one-sided P-value × 0.0010 = 0.0020. If one-
sided P = 0.2743, then two-
sided P = 2 × 0.2743 =
0.5486.
Interpretation
●
P-value answer the question: What is the probability of
the observed test statistic … when H0 is true?
●
Thus, smaller and smaller P-values provide stronger
and stronger evidence against H0
●
Small P-value strong evidence
Interpretation
Conventions*
P > 0.10 non-significant evidence against H0
0.05 < P 0.10 marginally significant evidence
0.01 < P 0.05 significant evidence against H0
P 0.01 highly significant evidence against H0
Examples
P =.27 non-significant evidence against H0
P =.01 highly significant evidence against H0
● you have a coin and you don’t know whether that is fair or tricky so let’s
decide null and alternate hypothesis
● H0 : a coin is a fair coin.
● H1 : a coin is a tricky coin. and alpha = 5% or 0.05
● Now let’s toss the coin and calculate p- value ( probability value).
● Toss a coin 1st time and result is tail- P-value = 50% (as head and tail
have equal probability)
● Toss a coin 2nd time and result is tail, now p-value = 50/2 = 25%
● and similarly we Toss 6 consecutive time and got result as P-value =
1.5% but we set our significance level as 95% means 5% error rate we
allow and here we see we are beyond that level i.e. our null- hypothesis
does not hold good so we need to reject and propose that this coin is a
tricky coin which is actually.
Two types of decision errors
Type I error = erroneous rejection of true H0
Type II error = erroneous retention of false H0 α ≡ probability of a Type I error
β ≡ Probability of a Type II error
Truth
Decision H0 true H0 false
Correct
Retain H0 Type II error
retention
Reject H0 Type I error Correct rejection
Maximizing and
Minimizing algebraic
equations in python
Linear Programming
● Linear programming is a set of techniques used in mathematical
programming, sometimes called mathematical optimization, to
solve systems of linear equations and inequalities while
maximizing or minimizing some linear function.
● It’s important in fields like scientific computing, economics,
technical sciences, manufacturing, transportation, military,
management, energy, and so on.
Linear Programming Problem
● The independent variables you need to
find—in this case x and y—are called the
decision variables.
● The function of the decision variables to
be maximized or minimized—in this case
z—is called the objective function, the
cost function, or just the goal.
● The inequalities you need to satisfy are
called the inequality constraints. You
can also have equations among the
constraints called equality constraints.
Solving optimization problems with SciPy
● from scipy.optimize import linprog
● linprog() solves only minimization (not maximization) problems
and doesn’t allow inequality constraints with the greater than or
equal to sign (≥).
● To work around these issues, you need to modify your problem
before starting optimization:
– Instead of maximizing z = x + 2y, you can minimize its negative(−z = −x −
2y).
– Instead of having the greater than or equal to sign, you can multiply the
yellow inequality by −1 and get the opposite less than or equal to sign (≤).
Output:
Solving optimization con: array([], dtype=float64)
fun: -20.714285714285715
problems with SciPy message: 'Optimization terminated
successfully.'
nit: 2
from scipy.optimize import linprog slack: array([0. , 0. , 9.85714286])
obj = [-1, -2] status: 0
success: True
lhs_ineq = [[ 2, 1],[-4, 5],[ 1, -2]] x: array([6.42857143, 7.14285714])
#constraint left side .con is the equality constraints residuals.
.fun is the objective function value at the optimum (if
rhs_ineq = [20,10,2] # constraint right side found).
.message is the status of the solution.
.nit is the number of iterations needed to finish the
bnd = [(0, float("inf")),(0, float("inf"))] # calculation.
.slack is the values of the slack variables, or the
bounds of x and y differences between the values of the left and right
sides of the constraints.
opt = linprog(c=obj, A_ub=lhs_ineq, .status is an integer between 0 and 4 that shows the
status of the solution, such as 0 for when the optimal
b_ub=rhs_ineq, solution has been found.
.success is a Boolean that shows whether the optimal
bounds=bnd,method="revised simplex") solution has been found.
.x is a NumPy array holding the optimal values of the
opt decision variables.