Professional Documents
Culture Documents
STATISTIK-BAG 2
Paulina
STIE IBS
INTRODUCTION
Sampling Distribution
Post Test
introduction
• The reason we select a sample is to
collect data to answer a research
question about a population.
• The sample results provide only
estimates of the values of the
population characteristics.
• The reason is simply that the
sample contains only a portion of
the population.
• With proper sampling methods,
the sample results can provide
“good” estimates of the population
characteristics.
5
Sampling from a Finite Population
• Example: St. Andrew’s College
Step 1: Assign a random number to each of the 900 applicants.
The random numbers generated by Excel’s RAND function
follow a uniform probability distribution between 0 and 1.
6
Sampling from an Infinite Population
• Sometimes we want to select a sample, but find that it is not possible to obtain a
list of all elements in the population.
• As a result, we cannot construct a frame for the population.
7
Sampling from an Infinite Population
• Sometimes we want to select a sample, but find that it is not possible to obtain a
list of all elements in the population.
• As a result, we cannot construct a frame for the population.
8
Sampling from an Infinite Population
• Populations are often generated by an ongoing process where there is no upper
limit on the number of units that can be generated.
• Some examples of on-going processes with infinite populations are:
• parts being manufactured on a production line
• transactions occurring at a bank
• telephone calls arriving at a technical help desk
• customers entering a store
9
Sampling from an Infinite Population
• In the case of an infinite population, we must select a random sample in order
to make valid statistical inferences about the population from which the sample
is taken.
• A random sample from an infinite population is a sample selected such that the
following conditions are satisfied.
• Each element selected comes from the population of interest.
• Each element is selected independently.
10
HYPOTHESIS
Population Condition
H0 True H0 False
Conclusion (m < 12) (m > 12)
Accept H0 Correct
Type II Error
(Conclude m < 12) Decision
Reject H0 Correct
Type I Error Decision
(Conclude m > 12)
p-Value Approach to
One-Tailed Hypothesis Testing
A p-value is a probability that provides a measure
of the evidence against the null hypothesis
provided by the sample.
The p-value is used to determine if the null
hypothesis should be rejected.
The smaller the p-value, the more evidence there
is against H0.
A small p-value indicates the value of the test
statistic is unusual given the assumption that H0
is true.
Lower-Tailed Test About a Population Mean:
s Known
p-Value Approach p-Value < a ,
so reject H0.
a = .10 Sampling
distribution
of
p-value
72
z
z = -za = 0
-1.46 -1.28
Upper-Tailed Test About a Population Mean:
s Known
p-Value Approach p-Value < a ,
so reject H0.
Sampling
distribution
a = .04
of
p-Value
11
z
0 za = z=
1.75 2.29
Critical Value Approach to
One-Tailed Hypothesis Testing
The test statistic z has a standard normal probability
distribution.
We can use the standard normal probability
distribution table to find the z-value with an area
of a in the lower (or upper) tail of the distribution.
The value of the test statistic that established the
boundary of the rejection region is called the
critical value for the test.
The rejection rule is:
• Lower tail: Reject H0 if z < -z
• Upper tail: Reject H0 if z > z
Lower-Tailed Test About a Population Mean:
s Known
Critical Value Approach
Sampling
distribution
of
Reject H0
a 1
Do Not Reject H0
z
-za = -1.28 0
Upper-Tailed Test About a Population Mean:
s Known
Critical Value Approach
Sampling
distribution
of
Reject H0
Do Not Reject H0
z
0 za = 1.645
Steps of Hypothesis Testing
p-Value Approach
Step 4. Use the value of the test statistic to compute the
p-value.
Step 5. Reject H0 if p-value < a.
Steps of Hypothesis Testing
Test Statistic
where:
Interval Estimation of m 1 – m 2
Hypothesis Tests About m 1 – m 2
Estimating the Difference Between
Two Population Means
Let 1 equal the mean of population 1 and 2 equal
the mean of population 2.
The difference between the two population means is
1 - 2.
To estimate 1 - 2, we will select a simple random
sample of size n1 from population 1 and a simple
random sample of size n2 from population 2.
Let equal the mean of sample 1 and equal the
mean of sample 2.
The point estimator of the difference between the
means of the populations 1 and 2 is .
Hypothesis Tests About m 1 - m 2:
s 1 and s 2 Known
Hypotheses
Interval Estimation of m 1 – m 2
Hypothesis Tests About m 1 – m 2
Interval Estimation of 1 - 2:
s 1 and s 2 Unknown
When s 1 and s 2 are unknown, we will:
Interval Estimation of p1 - p2
Hypothesis Tests About p1 - p2
Hypothesis Tests about p1 - p2
Hypotheses
We focus on tests involving no difference between
the two population proportions (i.e. p1 = p2)
H0: p1 - p2 < 0
Ha: p1 - p2 > 0
Test Statistic
Simple Linear Regression
y = b0 + b1x +e
where:
b0 and b1 are called parameters of the model,
e is a random variable called the error term.
Simple Linear Regression Equation
Positive Linear Relationship
E(y)
Regression line
Intercept Slope b1
b0
is positive
x
Simple Linear Regression Equation
E(y)
Slope b1
is negative
x
Simple Linear Regression Equation
No Relationship
E(y)
Regression line
Intercept
b0
Slope b1
is 0
x
Estimated Simple Linear Regression Equation
Estimated
b0 and b1 Regression Equation
provide estimates of
b0 and b1 Sample Statistics
b0, b1
Least Squares Method
Least Squares Criterion
where:
yi = observed value of the dependent variable
for the ith observation
^
yi = estimated value of the dependent variable
for the ith observation
Least Squares Method
Slope for the Estimated Regression Equation
where:
xi = value of independent variable for ith
observation
yi = value of dependent variable for ith
_ observation
x = mean value for independent variable
_
y = mean value for dependent variable
Least Squares Method
r2 = SSR/SST
where:
SSR = sum of squares due to regression
SST = total sum of squares
Sample Correlation Coefficient
where:
b1 = the slope of the estimated regression
equation
Testing for Significance
To test for a significant regression relationship, we
must conduct a hypothesis test to determine whether
the value of b1 is zero.
s 2 = MSE = SSE/(n - 2)
where:
Testing for Significance
An Estimate of s
• To estimate s we take the square root of s 2.
• The resulting s is called the standard error of
the estimate.
Testing for Significance: t Test
Hypotheses
Test Statistic
where
Testing for Significance: t Test
Rejection Rule
where:
t is based on a t distribution
with n - 2 degrees of freedom
Testing for Significance: F Test
Hypotheses
Test Statistic
F = MSR/MSE
Testing for Significance: F Test
Rejection Rule
Reject H0 if
p-value < a
or F > F
where:
F is based on an F distribution with
1 degree of freedom in the numerator and
n - 2 degrees of freedom in the denominator
Testing for Significance
An Estimate of s 2
The mean square error (MSE) provides the estimate
of s 2, and the notation s2 is also used.
s 2 = MSE = SSE/(n - 2)
where:
Testing for Significance
An Estimate of s
• To estimate s we take the square root of s 2.
• The resulting s is called the standard error of
the estimate.
Testing for Significance: t Test
Hypotheses
Test Statistic
where
Testing for Significance: t Test
Rejection Rule
where:
t is based on a t distribution
with n - 2 degrees of freedom
Testing for Significance: F Test
Hypotheses
Test Statistic
F = MSR/MSE
Testing for Significance: F Test
Rejection Rule
Reject H0 if
p-value < a
or F > F
where:
F is based on an F distribution with
1 degree of freedom in the numerator and
n - 2 degrees of freedom in the denominator
Multiple Regression
Multiple Regression Model
Least Squares Method
Multiple Coefficient of Determination
Model Assumptions
Testing for Significance
Using the Estimated Regression Equation
for Estimation and Prediction
Qualitative Independent Variables
Residual Analysis
Logistic Regression
Multiple Regression Model
Multiple Regression Model
The equation that describes how the dependent
variable y is related to the independent variables
x1, x2, . . . xp and an error term is:
where:
b0, b1, b2, . . . , bp are the parameters, and
e is a random variable called the error term
Multiple Regression Equation
^
y = b0 + b1x1 + b2x2 + . . . + bpxp
Estimated Multiple
b0, b1, b2, . . . , bp Regression Equation
provide estimates of
b0, b1, b2, . . . , bp Sample statistics are
b0, b1, b2, . . . , bp
Least Squares Method
Least Squares Criterion