MATRIKULASI STATISTIK

MATRIKULASI
STATISTIK-BAG 2
Paulina
STIE IBS
INTRODUCTION
Sampling Distribution
agenda Hypothesis Testing
Post Test
introduction
• The reason we select a sample is to
collect data to answer a research
question about a population.
• The sample results provide only
estimates of the values of the
population characteristics.
• The reason is simply that the
sample contains only a portion of
the population.
• With proper sampling methods,
the sample results can provide
“good” estimates of the population
characteristics.
20XX presentation title 3

Sampling from a Finite Population

• Example: St. Andrew’s College
St. Andrew’s College received 900 applications for admission in the

upcoming year from prospective students. The applicants were numbered,
from 1 to 900, as their applications arrived. The Director of Admissions
would like to select a simple random sample of 30 applicants.
5
• Example: St. Andrew’s College
Step 1: Assign a random number to each of the 900 applicants.
The random numbers generated by Excel’s RAND function
follow a uniform probability distribution between 0 and 1.
Step 2: Select the 30 applicants corresponding to the 30 smallest random

numbers.
6
Sampling from an Infinite Population
• Sometimes we want to select a sample, but find that it is not possible to obtain a
list of all elements in the population.
• As a result, we cannot construct a frame for the population.
• Hence we cannot use the random number selection procedure.

• Most often this situation occurs in the case of infinite population.
7
• Sometimes we want to select a sample, but find that it is not possible to obtain a
list of all elements in the population.
• As a result, we cannot construct a frame for the population.
• Hence we cannot use the random number selection procedure.

• Most often this situation occurs in the case of infinite population.
8
• Populations are often generated by an ongoing process where there is no upper
limit on the number of units that can be generated.
• Some examples of on-going processes with infinite populations are:
• parts being manufactured on a production line
• transactions occurring at a bank
• telephone calls arriving at a technical help desk
• customers entering a store
9
• In the case of an infinite population, we must select a random sample in order
to make valid statistical inferences about the population from which the sample
is taken.
• A random sample from an infinite population is a sample selected such that the
following conditions are satisfied.
• Each element selected comes from the population of interest.
• Each element is selected independently.
10
HYPOTHESIS

Developing Null and Alternative Hypotheses
 Hypothesis testing can be used to determine whether
a statement about the value of a population parameter
should or should not be rejected.
 The null hypothesis, denoted by H0 , is a tentative
assumption about a population parameter.
 The alternative hypothesis, denoted by Ha, is the
opposite of what is stated in the null hypothesis.
 The alternative hypothesis is what the test is
attempting to establish.
Testing Research Hypotheses
• The research hypothesis should be expressed as
the alternative hypothesis.
• The conclusion that the research hypothesis is true
comes from sample data that contradict the null
hypothesis.
Testing the Validity of a Claim

• Manufacturers’ claims are usually given the benefit
of the doubt and stated as the null hypothesis.
• The conclusion that the claim is false comes from

sample data that contradict the null hypothesis.
Testing in Decision-Making Situations

• A decision maker might have to choose between
two courses of action, one associated with the null
hypothesis and another associated with the
alternative hypothesis.
• Example: Accepting a shipment of goods from a

supplier or returning the shipment of goods to the
supplier
Summary of Forms for Null and Alternative
Hypotheses about a Population Mean
 The equality part of the hypotheses always appears
in the null hypothesis.
 In general, a hypothesis test about the value of a
population mean  must take one of the following
three forms (where 0 is the hypothesized value of
the population mean).
One-tailed One-tailed Two-tailed

(lower-tail) (upper-tail)
Type I Error
 Because hypothesis tests are based on sample data,
we must allow for the possibility of errors.
 A Type I error is rejecting H0 when it is true.
 The probability of making a Type I error when the

null hypothesis is true as an equality is called the
level of significance.
 Applications of hypothesis testing that only control
the Type I error are often called significance tests.
Type II Error
 A Type II error is accepting H0 when it is false.

 It is difficult to control for the probability of making
a Type II error.
 Statisticians avoid the risk of making a Type II
error by using “do not reject H0” and not “accept H0”.
Type I and Type II Errors
Population Condition
H0 True H0 False
Conclusion (m < 12) (m > 12)
Accept H0 Correct
Type II Error
(Conclude m < 12) Decision
Reject H0 Correct
Type I Error Decision
(Conclude m > 12)
p-Value Approach to
One-Tailed Hypothesis Testing
 A p-value is a probability that provides a measure
of the evidence against the null hypothesis
provided by the sample.
 The p-value is used to determine if the null
hypothesis should be rejected.
 The smaller the p-value, the more evidence there
is against H0.
 A small p-value indicates the value of the test
statistic is unusual given the assumption that H0
is true.
Lower-Tailed Test About a Population Mean:
s Known
 p-Value Approach p-Value < a ,
so reject H0.
a = .10 Sampling
distribution
of
p-value
72
z
z = -za = 0
-1.46 -1.28
Upper-Tailed Test About a Population Mean:
s Known
 p-Value Approach p-Value < a ,
so reject H0.
Sampling
distribution
a = .04
of
p-Value
11
z
0 za = z=
1.75 2.29
Critical Value Approach to
One-Tailed Hypothesis Testing
 The test statistic z has a standard normal probability
distribution.
 We can use the standard normal probability
distribution table to find the z-value with an area
of a in the lower (or upper) tail of the distribution.
 The value of the test statistic that established the
boundary of the rejection region is called the
critical value for the test.
 The rejection rule is:
• Lower tail: Reject H0 if z < -z
• Upper tail: Reject H0 if z > z
Lower-Tailed Test About a Population Mean:
s Known
 Critical Value Approach
Sampling
distribution
of
Reject H0
a 1
Do Not Reject H0
z
-za = -1.28 0
Upper-Tailed Test About a Population Mean:
s Known
 Critical Value Approach
Sampling
distribution
of
Reject H0

Do Not Reject H0
z
0 za = 1.645
Steps of Hypothesis Testing
Step 1. Develop the null and alternative hypotheses.

Step 2. Specify the level of significance .
Step 3. Collect the sample data and compute the test
statistic.
p-Value Approach
Step 4. Use the value of the test statistic to compute the
p-value.
Step 5. Reject H0 if p-value < a.
Steps of Hypothesis Testing
Critical Value Approach

Step 4. Use the level of significanceto determine the
critical value and the rejection rule.
Step 5. Use the value of the test statistic and the rejection
rule to determine whether to reject H0.
Tests About a Population Mean: s Unknown
Test Statistic
This test statistic has a t distribution

with n - 1 degrees of freedom.
Tests About a Population Mean:
s Unknown
 Rejection Rule: p -Value Approach
Reject H0 if p –value < a
 Rejection Rule: Critical Value Approach
H0:  Reject H0 if t < -t
H0:  Reject H0 if t > t
H0:  Reject H0 if t < - t or t > t

p -Values and the t Distribution
 The format of the t distribution table provided in most
statistics textbooks does not have sufficient detail
to determine the exact p-value for a hypothesis test.
 However, we can still use the t distribution table to
identify a range for the p-value.
 An advantage of computer software packages is that
the computer output will provide the p-value for the
t distribution.
A Summary of Forms for Null and Alternative
Hypotheses About a Population Proportion
 The equality part of the hypotheses always appears
in the null hypothesis.
 In general, a hypothesis test about the value of a
population proportion p must take one of the
following three forms (where p0 is the hypothesized
value of the population proportion).
One-tailed One-tailed Two-tailed

(lower tail) (upper tail)
Tests About a Population Proportion
 Test Statistic
where:
assuming np > 5 and n(1 – p) > 5

Tests About a Population Proportion
 Rejection Rule: p –Value Approach

Reject H0 if p –value < a
 Rejection Rule: Critical Value Approach
H0: pp Reject H0 if z > z
H0: pp Reject H0 if z < -z
H0: pp Reject H0 if z < -z or z > z

Statistical Inferences About Means
and Proportions with Two Populations
 Inferences About the Difference Between
Two Population Means: s 1 and s 2 Known
Two Population Means: s 1 and s 2 Unknown
Two Population Means: Matched Samples
Inferences About the Difference Between
Two Population Means: s 1 and s 2 Known
Interval Estimation of m 1 – m 2
Hypothesis Tests About m 1 – m 2
Estimating the Difference Between
Two Population Means
 Let 1 equal the mean of population 1 and 2 equal
the mean of population 2.
 The difference between the two population means is
1 - 2.
 To estimate 1 - 2, we will select a simple random
sample of size n1 from population 1 and a simple
random sample of size n2 from population 2.
 Let equal the mean of sample 1 and equal the
mean of sample 2.
 The point estimator of the difference between the
means of the populations 1 and 2 is .
Hypothesis Tests About m 1 - m 2:
s 1 and s 2 Known
 Hypotheses
Left-tailed Right-tailed Two-tailed

 Test Statistic
Two Population Means: s 1 and s 2 Unknown
 Interval Estimation of m 1 – m 2
 Hypothesis Tests About m 1 – m 2
Interval Estimation of 1 - 2:
s 1 and s 2 Unknown
When s 1 and s 2 are unknown, we will:
• use the sample standard deviations s1 and s2

as estimates of s 1 and s 2 , and
• replace za/2 with ta/2.
Hypothesis Tests About m 1 - m 2:
s 1 and s 2 Unknown
 Hypotheses

 Test Statistic
Statistical Inferences About Means
and Proportions with Two Populations
Two Population Proportions
Two Population Proportions
Interval Estimation of p1 - p2
Hypothesis Tests About p1 - p2
Hypothesis Tests about p1 - p2
 Hypotheses
We focus on tests involving no difference between
the two population proportions (i.e. p1 = p2)
H0: p1 - p2 < 0
Ha: p1 - p2 > 0

 Standard Error of when p1 = p2 = p
 Pooled Estimator of p when p1 = p2 = p

 Test Statistic
Simple Linear Regression
 Managerial decisions often are based on the

relationship between two or more variables.
 Regression analysis can be used to develop an
equation showing how the variables are related.
 The variable being predicted is called the dependent
variable and is denoted by y.
 The variables being used to predict the value of the
dependent variable are called the independent
variables and are denoted by x.
Simple Linear Regression
 Simple linear regression involves one independent

variable and one dependent variable.
 The relationship between the two variables is
approximated by a straight line.
 Regression analysis involving two or more
independent variables is called multiple regression.
Simple Linear Regression Model
 The equation that describes how y is related to x and
an error term is called the regression model.
 The simple linear regression model is:
y = b0 + b1x +e
where:
b0 and b1 are called parameters of the model,
e is a random variable called the error term.
Simple Linear Regression Equation
 Positive Linear Relationship
E(y)
Regression line
Intercept Slope b1
b0
is positive
x
 Negative Linear Relationship
E(y)
Intercept Regression line

b0
Slope b1
is negative
x
 No Relationship
E(y)
Regression line
Intercept
b0
Slope b1
is 0
x
Estimated Simple Linear Regression Equation
 The estimated simple linear regression equation
• The graph is called the estimated regression line.

• b0 is the y intercept of the line.
• b1 is the slope of the line.
• is the estimated value of y for a given x value.
Estimation Process
Regression Model Sample Data:
y = b0 + b1x +e x y
Regression Equation x1 y1
E(y) = b0 + b1x . .
Unknown Parameters . .
b0, b1 xn yn
Estimated
b0 and b1 Regression Equation
provide estimates of
b0 and b1 Sample Statistics
b0, b1
Least Squares Method
Least Squares Criterion
where:
yi = observed value of the dependent variable
for the ith observation
^
yi = estimated value of the dependent variable
for the ith observation
Slope for the Estimated Regression Equation
where:
xi = value of independent variable for ith
observation
yi = value of dependent variable for ith
_ observation
x = mean value for independent variable
_
y = mean value for dependent variable
 y-Intercept for the Estimated Regression Equation

Coefficient of Determination
 The coefficient of determination is:
r2 = SSR/SST
where:
SSR = sum of squares due to regression
SST = total sum of squares
Sample Correlation Coefficient
where:
b1 = the slope of the estimated regression
equation
Testing for Significance
To test for a significant regression relationship, we
must conduct a hypothesis test to determine whether
the value of b1 is zero.
Two tests are commonly used:

t Test and F Test
Both the t test and F test require an estimate of s 2,

the variance of e in the regression model.
An Estimate of s 2
The mean square error (MSE) provides the estimate
of s 2, and the notation s2 is also used.
s 2 = MSE = SSE/(n - 2)
where:
An Estimate of s
• To estimate s we take the square root of s 2.
• The resulting s is called the standard error of
the estimate.
Testing for Significance: t Test
Hypotheses
Test Statistic
where
 Rejection Rule
Reject H0 if p-value < a

or t < -tor t > t
where:
t is based on a t distribution
with n - 2 degrees of freedom
Testing for Significance: F Test
 Hypotheses
 Test Statistic
F = MSR/MSE
 Rejection Rule
Reject H0 if
p-value < a
or F > F
where:
F is based on an F distribution with
1 degree of freedom in the numerator and
n - 2 degrees of freedom in the denominator
An Estimate of s 2
The mean square error (MSE) provides the estimate
of s 2, and the notation s2 is also used.
s 2 = MSE = SSE/(n - 2)
where:
An Estimate of s
• To estimate s we take the square root of s 2.
• The resulting s is called the standard error of
the estimate.
Hypotheses
Test Statistic
where
 Rejection Rule
Reject H0 if p-value < a

or t < -tor t > t
where:
t is based on a t distribution
with n - 2 degrees of freedom
 Hypotheses
 Test Statistic
F = MSR/MSE
 Rejection Rule
Reject H0 if
p-value < a
or F > F
where:
F is based on an F distribution with
1 degree of freedom in the numerator and
n - 2 degrees of freedom in the denominator
Multiple Regression
 Multiple Regression Model
 Least Squares Method
 Multiple Coefficient of Determination
 Model Assumptions
 Testing for Significance
 Using the Estimated Regression Equation
for Estimation and Prediction
 Qualitative Independent Variables
 Residual Analysis
 Logistic Regression
Multiple Regression Model
 Multiple Regression Model
The equation that describes how the dependent
variable y is related to the independent variables
x1, x2, . . . xp and an error term is:
y = b0 + b1x1 + b2x2 + . . . + bpxp + e
where:
b0, b1, b2, . . . , bp are the parameters, and
e is a random variable called the error term
Multiple Regression Equation
 Multiple Regression Equation

The equation that describes how the mean
value of y is related to x1, x2, . . . xp is:
E(y) = 0 + 1x1 + 2x2 + . . . + pxp

Estimated Multiple Regression Equation
 Estimated Multiple Regression Equation
^
y = b0 + b1x1 + b2x2 + . . . + bpxp
A simple random sample is used to compute sample

statistics b0, b1, b2, . . . , bp that are used as the point
estimators of the parameters b0, b1, b2, . . . , bp.
Estimation Process
Multiple Regression Model
Sample Data:
E(y) = 0 + 1x1 + 2x2 +. . .+ pxp + e
x1 x2 . . . xp y
Multiple Regression Equation
. . . .
E(y) = 0 + 1x1 + 2x2 +. . .+ pxp . . . .
Unknown parameters are
b0, b1, b2, . . . , bp
Estimated Multiple
b0, b1, b2, . . . , bp Regression Equation
provide estimates of
b0, b1, b2, . . . , bp Sample statistics are
b0, b1, b2, . . . , bp
Least Squares Criterion
 Computation of Coefficient Values

The formulas for the regression coefficients
b0, b1, b2, . . . bp involve the use of matrix algebra.
We will rely on computer software packages to
perform the calculations.
thank you
Paulina Harun
paulina.harun@ibs.ac.id

MATRIKULASI STATISTIK

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MATRIKULASI STATISTIK

Uploaded by

Copyright:

Available Formats

MATRIKULASI

agenda Hypothesis Testing

20XX presentation title 3

20XX presentation title 4

• Example: St. Andrew’s College

St. Andrew’s College received 900 applications for admission in the

Step 2: Select the 30 applicants corresponding to the 30 smallest random

• Hence we cannot use the random number selection procedure.

• Hence we cannot use the random number selection procedure.

20XX presentation title 11

Testing the Validity of a Claim

• The conclusion that the claim is false comes from

Testing in Decision-Making Situations

• Example: Accepting a shipment of goods from a

One-tailed One-tailed Two-tailed

 The probability of making a Type I error when the

 A Type II error is accepting H0 when it is false.

Step 1. Develop the null and alternative hypotheses.

Critical Value Approach

This test statistic has a t distribution

H0:  Reject H0 if t > t

H0:  Reject H0 if t < - t or t > t

One-tailed One-tailed Two-tailed

assuming np > 5 and n(1 – p) > 5

 Rejection Rule: p –Value Approach

H0: pp Reject H0 if z < -z

H0: pp Reject H0 if z < -z or z > z

Left-tailed Right-tailed Two-tailed

• use the sample standard deviations s1 and s2

Left-tailed Right-tailed Two-tailed

Left-tailed Right-tailed Two-tailed

 Standard Error of when p1 = p2 = p

 Pooled Estimator of p when p1 = p2 = p

 Managerial decisions often are based on the

 Simple linear regression involves one independent

 Negative Linear Relationship

Intercept Regression line

 The estimated simple linear regression equation

• The graph is called the estimated regression line.

 y-Intercept for the Estimated Regression Equation

 The coefficient of determination is:

Two tests are commonly used:

Both the t test and F test require an estimate of s 2,

Reject H0 if p-value < a

Reject H0 if p-value < a

y = b0 + b1x1 + b2x2 + . . . + bpxp + e

 Multiple Regression Equation

E(y) = 0 + 1x1 + 2x2 + . . . + pxp

 Estimated Multiple Regression Equation

A simple random sample is used to compute sample

 Computation of Coefficient Values

You might also like