Professional Documents
Culture Documents
Department of Economics
Faculty of Economics and Business
Universitas Hasanuddin
Outline
1. Basic statistic analysis: tests for one or two means
2. Basic statistic analysis: bivariate correlation
3. Univariate OLS Regression
4. Empirical Analysis using STATA
Goals of today’s workshop
• Understand the concept of some basic statistic analysis: t-test &
correlation
• Have basic knowledge of how the univariate OLS model works and the
assumptions of the model
• Gain insight on how to perform the basic statistic analysis and
univariate OLS regression in STATA
1. Basic Statistic Analysis
Types of Statistical Methods
• In doing econometric research, lots of statistical tests are performed
• Key insight of statistics: one can learn about a character of population
by selecting a random sample from that population
• Using statistical methods, we can use this random sample to draw
statistical inferences about characteristics of the full population
• Three types of statistical methods are used throughout econometrics:
1. Estimation
2. Hypothesis testing
3. Confidence intervals
Types of Statistical Methods: Estimation
• Estimation is a statistical method that computes a “best guess” numerical
value for an unknown characteristic of a population by using a sample data
• An estimator is a function of sample data that are drawn randomly from a
population used to infer an estimate for an unknown parameter. Hence:
1. An estimator is a random variable because of the randomness in selecting the sample
2. An estimate is a numerical value of the estimator when it is actually computed using
data from a specific sample
• Ex:
1. Sample mean is the estimator of population mean
2. An estimator of the population variance is the sample variance
3. An estimator of the population covariance/correlation is the sample
covariance/correlation
Desirable Properties of the Estimator
•1. Unbiasedness
• is an unbiased estimator of if . is the mean of the sampling distribution of
2. Consistency
• As the sample size increases, the sampling distribution of the estimator
becomes increasingly concentrated at the true parameter value
• is a consistent estimator of , if as n gets larger
3. Efficiency
• The estimator has the smallest variance among the unbiased estimators
• is more efficient than if
Hypothesis testing : terminology (1)
• Hypothesis
testing is a process of formulating a specific hypothesis
about the population, then using the sample evidence to decide
whether it is true
• The null hypothesis is that the population men takes on a specific value:
• The two-sided alternative hypothesis specifies what is true if is not:
• Type I error: is rejected when in fact it is true
• Type II error: is not rejected when in fact it is not true
• Significance level of the test: The probability of a type I error: . Often
prespecified at 5%
Hypothesis testing : terminology (2)
• The power of the test: The probability that the test correctly rejects
when the alternative is true:
• The p-value : The smallest significance level at which you can reject . If
the significance level is prespecified at 5%, we reject if
• A test statistic is a statistic used to perform a hypothesis test:
E.g. the t-statistic:
Hypothesis testing & Confidence interval
• Critical value of the test statistic: The value of the test statistic for
which the test just rejects at the prespecified significance level
• Rejection region: Set of values of the t-statistic for which the test
rejects
• Confidence interval: A set of all values constructed from a random
sample that contains the true population mean with a prespecified
probability
t-test
•• t-test
is one of statistical tests used to perform the hypothesis testing about the
population mean
• The procedure is the same as described in the previous section
• We need to formulate a specific hypothesis before:
• Null hypothesis:
• Alternative hypothesis:
• Set the prespecified significance level : e.g. 5%
• Perform the t-statistic test:
• To note:
Correlation ≠ Causality !
Correlation and covariance (2)
• The sample covariance and correlation are estimators of the
population covariance and correlation
• The sample covariance is denoted by :
• The OLS would minimize the square of by selecting and , so the first
order condition (FOC) :
Estimating the coefficients of the linear
regression model (2)
• Solving the minimization problem would give:
•
• and are the estimators of and respectively
• and are random variables, and hence functions of data
• Therefore, and have sampling distrbutions
Estimating the coefficients of the linear
regression model (3)
• The OLS predicted values, , and are:
• The variance of the OLS estimators is often computed assuming homoscedastic error terms:
• Often, in econometric research, we would like to prove that a variable has a causal effect on the
outcome/dependent variable, recall that:
• Using the sample data, we would like to estimate , by using its estimator
• After having the sample evidence, we would like to test whether or not the estimate of the causal effect
can be accepted or statistically significant. If not, we cannot prove that the identified causal effect can be
inferred to the population coefficient, so:
Hypothesis test in the regression with a single
regressor (3)
• Three steps of testing the hypothesis about the slope :
1. Compute the standard error of ,
2. Compute the t-statistic
3. Compute the p-value. Reject the hypothesis at the 5% significance level if
the p-value is less than 0.05 or, equivalently, if
• The standard error and (typically) the t-statistic and p-value testing
are computed automatically by the software
3.STATA Practice : basic statistic & univariate OLS
model
Introduction to the dataset
• In this session we use the
Earnings_and_Height dataset
• This is the dataset used in the research by
Professors Anna Case (Princeton University)
and Christina Paxson (Brown University) in
their paper “Stature and Status: Height, Ability
and Labor Market Outcomes” published in
Journal of Political Economy, 2008
• The dataset contains the data on earnings,
height, and other characteristics of a random
sample of US workers
• The dataset is supplied by Pearson Education,
the publisher of Introduction to Econometrics,
Stock & Watson
The setting of the analysis
• Now, we would like to run the regression of Earnings on Height, and
estimate this model: