Unit 1 - Part 1

Financial Econometrics
Unit 1
Course Instructor: Aparna Krishna
Winter 2023 – 2024
IIT (ISM) Dhanbad
About the Course
• Objectives:
• Provide knowledge of modern econometric techniques commonly
employed in the finance literature.
• Learning Outcomes:
• Understand the essential foundations of time series models.
• Construct and evaluate forecast models using financial time-series.
Explain and apply models of volatility using financial time-series.
Course Outline
• Unit 1: Foundations of time series models – construct and forecast models
• Unit 2: Multivariate models, Modelling long-run relationships in finance
• Unit 3: Understanding and estimating volatility models
• Unit 4: Understanding, constructing and estimating panel data models

Reference Books
• Introductory Econometrics for Finance, 2nd Edition, Chris Brooks,
Cambridge University Press (2014)
• Introduction to Econometrics, 4th Edition, Christopher Dougherty,

Oxford University Press (2011)
• Basic Econometrics, Fifth Edition, Damodar N Gujarati, Dawn C

Porter, Sangeetha Gunasekar
Evaluation Components
• 2 Assignments (in groups): 20% weightage
• Midsemester exam: 32% weightage
• End semester exam: 48% weightage

Econometrics: application of mathematical statistics to
economic data to lend empirical support to the models
Descriptive Statistics vs Inferential Statistics
• Descriptive Stats: Describes sample of a population

• Inferential Stats: Makes statement about a population through a sample
Example: Descriptive Stats
• Describes and evaluates properties

of a sample
• Common ways:
• Location parameter (Mean,
Median, Modal value, Sum)
• Dispersion parameter (Standard
deviation, Variance, Range)
• Frequency tables
• Graphics
Example: Empirical Stats
Tests statements about a population on basis of sample characteristics
• Helps us draw conclusions that go

beyond available data
• Techniques:
• Simple test procedures (t-test,
Binominal test, chi-square test..)
• Regression analysis (simple linear,
multiple reg, logistical reg..)
• Correlation analysis (Pearson,
Spearman rank..)
Econometrics: application of mathematical statistics to
economic data to lend empirical support to the models
Economic Model Vs Econometrics Model
• Economic model: Theoretical construct which represents economic processes by a set
of variables and logical relationships between them
• Example: Economic model describing wage determinants for workers
Wage = f(education, experience, training)
• More generally,
Y = f(X1, X2, X3)
Y: explained variable; X: explanatory variable; F: unknown function which connects x and y
• Qs: Can this model be tested?

Economic Model Vs Econometrics Model
• Econometric models: Specify relationship between variables + lend themselves to
empirical testing
• Example: Econometric model describing wage determinants for workers

Wage = β0 + β1 education + β2 experience + β3 training + u
• More generally,
Y = β0 + β1x1 + β2x2 + β3x3 + u
Y: dependent variable
x1’ x2’ x3: independent variable
Β0, β1, β2, β3: coefficients to be estimated
u: error term representing combined effect of omitted variables
Goals of Econometric Model
Wage = β0 + β1 education + β2 experience + β3 training + u
This model can be used to understand:

• What is the effect of education on wage (value of β1)
• What is the magnitude of β1?

• Is β1 significantly different from zero?
Econometrics - Goals
• Econometrics may be used to:
• Estimate relationships between economic variables
• Test economic hypotheses and theories
• Evaluate effectiveness of a new policy
• Forecast economic variables
Types of Data and Notation
• 3 types of data used in econometrics:
1. Time series data
2. Cross-sectional data
3. Panel data, a combination of 1 & 2
Time Series Data
• Observations are over time on one or more variables
• Data collected with certain regularity.
• Example: GNP or unemployment (monthly, or quarterly); government budget deficit
(annually); money supply (weekly); value of a stock market index (as transactions
occur)
• Examples of problems that could be tackled using a Time Series Regression
• How a country’s stock index has varied with that country’s macroeconomic
fundamentals.
• How a company’s stock price has varied with announcement of dividend payment.
• The effect on a country’s currency of an increase in its interest rate
Time Series Regression Equation
• Individual observations numbered by t, total number of
observations available for analysis given by T
Cross-Sectional Data
• Observations on one or more variables at a single point in time
• Examples:
- A poll of usage of internet stock broking services
- Cross-section of stock returns on the New York Stock Exchange
- A sample of bond credit ratings for UK banks
• Examples of problems that could be tackled using a Cross-Sectional

Regression
- Relationship between age and internet usage
- Relationship between company size and stock returns
- Relationship between bank ownership and credit ratings
Cross-Section Regression Equation
• Individual observations numbered by i, total number of
observations available for analysis given by I
Panel Data
• Has the dimensions of both time series and cross-sections
• Example: Annual income information and age of individuals over a nine-

year period
• In regression equation observations are identified by both respondent id

and time period
Continuous and Discrete Data
• Continuous data: take on any value and are not confined to take specific
numbers
• For example, the rental yield on a property could be 6.2%, 6.24%, or
6.238%.
• Discrete data: take on certain values. Eg:

• Integers: the number of people in a particular train or the number of
shares traded during a day
• Count numbers: many financial asset prices quoted to the nearest 1/16 or
1/32 of a dollar.
Cardinal, Ordinal and Nominal Numbers
• Cardinal numbers: actual numerical values that a particular variable takes have meaning;
equal distance between the numerical values.
• Example: price of a share or of a building, and the number of houses in a street.
• Ordinal numbers: only provide position or an ordering.

• Example: position of a runner in a race
• Example: for an ordinal scale, a figure of 12 may be viewed as `better' than a figure of
6, but could not be considered twice as good. Examples of ordinal numbers would be
the position of a runner in a race.
• Nominal numbers: no natural ordering of the values at all.

• Example: arise when numerical values are arbitrarily assigned, such as telephone
numbers or when codings are assigned to qualitative data (e.g. when describing the
exchange that a US stock is traded on.
• Cardinal, ordinal and nominal variables may require different modelling approaches or at
least different treatments
Overview of Classical Linear Regression Model (CLRM)
Regression analysis is concerned with the study of the dependence of one variable,
the dependent variable, on one or more other variables, the explanatory variables,
with a view to estimating and/or predicting the (population) mean or average value
of the former in terms of the known values of the latter.
Example
• Qs: Do people’s expenses increase as their income increases?
Data:
Income Expense
80 55 • What is dependent variable? Independent
80 60 variable?
80 65 • What do different income levels denote?
80 70
• Please compute:
80 75
a) Average value of expenses
100 65
b) Average value of expenses when income is 80
100 70
and 100 respectively
100 74
100 80
100 85
100 88
Conditional vs Unconditional Mean
• Average value dependent variable: unconditional expected value, or E(Y)
• Average value of dependent variable for various sub-populations: conditional

expected value, or E(Y | X)
• The ‘average value’ in definition of regression is the conditional expected value.

Population Regression Line (1)
Population Regression Line (2)
• Population Regression Line: Indicates the expected value of
dependent variable conditional on one or more independent
variables
• Also known as population regression curve
• Purpose of regression: to map PRL which best represents

conditional mean of Y given certain value of X
Notations: PRF
E(Y | Xi ) = f (Xi )
Conditional Expectation Function or Population Regression Function
E(Y | Xi ) = β1 + β2Xi
Linear Regression Function or Linear regression equation
β1 and β2: Unknown but fixed parameters

Linearity in Variables
• X would be in single power
• Eg: E(Y | Xi ) = β1 + β2Xi
• Geometrically, regression curve

would be a straight line
Linearity in Parameters
• Linear in β, not necessarily
in X
• ‘Linear’ in regression
always means linear in
parameters, and not
necessarily in explanatory
variables
Stochastic Specification of PRF (1)
• Stochastic: having a random probability
distribution or pattern that may be
analysed statistically but may not be
predicted precisely
• Given an explanatory variable, values of

dependent variable are clustered around
average
• Need to take into account deviation of

individual Yi from expected value
• Deviation of individual Yi around expected value: y
ui = Yi − E(Y | Xi )
yi
Rearrange:
ûi
Yi = E(Y | Xi ) + ui ŷi
ui: stochastic disturbance or stochastic error term

• Is an unobservable random variable
• Conditional mean is 0, i.e. E(ui| Xi ) = 0

xi x
Need for a Stochastic Disturbance Term
• Disturbance term ui is a surrogate for all those variables that are
omitted from the model but that collectively affect Y
• Why not include all those variables in regression equation:
• Vagueness of theory
• Unavailability of data
• Core variables vs peripheral variables
• Intrinsic randomness in human behaviour
• Poor proxy variable
• Principle of parsimony
• Wrong functional form
Example 1: Food and Total Expenditure
Food Expenditure
• As total expenditure increases, on 700
average expenditure on food also

600
increases
500
• However, after greater variability after 400
total expenditure exceeds 600 300
• Food exp does not increase linearly 200
forever. Once basic needs are satisfied, 100
people spend relatively less on food 0

0 100 200 300 400 500 600 700 800 900
Example 2: Gender-wise scores
Reading
Maths
535
550
530
540
525
530
520 520
515 510
510 500
505 490
480
500
470
495
460
490 1970 1975 1980 1985 1990 1995 2000 2005 2010
1970 1975 1980 1985 1990 1995 2000 2005 2010
Male Female
Male Female
• Downward trend in reading scores

• Upward trend in maths scores
Example 2: Gender-wise scores
• Regress maths score on 550
reading score 540 y = 1.3567x - 181.57

R² = 0.2282
530
• Compute co-efficient 520
(Layout 9 in quick layout) 510
Maths
500
490
480
470
460
490 495 500 505 510 515 520 525 530 535
Reading
Example 3: Income and Scores
Reading Maths
1.2 1.2
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 1.2 0 0.2 0.4 0.6 0.8 1 1.2
• Scores increase as average Writing

1.2
incomes increase 1
• Reasons? 0.8
0.6
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1 1.2
Ordinary Least Squares
Sample Regression Function
Population data. Accurate but Sample data. Readily available

unavailable
but inaccurate
Fluctuations due to Sampling
Sample Regression Function
Key Takeaway
• Objective of regression analysis: to estimate PRF
• On the basis of the SRF

Making Best Use of SRF
• SRF may over- or

underestimate PRF
• Need a rule or method to

approximate SRF as close as
possible to PRF
Minimizing Deviation Between Actual and Computed
• SRF may be made more accurate by

reducing difference between the
actual and estimates Y values
• 2 problems:
- Each residual will get equal weight
- Residuals may cancel out each other
• Solution: Minimise square of

difference between actual and
imputed value
Ordinary Least Squares
• A method for choosing parameters in a linear regression equation
• Principal of least squares is followed: sum of squares of error terms
is minimised
Quick Recap
• Aim: To model a relationship, known as Population
Regression Function
• The equation is a linear equation between y and

parameters
• Step 1: Create a scatter plot of data, with

independent/explanatory var on X and
dependent/explained var on Y axis
• Step 2: Draw a line of best fit through the data using

principle of ordinary least square
• Output of regression: Sample Regression Line

Assumptions underlying CLRM
• Assumptions regarding Xi variable(s) and error term needed in
order to make valid statistical inference about coefficients and
dependent variable
• Assumption 1: Regression model is linear in parameters; may not

may not be linear in variables
• Assumption 2: Fixed values of X or X independent of the error term;

cov (Xi , ui) = 0
• X either fixed or sampled alongwith dependent variables
• X can also be considered nonrandom
• For sake of simplicity; realistic in data collection settings
• Assumption 3: Zero Mean Value of Disturbance ui ; E(ui |Xi) = 0 or
E(ui) = 0
• Positive ui cancel out negative ui
• No specification bias or specification error
• Assumption 4: Homoscedasticity or Constant Variance of ui; Var (ut) = 2
• Such a variance follows homoscedasticity - equal (homo) spread (scedasticity)
• Variance of errors is constant over all values of xt
• Needed to ensure that all Y values corresponding to Xs equally reliable
• Also implies that conditional variance of Yi are also homoscedastic or
Homoscedasticity Heteroscedasticity
• Assumption 3 and 4 together imply that errors in CLRM follow normal distribution
• Assumption 5: No autocorrelation between the disturbances; Cov (ui,uj)=0
• Errors of two values of an independent variable are statistically independent of
each other
• Assumption of no serial correlation, or no autocorrelation
• Helps with easier interpretation of Y in terms of X
• Assumption easy to justify in cross-section but difficult in time series
• Assumption 6: Number of observations (n) must be greater than
number of parameters to be estimated
• Assumption 7: Nature of X variables: var(X) must be a positive

number
• X values in a given sample must not all be the same – else not possible to
compute parameters
• No outliers (very large in relation to other observations) in X - else
regression results may be dominated by outliers
‘BLUE’ Estimators
• If assumptions hold then then the estimators determined by OLS are known as
Best Linear Unbiased Estimators (BLUE) as per Gauss-Markov theorem
• “Best” - means that the OLS estimator has minimum variance among the class
of linear unbiased estimators
• “Linear” – is is a linear estimator of dependent variable Y in regression model
• “Unbiased” - On average, the average or expected values, will be equal to
the true values
• “Estimator” - is an estimator of the true value of
Excel Exercise
Automobile Weight vs Gasoline Consumption Example
1) Check for variation between independent and dependent variables

2) Check for relationship between x and y
3) Conduct regression analysis
4) Check for variance and expected value of residuals
5) Check for normality of residuals
X on Y
12.0
10.0
8.0
6.0
4.0
2.0
0.0
0 1 2 3 4 5 6
Actual Vs Predicted
9
Weight1000lb Line Fit Plot
8
12.0
7
10.0
GallonsPer100Miles
6
8.0
5
6.0 GallonsPer100Miles
4
Predicted
4.0
GallonsPer100Miles
3
2.0
2
1 0.0
0 1 2 3 4 5 6
Weight1000lb
0
0.0 2.0 4.0 6.0 8.0 10.0 12.0
e vs x
4.00000
3.00000
2.00000
1.00000
0.00000
0 1 2 3 4 5 6
-1.00000
-2.00000
-3.00000
normal probability plot
5
0
-4 -3 -2 -1 0 1 2 3
-1
-2
-3
-4
normal probability plot Linear (normal probability plot)

Precision and Goodness of Fit
Standard Errors of Least-Squares Estimates
• Measure reliability and precision of estimators
• Origins in standard deviation – distance of observations from mean
Formula for Variance

Degree of Freedom
• Def: number of pieces of information available with researcher for
estimating population value
• Higher DoF, more chances of estimating the population mean
• Can also indicate the minimum number of values needed for estimation.
• DoF of Mean / Average: n
• DoF of Variance: n-1
• DoF of Regression: n-2
Standard Errors of Least-Squares Estimates
• SE: Absolute measure of the typical distance that data points fall
from regression line
• Variance of the error term
Note:
• Inverse relationship b/w variance of estimator and
independent variable. Hence greater the variance
in X, greater precision in computing estimators
Where • Bigger the sample size, more number of terms in
and more precise estimator
Some Comments on the Standard Error Estimators
Consider what happens if  xt  x  is small or large:

2
y
y
y y
x
0 x x
0 x
Goodness of Fit: r2
• Goodness of fit checks how well sample regression line fits the data
• Measures the proportion of total variation in Y explained by the
regression model
(a) r2 = 0
(f) r2 = 1
Goodness of Fit: r2
• TSS: Total Sum of Squares
• ESS: Explained Sum of Squares
• RSS: Residual Sum of Squares
• r2 = ESS/TSS
• Properties of r2
• Nonnegative
• Takes value between 0 and 1
SE vs r2
• SE: Absolute measure of the typical
distance that the data points fall
from the regression line. It is in the
units of the dependent variable
• r2 : relative measure of the

percentage of the dependent
variable variance that the model
explains
Regression Output: Total Exp as Regression Output: Cell phone
predictor of Food Expenditure subscribers and per capita income
Introduction to Statistical Inference
• We want to make inferences about the likely population values from the
regression parameters.
Example: Suppose we have the following regression results:

yˆ t = 20.3  0.5091xt
(14.38) (0.2561)
• $ = 05091
. is a single (point) estimate of the unknown population
parameter, . How “reliable” is this estimate?
• The reliability of the point estimate is measured by the coefficient’s

standard error.
Basic Concepts
• Hypothesis Testing: starting point of an econometric study
• Normal distribution: basis for making inferences about population
Hypothesis
• Hypothesis: Testable statement about relationship between two or more
variables
• Always made in pairs - null hypothesis (H0) and alternative hypothesis (H1)
• Null hypothesis: Independent variable (X) has no impact on dependent variable

(Y)
• Alternative hypothesis: Independent variable (X) has an impact on dependent

variable (Y)
Hypothesis Testing
• Use of statistical tools to decide whether data at hand sufficiently supports
a particular hypothesis
• Estimated value of coefficient is compared with value under null

hypothesis
• Allows researcher to make probabilistic statements about population

parameters
• 2 ways of hypothesis testing: a) test of significance approach b)

confidence interval approach
Normal Distribution
• A type of continuous probability distribution
for a real-valued random variable
• Mean or expectation:
• Standard deviation:
• Total area under the curve: represents
probability, sums to 1
• Bell shaped curve: a) Curve is symmetrical to
both sides of the mean b) more obs closer to
the mean
Using Normal Distribution
• If distribution of a variable follows ND and if mean and sd known, then one
can compute probability of an event taking place
• Eg: Scores of an exam are normally distributed with a mean of 65 and SD of
9. Find probability that score is a) less than 54 and b) atleast 80
Step 1: Compute Z score (tells how many
Step 2: Look up Z table
SD from the mean a value lies)
Answer: when x is <54
z score: -.122
Corresponding area as per z
table: 0.1112
Interpretation: Area to the left of
z score -.122 is 0.1112.
The probability that x is less than
54 is 11.12
Answer: when x is =>80
z score: 1.67
Corresponding area as per z
table: 0.9525
Interpretation: Area to the left of
z score 1.67 is 0.9525
The probability that x is greater
than or equal to 80 is 1 – 0.9525
= 4.75%
Standard Normal Distribution
• When actual values replaced by z scores,
ND becomes SND
• Mean value ( )=0
• X = Z score = Std. dev from mean
• Probability of occurrence of a value can

be easily determined
Probability Distribution of the Least Squares Estimators
• We assume that ut  N(0,2)
• Since the least squares estimators are linear combinations of the random variables
i.e. $ =  wt yt
• The weighted sum of normal random variables is also normally distributed, so
$  N(, Var())
$  N(, Var())
• If the errors are not normally distributed, will parameter estimates still be ND?
• Yes, if the other assumptions of CLRM hold, and sample size is sufficiently large
• From central limit theorem
Probability Distribution of the Least Squares Estimators (cont’d)
• Standard normal variates can be constructed from $ and $ :
ˆ   ˆ  
~ N 0,1 ~ N 0,1
var  
and var  
• But var() and var() are unknown, so

ˆ   ˆ
~ tT 2 and ~ tT  2
SE (ˆ ) ˆ
SE (  )
Standard Error: Estimate of standard deviation

Estimates of Population Parameters follow t-distribution
• Smaller samples (less than 100) are

more likely to underestimate
population standard deviation
normal distribution
• T-distribution has heavier tails t-distribution

Testing Hypotheses: The Test of Significance Approach
• Assume the regression equation is given by yt =   xt  ut for t=1,2,...,T
• Steps for test of significance:

1. Estimate $ , $ and SE($ ), SE( $ )
$   *
2. Calculate the test statistic: test statistic = $ ) where  *is the value
of  under the null hypothesis. SE ( 
3. Decide nature of distribution (t-distribution) and degree of freedom (t-2)
4. Choose a level of statistical significance
Significance Level and Rejection Region
• Significance level: probability that a result could have been occurred by
chance
• Lower the level of significance, lower the probability of event occurring
by chance
• Commonly used significance levels: 5%, 1% and 10%
• If results fall within significance level, there are called ‘statistically
significant’
Rejection Region (5% significance level)
f(x)
f(x) f(x)
95% non-rejection 95% non-rejection region

95% non-rejection region 5% rejection region
2.5% 2.5% 5% rejection region
rejection region region rejection region
2-Sided Test 1-Sided Test: Upper Tail 1-Sided Test: Lower Tail
The Test of Significance Approach: Drawing Conclusions
5. Use the t-tables to obtain a critical value (values with which to compare the test
statistic)
6. If the test statistic lies in the rejection region then reject the null hypothesis (H0), else
do not reject H0.
Tests of Significance: An Example
• Regression result: yˆ t = 20.3  0.5091xt

(14.38) (0.2561)
T=22
• Use test of significance approach to test whether  =1.

Steps:
o Set up null and alternate hypothesis
o Compute test stat
o Decide level of significance
o Look up critical value in t-distribution table
o Compare test stat with critical value
Solution
• The hypotheses are: H0 :  = 1 ; H1 :   1

$   *
test stat =
SE ( $ )
05091
. 1
= = 1917
.
0.2561
• tcrit = t20;5% = 2.086

• Test stat < tcritical
Iterations of Example
a) Test more hypothesis: H0 :  = 0 or H0 :  = 2?

b) Change significance levels to 10% and 1%
Significance Level vs Confidence Level
Measure of evidence in data for rejecting Indicates probability of obtaining same

null hypothesis results if data is recollected
Represents probability of Type I error, i.e. Indicates probability of drawing accurate
rejecting null hypothesis when it is true conclusions based on sample data
(false positive)
Confidence Interval Approach to Hypothesis Testing
Confidence interval: range of results

that summarise a data set by using
mean and deviation
Directly related to confidence level
95% CI will give a range of values
within which ‘true value’ will occur
95% of time
If hypothesised value of  falls within
CI, then null hypothesis is not rejected.
Confidence Interval Approach to Hypothesis Testing (Steps)
1. Calculate $ , $ and SE($ ) , SE( $ ).
2. Choose a significance level, , (again the convention is 5%). This is equivalent to choosing a (1-
)100% confidence interval, i.e. 5% significance level = 95% confidence interval
3. Use the t-tables to find the appropriate critical value, which will again have T-2 degrees of freedom.
4. The confidence interval is given by ( ˆ  t crit  SE ( ˆ ), ˆ  t crit  SE ( ˆ ))
5. Perform the test: If the hypothesised value of  (*) lies outside the confidence interval, then reject the
null hypothesis that  =  *, otherwise do not reject the null.
Confidence Interval Approach: An Example
• Regression result: yˆ t = 20.3  0.5091xt
(14.38) (0.2561)
T=22
• Use confidence interval approach to test whether  =1.

Steps:
o Set up null and alternate hypothesis
o Compute test stat
o Decide level of significance
o Look up critical value in t-distribution table
o Construct confidence interval
o Compare test stat with critical value
Solution
• The hypotheses are: H0 :  = 1 ; H1 :   1

ˆ  t crit  SE ( ˆ )
= 0.5091  2.086  0.2561
= (0.0251,1.0433)
• Since 1 lies with confidence interval, do not reject null hypothesis

Confidence Intervals Versus Tests of Significance
• Note that the Test of Significance and Confidence Interval approaches always give the same answer.
•
• Under the test of significance approach, we would not reject H0 that  = * if the test statistic lies within the non-rejection
region, i.e. if
$   *
tcrit £ £ tcrit
SE ( $ )
•
 t crit  SE( ˆ ) £ ˆ   * £ t crit  SE( ˆ )
Rearranging, we would not reject if
ˆ  t crit  SE ( ˆ ) £  * £ ˆ  t crit  SE ( ˆ )
• It the same rule under the confidence interval approach.

Some More Terminology
• If we reject the null hypothesis at the 5% level, we say that the result of the test is statistically
significant.
• Note that a statistically significant result may be of no practical significance. E.g. if a shipment of
cans of beans is expected to weigh 450g per tin, but the actual mean weight of some tins is 449g, the
result may be highly statistically significant but presumably nobody would care about 1g of beans.
Example: Stata Regression Output
Research setting: Studies show that exercising can help prevent heart disease. Within reasonable limits, the
more you exercise, the less risk you have of suffering from heart disease. One way in which exercise reduces
your risk of suffering from heart disease is by reducing a fat in your blood, called cholesterol. The more you
exercise, the lower your cholesterol concentration. Furthermore, it has recently been shown that the amount of
time you spend watching TV – an indicator of a sedentary lifestyle – might be a good predictor of heart disease
(i.e., that is, the more TV you watch, the greater your risk of heart disease).
Research thought process: Therefore, a researcher decided to determine if cholesterol concentration was related
to time spent watching TV in otherwise healthy 45 to 65 year old men (an at-risk category of people). For
example, as people spent more time watching TV, did their cholesterol concentration also increase (a positive
relationship); or did the opposite happen? The researcher also wanted to know the proportion of cholesterol
concentration that time spent watching TV could explain, as well as being able to predict cholesterol
concentration. The researcher could then determine whether, for example, people that spent eight hours spent
watching TV per day had dangerously high levels of cholesterol concentration compared to people watching just
two hours of TV.
Research set-up: To carry out the analysis, the researcher recruited 100 healthy male participants between the
ages of 45 and 65 years old. The amount of time spent watching TV (i.e., the independent variable, time_tv) and
cholesterol concentration (i.e., the dependent variable, cholesterol) were recorded for all 100 participants.
Expressed in variable terms, the researcher wanted to regress cholesterol on time_tv.
Stata Output Third table shows results from
parameter estimation
Components:
• Beta estimates
• Standard error of estimated
parameters
• t-statistics of estimated
parameters (Coef. / Std. Err.)
• P>|t| : p-value/probability
associated with t-statistics
• [95% Conf. Interval]: upper
and lower boundaries of co-eff
95% of the time
Note: default null hypothesis is ‘0’

Stata Output Units:
• time_tv: time in minutes
• Cholesetrol: mmol/L
(millimoles per litre)
Interpret the results:

• Portion of ch concentration
explained by tv time
• Impact of tv time on
cholesterol (+ve/-ve)
• Impact of tv time duration on
cholesterol
• Statistical significance of
results

Unit 1 - Part 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 1 - Part 1

Uploaded by

Copyright:

Available Formats

Financial Econometrics

• Unit 2: Multivariate models, Modelling long-run relationships in finance

• Unit 3: Understanding and estimating volatility models

• Unit 4: Understanding, constructing and estimating panel data models

• Introduction to Econometrics, 4th Edition, Christopher Dougherty,

• Basic Econometrics, Fifth Edition, Damodar N Gujarati, Dawn C

• Midsemester exam: 32% weightage

• End semester exam: 48% weightage

• Descriptive Stats: Describes sample of a population

• Describes and evaluates properties

• Helps us draw conclusions that go

• Example: Economic model describing wage determinants for workers

Wage = f(education, experience, training)

Y = f(X1, X2, X3)

Y: explained variable; X: explanatory variable; F: unknown function which connects x and y

• Qs: Can this model be tested?

• Example: Econometric model describing wage determinants for workers

This model can be used to understand:

• What is the magnitude of β1?

• Examples of problems that could be tackled using a Cross-Sectional

• Has the dimensions of both time series and cross-sections

• Example: Annual income information and age of individuals over a nine-

• In regression equation observations are identified by both respondent id

• Discrete data: take on certain values. Eg:

• Ordinal numbers: only provide position or an ordering.

• Nominal numbers: no natural ordering of the values at all.

• Average value of dependent variable for various sub-populations: conditional

• The ‘average value’ in definition of regression is the conditional expected value.

• Also known as population regression curve

• Purpose of regression: to map PRL which best represents

β1 and β2: Unknown but fixed parameters

• Eg: E(Y | Xi ) = β1 + β2Xi

• Geometrically, regression curve

• Given an explanatory variable, values of

• Need to take into account deviation of

ui: stochastic disturbance or stochastic error term

• Conditional mean is 0, i.e. E(ui| Xi ) = 0

average expenditure on food also

• However, after greater variability after 400

total expenditure exceeds 600 300

• Food exp does not increase linearly 200

forever. Once basic needs are satisfied, 100

people spend relatively less on food 0

• Downward trend in reading scores

reading score 540 y = 1.3567x - 181.57

• Compute co-efficient 520

(Layout 9 in quick layout) 510

• Scores increase as average Writing

Population data. Accurate but Sample data. Readily available

• On the basis of the SRF

• SRF may over- or

• Need a rule or method to

• SRF may be made more accurate by

• Solution: Minimise square of

• The equation is a linear equation between y and

• Step 1: Create a scatter plot of data, with

• Step 2: Draw a line of best fit through the data using

• Output of regression: Sample Regression Line

• Assumption 1: Regression model is linear in parameters; may not

• Assumption 2: Fixed values of X or X independent of the error term;

• Assumption 7: Nature of X variables: var(X) must be a positive

1) Check for variation between independent and dependent variables

normal probability plot Linear (normal probability plot)