What is econometrics?

Simple, non-technical introduction on Linear Regression/OLS as a technique

– This document is not meant for presentation and is best viewed together in slideshow or printed format. It is meant to be ‘read’, not ‘presented’

– This document also covers the very basics of Econometrics. Econometrics – as a subject – is theoretically complex. The goal of this document is to empower the reader with an understanding of econometrics so she/he can discuss the topic with some confidence

2

– This document assumes ‘zero-knowledge’ in econometrics and in linear regression – It may appear to be long-winded at times, but it is designed to be so in order impress upon the reader the concepts that are being discussed herein – Some online references and books are at the end of the document for those who are interested in further learning about econometric and statistical modeling

3

About this document – Readers who have either a formal background in. but this document will assume zero-knowledge in regression – Econometrics as a science is founded on complex equations and assumptions based on the theories of probability and statistics – these are not covered in this document. or keen interest in statistics would find this document helpful in ‘transitioning’ towards econometric modeling… – A conceptual understanding of linear regression will also be helpful to appreciate econometrics. conceptual understanding of. 4 .

What is econometrics? .

“Econometrics? Isn’t that difficult?” 6 .

It’s full of formulas… and it could be complex 7 .

But… 8 .

Things must be made as simple as possible – but never simpler 9 Corbis .

This is an attempt to present econometrics as simple as possible… 10 .

What’s required to learn a little bit of econometrics 11 .

… lots of curiosity 12 .

… a little bit of patience 13 .

… a little bit of brains 14 .

… confidence in dealing with numbers 15 .

… a belief that numbers can tell stories 16 .

Let’s start with a little bit of definition What is econometrics? 17 .

18 .What is econometrics? – Econometrics is an application of statistics and mathematics … aimed at identifying and quantifying the relationships between two sets of variables – (1) the predicted variables and (2) the predictor variables. – The goal of econometrics is to test a hypothesized causal relationship between the predicted and the predictor variables.

and from mathematics – There are differences between statistics and econometrics – but the differences are academic*… * … but not necessarily moot and unimportant For those interested about the differences. see future tutorials… 19 .What is econometrics? – Econometrics is an application of statistics and mathematics – Econometrics is derived from statistics – largely regression and ‘trending’ techniques .

What is econometrics? – … aimed at identifying and quantifying the relationships between two sets of variables – (1) the predicted variables and (2) the predictor variables. adspends. revenues. and profits 20 . temperature. competitive spends. – The basic goal of econometrics is to explain using formulas and numbers the relationship between a predictor variable – such as GRPs. sales. and seasonality – and a predicted variable – such as awareness.

What is econometrics? – This relationship is expressed in an equation – such as y  mx  b  u y is the ‘predicted’ variable x is the ‘predictor’ variable m. b and u are the values that econometrics want to uncover 21 .

What is econometrics?
– This relationship is expressed in an equation – such as

y  mx  b  u
We know the values of y and x Econometrics helps us identify the values of m, b and u

y is the ‘predicted’ variable x is the ‘predictor’ variable m, b and u are the values that econometrics want to uncover

22

If we were interested in awareness and GRPs…
– We can rewrite the first equation taking our interest into consideration as follows

awareness = m • GRPs + b + u
What econometrics does is “estimate” the values of “m”, “b” and “u” based on the available data on Awareness and GRPs, such that we have an equation that relates Awareness and GRPs. Once m, b and u are identified and estimated, we can then use the equation to explain the movements in awareness with respect to GRPs – and predict how awareness is going to move in the future given different levels of GRPs

NB. This is simplifying the relationship between GRPs and awareness drastically. The relationship is far more complex, of course – but let’s assume that this equation is true for now.
23

There are many econometric techniques…

– But the most common technique is linear regression

24

A brief introduction to linear regression How to create regression lines? Regression in econometrics and marketing What is linear regression? 25 .

that has increased to about 6’800 users – and by the 26th month. represented by time t – In the first month. there is a significant uptrend 26 . – By the 5th month. from looking at the data alone that indeed. there is an increase in the number of users – and it seems.Introduction to linear regression – Let’s assume that x is the evolution of the number of users of a certain product across months (in ‘000). we see that there are 4’905 users of the product. for example. the number of users have increased to around 34’200 – Clearly.

we would indeed see an upward trend… Product users „000 45 40 39. in months 0 0 5 10 15 20 25 30 35 27 . the number of users have increased to about 40‟000 users In the 1st month.905032999 5 Time t. we see that there 20 are about 5‟000 product users 15 10 4.91454632 35 30 25 By the 30th month.If we plotted the data.

how many more users will we have? 28 .The question If this trend held and continued into the next 12 months.

The Past The Future We will then use this understanding of the past to predict what’s going to happen in the next 12 months 29 .To answer this question… … we need to understand first the past relationship between the two variables – time and numbers of users.

What bridges the gap between the past and the future… Once we have identified the equation or the model. we will have a better grasp of (1) the past trends and (2) the potentials of the future The Past Linear regression equation The Future Linear regression comes into the picture by bridging that gap between the past and the future 30 .

With that in mind. let’s look at the chart again 31 .

we see an uptrend in users across time… Product users „000 45 40 35 30 25 20 15 10 5 Time t.From mere observation. in months 0 0 5 10 15 20 25 30 35 32 .

in months 0 0 5 10 15 20 25 30 35 * Remember: In order to project into the future. we need to create a model that quantifies the relationship between time and number of users 33 .How do we quantify* that uptrend? Product users „000 45 40 35 30 25 20 15 10 5 Time t.

and still another can argue that the best line is the pink line 34 . another can argue that the blue line is best.There are an infinite number of lines that we could use to characterize the uptrend… Product users „000 45 40 35 30 25 20 15 10 5 Time t. in months 0 0 5 10 15 20 25 30 35 Different people have different views – even when viewing the same set of data: I can argue that the best line is the grey line.

Linear regression insists that there is one (and only one) line that would best characterize the trend and the relationship between the two variables 35 .

Linear regression also insists that this equation be of the following form: y  mx  b  u … where – y is the number of users per month ‘000 – x is time – b is the constant – u is the unexplained variance 36 .

This one line that best describes the relationship between the two variables is derived through OLS – OLS – which stands for “ordinary least squares” – is an algorithm that defines the values of m. b and u … such that the distance between the actual values and the line defined by the final values of m. b and u are at its minimum Huh 37 .

) 38 . there are an infinite number of lines that can be used to describe the trend. (Think of OLS as a search-algorithm that tries different m-b-u combinations to achieve the best-fitting line. another person can argue that the yellow line is the best. and still another third person can defend the blue line.Let’s go back a few charts… Remember: Given any data set. What OLS does is it objectively goes through these infinite number of lines – and finds the bestfitting line such that the distance between the line and the original data-points are at a minimum OLS does this iteratively – that is. One can choose the “pink” to be the best and rationalize it. b. We can argue indefinitely about the merits of each of these infinite number of lines. through trial-and-error – until it arrives at the values of m. and u that define a line with minimum distance between it and the original data.

1416x + 3.6329 R² = 0.9391 35 30 25 20 15 10 5 Time t.Going back to the data – the best fitting regression line. after applying OLS is… Product users „000 45 40 y = 1. in months 0 0 5 10 15 20 25 30 35 39 .

the equation «y = 1. we are assured that this is unbiased and objective – It is linear – It conforms to the «y= mx + b + u» requirement of econometrics) – It is the best-fitting line – Because the OLS algorithm is aimed at minimizing the distance between the line and the data points.6329» is found to be the best-fitting regression line – It is objective and unbiased – By using OLS.By applying OLS.416x + 3. we are assured that it is the best-fitting line 40 .

Now comes the interesting part… So what does the equation exactly mean? 41 .

6329 is called the constant – it is the number of users when the product was rolled out into the marketplace (at time t = 0) – These are perhaps the early adopters of the product or those who have been exposed to the product through free samples 42 .416 months (about 5-6 weeks).The story behind «y = 1. there is an additional 1’000 new users of the product – 3.416x + 3.6329» This equation suggests the following – – For every 1. there is a corresponding 1unit change in y – Applying this to our data. we can say that for every 1.416-unit change in x.

we “eyeball” the line and the actual data – Are the data points within ‘reasonable’ distance of the line? If each of the data points seem to be near the trendline. we have an equation – how do we know it’s the correct equation? – First. then we can say initially that we have a good fit If there are data-points that are significantly far from the line. then the equation may need to be revisited – or that outlying data-point may be caused by something else apart from time 43 .OK.

1416x + 3.6329 R² = 0. in months 0 0 5 10 15 20 25 30 35 44 .9391 35 30 25 20 15 10 5 Time t.Let’s eyeball the model: There seem to be no datapoints that are significantly away from the line… Product users „000 45 40 y = 1.

1416x + 3.6329 R² = 0. more objective measurement of “fit” 45 .9391 35 30 25 20 15 10 5 Time t. however. in months 0 0 5 10 15 20 25 30 35 One can argue that point at month 11 is significantly away from the line – and so is data for month 24… We therefore need a more accurate.Eyeballing the data. brings back subjective interpretations Product users „000 45 40 y = 1.

09% are unexplained by the variable “time” – and could be due to other factors that are beyond time – The 6. or simply ‘random’ errors that we will never be able to uncover – An r-squared of 0.9391 – This suggests that the variable “time” is able to explain 93. 46 . Other measures include adjusted R-squared. These will not be discussed here.How else do we know if the equation is valid or not? – We look at the r-squared (r2) – 0.75+ is considered to be acceptable as a ‘rule-of-thumb’ The r-squared is only one of few that measure goodness-of-fit (GIF).09% unexplained variance could also be because of errors in measurements. AIC/Akaike Information Criteria. and GLM-ANOVA. RMSE/root-mean squared error.91% of the variance or movements in the number of users – The other 6.

highlyimprobable – A r-squared of 1.00 will only happen in a perfect scenario where the model perfectly fits and explains the data – Getting an r-squared of 0.75+ in and of itself will be a challenge 47 .Will we ever have a r-squared of 1.00? Possible – but highly improbable – The higher the r-squared. the better – and it possible to have a 1. but in the real world.00 r-squared.

But there are deviations between the line and the data! Why do we have deviations? – Because there are other things that we probably are not taking into account in this model 48 .

the deviations are part of the story… – Because these deviations are an indication that something else apart from time is at work.Deviations are not entirely bad… Actually. it is worth checking why these deviations exist – This is where analytics and econometrics/statistics meet – uncovering why things are explainable and not-explainable by a model. 49 .

Let’s go back to the original question: 50 .

9391 51 .6329 R² = 0.What have we done so far…? – We’ve modeled and derived an equation relating time-t with purchases for the first 30months 45 40 35 30 25 20 15 10 5 0 0 5 10 15 20 25 30 35 y = 1.1416x + 3.

What have we done so far…? – We’re fairly confident with the model because it explains about 94% of the variance in the number of purchasers.1416x + 3.6329 R² = 0. as reflected by the r-squared 45 40 35 30 25 20 15 10 5 0 0 5 10 15 20 25 30 35 y = 1.9391 52 .

in months 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 53 .Let’s now project what’s going to happen in the next 12 months… Product users „000 Actual 60 Projected 50 40 30 20 At the end of the next 12 months [by month 42]. we can expect to have 543‟000 users – if all things remain equal Time t.

Since we don’t really know what’s going to happen in the future – and we don’t have a perfect model… Actual 70 Projected 60 We can report ranges instead of just a line… The dashed lines indicate the range of expectations for the next 12 months 50 40 30 20 10 We can expect that there will be about 470’000 to 616’000 users by month 42 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 54 .

Are you still there? 55 .

Take a sigh of relief… 56 .

An international organization of econometricians – and some information on econometrics – can be found here. Wikipedia’s entry is here. – Specifically on econometrics. – A more detailed introduction to econometrics can be found here.Linear regression through OLS is just amongst of the many techniques in econometrics… For those interested… – Wikipedia’s page on linear regression is here and the OLS technique is discussed here. 57 .

Com. is an approachable introduction to the concepts – Introductory Econometrics by Humberto Barreto uses Microsoft Excel® and includes a CD-ROM with interactive files.Books on econometrics that we’ve found useful… – Econometrics by Samuel Cameron. in Amazon. – A Guide to Econometrics by Peter Kennedy is considered by most teachers in beginning econometrics and practitioners to be a good guide 58 .

try reading chapters on linear regression (bivariate/multivariate) in Stat101 books. ET Jaynes has an e-book (in PDF) here. 59 . but enlightening. An HTML version can be found here – Since econometrics builds on statistical theory.Other books that might be helpful – Probability plays a major role in econometrics. This is heavy reading. for those interested. Amazon has this list for you to choose from.

Com. the ownership of GettyImages over these photos are asserted and no claims are made by the presenter. author. nor by the company on these images. – We also acknowledge and claim no ownership of the other images that have been used in this presentation/file. 60 .Credits for the images use – Most of the images in the presentation are from Gettyimages. – We acknowledge GettyImages’ ownership of copyright over their work in this presentation.

This presentation – Author: Philip Tiongson philtiongson@gmail.com – Audiences: Staff interested in the basics of econometrics 61 .

62 .