You are on page 1of 6

BAX 442

Winter 2020

Advanced Statistics for Business Analytics (BAX 442)

Instructor: Prof. Prasad A. Naik


Publications: http://prasadnaik.faculty.ucdavis.edu/
UCD: https://gsm.ucdavis.edu/faculty/prasad-naik
LinkedIn: https://www.linkedin.com/in/prasad-a-naik-a868a459

Class Meetings: Tuesdays 1:10 to 4:00 (Mary Kay Kane Hall)


Tuesdays 5:10 to 8:00 (Snodgrass Hall)

Office Hours: You can contact me on Slack any time. For video-conferencing, we can use Zoom
(I will send the link) or Skype (my id pnaik007). For in-person meeting, please use Slack or
email me (panaik007@gmail.com) to arrange a mutually suitable date and time.

Course Description

This course introduces statistical methods to solve business problems. It covers both cross-
sectional and time-series regression models. Cross-sectional regression models are presented in
the first five classes and the time series regression models in the subsequent classes.

We begin gently with the linear regression model that you have studied in the Fall quarter. In
class 1, we refresh our understanding of linear regression and then apply it to design new
products. Specifically, how should managers know consumers’ willingness to pay for some
feature of a new product? What would be its expected market share with and without the new
features? What should be the “right” price?

In linear regression, we require more observations than the number of variables. But what if the
number of variables exceeds the sample size? Classes 2 and 3 tackle such “big data” estimation.
The term “big” refers to either a large sample size (Big-N) or many variables (Big-p). The
former involves no new statistical issues (only computational ones). Hence, we focus on the Big-
p problem and learn Principal Components Regression (PCR). We apply PCR to create
perceptual maps, which help us visualize competing brands and differentiate the focal brand.

In PCR we combine all the original variables (i.e., eliminate none) to create a few new variables.
In contrast, in class 3, we learn how to eliminate many of the original variables and retain just a
few important ones. To this end, we apply “shrinkage” via Ridge regression, Lasso regression,
and Elastic Net regression. Thus, we shall learn two ways of tackling Big-p data: dimension
reduction (PCR) and shrinkage methods (Ridge, Lasso, EN).

The above regression models assume a linear relation between the response and predictor
variables. But what if this relation is not linear? Nonlinearity need not be known a priori. It can
be more complex than that can be captured via variable transformations, polynomial terms, or
interaction terms. To analyze such data, in class 4, we learn how to apply tree-based methods

1
BAX 442
Winter 2020

such as Bagging, Random Forest, and Boosting. They help us discover important variables
without assuming specific nonlinear functions.

The above 8 types of regression (LM, PCR, Ridge, Lasso, EN, Bagging, Random Forest,
Boosting) belong to the set of cross-sectional methods and constitutes the content for the
Midterm Exams in class 5.

In class 6, we turn our attention to time-series analysis and forecasting. In time-series data, the
past influences the present and the present impacts the future. Such inter-temporal dependencies
are the main difference between cross-sectional versus time-series analysis. Given any single
time series variable (e.g., sales, consumer price index, GDP, temperature), how should we
separate the signal from noise in it, and then how to further decompose this signal into trend and
seasonal components? To this end, we shall use the methods of moving average, loess
(nonparametric regression), and Holt-Winters Filter.

In class 7, we advance our understanding to a broad class of models known as ARIMAX. We


shall apply it to address the multi-million dollar problem of attribution, which seeks to assign
sales credit to various marketing activities that generate sales.

Classes 8 and 9 demystify the celebrated Kalman filter (KF). For example, NASA used KF to
launch man on moon; the Department of Defense used it in anti-aircraft gunfire control problem;
Aerospace industry use it for navigational guidance to spacecraft, aircrafts, ships, cars, and now
to people via GPS on smartphones; KF also found applications in statistics, economics, and
business. I pioneered its application to advertising (Naik et. al 1998). We shall apply KF to
quantify and infer the time-varying effectiveness of marketing actions.

The final class reserves time to catch up, review, and discuss the advancing frontiers of statistics
and business.

Grading
• Participation (10%)
• Homework (40%) – one least scoring HW out of 5 HWs will be dropped.
• Final Exam (50%)

Readings in the Course Packet (Required) [HBS Coursepack]


1. Practical Guide to Conjoint Analysis (UV0406)
2. A Method for Producing Perceptual Maps Using Data (UV0405)
3. Advertising Analytics 2.0 (Harvard Business Review, March 2013, pp. 62-68)

Readings not in the Course Packet


4. State Space Models will be provided after Class 7
5. Understanding the Kalman Filter can be downloaded from
http://people.math.umass.edu/~lavine/courses/797/kalman_filter.pdf

2
BAX 442
Winter 2020

Textbooks

1. Book 1: An Introduction to Statistical Learning: With Applications in R by Gareth James,


Daniela Witten, Trevor Hastie, and Robert Tibshirani.

• Buy the printed copy from Amazon: https://www.amazon.com/Introduction-Statistical-


Learning-Applications-
Statistics/dp/1461471370/ref=sr_1_1?s=books&ie=UTF8&qid=1543630544&sr=1-
1&keywords=Gareth+James

• And/or download the pdf file from https://www-


bcf.usc.edu/~gareth/ISL/ISLR%20Seventh%20Printing.pdf

• Access the datasets and R codes used in this book at https://www-


bcf.usc.edu/~gareth/ISL/

2. Book 2: Forecasting: Principles and Practice by Rob j. Hyndman and George


Athanasopoulos.

• Buy the printed copy from Amazon: https://www.amazon.com/Forecasting-principles-


practice-Rob-
Hyndman/dp/0987507117/ref=sr_1_1?s=books&ie=UTF8&qid=1543631400&sr=1-
1&keywords=2.%09Forecasting%3A+Principles+and+Practice

• And/or access the online book in HTML: https://otexts.org/fpp2/

3. Book 3 (Free): A Little Book of R for Time Series by Avril Coghlan.

• HTML version: https://a-little-book-of-r-for-time-series.readthedocs.io/en/latest/

• PDF version: https://media.readthedocs.org/pdf/a-little-book-of-r-for-time-series/latest/a-


little-book-of-r-for-time-series.pdf

3
BAX 442
Winter 2020

Tentative Plan

Week Date Topics Readings HW


1 Jan 7 Course Overview Book 1:
• Review Linear Regression Ch3

Book 2:
Ch4, Ch5

Conjoint Analysis for New Product Design Reading 1 HW1


Build R code for
conjoint analysis
2 Jan 14 Big Data (p > n): Dimension Reduction Book 1:
• Principal Components Regression Ch6

Perceptual Mapping Reading 2 HW2


Build R code for
perceptual mapping
3 Jan 21 Big Data (p > n): Shrinkage Methods Book 1:
• Ridge Regression Ch6
• Lasso Regression
• Elastic Net Regression
Breakout Session
• Do HW2 with Ridge, Lasso, and EN.
• Explain insights from various regressions,
and recommend managerial actions

4 Jan 28 Tree Regressions Book 1: HW3


• Bagging Ch8 Which variables
• Random Forest drive sales?
• Boosting

5 Feb 4 Miscellaneous Topics


• Multicollinearity
• Heteroscadesticity
• Model Selection (AIC, AICC, BIC)
• Nonlinearity (Polynomials,
Transformations, Interactions)

Review of the Above Topics and Concepts


• Your Qs and Gaps
6 Feb 11 Time Series Decomposition Book 2: Ch6, Ch7
• Exploratory (Moving average)
• STL: Seasonality and Trend via Loess Book 3: Ch2.4, Ch2.5
• Holt-Winters 𝛼𝛽𝛾  Filter

4
BAX 442
Winter 2020

Breakout Session
• Practice them on various SKU time series
7 Feb 18 ARIMAX Models Book 2:
• AR(p) Ch8
• MA(q)
• Integrated (d) Book 3:
• ACF, PACF Ch2.6
• ARIMA(p,d,q)
• ARIMA with X variables (ARIMAX)
Marketing Mix Modeling Reading 3 HW4
Using variables
identified in HW3,
estimate their impact
on online and offline
sales
8 Feb 25 State Space Models
• Holt-Winters 𝛼𝛽𝛾 filter as a special case
• ARIMAX as a special case
• Regression with time-varying parameters

Understanding the Kalman Filter (KF) Reading 4


• Time Update Reading 5
• Measurement Update
• KF recursions

9 Mar 3 KF Estimation
• Coding KF + Maximum Likelihood

Marketing Mix with Time-varying Parameters HW5


Extend HW4 to
time-varying
parameters
10 Mar 10 Catch up, Review, Q&A

11 Mar 17 Final Exam

Remarks

• Class 1 is mandatory. Please ensure you attend; if you miss, you will lose 10 points from
your final score.

• Submit completed HWs before the next week’s class starts.

• No make-up for the Final exam. Definitely ensure that you plan accordingly.

5
BAX 442
Winter 2020

• For travel and other reasons (to be approved case-by-case), you can miss any one lecture
other than the first one. If you miss additional lectures, 10 points (out of 100) per 3-hour
lecture will be deducted from your final score.

• The use of smartphones and texting in the class are not allowed because it distracts my
teaching. Laptops can be used in class for note taking, analysis, and coding purposes, but not
for surfing the Internet or checking/responding to emails.

• Academic Code of Conduct: You are required to uphold the University’s Regulation 537 on
Exams, Plagiarism, Unauthorized collaboration, Lying, Disruption, and other issues. Read
the academic code of conduct at this link: http://sja.ucdavis.edu/files/cac.pdf

You might also like