Looking at Data
1.1 Our First Data Set
1.2 Summaries of a Single Variable
1.2.1 Categorical Data
1.2.2 Continuous Data
1.3. RELATIONSHIPS BETWEEN VARIABLES
1.4 The Rest of the Course
2.2.2 Bayes’ Rule
2.2.3 A “Real World” Probability Model
2.3 Expected Value and Variance
2.3.1 Expected Value
2.3.2 Variance
2.4 The Normal Distribution
2.4 The Normal Distribution
2.5 The Central Limit Theorem
Probability Applications
3.1 Market Segmentation and Decision Analysis
3.1.1 Decision Analysis
3.1.2 Building and Using Market Segmentation Models
3.2 Covariance, Correlation, and Portfolio Theory
3.2.1 Covariance
3.2.2 Measuring the Risk Penalty for Non-Diversiﬁed Investments
3.2.3 Correlation, Industry Clusters, and Time Series
3.3 Stock Market Volatility
4.1 Populations and Samples
4.2 Sampling Distributions
4.3 Conﬁdence Intervals
4.3.1 Can we just replace σ with s?
4.3.2 Example
4.4 Hypothesis Testing: The General Idea
4.4 Hypothesis Testing: The General Idea
4.4.1 P-values
4.4.2 Hypothesis Testing Example
4.4.3 Statistical Signiﬁcance
4.5 Some Famous Hypothesis Tests
4.5.1 The One Sample T Test
4.5.2 Methods for Proportions (Categorical Data)
Simple Linear Regression
5.1 The Simple Linear Regression Model
5.1.1 Example: The CAPM Model
5.2 Three Common Regression Questions
5.2 Three Common Regression Questions
5.2.1 Is there a relationship?
5.2.2 How strong is the relationship?
5.2.3 What is my prediction for Y and how good is it?
5.3 Checking Regression Assumptions
5.3 Checking Regression Assumptions
5.3.1 Nonlinearity
5.3.2 Non-Constant Variance
have non-constant variance
5.3.3 Dependent Observations
5.3.4 Non-normal residuals
5.4 Outliers, Leverage Points and Inﬂuential Points
5.4.1 Outliers
5.4.2 Leverage Points
5.4.3 Inﬂuential Points
5.4.4 Strategies for Dealing with Unusual Points
5.5 Review
Multiple Linear Regression
6.1 The Basic Model
6.2 Several Regression Questions
6.2.2 How Strong is the Relationship? R2
6.2.3 Is an Individual Variable Important? The T Test
6.2.4 Is a Subset of Variables Important? The Partial F Test
6.2.5 Predictions
6.3 Regression Diagnostics: Detecting Problems
6.3 Regression Diagnostics: Detecting Problems
6.3.1 Leverage Plots
6.3.2 Whole Model Diagnostics
6.4 Collinearity
6.4.1 Detecting Collinearity
6.4.2 Ways of Removing Collinearity
6.5 Regression When X is Categorical
6.5 Regression When X is Categorical
6.5.1 Dummy Variables
6.5.2 Factors with Several Levels
6.5.3 Testing Diﬀerences Between Factor Levels
6.6 Interactions Between Variables
6.6 Interactions Between Variables
6.6.1 Interactions Between Continuous and Categorical Variables
6.7 Model Selection/Data Mining
6.7.1 Model Selection Strategy
6.7.2 Multiple Comparisons and the Bonferroni Rule
6.7.3 Stepwise Regression
Not on the test 6.3 Where does the Bonferroni rule come from?
Further Topics
7.1 Logistic Regression
7.2 Time Series
• Time Series
7.3 More on Probability Distributions
7.3.1 Background
7.3.2 Exponential Waiting Times
7.3.3 Binomial and Poisson Counts
7.3.4 Review
7.4 Planning Studies
7.4.1 Diﬀerent Types of Studies
7.4.2 Bias, Variance, and Randomization
7.4.3 Surveys
7.4.4 Experiments
7.4.5 Observational Studies
7.4.6 Summary
JMP Cheat Sheet
A.1 Get familiar with JMP
A.2 Generally Neat Tricks
A.2.1 Dynamic Graphics
A.2.2 Including and Excluding Points
A.2.3 Taking a Subset of the Data
A.2.4 Marking Points for Further Investigation
A.2.5 Changing Preferences
A.2.6 Shift Clicking and Control Clicking
A.3 The Distribution of Y
A.3.1 Continuous Data
A.3.2 Categorical Data
A.4 Fit Y by X
A.4.1 The Two Sample T-Test (or One Way ANOVA)
A.4.2 Contingency Tables/Mosaic Plots
A.4.3 Simple Regression
A.4.4 Logistic Regression
A.5 Multivariate
A.6 Fit Model (i.e. Multiple Regression)
A.6.1 Running a Regression
A.6.2 Once the Regression is Run
A.6.3 Including Interactions and Quadratic Terms
A.6.4 Contrasts
A.6.5 To Run a Stepwise Regression
Some Useful Excel Commands
The Greek Alphabet
D.1 Normal Table
D.2 Quick and Dirty Normal Table
D.2 Quick and Dirty Normal Table
D.3 Cook’s Distance
D.4 Chi-Square Table
