You are on page 1of 21

Introduction to

Data Science
Module 2
Week 2
Review of Descriptive and Inferential Statistics
Data Processing and Visualization with R
Module Objectives
At the end of this module, students must be able to:
1. Differentiate the two areas of statistics: descriptive and
inferential;
2. Perform simple linear regression in Excel and in R along with
pertinent visual output;
3. Perform multiple linear regression in Excel and in R along with
pertinent visual output;
Statistics Refresher

Collection
DESCRIPTIVE Organization
Presentation

STATISTICS

Draw conclusions for a larger group/data


INFERENTIAL Determine relationships
Make predictions
Statistics Refresher

DESCRIPTIVE

Point
STATISTICS Probability
Estimation

INFERENTIAL Interval

Hypothesis
Testing
The Process of Statistics

Sampling Theory
POPULATION SAMPLE
Descriptive Statistics

Inferential Statistics
PARAMETER STATISTIC
Stat Refresher: Regression Analysis

Regression Analysis:
Statistical technique used most frequently to analyze the
relationship between two or more variables.
At least two variables need to be continuous
Deals with the way one variable tends to change as one or
more other variables change
Example

• Input the data


• Create a scatter plot
• Add trend line
When to use regression?
Regression analysis is used to describe the relationship between:
A single response variable Y; and
One or more predictor variables: 𝑋1,𝑋2,…,𝑋𝑝
p = 1 : Simple regression
p > 1 : Multivariate regression
Examples:
how sales (Y) vary with advertising expenditures (X)
how quantity demanded (Y) varies with prices (X)
relationship between corporate profit (Y) and R&D spending (X)
The Variables
Response Variables
- The response variable Y must be a continuous variable.

Predictor Variables
- The predictors 𝑋1,𝑋2,…,𝑋𝑝 can be continuous, discrete or
categorical variables
Initial EDA
Prior to any regression modelling, the data should always be
inspected for:
❑ Data-entry errors
❑ Missing values
❑ Outliers
❑ Unusual (e.g., asymmetric)distributions
❑ Changes in Variability
❑ Clustering
❑ Non-linear bivariate relationships
❑ Unexpected pattern
Simple Linear Regression

The Variables
X : explanatory variable (horizontal axis)
Y : response variable (vertical axis)
After data collection, we have pairs of observations:
(𝑋1,𝑌1),…,(𝑋𝑛,𝑌𝑛)
Sample Data 1
Variables: X (Height), Y (Weight)

We want to be able to describe the weight as a linear function


of height
Sample Data 1
Estimation of Unknown Parameters I
Residuals
Estimation of Unknown Parameters II
Sample Data 1
Multivariate Regression
From SLR to MLR
It is not often the case that dependent variable is explained by
exactly one variable.
We use multiple regression to attempt to predict the dependent
variable using more than one independent variable.
Multiple regressions can be linear and nonlinear. We use
Multiple Linear Regression for explanation, prediction, and
inference.
Sample Data 1
Example: Advertising
1. Perform SLR on each predictor variable.
2. Interpret the results.
3. Perform MLR.
4. Interpret the results.
Predicting Values in MVR
1. What is the predicted sales when TV = 115, Radio = 45,
Newspaper = 41?

2. What is the predicted sales when TV = 195, Radio = 62,


Newspaper = 155?

You might also like