You are on page 1of 20

2.1 Descriptive Statistics Contd.

2.2 Probability

3.1 Probability Contd.

3.2 Random Variable


4.1

4.2 Probability Distribution

5.1 Normal distribution

5.2 Sampling and Estimation


6 Estimation

7 Hypothesis Testing
8.1 Hypothesis Testing Contd..

8.2 Regression Intro


9.1 Simple Linear Regression

9.2 MLR, NLR


10 Forecasting

11 Forecasting + ANOVA
12 Multivariate Analytics
13 PCA
123

Branches of statistics
Descriptive statistics
Inferential statistics
Terminologies
Variable
Types of variable
Data Sets
Data Measurement
Data Visualization
Pie charts and bar chart for a categorical data
Histograms for quantitative data
Shape of histograms
Center
Spread
Stem and Leaf Plot
Dotplot
Measures of Location/Central Tendency: Ungrouped Data
Mean
Median
Mode
Comparing the Mean and the Median
Quartiles
Percentile
Measures of variability: Ungrouped Data
Range
Inter-Quartile Range (IQR)
Using the IQR to Detect Outliers
Boxplot
Side by side Box plot
Variance
Standard Deviation
The Empirical Rule
Coefficient of variation
Measures of central tendency: Grouped Data
Mean
Median
Mode
Measures of variability: Grouped Data
Population Variance and Standard deviation
Sample Variance and Standard deviation
Relation between mean, median and mode

Probability
Probability as a Numerical Measure of the Likelihood of Occurrence
Sample Space
Event
Complement of an Event
Union of Two Events
Intersection of Two Events
Set Theory Example
Disjoint events
Non-disjoint events
Independent events
Dependent events
Basic properties of probability
Determining Probability
Classical method
Limitation of Classical definition
Random experiment
Relative frequency
Limitations of relative frequency
Subjective Probability
P(A or B)
P(A and B)
The mn Rule
Sampling from a Population with Replacement
Permutation
Combination: Sampling from a Population Without Replacement
Combination

Probability table
Conditional probability
Exercise: Marginal Probabilities
Marginal, Union, Joint, and Conditional Probabilities
Rule of total Probability
Bayes’ Theorem

Random variable
Types of random variable
Probability distribution
Probability distribution histogram
Valid Probability Model
Cumulative distribution function
Expected Value of X
Expected Value of a Function
Rules of Expected Value

Variance and standard deviation of X


Continuous Variables
PDF of continuous variable
Legitimate pdf
Cumulative distribution function
Using F (x ) to Compute Probabilities
Expected Values
Variance and standard deviation

Binomial Random Variable


Binominal Probability Distribution
Theorem
Definitions: Bernouilli
Characteristics of Bernouilli distribution
Expected value and variance of Binomial Distribution
Variance Proof (optional!)
Poisson distribution
Poisson Mean and Variance
Poisson distribution as limit
The Mean and Variance of X

Normal distribution
Standard Deviation Rule for Normal Random Variables
The Normal Distribution
Standard Normal Distribution
Non-standard Normal Distributions
Probabilities using z value
z - score
Approximate Binomial Distribution Problems
Continuity correction factor

Sampling
Reasons for Sampling
Random Versus Non-random Sampling
Random Sampling
Simple random sampling
Stratified Random Sampling
Systematic Random Sampling
Non-random Sampling
Convenience Sampling
Quota Sampling
Snowball Sampling
Sampling Error
Sampling Variation
Sampling Variation
Sampling Variability
Sampling Distribution
Sampling Distribution of x bar
Conclusions
Shapes of the Distributions of Sample Means
Central Limit Theorem
Z score for sample means

Sampling from a Finite Population


Rules for finite population
Sampling Distribution Of Sample Proportion
How does a researcher use the sample proportion in analysis?
Z Formula For Sample Proportions
Forms Of Statistical Inference
Estimating The Population Mean Using The Z Statistic
Point Estimate
Interval Estimate
Central Limit Theorem
Confidence Interval to Estimate μ
Z Scores for Confidence Intervals in Relation to alpha
Distribution of Sample Means for 95% Confidence
Finite correction factor
Estimating The Population Proportion
Confidence Interval To Estimate P
Estimating The Population Variance
Chi-square Statistic
Confidence Interval To Estimate The Population Variance

Need for testing of hypothesis


Statistical Hypothesis
Hypothesis testing (Non-statistical)
Hypothesis - Formulation
Rejection and Non-rejection Regions
Critical values
Type I Errors
Alpha
Type II error
Beta
Relation between alpha and beta
Decision on α –error and β - error
Steps involved in Testing of Hypothesis
Testing hypotheses about a population mean using the Z statistic
Z Test For A Single Mean
Testing the Mean with a Finite Population
Using the p-Value to Test Hypotheses
Rejecting the Null Hypothesis Using p-Values
Critical Value Method to Test Hypotheses
Testing hypotheses about a population mean using the t Statistic
Introduction
Reading the t Distribution Table
The Difference In Two Means Using The Z Statistic
Central Limit Theorem
Hypothesis Testing

Testing Hypotheses About A Proportion


Solving For Type II Errors
Statistical Inferences For Two Related Populations
Hypothesis Testing
Confidence Intervals
Statistical Inferences About Two Population Proportions, P1 - P2
Z-test for difference between proportions
Z formula
Hypothesis Testing
Confidence Intervals

Linear Regression
Covariance
Variance Vs Covariance
Sampled variance
Problem with Covariance
Example of how covariance value relies on variance
Correlation
Questions a Pearson correlation answers
Assumptions
Pearson product-moment correlation coefficient
Correlation does not have units but Covariance always has units
Advantages of the Correlation Coefficient
Correlation Vs Covariance
Regression
Regression Analysis
Dependent and independent variable
Scatter Plot
Equation of regression line

Simple Linear Regression


Equation of regression line
Assumptions about the Error
Estimated Regression model
Least Squares regression line
Minimising sums of squares
Slope of the regression line
Y intercept of the regression line
Interpretation of slope and intercept
Coefficient of determination
Sum of Square of Totals
Significance of R-squared
Testing the slope of the regression line
Hypothesis testing
t test for airline example
Accessing the model fit

Moving from SLR to MLR


Multicollinearity
Identifying Multicollinearity
Variance Inflation Factor (VIF)
Fixing Multicollinearity
Equation for multiple linear regression
Multiple regression Steps
Some Problems with R-squared
Adjusted R-squared
Adjusted R2
Akaike Information Criterion
Non-linear Regression
What is bias?
What is variance?
Bias and variance using bullseye diagram
Overfitting
Overfitting vs. Underfitting
k-fold cross-validation
Bias Variance Tradeoff
Regularization
Ridge Regression
Lasso/L1 Regression
What does Regularization achieve?
Logistic Regression
Forecasting
Definition of Forecasting
Time Series
Why use time series data?
Types of Forecasting methods
Qualitative methods
Quantitative forecasting
The Measurement of Forecasting Error
Mean Absolute Deviation (MAD)
Mean Square Error (MSE)
Mean Absolute Percentage Error (MAPE)
Stationary time series
Analysis
SMOOTHING TECHNIQUES
Naïve Forecasting Models
Averaging Models
Moving Averages
Weighted Moving Averages
Exponential Smoothing
SEASONAL EFFECTS
Time series decomposition
Procedure for decomposition
Time series data raises new technical issues
Stationarity
Independent and identically distributed (iid) noise
White Noise
Autocorrelation
Partial correlation
The partial autocorrelation function
Autoregression in time series
The First Order Autoregressive (AR(1)) Model
Evaluating AR(1)
Residual Analysis
Pattern of ACF for AR(1) Model
The AR(p) model: using multiple lags for forecasting
Important conclusion for AR(p) Process
Autoregressive Model Benefits
Autoregressive Model Limitations

Forecasting + ANOVA
Python example: Minimum Daily Temperatures Dataset
Load the dataset as a Pandas Series
Quick Check for Autocorrelation
Pearson correlation coefficient
Autocorrelation Plots
Autoregression Model
Predictions from fixed AR model
Moving Averages model
Theoretical Properties of a Time Series with an MA(1) Model
Theoretical Properties of a Time Series with an MA(2) Model
ACF for General MA(q) Models
Important conclusion for MA(q) Process
Box-Jenkins models
Autoregressive Moving Average (ARMA)
Identifying the Model
Estimating & diagnosing a possible model
What if more than one model looks OK !!
So how do we decide the values of p and q ?
Autoregressive Integrated Moving Average (ARIMA)
ANOVA
Design Of Experiments
Terms
THE COMPLETELY RANDOMIZED DESIGN
Need for ANOVA
One Way ANOVA
Hypothesis in ANOVA
Testing hypotheses
Total Sum of Squares of Variation
Assumptions
Comparison of F and t Values

Multivariate Analytics
Introduction
Joint Distribution
Joint PMF
Joint probability table
Properties of joint probability mass function
Joint PDF
Properties of joint probability distribution function
Joint cumulative distribution function
Properties of the joint cdf
Independence
Expected Value
Covariance
Correlation
Multivariate normal distribution
Univariate Normal Distribution
Multivariate normal distribution
PDF of the Bivariate Normal Distribution
Probability for Bivariate
Conditional distribution
Statistical Independence
The standard multivariate normal distribution
Expected value
Covariance matrix
The multivariate normal distribution in general
Multivariate Normal probabilities
Multivariate conditional distribution
Conditional Normal Multivariate
Statistical Independence

Principal Component Analysis


Introduction
When should I use PCA?
Terms
Example of a problem where PCA is required
Features to Ignore Vs Features to Keep
What does Principal Component Analysis (PCA) do?
Steps in PCA
Goal of PCA
10 10 100 100
12 22 144 264
13 24 169 312 2938 248
16 27 256 432
17 29 289 493 1824.14286 158.857143
20 33 400 660
25 37 625 925
113 182 1983 3186 26 25.1828571
0.81714286

0.95
0.01
0.99 0.0095 0.0198 0.0293 0.32423208
0.02
1.56115108

You might also like