Professional Documents
Culture Documents
Session 4 - Exploratory Data Analysis - Thien Nguyen
Session 4 - Exploratory Data Analysis - Thien Nguyen
BUSINESS MANAGEMENT
Session 4
Exploratory Data Analysis
M.Sc. Thien Nguyen
Email: thien.nguyen@isb.edu.vn
Phone: 0949088908
Agenda
1. Introduction to EDA
2. Common Analyses
Part I
1. What is EDA?
2. The Typical Cycle of EDA
Introduction To
Exploratory Data Analysis
3
I. Introduction
Review: What is Data Analytics?
Source: https://en.wikipedia.org/wiki/Data_analysis
6
I. Introduction
What is EDA?
9
I. Introduction
What is EDA?
10
I. Introduction
What is EDA?
11
I. Introduction to EDA
Typical Cycle
Source: https://r4ds.had.co.nz/exploratory-data-analysis.html
12
Summary
What categories?
With Data Fields
(columns) Frequency of each
Name, meaning, Qualitative category?
relationship?
Descriptive measures
of each category
Data type? Making Questions
EDA Univariate analysis: & Answering
No. of missing - Descriptive measures
Quantitative - Outliers? Abnormals?
values?
Multivariate analysis:
Common errors? Covariance? Correlation?
Duplicates?
Qualitative &
Quantitative Inferential statistics:
Regression? Clustering?
Analysis
13
Source: https://www.geeksforgeeks.org/what-is-exploratory-data-analysis/
Note:
EDA is not formulated with
a set of defined rules.
It depends on yourself!
15
Part III
1. Univariate Analysis
2. Detecting Outliers
3. Multivariate Analysis
4. Regression Analysis
Common Analyses
16
III Common Analyses
1. Univariate Analysis
17
III Common Analyses
1. Univariate Analysis
Measures of frequency
Number of Occurrences, Percentage
(độ đo về tần số)
18
Univariate
Analysis
Most important
measures
Qualitative Quantitative
(categorical & discrete) (discrete & continuous)
Variables Variables
Measures of Measures of
Measures of Measures of Measures of
Central Spread/
Frequency Position Shape
Tendency Dispersion
19
III Common Analyses
2. Detecting outliers
20
Detecting Outliers
Source: https://r-graph-gallery.com/boxplot.html 21
III.3 Basic Multivariate Analyses
Bi-/ Multi-variate
Analysis
Qualitative Quantitative
(categorical & discrete) (discrete & continuous)
Variables Variables
Contingency Table (Bảng 2 chiều, Bảng Phát Sinh, Bảng Tương Quan)
Gender
Female Male Sub-Total
Branch
Da Nang 117 93 210
Ha Noi 143 130 273
HCM City 277 240 517
Total 537 463 1000
Contingency Table (VN: Bảng 2 chiều, Bảng Phát Sinh, Bảng Tương Quan)
Gender
Female Male Sub-Total
Branch
Da Nang 39,155.36 27,994.39 67,149.75
Ha Noi 47,664.66 37,759.64 85,424.30
HCM City 94,923.93 75,469.45 170,393.38
Total 181,743.95 141,223.48 322,967.43
Covariance (VN: hiệp phương sai): measure the relationship between two random
variables and how they change together (or how they move relative to each other)
➤ When Xs are moving away from mean-of-Xs, how Ys move away from mean-of-Ys
➤ 2 types: positive covariance vs. negative covariance
➤ Range of covariance value: -∞ < Cov(x,y) < +∞
25
III.3 Basic Multivariate Analyses
(2) Covariance & Correlation
26
III.3 Basic Multivariate Analyses
(2) Covariance & Correlation
Correlation:
27
III.3 Basic Multivariate Analyses
(2) Covariance & Correlation
Strength of relationship:
28
III.3 Basic Multivariate Analyses
(2) Covariance & Correlation
29
III.3 Basic Multivariate Analyses
(2) Covariance & Correlation
30
III Common Analyses
4. Regression
31
III Common Analyses
4. Regression
32
THANK YOU