You are on page 1of 5

Data Correlation

Saturday, June 3, 2023 9:54 AM

Structured
Data
Numerical / Quantitative
Structured Discrete
Continue

Unstructured Categorical / Qualitative


Nominal
Ordinal

Independent Columns / Variables / Features

Dependent Columns / Variables / Features

If any two columns in dataset are in relation, then it is called as


correlated

Relation : ?

If one column value is increased that result in change in value


in other column ]

Positive Correlation

If one column value is increased which result in increase in other


column value. Then it is called as Positive Correlation.

Types of Positive Correlation

Strong Positive Correlation

Data Values in distribution are closely packed almost


forming a straight line in upward direction is called
Strong Positive Correlation.

Weak Positive Correlation

Date Values in distribution are loosely packed in


upward direction is called Weak Positive Correlation.

Perfect linear Positive Correlation

Data Values in a distribution are closely packed


forming a straight in upward direction is perfect linear
positive correlation

Negative Correlation

If one column value increased which result in decrease in other


column value. Then it is called as Negative Correlation.

Types of Negative Correlation

Inno_DS_11.30 Page 1
Types of Negative Correlation

Strong Negative Correlation

Data Values in distribution are closely packed almost


forming a straight line in downward direction is called
Strong Negative Correlation.

Weak Negative Correlation

Date Values in distribution are loosely packed in


downward direction is called Weak Negative Correlation.

Perfect Linear Negative Correlation

Data Values in a distribution are closely packed forming a


straight in downward direction is perfect linear Negative
correlation

This correlation ship between any two variables are calculated /


estimated by using correlation coefficient ( r ).

Inno_DS_11.30 Page 2
EMPID EMPNAME EMPAGE EMPEXP EMPTECH EMPSAL

Regression
Relationship between independent & Dependent variable is
called regression

Estimating this regression by drawing a straight line is called


linear regression.

linear regression == >

Inno_DS_11.30 Page 3
Best fit line : A line which passes through maximum data point
and very close to other data point at the same time. Is called best
fit line

Best Fit Line Equation : y = mx + c

Linearly separable Dataset :


A dataset in which we can draw best fit line , then it is called
as linearly separable data.

Data Acquisition

Feature Selection

Inno_DS_11.30 Page 4
Feature Selection

Error Detection

Nan --> isna()

Mean Std
Zscore
Percentile
IQR

Encoding

Categorical --> Numerical

Nominal Categorical

One Hot Encoder

Get Dummies

Ordinal Categorical

Ordinal Encoder

Map- lambda
Apply - lambda

Label Encoder

Imbalance

Data Separation

Data Splitting

Model Building

Training model

Evaluation Model

Prediction

Inno_DS_11.30 Page 5

You might also like