2) Theoretical Background: 2.1 EDA (Exploratory Data Analysis)

Uploaded by

Hussain Mujtaba

0% found this document useful (0 votes)

44 views7 pages

Original Title

2 Theoretical Background

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

44 views7 pages

2) Theoretical Background: 2.1 EDA (Exploratory Data Analysis)

Uploaded by

Hussain Mujtaba

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 7

Search inside document

2) Theoretical Background

This chapter defines the following parts of the project, EDA(Exploratory Data Analysis), Feature
Engineering, Feature Selection, and Model Building and is the basis for the further project.
2.1 EDA (Exploratory Data Analysis)
Exploratory Data Analysis refers to the critical process of performing initial investigations on data so
as to discover patterns,to spot anomalies,to test hypothesis and to check assumptions with the help
of summary statistics and graphical representations.

It is a good practice to understand the data first and try to gather as many insights from it. EDA is
all about making sense of data in hand,before getting them dirty with it.

 In EDA we load the Dataset

 Describe that Data by using some Functions( df.describe(), df.info() )
 Find any missing values (df.isna() )
 Find out the outlires (using Histogram , Box plot )
 Visualize Data ( Using Matplotlib , Seaborn )
Figure 2.1 ( A Histogram plot)
The above one figure (Figure 2.1) a histogram tells us the various information about the
Variable(Feature or Column)
2.2 Feature Engineering

Feature Engineering is the Next Step in A DATA SCIENCE or MACHINE LEARNING Project
after EDA (Exploratory Data Analysis)

Feature engineering is the process of using domain knowledge to extract

features (characteristics, properties, attributes) from raw data. A feature is a property shared by
independent units on which analysis or prediction is to be done. Features are used by predictive
models and influence results.
In EDA we just get know about the Data ( missing Values, Outlires ) but in Feature Engineering
we Clean the data by Handling Missing Values , Removing Outlires.

Handling Missing Values

Drop missing values
Fill Missing Values by Mean, Median, Mode

Handling Outlires
Using Standard Deviation
Normal Distribution
IQR (Inter Quartile Range)

2.3 Feature Selection

Feature selection is the process of reducing the number of input variables when developing a
predictive model.

It is desirable to reduce the number of input variables to both reduce the computational cost of
modeling and, in some cases, to improve the performance of the model.
Statistical-based feature selection methods involve evaluating the relationship between each
input variable and the target variable using statistics and selecting those input variables that
have the strongest relationship with the target variable. These methods can be fast and
effective, although the choice of statistical measures depends on the data type of both the input
and output variables.

As such, it can be challenging for a machine learning practitioner to select an appropriate

statistical measure for a dataset when performing filter-based feature selection.

In this post, you will discover how to choose statistical measures for filter-based feature
selection with numerical and categorical data.

 There are two main types of feature selection techniques: supervised and unsupervised,
and supervised methods may be divided into wrapper, filter and intrinsic.

 Filter-based feature selection methods use statistical measures to score the correlation
or dependence between input variables that can be filtered to choose the most relevant
features.

 Statistical measures for feature selection must be carefully chosen based on the data
type of the input variable and the output or response variable.
2.4 Model Building

A machine learning model is built by learning and generalizing from training data, then
applying that acquired knowledge to new data it has never seen before to make predictions and
fulfill its purpose. Lack of data will prevent you from building the model, and access to data isn't
enough.

DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Machine Learning Part: Domain Overview
Document20 pages
Machine Learning Part: Domain Overview
surya prakash
No ratings yet
Feature Engineering: Short Study: Indian Institute of Space Science and Technology, Department of Mathematics
Document6 pages
Feature Engineering: Short Study: Indian Institute of Space Science and Technology, Department of Mathematics
goci
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
An Enhancement of Deep Feature Synthesis Algorithm Using Mean, Median, and Mode Imputation
Document3 pages
An Enhancement of Deep Feature Synthesis Algorithm Using Mean, Median, and Mode Imputation
International Journal of Innovative Science and Research Technology
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
SVMvs KNN
Document5 pages
SVMvs KNN
Look HIM
No ratings yet
Deep Learning Vocabulary
Document6 pages
Deep Learning Vocabulary
jaffar bikat
No ratings yet
PRACTICAL5
Document23 pages
PRACTICAL5
thundergamerz403
No ratings yet
Experiment No.: 9: T. Y. B. Tech (CSE) - II Subject: Open Source Lab-II
Document4 pages
Experiment No.: 9: T. Y. B. Tech (CSE) - II Subject: Open Source Lab-II
ASHISH MALI
No ratings yet
DM - MOD - 1 Part III
Document12 pages
DM - MOD - 1 Part III
sandrarajuofficial
No ratings yet
Solutions To DM I MID (A)
Document19 pages
Solutions To DM I MID (A)
jyothibellaryv
100% (1)
Compulsory Question
Document6 pages
Compulsory Question
Cole
No ratings yet
Feature Engineering for Beginners
From Everand
Feature Engineering for Beginners
Chuck Sherman
No ratings yet
Unit 7 ML
Document33 pages
Unit 7 ML
Yuvraj Chauhan
No ratings yet
Chapter 5, Class 9 - AI
Document4 pages
Chapter 5, Class 9 - AI
Abhishek Singh Baghel
No ratings yet
AML Individual Practical PaulY Vfinal
Document7 pages
AML Individual Practical PaulY Vfinal
Leonardo Hernanz
No ratings yet
Explain in Detail Different Types of Machine Learning Models?
Document14 pages
Explain in Detail Different Types of Machine Learning Models?
Sirisha
No ratings yet
Top 30 Data Analyst Interview Questions & Answers (2022)
Document16 pages
Top 30 Data Analyst Interview Questions & Answers (2022)
wesaltarron
No ratings yet
18-Article Text-61-1-10-20200510
Document6 pages
18-Article Text-61-1-10-20200510
Ghi.fourteen Ghi.fourteen
No ratings yet
Exploratory Data Analysis (EDA) Using Python
Document21 pages
Exploratory Data Analysis (EDA) Using Python
bpjstk vc
No ratings yet
Feature Selection Effects On Classification Algorithms
Document3 pages
Feature Selection Effects On Classification Algorithms
aman pandey
No ratings yet
Ipl Matches Documentation
Document28 pages
Ipl Matches Documentation
Nitish Kumar Mohanty
No ratings yet
Survey On Feature Selection in High-Dimensional Data Via Constraint, Relevance and Redundancy
Document4 pages
Survey On Feature Selection in High-Dimensional Data Via Constraint, Relevance and Redundancy
International Journal of Application or Innovation in Engineering & Management
No ratings yet
Class IX - Chapter 2 AI Project Cycle Notes
Document11 pages
Class IX - Chapter 2 AI Project Cycle Notes
Ammu Siri
No ratings yet
MACHINE LEARNING 1-5 (Ai &DS)
Document60 pages
MACHINE LEARNING 1-5 (Ai &DS)
Amani yar Khan
100% (1)
Group 5 - Smsma
Document17 pages
Group 5 - Smsma
abhilashmba22
No ratings yet
Data Mining Chapter 1
Document12 pages
Data Mining Chapter 1
Rony saha
0% (1)
Group A Assignment No2 Writeup
Document9 pages
Group A Assignment No2 Writeup
403 Chaudhari Sanika Sagar
No ratings yet
3.1 Dimensionality reduction
Document24 pages
3.1 Dimensionality reduction
Javada Javada
No ratings yet
DMWH M3
Document21 pages
DMWH M3
BINESH
No ratings yet
R Data Analysis
Document10 pages
R Data Analysis
Jatinder Kumar
No ratings yet
Data Mining University Answer
Document10 pages
Data Mining University Answer
oozed12
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
An Approach To Evaluate Tactical Decision-Making in Industrial Maintenance
Document4 pages
An Approach To Evaluate Tactical Decision-Making in Industrial Maintenance
Rafael
No ratings yet
TE Computer DSBDA
Document11 pages
TE Computer DSBDA
1620Chaitanya Suryavanshi
No ratings yet
Business Analytics in 40 Characters
Document6 pages
Business Analytics in 40 Characters
Ajin Paul
No ratings yet
Pattern Recognition
Document57 pages
Pattern Recognition
TapasKumarDash
No ratings yet
Function Point Analysis Example PDF
Document4 pages
Function Point Analysis Example PDF
Pranav Sanjeev kumar
No ratings yet
Main Dock Pin
Document31 pages
Main Dock Pin
Paul Walker
No ratings yet
Predictive Analysis Workbook
Document19 pages
Predictive Analysis Workbook
Confidence Ude
No ratings yet
Lecture 2 Introduction
Document24 pages
Lecture 2 Introduction
Pooja Vashisth
No ratings yet
Case Study 219302405
Document14 pages
Case Study 219302405
nishantjain2k03
No ratings yet
UNIT 1 Exploratory Data Analysis
Document8 pages
UNIT 1 Exploratory Data Analysis
parimala balamurugan
100% (1)
AML PRG Assign I
Document3 pages
AML PRG Assign I
Padma
No ratings yet
Develop A Program To Implement Data Preprocessing Using
Document19 pages
Develop A Program To Implement Data Preprocessing Using
Fucker Jamun
No ratings yet
CH 05 - Analysis Modeling
Document8 pages
CH 05 - Analysis Modeling
zack
No ratings yet
Introduction To Data Science
Document8 pages
Introduction To Data Science
course16rahul
No ratings yet
Unit 4 Basics of Feature Engineering
Document33 pages
Unit 4 Basics of Feature Engineering
Yash Desai
No ratings yet
DA (all chp.)
Document14 pages
DA (all chp.)
Sushant Thite
No ratings yet
Data Mining 2-5
Document4 pages
Data Mining 2-5
nirman kumar
No ratings yet
Study and Review On Developed Data Mining Systems: Preeti, Ms. Preethi Jose Dolly
Document2 pages
Study and Review On Developed Data Mining Systems: Preeti, Ms. Preethi Jose Dolly
erpublication
No ratings yet
Models For Machine Learning: M. Tim Jones
Document10 pages
Models For Machine Learning: M. Tim Jones
Shanti Guru
No ratings yet
Fashion Intelligent System Using Machine Learning
Document9 pages
Fashion Intelligent System Using Machine Learning
ADVENTURE CSE
No ratings yet
PCX - RepoHHHHHHHHHrt
Document13 pages
PCX - RepoHHHHHHHHHrt
Said Rahman
No ratings yet
Feature Selection Techniques in ML With Python-1
Document7 pages
Feature Selection Techniques in ML With Python-1
Дхиа Еддине
No ratings yet
Loan Prediction System
Document31 pages
Loan Prediction System
Fakorede Akinwande alex
No ratings yet
Data Science 2
Document55 pages
Data Science 2
kagome
No ratings yet
The Implication of Statistical Analysis and Feature Engineering For Model Building Using Machine Learning Algorithms
Document11 pages
The Implication of Statistical Analysis and Feature Engineering For Model Building Using Machine Learning Algorithms
ijcses
No ratings yet
Semi Supervised Learning
Document86 pages
Semi Supervised Learning
chaudharylalit025
No ratings yet
Acquisition Milestones and Schedules
Document1 page
Acquisition Milestones and Schedules
Hussain Mujtaba
No ratings yet
Statement of Goals and Objectivesw
Document1 page
Statement of Goals and Objectivesw
Hussain Mujtaba
No ratings yet
Experience Letter for Accounts Manager
Document1 page
Experience Letter for Accounts Manager
Hussain Mujtaba
No ratings yet
Generic Team Charter Template
Document2 pages
Generic Team Charter Template
Saravanan Rasaya
No ratings yet
Logistica Grile
Document17 pages
Logistica Grile
Alexandra Iuliana
No ratings yet
Netflix Inspired Powerpoint Design Template (By GEMO EDITS)
Document12 pages
Netflix Inspired Powerpoint Design Template (By GEMO EDITS)
Sir Nel Atienza
100% (2)
Es Case Study
Document4 pages
Es Case Study
Vedant Kaushish
No ratings yet
VBA Basics: COMP1022Q Introduction To Computing With Excel VBA
Document20 pages
VBA Basics: COMP1022Q Introduction To Computing With Excel VBA
Peiang ZHAO
No ratings yet
DRS3 3.4.0.0, Guidance On Reporting Procedures
Document18 pages
DRS3 3.4.0.0, Guidance On Reporting Procedures
Avick Tribedi
No ratings yet
EUDAMED MD Actor Module Q-A en
Document10 pages
EUDAMED MD Actor Module Q-A en
Ghada Jlassi
No ratings yet
CSC 305 VTL Lecture 04 20211
Document26 pages
CSC 305 VTL Lecture 04 20211
Bello Taiwo
No ratings yet
Choas: 2e Legiev
Document18 pages
Choas: 2e Legiev
sroyrnj
No ratings yet
Cookie
Document2 pages
Cookie
Orxan Şahbazlı
No ratings yet
Optiplex Small Form Factor Spec Sheet
Document9 pages
Optiplex Small Form Factor Spec Sheet
Pablo Chin
No ratings yet
05 Activity 1
Document1 page
05 Activity 1
playah hate
No ratings yet
Best Airline Route Planning Software in 2021 G2
Document1 page
Best Airline Route Planning Software in 2021 G2
Владислав Волокитин
No ratings yet
1.1.3.12 Lab - Diagram A Real-World Process
Document2 pages
1.1.3.12 Lab - Diagram A Real-World Process
Ju Ra
No ratings yet
Informatica Developer - Murali
Document3 pages
Informatica Developer - Murali
raaman
No ratings yet
Gujarat Technological University: Instructions
Document1 page
Gujarat Technological University: Instructions
Hardik Patoliya
No ratings yet
V 2.0 Kubernetes On Microsoft Azure Audit Checklist
Document25 pages
V 2.0 Kubernetes On Microsoft Azure Audit Checklist
raju mahtre
No ratings yet
Organize Work Place Information PDF
Document10 pages
Organize Work Place Information PDF
Ahmed
No ratings yet
LU Decomposition PDF
Document12 pages
LU Decomposition PDF
Surender Reddy
No ratings yet
Lifehack Presents - The WriteMonkey Mini User Guide
Document6 pages
Lifehack Presents - The WriteMonkey Mini User Guide
shubh
No ratings yet
ID121 NIS T3 Flowchart and Checklist (Updated March 2022)
Document2 pages
ID121 NIS T3 Flowchart and Checklist (Updated March 2022)
YEET
No ratings yet
An5419 Getting Started With stm32h723733 stm32h725735 and stm32h730 Value Line Hardware Development Stmicroelectronics
Document50 pages
An5419 Getting Started With stm32h723733 stm32h725735 and stm32h730 Value Line Hardware Development Stmicroelectronics
Juan Gil Roca
No ratings yet
Erbol Cat 2021
Document22 pages
Erbol Cat 2021
Sebastian Calle
No ratings yet
A Basic Guide To The Internet
Document3 pages
A Basic Guide To The Internet
datiles jozen paul
No ratings yet
Company Profile - ITCons E-Solutions PVT LTD
Document19 pages
Company Profile - ITCons E-Solutions PVT LTD
Kanika
No ratings yet
Computers in Industry: Concetta Semeraro, Mario Lezoche, Hervé Panetto, Michele Dassisti
Document23 pages
Computers in Industry: Concetta Semeraro, Mario Lezoche, Hervé Panetto, Michele Dassisti
Andrés Hurtado
No ratings yet
Display 30 Pinos B140XTN02.4-AUO
Document33 pages
Display 30 Pinos B140XTN02.4-AUO
Bob
No ratings yet
Formule Calcul Geografie PDF
Document1 page
Formule Calcul Geografie PDF
Valentina Vasiescu
No ratings yet
Seeview Manual PDF
Document365 pages
Seeview Manual PDF
Shiva
No ratings yet
ORDS-Best Practices and Coding Standard
Document9 pages
ORDS-Best Practices and Coding Standard
Anshul Chandak
No ratings yet
@toffey's ZIMSEC Project
Document49 pages
@toffey's ZIMSEC Project
Taku Ndaveni
86% (7)