Welcome to Scribd!

Name: Pratik Vasant Bhosale Class: MBA 1 (BA) Roll No. 05 Subject: Data Exploration and Visualization

Uploaded by

0% found this document useful (0 votes)

21 views4 pages

1. The document is an article review by Pratik Vasant Bhosale on the article "Data Exploration and Analysis Using Python" by Raji Rai. 2. The article discusses the various steps involved in data exploration such as data cleaning, preprocessing, identifying predictor and target variables, and understanding variable types. 3. Techniques like univariate analysis to identify outliers and missing values, bivariate analysis to understand relationships between variables, and visualizations using Matplotlib and Seaborn are covered.

Original Description:

Original Title

Article Review

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

21 views4 pages

Name: Pratik Vasant Bhosale Class: MBA 1 (BA) Roll No. 05 Subject: Data Exploration and Visualization

Uploaded by

Pratik Bhosale

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 4

Search inside document

Name: Pratik Vasant Bhosale

Class: MBA 1 (BA)

Roll No. 05
Subject: Data Exploration and Visualization

Article Review
On
Data Exploration and Analysis Using Python
Author: Raji Rai

Data exploration is a key aspect of data analysis

and model building. Without spending significant time on
understanding the data and its patterns one cannot expect to
build efficient predictive models. Data exploration takes major
chunk of time in a data science project comprising of data
cleaning and preprocessing.
In this article, explain the various steps involved in data
exploration through simple explanations and Python code
snippets.
Data sources can vary from databases to websites. Data
sourced is known as raw data. Raw data cannot be directly used
for model building, as it will be inconsistent and not suitable for
prediction. It has to be treated for anomalies and missing
values. Variable can be of different types such as character,
numeric, categorical, and continuous.
Identifying the predictor and target variable is also a key
step in model building. Target is the dependent variable and
predictor is the independent variable based on which the
prediction is made. Categorical or discrete variables are those
that cannot be mathematically manipulated. It is made up of
fixed values such as 0 and 1. On the other hand, continuous
variables can be interpreted using mathematical functions like
finding the average or sum of all values. You can use a series of
Python codes to understand the types of variables in your
dataset.
Univariate analysis is used to highlight missing and outlier
values. Here each variable is analysed on its own for range and
distribution. Univariate analysis differs for categorical and
continuous variables. For categorical variables, you can use
frequency table to understand distribution of each category.
For continuous variables, you have to understand the central
tendency and spread of the variable. It can be measured using
mean, median, mode, etc. It can be visualized using box plot or
histogram.

Bivariate Analysis is used to find the relationship between

two variables. Analysis can be performed for combination of
categorical and continuous variables. Scatter plot is suitable for
analyzing two continuous variables. It indicates the linear or
non-linear relationship between the variables. Bar charts helps
to understand relation between two categorical variables.
Certain statistical tests are also used to effectively understand
bivariate relationship. Scipy library has extensive modules for
performing these tests in Python.
Matplotlib and Seaborn libraries can be used to plot
different relational graphs that help visualizing bivariate
relationship between different types of variables.
Missing values in the dataset can reduce model fit. It can
lead to a biased model as the data cannot be analysed
completely. Behavior and relationship with other variables
cannot be deduced correctly. It can lead to wrong prediction or
classification. Missing values may occur due to problems in data
extraction or data collection, which can be categorized as
MCAR, MAR, and NMAR.
Missing ValuesMissing values can be treated by deletion,
mean/mode/median imputation, KNN imputation, or using
prediction models.
Outliers can occur naturally in a data or can be due to data
entry errors. They can drastically change the results of the data
analysis and statistical modeling. Outliers are easily detected by
visualization methods, like box-plot, histogram, and scatter
plot. Outliers are handled like missing values by deleting
observations, transforming them, binning or grouping them,
treating them as a separate group, or imputing values.
Author - Raji Rai
Source – towardsdatascience.com

Faculty of Computers and Information Technology Assembly Sheet
Document1 page
Faculty of Computers and Information Technology Assembly Sheet
Yousef Samir Allam
No ratings yet
hwk8 Sol
Document5 pages
hwk8 Sol
Dylan Ler
No ratings yet
Exploratory Data Analysis and Data Preprocessing - Dr. Haleema
Document11 pages
Exploratory Data Analysis and Data Preprocessing - Dr. Haleema
NishaPauline
No ratings yet
Exploratory Data Analysis EDA Part of Data PreProcessing
Document11 pages
Exploratory Data Analysis EDA Part of Data PreProcessing
Edgar Camargo
No ratings yet
DSBDL Asg 2 Write Up
Document4 pages
DSBDL Asg 2 Write Up
sdaradeyt
No ratings yet
How To Prepare Data For Predictive Analysis
Document5 pages
How To Prepare Data For Predictive Analysis
Mahak Kathuria
No ratings yet
Solved With ChatGPT
Document3 pages
Solved With ChatGPT
Md. Mahbubur Rahman
No ratings yet
Exploratory Data Analysis in ML
Document7 pages
Exploratory Data Analysis in ML
Suresh Kumar
No ratings yet
Data Mining
Document34 pages
Data Mining
NABEEL KHAN
No ratings yet
Big Data for Beginners: Book 2 - An Introduction to the Data Analysis, Visualization, Integration, Interoperability, Governance and Ethics
From Everand
Big Data for Beginners: Book 2 - An Introduction to the Data Analysis, Visualization, Integration, Interoperability, Governance and Ethics
Brian Murray
No ratings yet
Data Science
Document17 pages
Data Science
Nabajit
No ratings yet
CSA Unit 4
Document16 pages
CSA Unit 4
Aditya Shah
No ratings yet
Dimensionality Reduction-PCA FA LDA
Document12 pages
Dimensionality Reduction-PCA FA LDA
Javada Javada
No ratings yet
Edab Module - 1
Document20 pages
Edab Module - 1
Chirag 17
No ratings yet
Book Machine Learning Finance Python
Document75 pages
Book Machine Learning Finance Python
Alanger
100% (1)
Basic Data Science Interview Questions
Document18 pages
Basic Data Science Interview Questions
Ramesh k
No ratings yet
Unit 3 Notes
Document5 pages
Unit 3 Notes
patilamrutak2003
No ratings yet
Core Data Science Concepts 1629081058
Document24 pages
Core Data Science Concepts 1629081058
Abhishek Prasoon
No ratings yet
Unit 2
Document58 pages
Unit 2
radhikakumbhar2978
No ratings yet
Advanced Data Analytics Assignment
Document6 pages
Advanced Data Analytics Assignment
Olwethu N Mahlathini (Lethu)
No ratings yet
Exploratory Data Analysis Using Python
Document7 pages
Exploratory Data Analysis Using Python
bbboss2266
No ratings yet
Unit 3
Document47 pages
Unit 3
Sai priyadarshini S
No ratings yet
Sample Thesis Using Regression Analysis
Document6 pages
Sample Thesis Using Regression Analysis
gj4m3b5b
100% (2)
Data Science Full
Document32 pages
Data Science Full
BCS Wala
No ratings yet
NSE Project
Document11 pages
NSE Project
Shahana Fathima
No ratings yet
Codes
Document3 pages
Codes
RITESH TRIPATHI
No ratings yet
Unit 3
Document31 pages
Unit 3
Avish Khan
No ratings yet
What Exactly Is Data Science
Document15 pages
What Exactly Is Data Science
s6652565
No ratings yet
Computer Basics Document
Document27 pages
Computer Basics Document
Aruneswarvh Arun
No ratings yet
Questions Stats and Trix
Document39 pages
Questions Stats and Trix
Aakriti Jain
No ratings yet
Unit3 Eda
Document13 pages
Unit3 Eda
RAPTER GAMING
No ratings yet
Kenny-230718-Top 70 Microsoft Data Science Interview Questions
Document17 pages
Kenny-230718-Top 70 Microsoft Data Science Interview Questions
vanjchao
No ratings yet
ML Unit 3
Document17 pages
ML Unit 3
chaitali.choudhary2781
No ratings yet
Exploratory Data Analysis: Datascience Using Python Topic: 3
Document32 pages
Exploratory Data Analysis: Datascience Using Python Topic: 3
KALYANI KIRAN
No ratings yet
Day 1 Article For Discussion
Document5 pages
Day 1 Article For Discussion
onkar nath
No ratings yet
Data Mining Vs Data Exploration UNIT-II
Document11 pages
Data Mining Vs Data Exploration UNIT-II
Hanumanthu Gouthami
No ratings yet
Individual Assigngment
Document13 pages
Individual Assigngment
Keyd Muhumed
No ratings yet
What Is Pattern Recognition and Machine Learning?
Document9 pages
What Is Pattern Recognition and Machine Learning?
Yash Shah
No ratings yet
Be A 65 Ads Exp 3
Document6 pages
Be A 65 Ads Exp 3
Ritika dwivedi
No ratings yet
Data Exploration & Visualization
Document23 pages
Data Exploration & Visualization
divya kolluri
No ratings yet
Unit 4 Notes
Document20 pages
Unit 4 Notes
patilamrutak2003
No ratings yet
Unit 4 Basics of Feature Engineering
Document33 pages
Unit 4 Basics of Feature Engineering
Yash Desai
No ratings yet
Unit 2 - DA - Statistical Concepts
Document137 pages
Unit 2 - DA - Statistical Concepts
MASTER PIECE
No ratings yet
High School Statistics and Probability
Document6 pages
High School Statistics and Probability
ROE40
No ratings yet
Machine Learning Notes
Document112 pages
Machine Learning Notes
mubin.pathan765
No ratings yet
Summary PCA by Atta Mohammad 26040
Document2 pages
Summary PCA by Atta Mohammad 26040
Atta Mohammad
No ratings yet
Business Analytics - Intro
Document2 pages
Business Analytics - Intro
Aman Singh
No ratings yet
Data Analysis Exam 1 36-401, Section B
Document3 pages
Data Analysis Exam 1 36-401, Section B
S
No ratings yet
Data Analytics Course (IIFT Kolkata) Lectures 1 - 4 21072022
Document221 pages
Data Analytics Course (IIFT Kolkata) Lectures 1 - 4 21072022
omprakash
No ratings yet
Data Science New
Document9 pages
Data Science New
Krishna mE
No ratings yet
Deneesha Tharunika Sooriyaarachchi CL-HDCSE-CMU-102-40 CSE5014 1668472 412159309
Document15 pages
Deneesha Tharunika Sooriyaarachchi CL-HDCSE-CMU-102-40 CSE5014 1668472 412159309
helpassignment331
No ratings yet
What Is Data?
Document8 pages
What Is Data?
RITESH TRIPATHI
No ratings yet
Activity 3 Interpreting Data
Document7 pages
Activity 3 Interpreting Data
Isa Pearl
No ratings yet
Exploratory Data Analysis
Document62 pages
Exploratory Data Analysis
60 Vibha Shree.S
No ratings yet
Data Analysis
Document6 pages
Data Analysis
Arun Vidya
No ratings yet
Mmds
Document12 pages
Mmds
Ankitha Vardhini
No ratings yet
Unit II Big Data Learning
Document6 pages
Unit II Big Data Learning
SE10SAKSHI CHAVAN
No ratings yet
III - Topic 6 Finding The Answers To The Research Questions (Interpretation and Presentation of Results)
Document58 pages
III - Topic 6 Finding The Answers To The Research Questions (Interpretation and Presentation of Results)
Jemimah Corporal
100% (2)
Data Visualization
Document7 pages
Data Visualization
Malik Sahab
No ratings yet
Unit I (Notes 2)
Document16 pages
Unit I (Notes 2)
Click Beats
No ratings yet
DMBAR Chapter 4 Dimension Reduction
Document25 pages
DMBAR Chapter 4 Dimension Reduction
ANAM AFTAB 22GSOB2010404
No ratings yet
DEV Lecture Notes Unit I
Document72 pages
DEV Lecture Notes Unit I
prins l
No ratings yet
AS 2417part 2 ISO 2548 Pumps The International Acceptance - 1
Document40 pages
AS 2417part 2 ISO 2548 Pumps The International Acceptance - 1
ruben
No ratings yet
Design Example of Flat Slab by Equilavent Frame Method
Document16 pages
Design Example of Flat Slab by Equilavent Frame Method
Wendimu Tolessa
100% (2)
Dynamics Tutorial 1
Document2 pages
Dynamics Tutorial 1
ethan
No ratings yet
Chapter 2 Signal Analysis and Mixing Tomasi Review
Document5 pages
Chapter 2 Signal Analysis and Mixing Tomasi Review
Hannah Faith Sulong
No ratings yet
An Application To Cryptography Using Fermat's Theorem
Document3 pages
An Application To Cryptography Using Fermat's Theorem
International Journal of Innovative Science and Research Technology
No ratings yet
Choose The Right FFT Window Function When Evaluating Precision ADCs
Document9 pages
Choose The Right FFT Window Function When Evaluating Precision ADCs
kenali
No ratings yet
PHY103A: Lecture # 2: Semester II, 2017-18 Department of Physics, IIT Kanpur
Document21 pages
PHY103A: Lecture # 2: Semester II, 2017-18 Department of Physics, IIT Kanpur
SABARI BALA
No ratings yet
Decision Tree
Document19 pages
Decision Tree
Ramanathan Mech
No ratings yet
Module - Data Management (Part 2)
Document31 pages
Module - Data Management (Part 2)
Nikki Jean Hona
No ratings yet
Partial Differencial Equations Book
Document790 pages
Partial Differencial Equations Book
axjulian
No ratings yet
Al-Khwarizmi (Algorithm) and The Development of Algebra
Document5 pages
Al-Khwarizmi (Algorithm) and The Development of Algebra
umair ali
No ratings yet
Pushover Analysis in Sap
Document5 pages
Pushover Analysis in Sap
SiddharthJoshi
No ratings yet
Free Gyroscope
Document20 pages
Free Gyroscope
Bob Sha
No ratings yet
Long Division With No Remainders Activity Sheets Ver 4
Document4 pages
Long Division With No Remainders Activity Sheets Ver 4
Fon
No ratings yet
Model QP IV Maths Sa 1
Document9 pages
Model QP IV Maths Sa 1
sarveshfdk48
No ratings yet
Department of Electronics & Telecommunications Engineering: ETEL71A-Machine Learning and AI
Document4 pages
Department of Electronics & Telecommunications Engineering: ETEL71A-Machine Learning and AI
Shrey Dixit
No ratings yet
1 Normal Distribution - Wikipedia
Document32 pages
1 Normal Distribution - Wikipedia
Barep Adji Widhi
0% (1)
Seminar Kejar Form 4 Maths MR Uzairi 22.07.2021
Document7 pages
Seminar Kejar Form 4 Maths MR Uzairi 22.07.2021
haziq zaperi
No ratings yet
Rotman Antenna
Document67 pages
Rotman Antenna
Vinh Cam
No ratings yet
Notes Fem
Document88 pages
Notes Fem
Shafqat Hussain
No ratings yet
Problem 4-8
Document4 pages
Problem 4-8
Indra Budi Setiyawan
100% (3)
Introduction To Statastics
Document127 pages
Introduction To Statastics
ebin vincent
No ratings yet
Law of Conservation of Linear Momentum (32-38)
Document7 pages
Law of Conservation of Linear Momentum (32-38)
Vaibhav Tripathi
No ratings yet
7146 Do Deep Neural Networks Suffer From Crowding
Document11 pages
7146 Do Deep Neural Networks Suffer From Crowding
Conor
No ratings yet
Holt Algebra 1 - Chapter 04 - Quiz 3
Document3 pages
Holt Algebra 1 - Chapter 04 - Quiz 3
Stanley
No ratings yet
M.Sc. (Part-I) Mathematics (Revised) - 19.062020
Document12 pages
M.Sc. (Part-I) Mathematics (Revised) - 19.062020
Abubaqar Ansari
No ratings yet
Conjugate Gradient Method
Document14 pages
Conjugate Gradient Method
Yash Menon
No ratings yet
Reliability Based Spare Parts Forecasting and Procurement Strategies
Document30 pages
Reliability Based Spare Parts Forecasting and Procurement Strategies
m7shahid
No ratings yet