You are on page 1of 7

Exploratory Data Analysis (Python)

Exploratory Data Analysis (EDA) is a crucial step in any


data analysis project. It involves analyzing and visualizing
data to gain insights, understand patterns, and identify
potential relationships between variables.

Tools used: Python, R, Excel, Apache Spark


Summary Statistics: Calculate basic statistics such as
mean, median, mode, standard deviation, range, etc., to
summarize the dataset's main characteristics.
• Information about the data
• Checking Data types only

• Summary Statistics of numerical data

• Summary Statistics of categorical data


Missing Values: Identify missing values in the dataset
and decide on appropriate strategies for handling them,
such as imputation or deletion.
Data Visualization: Use various graphs and plots to
visualize the data, including histograms, box plots,
scatter plots, bar plots, etc. Visualization helps in
understanding the distribution of variables, identifying
outliers, and spotting trends or patterns.
Data Transformation: Perform transformations on
variables if needed, such as log transformations,
normalization, or standardization, to make the data more
suitable for analysis or modeling.
Correlation Analysis: Examine the relationships between
variables using correlation coefficients or correlation
matrices. This helps in understanding the strength and
direction of relationships between variables.

You might also like