This document provides an introduction and overview of exploratory data analysis (EDA). It can be performed using programming languages like R and Python or business intelligence tools like Tableau. Key Python libraries that are useful for EDA include NumPy for mathematical and statistical functions, Pandas for data manipulation and time series analysis, and visualization libraries like seaborn and matplotlib. Common plots used in EDA are histograms, scatter plots, pair plots, box plots, violin plots, and distribution plots to visualize and understand datasets.
This document provides an introduction and overview of exploratory data analysis (EDA). It can be performed using programming languages like R and Python or business intelligence tools like Tableau. Key Python libraries that are useful for EDA include NumPy for mathematical and statistical functions, Pandas for data manipulation and time series analysis, and visualization libraries like seaborn and matplotlib. Common plots used in EDA are histograms, scatter plots, pair plots, box plots, violin plots, and distribution plots to visualize and understand datasets.
This document provides an introduction and overview of exploratory data analysis (EDA). It can be performed using programming languages like R and Python or business intelligence tools like Tableau. Key Python libraries that are useful for EDA include NumPy for mathematical and statistical functions, Pandas for data manipulation and time series analysis, and visualization libraries like seaborn and matplotlib. Common plots used in EDA are histograms, scatter plots, pair plots, box plots, violin plots, and distribution plots to visualize and understand datasets.
● Roll No : 18EJCEE008 Intro to Exploratory Data Analysis Machine Learning and AI In statistics, exploratory data analysis is an approach of analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task. How to perform EDA
We can perform EDA either by using most popular programming
languages used for statistics like R , python or use Business Intelligence(BI) tools like Tableau, IBM Cognos, Qlik sense and other tools. BI tools provide interactive dashboards for understanding data. They are easy to use and some of BI tools are also integrated with building machine learning models with no need of writing of code. NumPy Data Analysis NumPy is used for comprehensive mathematical functions, random Data analysis is a process of number generators, linear algebra inspecting, cleansing, routines, Fourier transforms, and transforming and modeling data more. with the goal of discovering useful information, informing conclusions and supporting decision-making. Pandas Pandas is a fast and efficient data frame object for data manipulation with integrated indexing. Pandas is in use in a wide variety of academic and commercial domains, Time series-functionality: date including Finance, Neuroscience, range generation and frequency Economics, Statistics, Advertising, conversion, moving window Web Analytics, and more. statistics, date shifting and lagging. Even create domain-specific time offsets and join time series without losing data. Data Visualization With data visualisation libraries like seaborn , matplotlib in python aims to make visualization a central part of exploring and understanding data. It is data oriented plotting function operate on dataframes and arrays containing whole datasets and internally perform the necessary semantic mapping and statistical aggregation to produce informative plots. Some of the examples are shown. Histograms Common Plots used Scatter plots for Visualization Pair plots Box plots Violin plots Distribution Plots Thank You