You are on page 1of 10

BASIC OF ANALYTIC

SUBMITTED TO- ANKITA MENDIRATTA


SUBMITTED BY- RUKHSAR KHATUN
UID NO- 2022-2105-0001-0014
SECTION- PGDM-1
TABLE OF CONTENT
 INTRODUCTION
 SAMPLE DATA
 DUPLICATION & MISSING VALUES OF DATA
 CORRELATION, SKEWNESS & KURTOSIS OF DATA
 VARIABLE PRICE, STANDARD DEVIATION.
 DESCRIPTIVE STATISTICS
INTRODUCTION
Python is a dynamic, interpreted (bytecode-compiled) language.
There are no type declarations of variables, parameters, functions, or
methods in source code. This makes the code short and flexible, and
you lose the compile-time type checking of the source code.
What can Python do?

 Python can be used on a server to create web applications.


 Python can be used alongside software to create workflows.
 Python can connect to database systems. It can also read and
modify files.
 Python can be used to handle big data and perform complex
mathematics.
 Python can be used for rapid prototyping, or for production-
ready software development.

Why Python?

 Python works on different platforms (Windows, Mac, Linux,


Raspberry Pi, etc).
 Python has a simple syntax similar to the English language.
 Python has syntax that allows developers to write programs
with fewer lines than some other programming languages.
 Python runs on an interpreter system, meaning that code can
be executed as soon as it is written. This means that
prototyping can be very quick.
 Python can be treated in a procedural way, an object-oriented
way or a functional way.

Good to know

 The most recent major version of Python is Python 3, which we


shall be using in this tutorial. However, Python 2, although not
being updated with anything other than security updates, is still
quite popular.
 In this tutorial Python will be written in a text editor. It is
possible to write Python in an Integrated Development
Environment, such as Thonny, Pycharm, Netbeans or Eclipse
which are particularly useful when managing larger collections
of Python files.

DATA
So, According to import of FMCG goods data we deliberate
dimensions and datatype of data. In which we calculated Number of
rows and columns, header names, datatypes. And in Sample data we
calculated top 10 rows, bottom 10 rows.

DUPLICATE AND COUNT


According to data, we observed duplicate records if any , so we
found 0 duplicate records.
In missing value =0
Duplicate value=0

CORRELATION ,
SKEWNESS AND KURTOSIS OF DATA
On the above data we calculated correlation between discount,
order ID, order quantity, Profit, Sales, Price. Where we get
correlation above 0.6 and below 1.
On the basis of this data we also calculated skewness and kurtosis
where we got positive skewness on discount, order ID, Profit,
Sales, Price and got negative skewness on order quantity. Likewise
on kurtosis we get positive on profit, sales and price and negative
kurtosis on discount, order ID, order quantity.
kutrosis-17.8386060
Skewness- 3.463269

VARIABLE PRICE
On the basis of import data we done certain operations like we
calculated the minimum value of PRICE variable, sum of PRICE
variable, mean of PRICE variable, mode of PRICE variable and
standard deviation of PRICE variable

Mean=3060.916
standard deviation=5167.5817
mode=275.0363
shape of Boston data set-(7811,9)
SCATTER PLOT
 This graph represent the correlation between
QTY and price of the data
 A scatter plot is a diagram where each value in the data set is
represented by a dot.
 The Matplotlib module has a method for drawing scatter plots,
it needs two arrays of the same length, one for the values of
the x-axis, and one for the values of the y-axis.
 import matplotlib.pyplot as plt
HISTOGRAM PLOT
• It showing the frequency distribution of every column of data
• It is a graph showing the number of observations within each
given interval.
• Using the data we plotted the graph for Discount, order ID,
order quantity, profit, sales, price.
• import seaborn as sns
DISTRIBUTION PLOT
Seaborn is a Python data visualization library based on
Matplotlib. It provides a high-level interface for drawing
attractive and informative statistical graphics. This article deals
with the distribution plots in seaborn which is used for
examining univariate and bivariate distributions

HEATMAP
A heatmap is a two-dimensional graphical representation of
data where the individual values that are contained in a matrix
are represented as colors. The Seaborn package allows the
creation of annotated heatmaps which can be tweaked using
Matplotlib tools as per the creator's requirement
THANK YOU

You might also like