Professional Documents
Culture Documents
Analysis
S U D H A N VA S A R A L AYA
Introduction and Purpose
This particular presentation is made to analyse the data given and answer the questions asked below.
Plot graphically which food categories have the highest and lowest varieties.
Which variables have the highest correlation? Plot them and find out the value?
Packages
Application and
▪Import numpy as np for Numpy related calculations
▪Filterwarnings("ignore")
Plot graphically which food
categories have the highest
and lowest varieties.
Answering this question is very easy. First, we see the count
for all the category and then we use a graph to make sense
out of it.
outlier?
Saturated Fat
Saturated Fat (% Daily Value)
Trans Fat
Cholesterol
Before we go to what and all variables
Cholesterol (% Daily Value)
have a outliner we need to understand
what a outliner is. Sodium
Sodium (% Daily Value)
▪Outliner is the extreme values in a data
Carbohydrates
set. It can be extremely high or extremely
low values associated to the data set. Carbohydrates (% Daily Value)
Outliners can be only checked for Dietary Fiber
numerical values. Dietary Fiber (% Daily Value)
For Example : 0 100 200 300 Sugars
10000. here 0 and 10000 are a outliner. Protein
Vitamin A (% Daily Value)
▪The following slides will show all the
graphs with a small inference on outliers Vitamin C (% Daily Value)
Calcium (% Daily Value)
Iron (% Daily Value)
Here we can see outliers in
Calories so we can assume that
outliers are present is calories
from Fat, Total Fat (% Daily
Value)
total fat as they are all directly
proportional to each other. We
can see the below Boxplot and
tell that the assumptions are
correct
Interesting inference is that even though Total Fats and Total
Fat (% Daily Value) have outliers, Saturated fat Saturated Fat
(% Daily Value) don’t have any
When we come to
trans fat we see
some outliers. As
most of the values
are 0 we see no
proper plot and but
many outliers
As we see in the 2
plots Carbohydrates
and Carbohydrates
(% Daily Value) have
outlier present in
them.
Outliers are
present in
Cholesterol and
Cholesterol (%
Daily Value)
Outliers are
present in
Sodium and
Sodium (%
Daily Value)
When it comes to Dietary
Fiber and Dietary Fiber (%
Daily Value) which are
related to each other we see
that outliers are present
only Dietary Fiber (% Daily
Value) even though Dietary
Fiber does not have any.
Even Sugar has outliers
present
We even see outliers
in Protein as well
Here we see
outliers in
different
Vitamins like C,
A as well in Iron
And Calcium
Which variables have the highest correlation?
Plot them and find out the value?