You are on page 1of 7

Lab FAT

Data Visualization - CSE3020


Slot: L27+L28

Name:S.S.bhaskar reddy
REG NO:18BCE0808

Submitted to-Prof. Ramani S

School of Computer Science & Engineering


Question 1

1. Using the built-in data set airquality, create a scatter plot comparing the
Temp and Ozone variables. Does there appear to be a relationship?
2. Create a histogram of the Temp variable. Can you adjust the binning so that
there are (approximately) 25 bins? Does this look to be approximately normally
distributed?
3. Plot the frequency of observations in each Month. Are the months equally
represented?
4. Plot a graph between the Ozone and Wind values
5. Create a boxplot to view the distribution of Ozone for each month. Do the
distributions differ across the months?

Code:-
1. import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df=pd.read_csv("C:\\Users\\admin\\Downloads\\airquality.csv")
x=df["Temp"]
y=df["Ozone"]
plt.scatter(x,y)
plt.show()

Explanation:
Yes, there indeed appears to be a direct relationship between the Temp & Ozone
variables as Ozone levels rise up when the temperature rises.
2.x=df["Temp"]
plt.hist(x, bins=25)

Explanation:
A histogram is created for the variable Temp and the binn value is set to 25 (changed from the
default value of 30).
Yes, the distribution as seen from the histogram looks to be a normal distribution.

3.g={}
for i in df["Month"]:
if i in g:
g[i]+=1
else:
g[i]=1
g
x=[]
y=[]
for i in g:
x.append(i)
y.append(g[i])
x
y
plt.bar(x,y)
Explanation:
Yes, the months are represented equally.

4.x=df["Ozone"]
y=df["Wind"]
plt.scatter(x,y)
plt.show()

Explanation:
The plot is created showing the relation of Ozone and Wind.
5.import seaborn as sns
ax = sns.boxplot(x=df["Month"], y=df["Ozone"], data=df)

Explanation:
Yes, the distributions differ across the months.

Question 2

Create your own (Student Record) with 5 attributes and 20 observations. Show the key and value
attributes in the dataset using different colors by displaying in a table form.

code:
df3=pd.read_csv("C:\\Users\\admin\\Desktop\\archive\\StudentsPerformance.csv")
from sklearn.model_selection import train_test_split as tr
X,X_test,y,y_test=tr(df3["math score"].values.reshape(-1,1),df3["writing score"],test_size=0.3)
from sklearn.linear_model import LinearRegression
model2=LinearRegression()
model2.fit(X,y)
sns.regplot(x=X, y=y, data=df3);
from sklearn.model_selection import train_test_split as tr
X,X_test,y,y_test=tr(df3["reading score"].values.reshape(-1,1),df3["writing score"],test_size=0.3)
from sklearn.linear_model import LinearRegression
model2=LinearRegression()
model2.fit(X,y)
sns.regplot(x=X, y=y, data=df3)

Explanation: shown LinearRegression from data

cm=sns.light_palette("red",as_cmap=True)
df3.style.background_gradient(cmap='Blues')
Explanation:shown data set used

You might also like