You are on page 1of 22

Laboratoire

d’Intelligence Artificielle
ESIEA 3A 2019-2020

Mihir Sarkar
mihir@media.mit.edu
Objectifs du cours

Module d’introduction général expliquant l’histoire de l’IA, les


principaux concepts, ses applications et ses défis
Introduction to AI & Data
Science:
Applications & Techniques
Install Anaconda
Download input.zip
data_description.txt
sample_submission.csv
test.csv
train.csv
How to improve your business using data
science ?
You want to improve your home sales business by providing an easy to use
house price estimation to your clients ? Here you will see how to do it using
machine learning models.

This notebook is an introduction to python programming and data science. It


contains three parts:

1. Explore your data


2. Build your first model
3. Improve it
Import packages
Programming features are available throught packages which have to
be imported to be used in program. In this notebook you will use the most
popular python packages for data manipulation, data
visualization and machine learning.

# data manipulation
import pandas

# data visualization
import matplotlib.pyplot as plt
import seaborn
# configuration, don't pay attention
%matplotlib inline
plt.rcParams["figure.figsize"]=20, 10

import warnings
warnings.filterwarnings('ignore')
Import your data
data = pandas.read_csv('input/train.csv')
Import your data
data = pandas.read_csv('input/train.csv’)
data.head()
Import your data
data = pandas.read_csv('input/train.csv’)
data.head()

target = data['SalePrice’]
target.describe()
Import your data
data = pandas.read_csv('input/train.csv’)
data.head()

target = data['SalePrice’]
target.describe()

seaborn.distplot(target)
Import your data
data = pandas.read_csv('input/train.csv’)
data.head()

target = data['SalePrice’]
target.describe()

seaborn.distplot(target)

#scatter plot grlivarea/saleprice


data.plot.scatter(x='GrLivArea', y='SalePrice', ylim=(0, 800000))
Bivariate analysis
The art of data visualization
seaborn.boxplot(x='YearBuilt', y="SalePrice", data=data)
Bivariate analysis
The art of data visualization
seaborn.boxplot(x='YearBuilt', y="SalePrice", data=data)

data['year_grouped'] = pandas.cut(data['YearBuilt'], 10)


seaborn.boxplot(x='year_grouped', y="SalePrice", data=data)
Bivariate analysis
The art of data visualization
seaborn.boxplot(x='YearBuilt', y="SalePrice", data=data)

data['year_grouped'] = pandas.cut(data['YearBuilt'], 10)


seaborn.boxplot(x='year_grouped', y="SalePrice", data=data)

data['year_grouped'] = pandas.qcut(data['YearBuilt'], 10)


seaborn.boxplot(x='year_grouped', y="SalePrice", data=data)
Bivariate analysis
The art of data visualization
seaborn.boxplot(x='YearBuilt', y="SalePrice", data=data)

data['year_grouped'] = pandas.cut(data['YearBuilt'], 10)


seaborn.boxplot(x='year_grouped', y="SalePrice", data=data)

data['year_grouped'] = pandas.qcut(data['YearBuilt'], 10)


seaborn.boxplot(x='year_grouped', y="SalePrice", data=data)

seaborn.distplot(data['YearBuilt'])
Bivariate analysis
The art of data visualization
seaborn.boxplot(x='YearBuilt', y="SalePrice", data=data)

data['year_grouped'] = pandas.cut(data['YearBuilt'], 10)


seaborn.boxplot(x='year_grouped', y="SalePrice", data=data)

data['year_grouped'] = pandas.qcut(data['YearBuilt'], 10)


seaborn.boxplot(x='year_grouped', y="SalePrice", data=data)

seaborn.distplot(data['YearBuilt’])

cols = ['SalePrice', 'OverallQual', 'GrLivArea', 'GarageCars',


'TotalBsmtSF', 'FullBath', 'YearBuilt']
seaborn.pairplot(data[cols], size = 2.5, plot_kws=dict(s=20))
Building your first model
from sklearn.linear_model import LinearRegression

features = ['OverallQual', 'GrLivArea', 'GarageCars',


'TotalBsmtSF', 'FullBath', 'YearBuilt']
target = 'SalePrice’

features_data = data[features]
target_data = data[target]

my_first_model = LinearRegression()
my_first_model.fit(features_data, target_data)
Building your first model
Congratulations ! You just trained your first machine learning model !
Let's use it !

You now have a perfectly valid house price estimation


service, let's try it !¶
from collections import OrderedDict

# Fill with your house data


my_house = pandas.DataFrame([OrderedDict({'OverallQual': 7,
'GrLivArea': 150,
'GarageCars': 2,
'TotalBsmtSF': 200,
'FullBath': 1,
'YearBuilt': 1880})])

# compute your price estimation using your first mode


my_price_estimation = my_first_model.predict(my_house)
print(f'The price estimation of your house is:
{my_price_estimation[0] :.2f} $’)
from collections import OrderedDict

# Fill with your house data


my_house = pandas.DataFrame([OrderedDict({'OverallQual': 7,
'GrLivArea': 150,
'GarageCars': 2,
'TotalBsmtSF': 200,
'FullBath': 1,
'YearBuilt': 1880})])

# compute your price estimation using your first mode


my_price_estimation = my_first_model.predict(my_house)
print(f'The price estimation of your house is:
{my_price_estimation[0] :.2f} $’)

for feature, coef in zip(features, my_first_model.coef_):


print(f'{feature}: {coef: .2f}')

You might also like