You are on page 1of 11

18/11/2023 09:50 ensemble d'apprentissage - Jupyter Notebook

Entrée [1]: import pandas as pd


import numpy as np
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.preprocessing import LabelEncoder
import matplotlib.pyplot as plt
from IPython.display import display, Markdown

Entrée [2]: df = pd.read_csv("C:/Users/PC/Downloads/winequality-white.csv", sep=";")


df.head()

Out[2]:
fixed volatile citric residual free sulfur total sulfur
chlorides density pH sulphates alcohol quality
acidity acidity acid sugar dioxide dioxide

0 7.0 0.27 0.36 20.7 0.045 45.0 170.0 1.0010 3.00 0.45 8.8 6

1 6.3 0.30 0.34 1.6 0.049 14.0 132.0 0.9940 3.30 0.49 9.5 6

2 8.1 0.28 0.40 6.9 0.050 30.0 97.0 0.9951 3.26 0.44 10.1 6

3 7.2 0.23 0.32 8.5 0.058 47.0 186.0 0.9956 3.19 0.40 9.9 6

4 7.2 0.23 0.32 8.5 0.058 47.0 186.0 0.9956 3.19 0.40 9.9 6

Entrée [3]: df.shape

Out[3]: (4898, 12)

localhost:8888/notebooks/ensemble d'apprentissage.ipynb 1/11


18/11/2023 09:50 ensemble d'apprentissage - Jupyter Notebook

Entrée [4]: df.tail()

Out[4]:
fixed volatile citric residual free sulfur total sulfur
chlorides density pH sulphates alcohol quality
acidity acidity acid sugar dioxide dioxide

4893 6.2 0.21 0.29 1.6 0.039 24.0 92.0 0.99114 3.27 0.50 11.2 6

4894 6.6 0.32 0.36 8.0 0.047 57.0 168.0 0.99490 3.15 0.46 9.6 5

4895 6.5 0.24 0.19 1.2 0.041 30.0 111.0 0.99254 2.99 0.46 9.4 6

4896 5.5 0.29 0.30 1.1 0.022 20.0 110.0 0.98869 3.34 0.38 12.8 7

4897 6.0 0.21 0.38 0.8 0.020 22.0 98.0 0.98941 3.26 0.32 11.8 6

Entrée [5]: df.describe()

Out[5]:
volatile residual free sulfur total sulfur
fixed acidity citric acid chlorides density pH sulphat
acidity sugar dioxide dioxide

count 4898.000000 4898.000000 4898.000000 4898.000000 4898.000000 4898.000000 4898.000000 4898.000000 4898.000000 4898.0000

mean 6.854788 0.278241 0.334192 6.391415 0.045772 35.308085 138.360657 0.994027 3.188267 0.4898

std 0.843868 0.100795 0.121020 5.072058 0.021848 17.007137 42.498065 0.002991 0.151001 0.1141

min 3.800000 0.080000 0.000000 0.600000 0.009000 2.000000 9.000000 0.987110 2.720000 0.2200

25% 6.300000 0.210000 0.270000 1.700000 0.036000 23.000000 108.000000 0.991723 3.090000 0.4100

50% 6.800000 0.260000 0.320000 5.200000 0.043000 34.000000 134.000000 0.993740 3.180000 0.4700

75% 7.300000 0.320000 0.390000 9.900000 0.050000 46.000000 167.000000 0.996100 3.280000 0.5500

max 14.200000 1.100000 1.660000 65.800000 0.346000 289.000000 440.000000 1.038980 3.820000 1.0800

localhost:8888/notebooks/ensemble d'apprentissage.ipynb 2/11


18/11/2023 09:50 ensemble d'apprentissage - Jupyter Notebook

Entrée [6]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4898 entries, 0 to 4897
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 fixed acidity 4898 non-null float64
1 volatile acidity 4898 non-null float64
2 citric acid 4898 non-null float64
3 residual sugar 4898 non-null float64
4 chlorides 4898 non-null float64
5 free sulfur dioxide 4898 non-null float64
6 total sulfur dioxide 4898 non-null float64
7 density 4898 non-null float64
8 pH 4898 non-null float64
9 sulphates 4898 non-null float64
10 alcohol 4898 non-null float64
11 quality 4898 non-null int64
dtypes: float64(11), int64(1)
memory usage: 459.3 KB

Entrée [7]: df_features = df.drop(columns='quality')


df_label = df['quality']

Entrée [8]: from sklearn.model_selection import train_test_split



X_train, X_test, y_train, y_test = train_test_split(df_features, df_label, test_size=0.20)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

title = "bagging"
display(Markdown(f"# {title}"))

(3918, 11) (3918,) (980, 11) (980,)

bagging

localhost:8888/notebooks/ensemble d'apprentissage.ipynb 3/11


18/11/2023 09:50 ensemble d'apprentissage - Jupyter Notebook

Entrée [9]: from sklearn.ensemble import BaggingClassifier



baggingClassifier = BaggingClassifier(n_estimators=100)
baggingClassifier.fit(X_train, y_train)

Out[9]: ▾ BaggingClassifier
BaggingClassifier(n_estimators=100)

Entrée [10]: baggingClassifier.score(X_test, y_test)

Out[10]: 0.6622448979591836

localhost:8888/notebooks/ensemble d'apprentissage.ipynb 4/11


18/11/2023 09:50 ensemble d'apprentissage - Jupyter Notebook

Entrée [11]: from sklearn.ensemble import VotingClassifier


from sklearn import tree
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import SGDClassifier
from sklearn.linear_model import LogisticRegression


decisionTree = tree.DecisionTreeClassifier()
logisticRegression = LogisticRegression()
SGD = SGDClassifier()
GNB = GaussianNB()

votingClassifier = VotingClassifier(estimators=[
('tree', decisionTree), ('lr',logisticRegression), ('sgd', SGD), ('gnb', GNB)],
weights=[2,1,1,0.5])
votingClassifier.fit(X_train, y_train)

C:\Users\PC\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py:460: ConvergenceWarning: lbfgs fail


ed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html (https://scikit-learn.org/stable/modules/prepr
ocessing.html)
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression (https://scikit-learn.org/s
table/modules/linear_model.html#logistic-regression)
n_iter_i = _check_optimize_result(

Out[11]: ▸ VotingClassifier
tree lr sgd gnb

▸ DecisionTreeClassifier ▸ LogisticRegression ▸ SGDClassifier ▸ GaussianNB

Entrée [12]: votingClassifier.score(X_test, y_test)

Out[12]: 0.5469387755102041

localhost:8888/notebooks/ensemble d'apprentissage.ipynb 5/11


18/11/2023 09:50 ensemble d'apprentissage - Jupyter Notebook

Entrée [13]: from sklearn.ensemble import RandomForestClassifier



randomForest = RandomForestClassifier(n_estimators=100)
randomForest.fit(X_train, y_train)

Out[13]: ▾ RandomForestClassifier
RandomForestClassifier()

Entrée [14]: randomForest.score(X_test, y_test)



title = "Boosting"
display(Markdown(f"# {title}"))

Boosting
Entrée [15]: from sklearn.ensemble import AdaBoostClassifier

adaBoost = AdaBoostClassifier(n_estimators=100)
adaBoost.fit(X_train, y_train)

Out[15]: ▾ AdaBoostClassifier
AdaBoostClassifier(n_estimators=100)

Entrée [16]: adaBoost.score(X_test, y_test)

Out[16]: 0.4142857142857143

Entrée [17]: from sklearn.ensemble import GradientBoostingClassifier



gradientBoosting = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0,max_depth=1, random_state=0)
gradientBoosting.fit(X_train, y_train)

Out[17]: ▾ GradientBoostingClassifier
GradientBoostingClassifier(learning_rate=1.0, max_depth=1, random_state=0)

localhost:8888/notebooks/ensemble d'apprentissage.ipynb 6/11


18/11/2023 09:50 ensemble d'apprentissage - Jupyter Notebook

Entrée [18]: gradientBoosting.score(X_test, y_test)

Out[18]: 0.5295918367346939

Entrée [19]: from sklearn.ensemble import HistGradientBoostingClassifier



histGradientBoostingClassifier = HistGradientBoostingClassifier(max_iter=100)
histGradientBoostingClassifier.fit(X_train, y_train)

Out[19]: ▾ HistGradientBoostingClassifier
HistGradientBoostingClassifier()

Entrée [20]: histGradientBoostingClassifier.score(X_test, y_test)

Out[20]: 0.6602040816326531

Entrée [21]: title = "Stacking"


display(Markdown(f"# {title}"))

Stacking

localhost:8888/notebooks/ensemble d'apprentissage.ipynb 7/11


18/11/2023 09:50 ensemble d'apprentissage - Jupyter Notebook

Entrée [22]: from sklearn.ensemble import StackingClassifier


from sklearn import tree
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import SGDClassifier
from sklearn.linear_model import LogisticRegression


decisionTree = tree.DecisionTreeClassifier()
logisticRegression = LogisticRegression()
SGD = SGDClassifier()
GNB = GaussianNB()
finalClassifier = LogisticRegression()

stackingClassifier = StackingClassifier(estimators=[
('tree', decisionTree), ('lr',logisticRegression), ('sgd', SGD), ('gnb', GNB)],
final_estimator=finalClassifier)
stackingClassifier.fit(X_train, y_train)

localhost:8888/notebooks/ensemble d'apprentissage.ipynb 8/11


18/11/2023 09:50 ensemble d'apprentissage - Jupyter Notebook

C:\Users\PC\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py:460: ConvergenceWarning: lbfgs fail


ed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html (https://scikit-learn.org/stable/modules/prepr
ocessing.html)
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression (https://scikit-learn.org/s
table/modules/linear_model.html#logistic-regression)
n_iter_i = _check_optimize_result(
C:\Users\PC\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py:460: ConvergenceWarning: lbfgs fail
ed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html (https://scikit-learn.org/stable/modules/prepr
ocessing.html)
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression (https://scikit-learn.org/s
table/modules/linear_model.html#logistic-regression)
n_iter_i = _check_optimize_result(
C:\Users\PC\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py:460: ConvergenceWarning: lbfgs fail
ed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html (https://scikit-learn.org/stable/modules/prepr
ocessing.html)
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression (https://scikit-learn.org/s
table/modules/linear_model.html#logistic-regression)
n_iter_i = _check_optimize_result(
C:\Users\PC\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py:460: ConvergenceWarning: lbfgs fail
ed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html (https://scikit-learn.org/stable/modules/prepr
ocessing.html)
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression (https://scikit-learn.org/s
table/modules/linear_model.html#logistic-regression)
localhost:8888/notebooks/ensemble d'apprentissage.ipynb 9/11
18/11/2023 09:50 ensemble d'apprentissage - Jupyter Notebook
n_iter_i = _check_optimize_result(
C:\Users\PC\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py:460: ConvergenceWarning: lbfgs fail
ed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html (https://scikit-learn.org/stable/modules/prepr
ocessing.html)
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression (https://scikit-learn.org/s
table/modules/linear_model.html#logistic-regression)
n_iter_i = _check_optimize_result(
C:\Users\PC\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py:460: ConvergenceWarning: lbfgs fail
ed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html (https://scikit-learn.org/stable/modules/prepr
ocessing.html)
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression (https://scikit-learn.org/s
table/modules/linear_model.html#logistic-regression)
n_iter_i = _check_optimize_result(
C:\Users\PC\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py:460: ConvergenceWarning: lbfgs fail
ed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html (https://scikit-learn.org/stable/modules/prepr
ocessing.html)
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression (https://scikit-learn.org/s
table/modules/linear_model.html#logistic-regression)
n_iter_i = _check_optimize_result(

localhost:8888/notebooks/ensemble d'apprentissage.ipynb 10/11


18/11/2023 09:50 ensemble d'apprentissage - Jupyter Notebook

Out[22]: ▸ StackingClassifier
tree lr sgd gnb

▸ DecisionTreeClassifier ▸ LogisticRegression ▸ SGDClassifier ▸ GaussianNB

final_estimator

▸ LogisticRegression

Entrée [23]: stackingClassifier.score(X_test, y_test)

Out[23]: 0.3020408163265306

Entrée [ ]: ​

localhost:8888/notebooks/ensemble d'apprentissage.ipynb 11/11

You might also like