You are on page 1of 6

Green University of Bangladesh

Department of Computer Science and


Engineering (CSE)
Faculty of Sciences and Engineering
Semester: (Fall, Year:2023), B.Sc. in CSE

LAB REPORT NO - 05

Course Title: Data Mining LAB


Course Code: CSE 424 Section:D3

Lab Experiment Name: Ensemble Learning

Student Details
Name ID

1. Nazmul Hossen 201902140

Lab Date : 23- 12 -2023


Submission Date : 30-12-2023
Course Teacher’s Name : Meherunnesa Tania

[For Teachers use only: Don’t Write Anything inside this box]
Lab Report Status
Marks: ………………………………… Signature:.....................
Comments:.............................................. Date:..............................
1. TITLE OF THE LAB EXPERIMENT
Implement some ensemble method using the same dataset.

2. OBJECTIVES/AIM
• To learn more about group learning.
• To increase machine learning models' capacity for accurate prediction.
• To produce predictions that are more accurate and lower mistakes compared to individual models.
• To outperform single models in terms of robustness and susceptibility to overfitting.

3. PROCEDURE / ANALYSIS / DESIGN

An ensemble is a group of parts that are seen as a whole rather than as individual parts. An
ensemble technique creates and combines many models to solve a problem. Ensemble techniques
improve the robustness and generalization of the model. In this study, we will discuss a few
strategies and how they are implemented in Python.
Bagging, stacking, and boosting are the three primary classes of ensemble learning techniques. It
is crucial to comprehend each technique in-depth and to take it into account while working on a
predictive modeling project.

Here is types of divided some features of ensemble learning .These are –


Basic Ensemble Techniques
• Max Voting
• Averaging
• Weighted Average
Advanced Ensemble Techniques
• Bagging
• Stacking
• Boosting
Algorithms based on Bagging and Boosting
• Random Forest
• AdaBoost

Working procedure of Max Voting::


At first,Train different models independently using the same dataset.Every model has its own
forecast during the prediction process.The class label with the most votes from each of the
various models is chosen to be the final forecast.

Working procedure of Weighted Average:


Train multiple individual models on the same dataset.During prediction, each model makes
its own prediction.Assign weights to the predictions of each model based on their
performance or reliability. Compute a weighted average of the predictions to obtain the final
prediction.
Working procedure of Bagging (Bootstrap Aggregating):
Create several bootstrap samples by choosing portions of the original dataset at random and
replacing them.Utilizing each bootstrap sample, train an independent model.
Every model has its own forecast during the prediction process.
For classification, the final prediction can be found via majority vote, and for regression, the
average of all the models' forecasts.

Working procedure of Stacking:


Use the same dataset to train a variety of different models, often known as base models.
Construct a meta-model that uses the base models' predictions as input. This type of model is
sometimes referred to as an aggregator or blender.
Educate the

4. IMPLEMENTATION
For MAX-VOTING :
# Load the Dataset
data = pd.read_csv('/kaggle/input/diabetes-dataset/diabetes.csv')
X = data.drop('Outcome', axis=1)
y = data['Outcome']
# Split the Dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Implement Base Models
model1 = LogisticRegression()
model2 = DecisionTreeClassifier()
model3 = RandomForestClassifier()
# Implement Max Voting
max_voting_model = VotingClassifier(estimators=[('lr', model1), ('dt', model2), ('rf', model3)],
voting='hard')
max_voting_model.fit(X_train, y_train)
max_voting_pred = max_voting_model.predict(X_test)
#Evaluate Performance
max_voting_accuracy = accuracy_score(y_test, max_voting_pred)
print("Max Voting Accuracy:", max_voting_accuracy)

For Weighted Average :

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train multiple base models with different hyperparameters or features
model1 = RandomForestClassifier(n_estimators=100, random_state=42)
model2 = RandomForestClassifier(n_estimators=50, max_depth=5, random_state=42)
# Train the models
model1.fit(X_train, y_train)
model2.fit(X_train, y_train)
# Make predictions on the test set
pred1 = model1.predict(X_test)
pred2 = model2.predict(X_test)
# Evaluate individual model performances
acc1 = accuracy_score(y_test, pred1)
acc2 = accuracy_score(y_test, pred2)
# Assign weights to the models
weight1 = 0.7
weight2 = 0.3
# Calculate the weighted average predictions
ensemble_pred = (weight1 * pred1 + weight2 * pred2) / (weight1 + weight2)
# Convert predictions to binary (0 or 1)
ensemble_pred = [1 if p >= 0.5 else 0 for p in ensemble_pred]
# Evaluate the ensemble model performance
ensemble_acc = accuracy_score(y_test, ensemble_pred)
# Display the results
print(f'Model 1 Accuracy: {acc1:.4f}')
print(f'Model 2 Accuracy: {acc2:.4f}')

For Stacking :
#: Split the Dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Implement Base Models
model1 = LogisticRegression()
model2 = DecisionTreeClassifier()
model3 = RandomForestClassifier()
# Generate Base Model Predictions
model1.fit(X_train, y_train)
pred1 = model1.predict(X_test)
model2.fit(X_train, y_train)
pred2 = model2.predict(X_test)
model3.fit(X_train, y_train)
pred3 = model3.predict(X_test)
# Implement Stacking
meta_model = LogisticRegression() # Example meta-model, you can choose another suitable
model
stacking_train_pred = pd.DataFrame({'Model 1': pred1, 'Model 2': pred2, 'Model 3': pred3})
stacking_test_pred = pd.DataFrame({'Model 1': model1.predict(X_test), 'Model 2':
model2.predict(X_test), 'Model 3': model3.predict(X_test)})
meta_model.fit(stacking_train_pred, y_test)
stacking_pred = meta_model.predict(stacking_test_pred)
#Evaluate Performance
stacking_accuracy = accuracy_score(y_test, stacking_pred)
print("Stacking Accuracy:", stacking_accuracy)

For Bagging :

#Split the Dataset


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Implement Base Model
base_model = DecisionTreeClassifier()
#: Implement Bagging
bagging_model = BaggingClassifier(base_model, n_estimators=10, random_state=42)
bagging_model.fit(X_train, y_train)
bagging_pred = bagging_model.predict(X_test)
# Evaluate Performance
bagging_accuracy = accuracy_score(y_test, bagging_pred)
print("Bagging Accuracy:", bagging_accuracy)

5. TEST RESULT / OUTPUT

Fig-1: Loading Datasets

Fig-2: Accuracy for Max Voting

Fig-3: Accuracy for Weighted Average


Fig-4: Accuracy for Stacking

Fig-5: Accuracy for Bagging

6. ANALYSIS AND DISCUSSION

This lab report covers, in general, how the nature of the issue, the properties of the dataset, and the behavior
of the base models influence the choice of ensemble learning method. Generally speaking, it's best to try a
few different approaches and assess each one's effectiveness before deciding on the best one. Any machine
learning endeavor aims to identify the one model that has the highest predictive accuracy for the desired
outcome. Rather than creating a single model and hope it's the best/most accurate forecast possible,
ensemble techniques combine several models and average them to get a single final model.

You might also like