Professional Documents
Culture Documents
LAB REPORT NO - 05
Student Details
Name ID
[For Teachers use only: Don’t Write Anything inside this box]
Lab Report Status
Marks: ………………………………… Signature:.....................
Comments:.............................................. Date:..............................
1. TITLE OF THE LAB EXPERIMENT
Implement some ensemble method using the same dataset.
2. OBJECTIVES/AIM
• To learn more about group learning.
• To increase machine learning models' capacity for accurate prediction.
• To produce predictions that are more accurate and lower mistakes compared to individual models.
• To outperform single models in terms of robustness and susceptibility to overfitting.
An ensemble is a group of parts that are seen as a whole rather than as individual parts. An
ensemble technique creates and combines many models to solve a problem. Ensemble techniques
improve the robustness and generalization of the model. In this study, we will discuss a few
strategies and how they are implemented in Python.
Bagging, stacking, and boosting are the three primary classes of ensemble learning techniques. It
is crucial to comprehend each technique in-depth and to take it into account while working on a
predictive modeling project.
4. IMPLEMENTATION
For MAX-VOTING :
# Load the Dataset
data = pd.read_csv('/kaggle/input/diabetes-dataset/diabetes.csv')
X = data.drop('Outcome', axis=1)
y = data['Outcome']
# Split the Dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Implement Base Models
model1 = LogisticRegression()
model2 = DecisionTreeClassifier()
model3 = RandomForestClassifier()
# Implement Max Voting
max_voting_model = VotingClassifier(estimators=[('lr', model1), ('dt', model2), ('rf', model3)],
voting='hard')
max_voting_model.fit(X_train, y_train)
max_voting_pred = max_voting_model.predict(X_test)
#Evaluate Performance
max_voting_accuracy = accuracy_score(y_test, max_voting_pred)
print("Max Voting Accuracy:", max_voting_accuracy)
For Stacking :
#: Split the Dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Implement Base Models
model1 = LogisticRegression()
model2 = DecisionTreeClassifier()
model3 = RandomForestClassifier()
# Generate Base Model Predictions
model1.fit(X_train, y_train)
pred1 = model1.predict(X_test)
model2.fit(X_train, y_train)
pred2 = model2.predict(X_test)
model3.fit(X_train, y_train)
pred3 = model3.predict(X_test)
# Implement Stacking
meta_model = LogisticRegression() # Example meta-model, you can choose another suitable
model
stacking_train_pred = pd.DataFrame({'Model 1': pred1, 'Model 2': pred2, 'Model 3': pred3})
stacking_test_pred = pd.DataFrame({'Model 1': model1.predict(X_test), 'Model 2':
model2.predict(X_test), 'Model 3': model3.predict(X_test)})
meta_model.fit(stacking_train_pred, y_test)
stacking_pred = meta_model.predict(stacking_test_pred)
#Evaluate Performance
stacking_accuracy = accuracy_score(y_test, stacking_pred)
print("Stacking Accuracy:", stacking_accuracy)
For Bagging :
This lab report covers, in general, how the nature of the issue, the properties of the dataset, and the behavior
of the base models influence the choice of ensemble learning method. Generally speaking, it's best to try a
few different approaches and assess each one's effectiveness before deciding on the best one. Any machine
learning endeavor aims to identify the one model that has the highest predictive accuracy for the desired
outcome. Rather than creating a single model and hope it's the best/most accurate forecast possible,
ensemble techniques combine several models and average them to get a single final model.