You are on page 1of 4

Date: 20 – 11 – 2023 Muhammad Faizan khan (7th Semester)

Day: Monday

Machine Learning (LAB)

Task
Download Data Set from Koggle.com, Libraries like
Panda/Matplotlib/NumPy/Skicit in Jupyter Notebook and import data set and
applying Bagging Model Using Random Forest.

Observation:

Data set name: healthcare_dataset.csv


Training set: Age
Testing set: Medical Condition
Approach: Bagging

Code:
Output:
Code Explanation:

1. df = pd.read_csv(Data/health_dataset.csv')
This line reads your medical dataset from a CSV file into a Pandas DataFrame
named df.
2. subset = df[(df['Gender'] == 'Male') & (df['Age'] > 35)]
Here, a subset of the DataFrame is created, containing only rows where 'Gender' is
'Male' and 'Age' is greater than 35.
3. if len(subset) == 0:
print("No data found for male patients with ages greater than 35.")
else:
X = subset[['Age']] # You can include other features as needed
y = subset['Medical Condition']
This block checks if the subset is empty. If it's not empty, it splits the subset into
features (X) and the target variable (y). X contains the 'Age' column, and y
contains the 'Medical Condition' column.
4. X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
Here, the data is split into training and testing sets using the train_test_split
function. 80% of the data is used for training (X_train and y_train), and 20% for
testing (X_test and y_test).
5. base_classifier = DecisionTreeClassifier()
This line creates an instance of the Decision Tree classifier, which will serve as the
base classifier for the Bagging algorithm.
6. bagging_classifier = BaggingClassifier(base_classifier, n_estimators=10,
random_state=42)
Here, a Bagging Classifier is created using the Decision Tree as the base classifier.
n_estimators specifies the number of base classifiers to train, and random_state
ensures reproducibility.
7. bagging_classifier.fit(X_train, y_train)
The Bagging Classifier is trained on the training data.
8. y_pred = bagging_classifier.predict(X_test)
Predictions are made on the testing data using the trained Bagging Classifier.
9. accuracy = accuracy_score(y_test, y_pred) print(f'Accuracy:
{accuracy}')
The accuracy of the model is calculated and printed. You can use other metrics
depending on your specific requirements.
10.diabetes_count = sum(y_test == 'Diabetes')
print(f"Number of male patients with ages greater than 35 who have
diabetes: {diabetes_count}")
Finally, the number of male patients with ages greater than 35 who have diabetes is
calculated and printed based on the predictions. Adjust the value in the sum
function based on the actual representation of diabetes in your dataset.

You might also like