You are on page 1of 2

DR.

VITHALRAO VIKHE PATIL COLLEGE OF ENGINEERING, AHMEDNAGAR


Department of Mechanical Engineering
TE Mechanical Subject-302049:Artificial Intelligence & Machine Learning

Practical No.03

Aim: To extract features from given data set and establish training data.

Objectives:

1. To load data set in python IDE.


2. Extract features and target from given dataset
3. Separate training and testingdata.

Package Used:-Python3

ProblemDefinition:-
Diabetes is a disease that occurs when your blood glucose, also called blood sugar, is too
high. We are working on Pima Indians Diabetes Dataset (PIDD). PIDD consists of several medical
parameters and one dependent (outcome) parameter of binary values. This dataset is mainly for
female gender and Description of dataset is as following:
9 columns with 8 independent parameter and one outcome parameter with uniquely identified 768
observations having 268 positive for diabetes (1) and 500 negative for diabetes (0).

Target Variable:
Label is the target variable.

Input data
1. Dataset given in form of .csv file(comma separated values)

Description of variables in the above file


Serial Pregnant Glucose Skin Insulin Bmi Pedigree Age Label
Bp
No.
1 148 72 35 0 33.6 0.627 50 1
6
2 1 85 66 29 0 26.6 0.351 31 0

3 8 183 64 0 0 23.3 0.672 32 1

4 1 89 66 23 94 28.1 0.167 21 0

5 0 137 40 35 168 43.1 2.288 33 1

6 5 116 74 0 0 25.6 0.201 30 0

7 3 78 50 32 88 31 0.248 26 1

8 10 115 0 0 0 35.3 0.134 29 0

9 2 197 70 45 543 30.5 0.158 53 1

Artificial Intelligence & Machine Learning Mechanical Engineering Department


DR. VITHALRAO VIKHE PATIL COLLEGE OF ENGINEERING, AHMEDNAGAR
Department of Mechanical Engineering
TE Mechanical Subject-302049:Artificial Intelligence & Machine Learning

10 8 125 96 0 0 0 0.232 54 1

11 4 110 92 0 0 37.6 0.191 30 0

12 10 168 74 0 0 38 0.537 34 1

13 10 139 80 0 0 27.1 1.441 57 0

14 1 189 60 23 846 30.1 0.398 59 1

15 5 166 72 19 175 25.8 0.587 51 1

Program:-

# Load libraries
import pandas as pd
from sklearn.model_selectionimport train_test_split
#Import train_test_split function

# load dataset

data=pd.read_csv(r"E:\pythonProject_experiment\pima-indians-diabetes.csv")
print(data)
#split dataset in features and target variable
X=data.iloc[:,1:9]
Y=data.iloc[:,-1]

# Split dataset into training set and test set

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3,


random_state=1) # 70% training and 30% test

print("The training data for X is \n\n",X_train)

print("The training data for Y is \n\n",y_train)

Conclusion:-

 In the above case the Diabetes dataset is divided into training and testing data.

 For the separation of dataset the “sklearn” library is used and from that “train_test_split” method
is called.

Artificial Intelligence & Machine Learning Mechanical Engineering Department

You might also like