You are on page 1of 2

Developing Curricula for Artificial Intelligence and Robotics (DeCAIR)

618535-EPP-1-2020-1-JO-EPPKA2-CBHE-JP

Course Title Applied Machine Learning

Experiment Number 5

Experiment Name Data Preparation and Regression

Objectives The students learn basic skills in machine learning including data preparation and regression
using Python and Scikit-Learn.

Introduction This is an introductory experiment in machine learning. The student solves two exercises to
practice some basic skills in data preparation and solving a simple regression problem.

Materials Computer with Python integrated development environment (IDE) software installed
(PyCharm is recommended).
Dataset files: diabetes.features.csv and diabetes.labels.csv

Procedure Exercise 1: Basic Data Preparation


Complete the following Python code to split the DataFrame data into 70% train set and 30%
test set and separate the two sets to disjoint DataFrames: the features x1, x2, and x3
DataFrame and the response y DataFrame. Your program should print the shape of the train
set features.

import pandas as pd
from numpy.random import randn
from sklearn.model_selection import train_test_split

data = pd.DataFrame(randn(100, 4),


columns=['x1', 'x2', 'x3', 'y'])

Exercise 2: Regression
The following Python code loads the features and labels of the Diabetes dataset. Complete
this code to evaluate the RMSE of the linear regressor. Normalize the entire feature set using
the standard scalar, then train and evaluate the linear model using the scaled features.

import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

X = pd.read_csv('diabetes.features.csv')
y = pd.read_csv('diabetes.labels.csv').to_numpy().flatten()

Try to improve the results by replacing the linear regressor with the following SVM regressor
that uses polynomial kernel of degree 5 and C parameter = 100.
from sklearn.svm import SVR
reg = SVR(kernel="poly", degree=5, C=100)

The European Commission's support for the production of this publication does not constitute an endorsement of the contents, which reflect
the views only of the authors, and the Commission cannot be held responsible for any use which may be made of the information contained
therein.

2
Developing Curricula for Artificial Intelligence and Robotics (DeCAIR)
618535-EPP-1-2020-1-JO-EPPKA2-CBHE-JP

Data Collection Capture the output of your code for the above two exercises.

Data Analysis None

Required Reporting Submit your code and the captured output.

Safety Considerations Standard safety precautions related to using computer.


References 1. Applied Machine Learning presentation titled “End-to-End Machine Learning Project.”
2. Aurélien Géron, Hands-On Machine Learning with Scikit-Learn, Keras and TensorFlow,
O’Reilly, 3rd Edition, 2022.

The European Commission's support for the production of this publication does not constitute an endorsement of the contents, which reflect
the views only of the authors, and the Commission cannot be held responsible for any use which may be made of the information contained
therein.

You might also like