You are on page 1of 4

Fundamentals of Machine Learning

Hakim Hafidi & Youness Moukafih

Lab 1: Introduction to Python Libraries -


Pandas, NumPy, and Matplotlib
Objective:
This lab aims to introduce you to three fundamental Python libraries, Pandas, NumPy,
and Matplotlib, used in data analysis. By the end of this lab, you should be able to load a
dataset, perform basic operations, and create visualizations to understand the
relationships between different variables in the dataset.

Prerequisites:
• Basic knowledge of Python programming language.
• Anaconda installed on your computer.

Step 1: Installing Anaconda


If you haven't installed Anaconda yet, please follow the instructions below:
1. Download Anaconda from Anaconda Individual Edition.
2. Follow the installation instructions for your operating system: Anaconda
Installation Guide.

Step 2: Setting Up Jupyter Notebook


1. Open Anaconda Navigator.
2. Launch Jupyter Notebook.
3. Create a new Python notebook.

1
Fundamentals of Machine Learning

Introduction to Pandas, NumPy, and Matplotlib


Pandas
Pandas is a powerful library for data analysis and manipulation.
# Importing Pandas Library
import pandas as pd

NumPy
NumPy supports large, multi-dimensional arrays and matrices and mathematical
functions to operate on these arrays.
# Importing NumPy Library
import numpy as np

Matplotlib
Matplotlib is a plotting library for creating static, animated, and interactive visualizations
in Python.
# Importing Matplotlib Library
import matplotlib.pyplot as plt

Lab Tasks:
Task 1: Load a Dataset
Load the 'Iris' dataset from the UCI Machine Learning Repository.
# Loading Iris Dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.dat
a"
column_names = ["sepal_length", "sepal_width", "petal_length", "petal_width",
"class"]
iris = pd.read_csv(url, names=column_names)

2
Fundamentals of Machine Learning

Task 2: View the Dataset


View the first 5 rows of the dataset to understand the data.
# Viewing first 5 rows of Iris Dataset
iris.head()

Task 3: Basic Operations


Calculate the average, median, and standard deviation of the 'sepal_length' column.
# Calculating the average of 'sepal_length'
average_sepal_length = iris['sepal_length'].mean()
print(f"Average Sepal Length: {average_sepal_length}")

# Calculating the median of 'sepal_length'


median_sepal_length = iris['sepal_length'].median()
print(f"Median Sepal Length: {median_sepal_length}")

# Calculating the standard deviation of 'sepal_length'


std_dev_sepal_length = iris['sepal_length'].std()
print(f"Standard Deviation of Sepal Length: {std_dev_sepal_length}")

Task 4: Data Visualization


Create scatter plots to visualize the relationships between 'sepal_length' and
'sepal_width', and between 'petal_length' and 'petal_width'.
# Creating Scatter Plot for 'sepal_length' and 'sepal_width'
plt.scatter(iris['sepal_length'], iris['sepal_width'])
plt.title('Sepal Length vs Sepal Width')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')
plt.show()

# Creating Scatter Plot for 'petal_length' and 'petal_width'


plt.scatter(iris['petal_length'], iris['petal_width'])
plt.title('Petal Length vs Petal Width')
plt.xlabel('Petal Length (cm)')
plt.ylabel('Petal Width (cm)')
plt.show()

3
Fundamentals of Machine Learning

Enhanced Visualization and Analysis Tasks:


Task 5: Correlation Matrix
Create a correlation matrix to understand the linear relationship between the different
variables in the dataset.
# Creating Correlation Matrix
correlation_matrix = iris.corr()
print(correlation_matrix)

Task 6: Scatter Plot Matrix


Create a scatter plot matrix to visualize the relationships between all pairs of variables.
# Creating Scatter Plot Matrix
pd.plotting.scatter_matrix(iris, alpha=0.8, figsize=(10, 10), diagonal='hist')
plt.show()

Exercises:
1. Exercise 1: Analyze the correlation matrix and scatter plot matrix. Answer the
following questions: a. Is there a relationship between 'sepal_length' and
'sepal_width'? b. Is the relationship between 'petal_length' and 'petal_width'
positive or negative? c. Which pair of variables has the strongest relationship?
2. Exercise 2: Create a scatter plot for 'petal_length' and 'petal_width'. Based on
the plot, hypothesize whether there is any association between the two variables
and whether the association is positive or negative.
3. Exercise 3: Load another dataset of your choice and perform similar operations
and visualizations to understand the relationships between the variables. Answer
questions about the relationships between the variables based on the
visualizations.

Submission:
Submit the Jupyter notebook containing all the executed cells along with the outputs and
your answers to the exercise questions.

You might also like