FML Lab 1

Fundamentals of Machine Learning
Hakim Hafidi & Youness Moukafih
Lab 1: Introduction to Python Libraries -

Pandas, NumPy, and Matplotlib
Objective:
This lab aims to introduce you to three fundamental Python libraries, Pandas, NumPy,
and Matplotlib, used in data analysis. By the end of this lab, you should be able to load a
dataset, perform basic operations, and create visualizations to understand the
relationships between different variables in the dataset.
Prerequisites:
• Basic knowledge of Python programming language.
• Anaconda installed on your computer.
Step 1: Installing Anaconda

If you haven't installed Anaconda yet, please follow the instructions below:
1. Download Anaconda from Anaconda Individual Edition.
2. Follow the installation instructions for your operating system: Anaconda
Installation Guide.
Step 2: Setting Up Jupyter Notebook

1. Open Anaconda Navigator.
2. Launch Jupyter Notebook.
3. Create a new Python notebook.
1
Introduction to Pandas, NumPy, and Matplotlib

Pandas
Pandas is a powerful library for data analysis and manipulation.
# Importing Pandas Library
import pandas as pd
NumPy
NumPy supports large, multi-dimensional arrays and matrices and mathematical
functions to operate on these arrays.
# Importing NumPy Library
import numpy as np
Matplotlib
Matplotlib is a plotting library for creating static, animated, and interactive visualizations
in Python.
# Importing Matplotlib Library
import matplotlib.pyplot as plt
Lab Tasks:
Task 1: Load a Dataset
Load the 'Iris' dataset from the UCI Machine Learning Repository.
# Loading Iris Dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.dat
a"
column_names = ["sepal_length", "sepal_width", "petal_length", "petal_width",
"class"]
iris = pd.read_csv(url, names=column_names)
2
Task 2: View the Dataset

View the first 5 rows of the dataset to understand the data.
# Viewing first 5 rows of Iris Dataset
iris.head()
Task 3: Basic Operations

Calculate the average, median, and standard deviation of the 'sepal_length' column.
# Calculating the average of 'sepal_length'
average_sepal_length = iris['sepal_length'].mean()
print(f"Average Sepal Length: {average_sepal_length}")
# Calculating the median of 'sepal_length'

median_sepal_length = iris['sepal_length'].median()
print(f"Median Sepal Length: {median_sepal_length}")
# Calculating the standard deviation of 'sepal_length'

std_dev_sepal_length = iris['sepal_length'].std()
print(f"Standard Deviation of Sepal Length: {std_dev_sepal_length}")
Task 4: Data Visualization

Create scatter plots to visualize the relationships between 'sepal_length' and
'sepal_width', and between 'petal_length' and 'petal_width'.
# Creating Scatter Plot for 'sepal_length' and 'sepal_width'
plt.scatter(iris['sepal_length'], iris['sepal_width'])
plt.title('Sepal Length vs Sepal Width')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')
plt.show()
# Creating Scatter Plot for 'petal_length' and 'petal_width'

plt.scatter(iris['petal_length'], iris['petal_width'])
plt.title('Petal Length vs Petal Width')
plt.xlabel('Petal Length (cm)')
plt.ylabel('Petal Width (cm)')
plt.show()
3
Enhanced Visualization and Analysis Tasks:

Task 5: Correlation Matrix
Create a correlation matrix to understand the linear relationship between the different
variables in the dataset.
# Creating Correlation Matrix
correlation_matrix = iris.corr()
print(correlation_matrix)
Task 6: Scatter Plot Matrix

Create a scatter plot matrix to visualize the relationships between all pairs of variables.
# Creating Scatter Plot Matrix
pd.plotting.scatter_matrix(iris, alpha=0.8, figsize=(10, 10), diagonal='hist')
plt.show()
Exercises:
1. Exercise 1: Analyze the correlation matrix and scatter plot matrix. Answer the
following questions: a. Is there a relationship between 'sepal_length' and
'sepal_width'? b. Is the relationship between 'petal_length' and 'petal_width'
positive or negative? c. Which pair of variables has the strongest relationship?
2. Exercise 2: Create a scatter plot for 'petal_length' and 'petal_width'. Based on
the plot, hypothesize whether there is any association between the two variables
and whether the association is positive or negative.
3. Exercise 3: Load another dataset of your choice and perform similar operations
and visualizations to understand the relationships between the variables. Answer
questions about the relationships between the variables based on the
visualizations.
Submission:
Submit the Jupyter notebook containing all the executed cells along with the outputs and
your answers to the exercise questions.

FML Lab 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

FML Lab 1

Uploaded by

Copyright:

Available Formats

Fundamentals of Machine Learning

Hakim Hafidi & Youness Moukafih

Lab 1: Introduction to Python Libraries -

Step 1: Installing Anaconda

Step 2: Setting Up Jupyter Notebook

Introduction to Pandas, NumPy, and Matplotlib

Task 2: View the Dataset

Task 3: Basic Operations

# Calculating the median of 'sepal_length'

# Calculating the standard deviation of 'sepal_length'

Task 4: Data Visualization

# Creating Scatter Plot for 'petal_length' and 'petal_width'

Enhanced Visualization and Analysis Tasks:

Task 6: Scatter Plot Matrix

You might also like