You are on page 1of 16

Smith School of Business MMA, MMAI, and GMMA

MMAI/MMA/GMMA 869: Machine


Learning & AI
2022 Syllabus

Version 5: November 12, 2021

COURSE DESCRIPTION
This course will cover artificial intelligence (AI) and in particular machine learning (ML), with a focus on
applications in business decisions and processes. The course will look in-depth at the two major types of
ML: supervised learning (including classification and deep learning) and unsupervised learning (including
clustering, association rule learning, and recommender systems). The course will survey key
technologies and applications that are driving the ML revolution. The course will provide students with a
robust theoretical understanding of popular models and techniques, but the course will be application-
focused. The overall goal of the course is to provide a foundation and framework for understanding
how, when, and why to use AI and ML models in data-driven decision-making.

The course will also cover practical topics in the machine learning process, including feature
engineering, feature selection, feature normalization, model evaluation, parameter tuning, model
selection, and ensemble building.

On completion of this course, students will be able to frame various classes of business problems as AI
and ML problems. Students will understand which AI and ML model to use for a given problem, how to
use the model, how to evaluate the model, and how to deploy the model.

The course will include a team project that will provide an opportunity to apply various AI and ML
models to a real-world business dataset.

INSTRUCTOR
Stephen W. Thomas, PhD

 Office: (613) 533-3390


 stephen.thomas@queensu.ca
 Smith Faculty Page
 DBLP
 Google Scholar
 LinkedIn
 GitHub

TEACHING ASSISTANT
Cecilia Ying, MMA, MSc, PhD (in progress). y.ying@queensu.ca

Page 1 of 16
MMAI/MMA/GMMA 869: Machine Learning and AI

REFERENCES

Textbooks
 Geron, Aurelien. “Hands-on Machine Learning with Scikit-Learn & Tensorflow.” 2017. O’Reilly
Media. ISBN: 9781491962299.
o GitHub resources: github.com/ageron/handson-ml
 Kuhn, Max and Johson, Kjell. "Applied Predictive Modeling." 2nd Edition, 2018. Springer. ISBN-
13: 978-1461468486.
o Resources: appliedpredictivemodeling.com

We will also draw from the following free resources, all available on the course portal:

 James, Gareth and Witten, Daniela and Hastie, Trevor and Tibshirani, Robert. "An Introduction
to Statistical Learning with Applications in R." 7th Printing. Springer. www.statlearning.com
 Burkov, Andriy. “Machine Learning Engineering.” September 2020.
www.mlebook.com/wiki/doku.php
 VanderPlas, Jake. “Python data science handbook: Essential tools for working with data.”
O'Reilly Media, Inc.", 2016. jakevdp.github.io/PythonDataScienceHandbook/index.html
 Molnar, Christoph. “Interpretable Machine Learning.” April 26, 2021.
christophm.github.io/interpretable-ml-book

Course Portal
The course portal contains supplementary readings, lecture notes, and interactive material for each
course session. I highly recommend you visit and review the material before each course session.

EVALUATION
Item Value GMMA 2022 Due* MMA 2022S Due* MMAI 2022 Due*
Team Project 50% Nov 12, 2021 Nov 30, 2021 Dec 12, 2021
Assignment 50% Nov 15, 2021 Dec 12, 2021 Dec 26, 2021
*by 11:59 pm Eastern Time on the date indicated.

TEAM PROJECT
Please see the separate file: Project Brief – DrivenData.docx.

THE ML CUP
There can only be one winner. Will it be you?

The ML Cup is an individual competition held throughout the course. There will be multiple events in
which students can earn points. The student with the highest number of total points at the end of the
course will win the cup. The winner will receive a prize and, more importantly, life-long bragging rights.

Page 2 of 16
MMAI/MMA/GMMA 869: Machine Learning and AI

I will not reveal any details about the events until 24 hours before they occur. What I will say is this:
come to each event having prepared for that week’s course material. Be nimble, be quick, and jump
over that candlestick!

Page 3 of 16
MMAI/MMA/GMMA 869: Machine Learning and AI

COURSE SESSIONS

Session 1: Intro to AI and ML


This session will start you down the road of enlightenment by outlining what AI and ML are and will
motivate you with some success stories from the front lines.

Learning Objective Course Materials Textbook Readings Diving Deeper


Overview of AI  Lecture slides  LL: Big data in the Age of
 Lecture notes AI, by Barton Poulson
 Pre-recorded lecture  LL: Artificial Intelligence
video Foundations: Thinking
Machines, by Doug Rose

Overview of ML  Lecture slides  Geron: Ch 1, 2  LL: Artificial Intelligence


 Lecture notes  Kuhn: Ch 1 Foundations: Machine
 Pre-recorded lecture  James: Ch 2.1 Learning, by Doug Rose
video  VanderPlas: Ch 5.1  Data School: What is
 Burkov: Ch 1, 2 Machine Learning?

Page 4 of 16
MMAI/MMA/GMMA 869: Machine Learning and AI

Session 2: Classification
In this session, we will learn all about classification, a type of supervised machine learning. We will learn
about the different algorithms for building classification models: how they work under the hood, what
their strengths and weakness are, and when to use which.

Learning Objective Course Materials Textbook Readings Diving Deeper

Classification  Lecture slides  Geron: Ch 3  LL: Machine Learning and AI


overview: definition,  Lecture notes  Kuhn: Ch 11 Foundations: Classification
process, applications  Pre-recorded lecture  James: Ch 4.1 – 4.3 Modeling, by Keith McCormick
video  VanderPlas: Ch 5.1  LL: Applied Machine Learning:
 Exercise: You Try! Algorithms, by Derek Jedamski
 Exercise: Which  StatQuest: Logistic Regression
algorithm?  scikit-learn: Supervised Learning

Decision Trees:  Lecture slides  Geron: Ch 6  StatQuest: Decision Trees


learning rules from  Lecture notes  Kuhn: Ch 14  LL: Machine Learning and AI
data  Pre-recorded lecture  James: Ch 8.1 Foundations: Decision Trees, by
video  VanderPlas: Ch 5.8 Keith McCormick
 GitHub tutorial

Naïve Bayes: using  Lecture slides  Kuhn: Ch 13.6  StatQuest: Nave Bayes
Bayes’ Theorem  Lecture notes  James: Ch 4.4  3Blue1Brown: Bayes Theorem
 Pre-recorded lecture  VanderPlas: Ch 5.5
video
 GitHub tutorial

K-Nearest Neighbors:  Lecture slides  Kuhn: Ch 13.5  StatQuest: KNN


learning by example  Lecture notes
 Pre-recorded lecture
video
 GitHub tutorial

Support Vector  Lecture slides  Geron: Ch 5  StatQuest: SVM Part 1


Machines: dividing  Lecture notes  James: Ch 9  StatQuest: SVM Part 2 (Polynomial)
the data with a  Pre-recorded lecture  VanderPlas: Ch 5.7  StatQuest: SVM Part 3 (RBF)
hyperplane video
 GitHub tutorial

Neural Networks: a  Lecture slides  Geron: Ch 10, 11  StatQuest: Neural Networks Part 1
powerful black box  Lecture notes  Kuhn: Ch 13.2  StatQuest: NN Part 2
that simulates the  Pre-recorded lecture (Backpropagation)
human brain video  StatQuest: NN Part 3 (ReLU)
 GitHub tutorial  StatQuest: NN Part 4 (Inputs and
Outputs)
 StatQuest: NN Part 5 (ArgMax and
SoftMax)
 StatQuest: NN Part 6 (X Entropy)
 LL: Artificial Intelligence

Page 5 of 16
MMAI/MMA/GMMA 869: Machine Learning and AI

Foundations: Neural Networks


 deeplizard: NNs explained

Session 3: Evaluating and Enhancing Supervised ML Algorithms

Learning Objective Course Materials Textbook Readings Diving Deeper

Evaluation metrics:  Lecture slides  Geron: Ch 3  StatQuest: Confusion


confusion matrix,  Lecture notes  Kuhn: Ch 11 Matrix
accuracy, sensitivity,  Pre-recorded lecture  Data School: Confusion
specificity, P/R, F1, AUC video Matrix
 GitHub tutorial  StatQuest: Sensitivity and
Specificity
 StatQuest: ROC and AUC
 StatQuest: Cross Validation
 StatQuest: Bias/Variance

Model validation: Hold  Lecture slides  Geron: Ch 2  StatQuest: Confusion


out, bootstrapping, cross  Lecture notes  Kuhn: Ch 2.2, 4 Matrix
validation, leave one out  Pre-recorded lecture  James: Ch 5  scikit-learn: GridSearchCV
video  VanderPlas: Ch 5.3
 GitHub tutorial

Ensemble methods:  Lecture slides  Geron: Ch 7  StatQuest: Random Forests


Boosting, bagging,  Lecture notes  Kuhn: 14.3 – 14.5  StatQuest: Adaboost
stacking  Pre-recorded lecture  James: 8.2  StatQuest: Gradient Boost
video  StatQuest: XGBoost
 GitHub tutorial  LL: Machine Learning & AI:
Advanced Decision Trees,
by Keith McCormick

Page 6 of 16
MMAI/MMA/GMMA 869: Machine Learning and AI

Session 4: Enhancing the ML Process


In this session, we will discuss three important topics that will elevate your supervised machine learning
projects to the next level.

Learning Objective Course Materials Textbook Readings Diving Deeper

Feature engineering:  Lecture slides  Geron: Ch 8  Data School:


aggregations and  Lecture notes  Kuhn: Ch 3 ColumnTransformer
transformations  Pre-recorded lecture  VanderPlas: Ch 5.4,  Data School: Categorical
video 5.9 encoding in scikit-learn
 GitHub tutorial  Representing Categorical
Data with Target Encoding

Data leakage: target  Lecture slides  Burkov: Ch 3.2.8, 3.5  StatQuest Q&A
leakage, train-test  Lecture notes  Data School: fit vs.
contamination  Pre-recorded lecture transform
video  Data School: fit_transform
vs transform
 Scikit-learn: Common
pitfalls

Feature selection:  Lecture slides  Kuhn: Ch 18, 19  Data School: Feature


transformations,  Lecture notes Selection
aggregations  Pre-recorded lecture
video
 GitHub tutorial

Hyperparameter tuning:  Lecture slides  Geron: Ch 2  scikit-learn: GridSearchCV


grid search, random  Lecture notes  Kuhn: Ch 4
search, Bayesian, AutoML  Pre-recorded lecture  VanderPlas: Ch 5.3
video
 GitHub tutorial

Imbalanced data: class  Lecture slides  StatQuest Q&A


weights, resampling  Lecture notes
techniques.  Pre-recorded lecture
video
 GitHub tutorial

ML Pipelines: simplify  Lecture slides  Data School: Pipelines


your code  Lecture notes
 Pre-recorded lecture
video
 GitHub tutorial

Feature Importance  Lecture slides  Scikit-learn: Interpretting


 Lecture notes Linear Models

Page 7 of 16
MMAI/MMA/GMMA 869: Machine Learning and AI

 Pre-recorded lecture
video

Session 5: Unsupervised Learning (Clustering and Association Rules)


In this session, we will cover the two main areas of unsupervised learning, clustering and association
rule analysis.

Learning Objective Course Materials Textbook Readings Diving Deeper

Clustering overview:  Lecture slides  LL: Machine Learning and


definition, use cases  Lecture notes AI Foundations: Clustering
 Pre-recorded lecture and Association, by Keith
video McCormick
 LL: Python for Data Science
Essential Training Part 2,
by Lillian Pierson
 scikit-learn: Clustering

Practical clustering  Lecture slides


issues: distance metrics,  Lecture notes
feature standardization,  Pre-recorded lecture
categorical data video
 GitHub tutorial #1
 GitHub tutorial #2

Clustering algorithms: K-  Lecture slides  Geron: Ch 9  StatQuest: K-means


Means, DBSCAN,  Lecture notes  Kuhn: Ch clustering
Hierarchical  Pre-recorded lecture  James: Ch 10.3, 10.5  StatQuest: Hierarchical
video  VanderPlas: 5.11, clustering
 GitHub tutorial 5.12
 Video: K-Means
 Tutorial: K-Means
 Video: DBSCAN
 Tutorial: DBSCAN
 Video: Hierarchical
 Tutorial: Hierarchical

Interpreting and  Lecture slides


evaluating clusters  Lecture notes
 Pre-recorded lecture
video
 GitHub tutorial #1
 GitHub tutorial #2
 Video: Interpreting

Association rule learning  Lecture slides  LL: Data Science


overview: definitions,  Lecture notes Foundations: Data Mining,
applications  Pre-recorded lecture by Barton Poulson

Page 8 of 16
MMAI/MMA/GMMA 869: Machine Learning and AI

video
 GitHub tutorial

Association rule learning  Lecture slides


algorithms: Apriori, FP  Lecture notes
Growth  Pre-recorded lecture
video
 Video: Apriori
 Tutorial: Apriori

Association rule learning  Lecture slides


interestingness  Lecture notes
measures: support,  Pre-recorded lecture
confidence, lift video
 GitHub tutorial

Page 9 of 16
MMAI/MMA/GMMA 869: Machine Learning and AI

Session 6: Recommender Systems


In this session, we will discuss recommender systems, i.e., methods to recommend products to users.

Learning Objective Course Materials Textbook Readings Diving Deeper

Recommender systems:  Lecture slides  LL: Building Recommender


overview and use cases  Lecture notes Systems with Machine
 Pre-recorded lecture Learning and AI, by Frank
video Kane

Recommender systems:  Lecture slides  LL: Building a


Common algorithms  Lecture notes Recommendation System
 Pre-recorded lecture with Python Machine
video Learning & AI, by Lillian
 GitHub tutorial Pierson

Recommender systems:  Lecture slides  LL: Machine Learning and


Practical issues  Lecture notes AI Foundations:
 Pre-recorded lecture Recommendations, by
video Adam Geitgey

Recommender systems:  Lecture slides


Case studies  Lecture notes
 Pre-recorded lecture
video

Recommended (hah!) textbooks related to Recommender Systems:

 Aggarwal, Charu C. “Recommender Systems: The Textbook.” ISBN 978-3-319-29657-9. Springer.


2016.
 Falk, Kim. “Practical Recommender Systems.” ISBN 9781617292705. Manning Publications. 2019.

Page 10 of 16
MMAI/MMA/GMMA 869: Machine Learning and AI

Session 7: Team Project Presentations


o Synchronous team project presentations

Page 11 of 16
MMAI/MMA/GMMA 869: Machine Learning and AI

OTHER RESOURCES
I have created a list of notebooks and datasets that students may find helpful:

 github.com/stepthom/869_course

I also curate a list of ML books, articles, blogs, data sets, tools, and more:

 github.com/stepthom/data_mining_resources

Stats, Data Science, and Python MOOCs


Looking to brush up on some math, stats, and Python skills? Have 1 or 2 hours to spare? Check out these
(optional) courses on LinkedIn Learning. Don’t be overwhelmed: just pick one and go.

 Statistics Foundations: 1, 2, and 3 by Eddie Davila


o Center of data; data variability; distributions; probability; permutations/combinations
o Sampling; Confidence intervals; hypothesis testing
o Comparing populations; chi-square; ANOVA; Regression
 Introduction to Data Science, by Lavanya Vijayan and Madecraft
o Tabular data, EDA, data cleaning, data visualization, inference
 Applied Machine Learning: Foundations, by Derek Jedamski
o EDA and Data Cleaning, Measuring Success, Optimizing, End-to-end
 Python Statistics Essential Training, by Michele Vallisneri
o Anaconda setup, importing/cleaning data, visualizing and describing data, statistical
inference, intro to statistical modelling
 Learning Python, by Joe Marini
o Installing Python, Python basics, working with dates and times, working with files,
working with web data
 Python Quick Start, by Lavanya Vijayan and Madecraft
o Defining Python, data, functions, sequences, conditional statements, iteration,
recursion, object-oriented programming
 Python Essential Training, by Bill Weinman
o Installation, Langauge overview, types and values, conditionals, operators, loops,
functions, structured data, classes, exceptions, string objects, file input/output, built-in
functions, modules, databases
 Advanced Python, by Joe Marini
o Python language features, built-in functions, advanced python functions, collections,
classes and objects, logging, comprehensions
 Data Science Foundations: Python Scientific Stack , by Miki Tebeka
o Setup, Jupyter Notebooks, Numpy Basics, Pandas, Conda, Folium and Geo, NY Taxi Data,
scikit-learn, plotting, other packages

Data Cleaning
 Data School

Page 12 of 16
MMAI/MMA/GMMA 869: Machine Learning and AI

o Encoding with OneHotEncoder or OrdinalEncoder


o Missing value imputation for continuous features
o Impute missing values using KNNImputer or IterativeImputer
o Two ways to impute missing values for a categorical feature
o Pipelines
 Data Cleaning in Python: the Ultimate Guide (Towards Data Science, 2020)
 Data Cleaning in Python (Udemy)

Dimensionality Reduction
 StatQuest: PCA, Step-by-Step
 DataCamp: Principal Component Analysis (PCA) in Python
 IEEE: An Analysis of Dimensionality Reduction Techniques on Big Data
 scikit-learn: Decomposing signals in components

Page 13 of 16
MMAI/MMA/GMMA 869: Machine Learning and AI

COURSE POLICIES

Late Work
There will be a 5% penalty per day for late work.

Extensions
Deadline extensions will only be given for extenuating circumstances.

Rounding
I round marks to the nearest whole number. I.e., 89.49999 gets rounded to 89; 89.50000 gets rounded
to 90.

Appealing Marks
You may appeal a mark if you believe an error has occurred. To appeal, please do the following.

1. Wait until at least THREE DAYS AFTER you receive your mark to submit your appeal. (Cool down
period.)
2. Write a brief email/memo as follows:
a. Clearly state that you want to appeal a mark.
b. For each question/portion you would like to appeal, describe what you believe the error
was and how you recommend it should have been marked.
3. You must submit your appeal within ONE WEEK of receiving your initial mark.

Please note that if I re-mark your assignment, I reserve the right to decrease your mark if I feel that the
initial mark was too high.

PRIVACY
All students are expected to respect the privacy of others. Capturing of still and moving images and
audio of other individuals without their express consent is a violation of privacy. Similarly, recordings
and images taken of teaching materials and/or the instructor without permission is a violation of privacy
and fails to recognize that instructors have intellectual property rights to their own teaching materials.
The teaching and learning environment should be a safe space in which all participants are expected to
uphold the values of respect, dignity and trust.

To learn more, please refer to the following:

Taking and Using Images with Consent | Records Management and Privacy Office (queensu.ca)

Privacy and Remote Teaching and Learning | Records Management and Privacy Office (queensu.ca)

Page 14 of 16
MMAI/MMA/GMMA 869: Machine Learning and AI

ACADEMIC INTEGRITY

Definition of Academic Integrity


Queen's students, faculty, administrators, and staff all have responsibilities for supporting and upholding
the fundamental values of academic integrity. Academic integrity is constituted by the five core
fundamental values of honesty, trust, fairness, respect, and responsibility, and by the quality of courage
(see www.academicintegrity.org). These values and qualities are central to the building, nurturing, and
sustaining of an academic community in which all members of the community will thrive. Adherence to
the values expressed through academic integrity forms a foundation for the "freedom of inquiry and
exchange of ideas" essential to the intellectual life of the University.

Students are responsible for familiarizing themselves with, and adhering to, the regulations concerning
academic integrity. General information on academic integrity is available at Academic Integrity @
Queen's University; an overview of Smith's own policies and procedures are also important to review.
You may also find these frequently asked questions on academic integrity helpful for your understanding
of the concept and the regulations surrounding it. Departures from academic integrity include, but are
not limited to, plagiarism, use of unauthorized materials, facilitation, forgery, and falsification. Actions
which contravene the academic integrity regulations carry sanctions that can range from a warning, to
loss of grades on an assignment, to failure of a course, to requirement to withdraw from the university.

Individual Work
I will clearly indicate when students can consult with one another or with experts or resources.
Otherwise, you are required to develop an original response to the assigned topic. Assignments and
examinations identified as individual in nature must be the result of the student's individual effort.
Individuals must not look at, access or discuss any aspect of anyone else's solution (including a student
from a previous year), nor allow anyone else to look at any aspect of their own solution. Likewise,
students are prohibited from utilizing the internet or any other means to access others' solutions to, or
discussions of, the assigned material. If the assignment requires outside research, all sources must be
properly cited and referenced; be careful to cite all material, not only of direct quotations but also of
ideas. Help for citing sources is available through the Queen's University library:
http://library.queensu.ca/help-services/citing-sources.

Group Work
I will clearly indicate when groups may consult with one another or with other experts or resources.
Otherwise, in a group assignment, the group members will work together to develop an original,
consultative response to the assigned topic. Group members must not look at, access or discuss any
aspect of any other group's solution (including a group from a previous year), nor allow anyone outside
of the group to look at any aspect of the group's solution. Likewise, you are prohibited from utilizing the
internet or any other means to access others' solutions to, or discussions of, the assigned material. If the
assignment requires outside research, all sources must be properly cited and referenced; be careful to
cite all material, not only of direct quotations but also of ideas. Help for citing sources is available
through the Queen's University library: http://library.queensu.ca/help-services/citing-sources. The
names of each group member must appear on the submitted assignment, and no one other than the
people whose names appear on the assignment may have contributed in any way to the submitted
solution. In short, the group assignments must be the work of your group, and your group only. All
group members are responsible for ensuring the academic integrity of the work that the group submits.

Page 15 of 16
MMAI/MMA/GMMA 869: Machine Learning and AI

Turnitin.com
When assignments are submitted through the dropbox on the course website, they will be processed
through turnitin.com. Turnitin is a plagiarism detection tool that checks your submission against other
texts, including websites, journal articles, books, and other student submissions in order to verify the or

Page 16 of 16

You might also like