0.1 Mmai Mma Gmma 869 Syllabus

Smith School of Business MMA, MMAI, and GMMA
MMAI/MMA/GMMA 869: Machine

Learning & AI
2022 Syllabus
Version 5: November 12, 2021
COURSE DESCRIPTION
This course will cover artificial intelligence (AI) and in particular machine learning (ML), with a focus on
applications in business decisions and processes. The course will look in-depth at the two major types of
ML: supervised learning (including classification and deep learning) and unsupervised learning (including
clustering, association rule learning, and recommender systems). The course will survey key
technologies and applications that are driving the ML revolution. The course will provide students with a
robust theoretical understanding of popular models and techniques, but the course will be application-
focused. The overall goal of the course is to provide a foundation and framework for understanding
how, when, and why to use AI and ML models in data-driven decision-making.
The course will also cover practical topics in the machine learning process, including feature
engineering, feature selection, feature normalization, model evaluation, parameter tuning, model
selection, and ensemble building.
On completion of this course, students will be able to frame various classes of business problems as AI
and ML problems. Students will understand which AI and ML model to use for a given problem, how to
use the model, how to evaluate the model, and how to deploy the model.
The course will include a team project that will provide an opportunity to apply various AI and ML
models to a real-world business dataset.
INSTRUCTOR
Stephen W. Thomas, PhD
 Office: (613) 533-3390

 stephen.thomas@queensu.ca
 Smith Faculty Page
 DBLP
 Google Scholar
 LinkedIn
 GitHub
TEACHING ASSISTANT
Cecilia Ying, MMA, MSc, PhD (in progress). y.ying@queensu.ca
Page 1 of 16
MMAI/MMA/GMMA 869: Machine Learning and AI
REFERENCES
Textbooks
 Geron, Aurelien. “Hands-on Machine Learning with Scikit-Learn & Tensorflow.” 2017. O’Reilly
Media. ISBN: 9781491962299.
o GitHub resources: github.com/ageron/handson-ml
 Kuhn, Max and Johson, Kjell. "Applied Predictive Modeling." 2nd Edition, 2018. Springer. ISBN-
13: 978-1461468486.
o Resources: appliedpredictivemodeling.com
We will also draw from the following free resources, all available on the course portal:
 James, Gareth and Witten, Daniela and Hastie, Trevor and Tibshirani, Robert. "An Introduction
to Statistical Learning with Applications in R." 7th Printing. Springer. www.statlearning.com
 Burkov, Andriy. “Machine Learning Engineering.” September 2020.
www.mlebook.com/wiki/doku.php
 VanderPlas, Jake. “Python data science handbook: Essential tools for working with data.”
O'Reilly Media, Inc.", 2016. jakevdp.github.io/PythonDataScienceHandbook/index.html
 Molnar, Christoph. “Interpretable Machine Learning.” April 26, 2021.
christophm.github.io/interpretable-ml-book
Course Portal
The course portal contains supplementary readings, lecture notes, and interactive material for each
course session. I highly recommend you visit and review the material before each course session.
EVALUATION
Item Value GMMA 2022 Due* MMA 2022S Due* MMAI 2022 Due*
Team Project 50% Nov 12, 2021 Nov 30, 2021 Dec 12, 2021
Assignment 50% Nov 15, 2021 Dec 12, 2021 Dec 26, 2021
*by 11:59 pm Eastern Time on the date indicated.
TEAM PROJECT
Please see the separate file: Project Brief – DrivenData.docx.
THE ML CUP
There can only be one winner. Will it be you?
The ML Cup is an individual competition held throughout the course. There will be multiple events in
which students can earn points. The student with the highest number of total points at the end of the
course will win the cup. The winner will receive a prize and, more importantly, life-long bragging rights.
Page 2 of 16
I will not reveal any details about the events until 24 hours before they occur. What I will say is this:
come to each event having prepared for that week’s course material. Be nimble, be quick, and jump
over that candlestick!
Page 3 of 16
COURSE SESSIONS
Session 1: Intro to AI and ML

This session will start you down the road of enlightenment by outlining what AI and ML are and will
motivate you with some success stories from the front lines.
Learning Objective Course Materials Textbook Readings Diving Deeper

Overview of AI  Lecture slides  LL: Big data in the Age of
 Lecture notes AI, by Barton Poulson
 Pre-recorded lecture  LL: Artificial Intelligence
video Foundations: Thinking
Machines, by Doug Rose
Overview of ML  Lecture slides  Geron: Ch 1, 2  LL: Artificial Intelligence

 Lecture notes  Kuhn: Ch 1 Foundations: Machine
 Pre-recorded lecture  James: Ch 2.1 Learning, by Doug Rose
video  VanderPlas: Ch 5.1  Data School: What is
 Burkov: Ch 1, 2 Machine Learning?
Page 4 of 16
Session 2: Classification
In this session, we will learn all about classification, a type of supervised machine learning. We will learn
about the different algorithms for building classification models: how they work under the hood, what
their strengths and weakness are, and when to use which.
Classification  Lecture slides  Geron: Ch 3  LL: Machine Learning and AI

overview: definition,  Lecture notes  Kuhn: Ch 11 Foundations: Classification
process, applications  Pre-recorded lecture  James: Ch 4.1 – 4.3 Modeling, by Keith McCormick
video  VanderPlas: Ch 5.1  LL: Applied Machine Learning:
 Exercise: You Try! Algorithms, by Derek Jedamski
 Exercise: Which  StatQuest: Logistic Regression
algorithm?  scikit-learn: Supervised Learning
Decision Trees:  Lecture slides  Geron: Ch 6  StatQuest: Decision Trees

learning rules from  Lecture notes  Kuhn: Ch 14  LL: Machine Learning and AI
data  Pre-recorded lecture  James: Ch 8.1 Foundations: Decision Trees, by
video  VanderPlas: Ch 5.8 Keith McCormick
 GitHub tutorial
Naïve Bayes: using  Lecture slides  Kuhn: Ch 13.6  StatQuest: Nave Bayes
Bayes’ Theorem  Lecture notes  James: Ch 4.4  3Blue1Brown: Bayes Theorem
 Pre-recorded lecture  VanderPlas: Ch 5.5
video
 GitHub tutorial
K-Nearest Neighbors:  Lecture slides  Kuhn: Ch 13.5  StatQuest: KNN

learning by example  Lecture notes
 Pre-recorded lecture
video
 GitHub tutorial
Support Vector  Lecture slides  Geron: Ch 5  StatQuest: SVM Part 1

Machines: dividing  Lecture notes  James: Ch 9  StatQuest: SVM Part 2 (Polynomial)
the data with a  Pre-recorded lecture  VanderPlas: Ch 5.7  StatQuest: SVM Part 3 (RBF)
hyperplane video
 GitHub tutorial
Neural Networks: a  Lecture slides  Geron: Ch 10, 11  StatQuest: Neural Networks Part 1
powerful black box  Lecture notes  Kuhn: Ch 13.2  StatQuest: NN Part 2
that simulates the  Pre-recorded lecture (Backpropagation)
human brain video  StatQuest: NN Part 3 (ReLU)
 GitHub tutorial  StatQuest: NN Part 4 (Inputs and
Outputs)
 StatQuest: NN Part 5 (ArgMax and
SoftMax)
 StatQuest: NN Part 6 (X Entropy)
 LL: Artificial Intelligence
Page 5 of 16
Foundations: Neural Networks

 deeplizard: NNs explained
Session 3: Evaluating and Enhancing Supervised ML Algorithms
Evaluation metrics:  Lecture slides  Geron: Ch 3  StatQuest: Confusion

confusion matrix,  Lecture notes  Kuhn: Ch 11 Matrix
accuracy, sensitivity,  Pre-recorded lecture  Data School: Confusion
specificity, P/R, F1, AUC video Matrix
 GitHub tutorial  StatQuest: Sensitivity and
Specificity
 StatQuest: ROC and AUC
 StatQuest: Cross Validation
 StatQuest: Bias/Variance
Model validation: Hold  Lecture slides  Geron: Ch 2  StatQuest: Confusion

out, bootstrapping, cross  Lecture notes  Kuhn: Ch 2.2, 4 Matrix
validation, leave one out  Pre-recorded lecture  James: Ch 5  scikit-learn: GridSearchCV
video  VanderPlas: Ch 5.3
 GitHub tutorial
Ensemble methods:  Lecture slides  Geron: Ch 7  StatQuest: Random Forests

Boosting, bagging,  Lecture notes  Kuhn: 14.3 – 14.5  StatQuest: Adaboost
stacking  Pre-recorded lecture  James: 8.2  StatQuest: Gradient Boost
video  StatQuest: XGBoost
 GitHub tutorial  LL: Machine Learning & AI:
Advanced Decision Trees,
by Keith McCormick
Page 6 of 16
Session 4: Enhancing the ML Process

In this session, we will discuss three important topics that will elevate your supervised machine learning
projects to the next level.
Feature engineering:  Lecture slides  Geron: Ch 8  Data School:

aggregations and  Lecture notes  Kuhn: Ch 3 ColumnTransformer
transformations  Pre-recorded lecture  VanderPlas: Ch 5.4,  Data School: Categorical
video 5.9 encoding in scikit-learn
 GitHub tutorial  Representing Categorical
Data with Target Encoding
Data leakage: target  Lecture slides  Burkov: Ch 3.2.8, 3.5  StatQuest Q&A
leakage, train-test  Lecture notes  Data School: fit vs.
contamination  Pre-recorded lecture transform
video  Data School: fit_transform
vs transform
 Scikit-learn: Common
pitfalls
Feature selection:  Lecture slides  Kuhn: Ch 18, 19  Data School: Feature

transformations,  Lecture notes Selection
aggregations  Pre-recorded lecture
video
 GitHub tutorial
Hyperparameter tuning:  Lecture slides  Geron: Ch 2  scikit-learn: GridSearchCV

grid search, random  Lecture notes  Kuhn: Ch 4
search, Bayesian, AutoML  Pre-recorded lecture  VanderPlas: Ch 5.3
video
 GitHub tutorial
Imbalanced data: class  Lecture slides  StatQuest Q&A

weights, resampling  Lecture notes
techniques.  Pre-recorded lecture
video
 GitHub tutorial
ML Pipelines: simplify  Lecture slides  Data School: Pipelines

your code  Lecture notes
video
 GitHub tutorial
Feature Importance  Lecture slides  Scikit-learn: Interpretting

 Lecture notes Linear Models
Page 7 of 16
video
Session 5: Unsupervised Learning (Clustering and Association Rules)

In this session, we will cover the two main areas of unsupervised learning, clustering and association
rule analysis.
Clustering overview:  Lecture slides  LL: Machine Learning and

definition, use cases  Lecture notes AI Foundations: Clustering
 Pre-recorded lecture and Association, by Keith
video McCormick
 LL: Python for Data Science
Essential Training Part 2,
by Lillian Pierson
 scikit-learn: Clustering
Practical clustering  Lecture slides

issues: distance metrics,  Lecture notes
feature standardization,  Pre-recorded lecture
categorical data video
 GitHub tutorial #1
Clustering algorithms: K-  Lecture slides  Geron: Ch 9  StatQuest: K-means

Means, DBSCAN,  Lecture notes  Kuhn: Ch clustering
Hierarchical  Pre-recorded lecture  James: Ch 10.3, 10.5  StatQuest: Hierarchical
video  VanderPlas: 5.11, clustering
 GitHub tutorial 5.12
 Video: K-Means
 Tutorial: K-Means
 Video: DBSCAN
 Tutorial: DBSCAN
 Video: Hierarchical
 Tutorial: Hierarchical
Interpreting and  Lecture slides

evaluating clusters  Lecture notes
video
 Video: Interpreting
Association rule learning  Lecture slides  LL: Data Science

overview: definitions,  Lecture notes Foundations: Data Mining,
applications  Pre-recorded lecture by Barton Poulson
Page 8 of 16
video
 GitHub tutorial
Association rule learning  Lecture slides

algorithms: Apriori, FP  Lecture notes
Growth  Pre-recorded lecture
video
 Video: Apriori
 Tutorial: Apriori
Association rule learning  Lecture slides

interestingness  Lecture notes
measures: support,  Pre-recorded lecture
confidence, lift video
 GitHub tutorial
Page 9 of 16
Session 6: Recommender Systems

In this session, we will discuss recommender systems, i.e., methods to recommend products to users.
Recommender systems:  Lecture slides  LL: Building Recommender

overview and use cases  Lecture notes Systems with Machine
 Pre-recorded lecture Learning and AI, by Frank
video Kane
Recommender systems:  Lecture slides  LL: Building a

Common algorithms  Lecture notes Recommendation System
 Pre-recorded lecture with Python Machine
video Learning & AI, by Lillian
 GitHub tutorial Pierson
Recommender systems:  Lecture slides  LL: Machine Learning and

Practical issues  Lecture notes AI Foundations:
 Pre-recorded lecture Recommendations, by
video Adam Geitgey
Recommender systems:  Lecture slides

Case studies  Lecture notes
video
Recommended (hah!) textbooks related to Recommender Systems:
 Aggarwal, Charu C. “Recommender Systems: The Textbook.” ISBN 978-3-319-29657-9. Springer.

2016.
 Falk, Kim. “Practical Recommender Systems.” ISBN 9781617292705. Manning Publications. 2019.
Page 10 of 16
Session 7: Team Project Presentations

o Synchronous team project presentations
Page 11 of 16
OTHER RESOURCES
I have created a list of notebooks and datasets that students may find helpful:
 github.com/stepthom/869_course
I also curate a list of ML books, articles, blogs, data sets, tools, and more:
 github.com/stepthom/data_mining_resources
Stats, Data Science, and Python MOOCs

Looking to brush up on some math, stats, and Python skills? Have 1 or 2 hours to spare? Check out these
(optional) courses on LinkedIn Learning. Don’t be overwhelmed: just pick one and go.
 Statistics Foundations: 1, 2, and 3 by Eddie Davila

o Center of data; data variability; distributions; probability; permutations/combinations
o Sampling; Confidence intervals; hypothesis testing
o Comparing populations; chi-square; ANOVA; Regression
 Introduction to Data Science, by Lavanya Vijayan and Madecraft
o Tabular data, EDA, data cleaning, data visualization, inference
 Applied Machine Learning: Foundations, by Derek Jedamski
o EDA and Data Cleaning, Measuring Success, Optimizing, End-to-end
 Python Statistics Essential Training, by Michele Vallisneri
o Anaconda setup, importing/cleaning data, visualizing and describing data, statistical
inference, intro to statistical modelling
 Learning Python, by Joe Marini
o Installing Python, Python basics, working with dates and times, working with files,
working with web data
 Python Quick Start, by Lavanya Vijayan and Madecraft
o Defining Python, data, functions, sequences, conditional statements, iteration,
recursion, object-oriented programming
 Python Essential Training, by Bill Weinman
o Installation, Langauge overview, types and values, conditionals, operators, loops,
functions, structured data, classes, exceptions, string objects, file input/output, built-in
functions, modules, databases
 Advanced Python, by Joe Marini
o Python language features, built-in functions, advanced python functions, collections,
classes and objects, logging, comprehensions
 Data Science Foundations: Python Scientific Stack , by Miki Tebeka
o Setup, Jupyter Notebooks, Numpy Basics, Pandas, Conda, Folium and Geo, NY Taxi Data,
scikit-learn, plotting, other packages
Data Cleaning
 Data School
Page 12 of 16
o Encoding with OneHotEncoder or OrdinalEncoder

o Missing value imputation for continuous features
o Impute missing values using KNNImputer or IterativeImputer
o Two ways to impute missing values for a categorical feature
o Pipelines
 Data Cleaning in Python: the Ultimate Guide (Towards Data Science, 2020)
 Data Cleaning in Python (Udemy)
Dimensionality Reduction
 StatQuest: PCA, Step-by-Step
 DataCamp: Principal Component Analysis (PCA) in Python
 IEEE: An Analysis of Dimensionality Reduction Techniques on Big Data
 scikit-learn: Decomposing signals in components
Page 13 of 16
COURSE POLICIES
Late Work
There will be a 5% penalty per day for late work.
Extensions
Deadline extensions will only be given for extenuating circumstances.
Rounding
I round marks to the nearest whole number. I.e., 89.49999 gets rounded to 89; 89.50000 gets rounded
to 90.
Appealing Marks
You may appeal a mark if you believe an error has occurred. To appeal, please do the following.
1. Wait until at least THREE DAYS AFTER you receive your mark to submit your appeal. (Cool down
period.)
2. Write a brief email/memo as follows:
a. Clearly state that you want to appeal a mark.
b. For each question/portion you would like to appeal, describe what you believe the error
was and how you recommend it should have been marked.
3. You must submit your appeal within ONE WEEK of receiving your initial mark.
Please note that if I re-mark your assignment, I reserve the right to decrease your mark if I feel that the
initial mark was too high.
PRIVACY
All students are expected to respect the privacy of others. Capturing of still and moving images and
audio of other individuals without their express consent is a violation of privacy. Similarly, recordings
and images taken of teaching materials and/or the instructor without permission is a violation of privacy
and fails to recognize that instructors have intellectual property rights to their own teaching materials.
The teaching and learning environment should be a safe space in which all participants are expected to
uphold the values of respect, dignity and trust.
To learn more, please refer to the following:
Taking and Using Images with Consent | Records Management and Privacy Office (queensu.ca)
Privacy and Remote Teaching and Learning | Records Management and Privacy Office (queensu.ca)
Page 14 of 16
ACADEMIC INTEGRITY
Definition of Academic Integrity

Queen's students, faculty, administrators, and staff all have responsibilities for supporting and upholding
the fundamental values of academic integrity. Academic integrity is constituted by the five core
fundamental values of honesty, trust, fairness, respect, and responsibility, and by the quality of courage
(see www.academicintegrity.org). These values and qualities are central to the building, nurturing, and
sustaining of an academic community in which all members of the community will thrive. Adherence to
the values expressed through academic integrity forms a foundation for the "freedom of inquiry and
exchange of ideas" essential to the intellectual life of the University.
Students are responsible for familiarizing themselves with, and adhering to, the regulations concerning
academic integrity. General information on academic integrity is available at Academic Integrity @
Queen's University; an overview of Smith's own policies and procedures are also important to review.
You may also find these frequently asked questions on academic integrity helpful for your understanding
of the concept and the regulations surrounding it. Departures from academic integrity include, but are
not limited to, plagiarism, use of unauthorized materials, facilitation, forgery, and falsification. Actions
which contravene the academic integrity regulations carry sanctions that can range from a warning, to
loss of grades on an assignment, to failure of a course, to requirement to withdraw from the university.
Individual Work
I will clearly indicate when students can consult with one another or with experts or resources.
Otherwise, you are required to develop an original response to the assigned topic. Assignments and
examinations identified as individual in nature must be the result of the student's individual effort.
Individuals must not look at, access or discuss any aspect of anyone else's solution (including a student
from a previous year), nor allow anyone else to look at any aspect of their own solution. Likewise,
students are prohibited from utilizing the internet or any other means to access others' solutions to, or
discussions of, the assigned material. If the assignment requires outside research, all sources must be
properly cited and referenced; be careful to cite all material, not only of direct quotations but also of
ideas. Help for citing sources is available through the Queen's University library:
http://library.queensu.ca/help-services/citing-sources.
Group Work
I will clearly indicate when groups may consult with one another or with other experts or resources.
Otherwise, in a group assignment, the group members will work together to develop an original,
consultative response to the assigned topic. Group members must not look at, access or discuss any
aspect of any other group's solution (including a group from a previous year), nor allow anyone outside
of the group to look at any aspect of the group's solution. Likewise, you are prohibited from utilizing the
internet or any other means to access others' solutions to, or discussions of, the assigned material. If the
assignment requires outside research, all sources must be properly cited and referenced; be careful to
cite all material, not only of direct quotations but also of ideas. Help for citing sources is available
through the Queen's University library: http://library.queensu.ca/help-services/citing-sources. The
names of each group member must appear on the submitted assignment, and no one other than the
people whose names appear on the assignment may have contributed in any way to the submitted
solution. In short, the group assignments must be the work of your group, and your group only. All
group members are responsible for ensuring the academic integrity of the work that the group submits.
Page 15 of 16
Turnitin.com
When assignments are submitted through the dropbox on the course website, they will be processed
through turnitin.com. Turnitin is a plagiarism detection tool that checks your submission against other
texts, including websites, journal articles, books, and other student submissions in order to verify the or
Page 16 of 16

0.1 Mmai Mma Gmma 869 Syllabus

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

0.1 Mmai Mma Gmma 869 Syllabus

Uploaded by

Copyright:

Available Formats

Smith School of Business MMA, MMAI, and GMMA

MMAI/MMA/GMMA 869: Machine

Version 5: November 12, 2021

 Office: (613) 533-3390

Session 1: Intro to AI and ML

Learning Objective Course Materials Textbook Readings Diving Deeper

Overview of ML  Lecture slides  Geron: Ch 1, 2  LL: Artificial Intelligence

Learning Objective Course Materials Textbook Readings Diving Deeper

Classification  Lecture slides  Geron: Ch 3  LL: Machine Learning and AI

Decision Trees:  Lecture slides  Geron: Ch 6  StatQuest: Decision Trees

K-Nearest Neighbors:  Lecture slides  Kuhn: Ch 13.5  StatQuest: KNN

Support Vector  Lecture slides  Geron: Ch 5  StatQuest: SVM Part 1

Foundations: Neural Networks

Session 3: Evaluating and Enhancing Supervised ML Algorithms

Learning Objective Course Materials Textbook Readings Diving Deeper

Evaluation metrics:  Lecture slides  Geron: Ch 3  StatQuest: Confusion

Model validation: Hold  Lecture slides  Geron: Ch 2  StatQuest: Confusion

Ensemble methods:  Lecture slides  Geron: Ch 7  StatQuest: Random Forests

Session 4: Enhancing the ML Process

Learning Objective Course Materials Textbook Readings Diving Deeper

Feature engineering:  Lecture slides  Geron: Ch 8  Data School:

Feature selection:  Lecture slides  Kuhn: Ch 18, 19  Data School: Feature

Hyperparameter tuning:  Lecture slides  Geron: Ch 2  scikit-learn: GridSearchCV

Imbalanced data: class  Lecture slides  StatQuest Q&A

ML Pipelines: simplify  Lecture slides  Data School: Pipelines

Feature Importance  Lecture slides  Scikit-learn: Interpretting

Session 5: Unsupervised Learning (Clustering and Association Rules)

Learning Objective Course Materials Textbook Readings Diving Deeper

Clustering overview:  Lecture slides  LL: Machine Learning and

Practical clustering  Lecture slides

Clustering algorithms: K-  Lecture slides  Geron: Ch 9  StatQuest: K-means

Interpreting and  Lecture slides

Association rule learning  Lecture slides  LL: Data Science

Association rule learning  Lecture slides

Association rule learning  Lecture slides

Session 6: Recommender Systems

Learning Objective Course Materials Textbook Readings Diving Deeper

Recommender systems:  Lecture slides  LL: Building Recommender

Recommender systems:  Lecture slides  LL: Building a

Recommender systems:  Lecture slides  LL: Machine Learning and

Recommender systems:  Lecture slides

Recommended (hah!) textbooks related to Recommender Systems:

 Aggarwal, Charu C. “Recommender Systems: The Textbook.” ISBN 978-3-319-29657-9. Springer.

Session 7: Team Project Presentations

Stats, Data Science, and Python MOOCs

 Statistics Foundations: 1, 2, and 3 by Eddie Davila

o Encoding with OneHotEncoder or OrdinalEncoder

To learn more, please refer to the following:

Definition of Academic Integrity

You might also like