You are on page 1of 7

Course Outline - Machine Learning - Spring 2019

Prof. Cynthia Rudin


TTH - 08:30AM to 09:45AM in LSRC B101 (244)

Sakai Site: COMPSCI.571D.01D.S18


Piazza sign up link: https://piazza.com/duke/spring2019/compsci671d01dsp19/home

Course Description: This is an introductory overview course at an advanced level. Covers


standard techniques, such as the perceptron algorithm, decision trees, random forests, boosting,
support vector machines and reproducing kernel Hilbert spaces, regression, Kmeans, Gaussian
mixture models and EM, neural networks, and multi-armed bandits. Covers introductory
statistical learning theory.

Teaching Assistants:

PhD Students:
Harsh Parikh harsh.parikh@duke.edu
Zhi Chen zhi.chen1@duke.edu
Marco Morucci marco.morucci@duke.edu
Arnab Kar arnab.kar@duke.edu

Undergraduates:
Sachit Menon sachit.menon@duke.edu
Nikhil Ravi nikhil.v.ravi@duke.edu
Webster Bei yijie.bei@duke.edu

Discussion Times and Locations:


M - 08:30AM to 09:45AM in French Science 2237
M - 10:05AM to 11:20AM in Old Chemistry 003
M - 11:45AM to 01:00PM in LSRC A156
M - 11:45AM to 01:00PM in Biological Sciences 155
M - 01:25PM to 02:40PM in French Science 2237
M - 03:05PM to 04:20PM in Sociology Psychology 129
Discussions and lectures are optional, no attendance will be taken.

Office hours for TA’s:


Venue: Student Lounge outside LSRC D301
------------
PhD students:
Monday 5:00 PM to 6:00 PM (Marco, Zhi and Harsh will rotate)
Arnab: Tuesday 5:00 PM to 6:00 PM, Thursday 4:00 PM to 5:00 PM, Friday 5:00 PM to 6:00 PM
UG students:
Nikhil: Friday 11 AM to 12 Noon
Sachit: Friday 1:30 PM to 2:30 PM
Webster: Friday 4 PM to 5 PM

Prof Rudin’s office hour is just after class. Just come up after the lecture and ask your
question.
This is an introductory, applied, machine learning course at an advanced level. It is an overview
course, meaning we will cover a lot of material at a rapid pace (breadth, at the expense of depth).

This is graduate machine learning. It assumes fluency with basic skills such as linear algebra,
analysis (including proofs), probability (advanced), and programming. Many of you who
registered do not have the background necessary to understand this material, which requires a
level of fluency in mathematics and computer science. If you do not enjoy proofs, you should
not take this course. If you do not have the necessary background, you won’t get much out of
taking the course now. In that case you are better off taking the course after you have the
prerequisites - the material will be much more useful if you can understand it! You could also
take a different data science course offered at the university.

You may use any programming language for this course (though we can’t realistically help you
troubleshoot anything other than matlab, python, or R). I would recommend python to those who
can’t decide. Matlab is expensive outside academia, and for R, several important machine
learning packages are not maintained well and can be difficult to troubleshoot. If the TA’s can’t
understand your code, you will lose points, so comment it carefully. Please type your homework
(in a word processor or LaTex).

Grade: 60% homework, 40% tests (Duke electives must have at least 40% of grade from tests.)
You can do either the last homework or Kaggle but not both (well, you could do both, we just
won’t grade both). The last homework / Kaggle will count for twice as much as other homework
assignments.

General topics:
- Basic machine learning evaluation techniques, including ROC curves and cross validation
- Top 10 algorithms in data mining (including optimization and ensemble methods)
- Statistical learning theory
- Introductory online learning - mistake bounds, multi-armed bandits
- Bayesian Methods in ML (Gaussian mixture models, Bayesian and frequentist interpretations)

Note that the lecture/topic schedule is subject to change at any point throughout the semester. The
exam dates will not change.

Lecture 1: January 10
Overview
Perceptron Algorithm
Perceptron convergence proof
Start Winnow algorithm
HW 1 assigned

Lecture 2: January 15
Winnow algorithm
Knowledge Discovery in Databases
ROC Curves, AUC/AUROC

Lecture 3: January 17
Concepts of Supervised Learning (regularized loss minimization)
More ROC Curves
Cross-validation

Lecture 4: January 22
Decision Trees
Information and Entropy
HW 1 due (extension until Jan 24)
HW 2 assigned

Lecture 5: January 24
Decision trees continued

Lecture 6: January 29
Random Forests (Decision Forests)
Variable importance
Boosting
Coordinate descent view and convergence rates

Lecture 7: January 31
Boosting continued
Logistic regression and Boosting, probabilistic interpretation
HW 2 due
HW 3 assigned

Lecture 8: February 5
Logistic regression - Continued from last time
Convexity

Lecture 9: February 7
Convexity - Continued from last time
SVM

Lecture 10: February 12


Test 1 – Covering through Feb 5 (including logistic regression but not convexity if the
schedule stays as it is now, and I’ll announce it closer to the time)

Lecture 11: February 14


SVM continued
Kernels

Lecture 12: February 19


Kernels, continued
Reproducing Kernel Hilbert Spaces
Representer Theorem
HW 3 due
HW 4 assigned

Lecture 13: February 21


Statistical Learning Theory

Lecture 14: February 26


Statistical Learning Theory, Margin theory

Lecture 15: February 28


Ridge Regression: Frequentist and Generative interpretation, PCA
HW 4 due
HW 5 assigned
Kaggle Competition Available
(Note: you can choose to be graded on either HW 5 or the Kaggle but not both.)

Lecture 16: March 5


Regularized logistic regression
Kernel Ridge Regression
Kernel Regression
Kmeans
Hierarchical clustering
Gaussian Mixture Models (start this topic)

Lecture 17: March 7


No class

Spring break

Lecture 18: March 19


EM and Gaussian Mixture Models

Lecture 19: March 21


NN’s
Backpropagation

Lecture 20: March 26


Multi-armed bandits
- Epsilon-greedy, Regret bound for epsilon-greedy, UCB

Lecture 21: March 28


More bandits

Lecture 22: April 2


Word2Vec, Naïve Bayes

Lecture 23: April 4


Extra lecture in case we get behind
HW 5 due

Lecture 24: April 9


Causal Inference, Cynthia’s own view of ML, CORELS algorithm

No class April 11 in order to study for last quiz

Lecture 25: April 16


Exam 2
April 23: Kaggle last day, reports due

There are 5 Homework assignments, 2 Tests. Homework is due at 10pm on its due date, on Sakai.
Please be gentle on the TA’s and organize your homework well!

Reference Materials
We will not be following a single book or source for the course material. I often provide
references for each topic separately. There is a list of optional references below. You are not
required to read or purchase any of these references. All of the material taught in this course is
standard, and there are many sets of lecture notes posted on throughout the Internet on these
topics also.

- Elements of Statistical Learning – Hastie, Tibshirani, Friedman


- Machine Learning – Tom Mitchell
- Understanding Machine Learning, from Theory to Algorithms – Shalev-Shwartz/Ben David

Absences

With >150 students, scheduling can be a formidable challenge! Please be very careful not to make
your schedule a problem for the TA team!

We do not accept late homework. Even if you are sick, you are still expected to hand in
homework on time – with the exception of serious illnesses.
If you must miss a quiz or the final due to an excused absence or illness, you must inform Prof.
Rudin beforehand, obtain her permission, and fill out a STINF if relevant (see
https://trinity.duke.edu/undergraduate/academic-policies/illness). Please do not email the TAs if
you need to be absent. It is unlikely that Prof. Rudin will grant permission for non-emergencies
such as job-interviews.

Undergraduate students who miss a quiz due to a scheduled varsity trip should fill out an online
NOVAP (https://trinity.duke.edu/undergraduate/academic-policies/athletic-varsity-participation).
If you are faced with a personal or family emergency or a long-term or chronic health condition
that interferes with your ability to attend or complete classes, you should contact your academic
dean’s office. In the past we have been very accommodating to students with chronic health
conditions or other severe emergencies. At the same time, we do not have the capacity to handle
most other issues that arise, so be prepared that many requests for absence will be denied.

We have many students who want to skip our tests and exam due to job interviews. This is
unprofessional. Job interviews should be scheduled around tests. As a student with integrity, your
first priority are your classes, and any future employer should understand this, given that you are
a student at a prestigious university, and they should be willing to accommodate your schedule.
We will not give any credit/makeup for tests missed because of a job interview. The only
exception to this rule is if it is a second or third round interview, and the student made a (sincere,
clearly recorded in writing) attempt to change the schedule. In that case the student should still
request permission from us before confirming the interview schedule to be guaranteed that this
interview is a valid excusable absence.

If you are absent from lecture and wish to get lecture notes, please get the notes from a friend or
on the course Sakai site. As usual, it is not the responsibility of the staff to provide you with
lecture notes if you skip lecture. We welcome students to post their notes from lecture on Sakai to
help others who didn’t catch a concept the first time around, or who were not able to attend class
that day.

Cheating
There is no collaboration on homework or tests. You must do your own work; we feel this is the
best way to learn. If you get stuck, you are welcome to discuss conceptual issues with your
classmates or the TA. Do not ask another classmate to perform coding for you or to show you the
answer to the problem.

We have had serious problems with cheating in the past, so we are very harsh on cheating in
order for the course to be enjoyable to the students who do not cheat. Students who suspect
cheating by others should definitely contact a member of the teaching team.

Duke University is a community dedicated to scholarship, leadership, and service and to the
principles of honesty, fairness, respect, and accountability. Citizens of this community commit to
reflect upon and uphold these principles in all academic and non-academic endeavors, and to
protect and promote a culture of integrity. Cheating on exams and quizzes, plagiarism on
homework assignments and projects, lying about an illness or absence and other forms of
academic dishonesty are a breach of trust with classmates and faculty, violate the Duke
Community Standard, and will not be tolerated. Such incidences will result in a 0 grade for all
parties involved as well as being reported to the University Judicial Board. Additionally, there
may be penalties to your final class grade. Please review Duke's Standards of Conduct.

Course Policies
No laptops in class. They are very disruptive to other students. They are even more disruptive to
instructors. Laptops are only permitted during the discussion times with the TA’s.
Use of cell phones underneath the desk is permitted if done in a way that is respectful.
Food is permitted. (Snacks/gum are encouraged since it might help you focus. Just don’t make a
mess in the classroom since we don’t want to get yelled at by Facilities.) Do not come late or
leave early. If you need to do so, please sit near the door so as not to disrupt others.

Disabilities
Students with disabilities who believe they may need accommodations in this class are
encouraged to contact the Student Disability Access Office at (919) 668-1267 as soon as possible
to better ensure that such accommodations can be made.

Communication
We will heavily make use of the Piazza discussion forum. Everyone who is registered for the
course should already have access to the Sakai page, from which you can reach the Piazza forum.
Please post technical questions to the Piazza forum rather than sending them by email. This will
allow other students to see what questions have been asked already and to view posted answers. If
you think you know the answer to a question that was posted by someone else, you are
encouraged to try to answer it. If you aren’t sure whether the question should be public, you can
submit it as a private question and the TA can choose whether to make it public later on, but it
would be better for everyone if most of the questions could be public. Do not post the answers to
homework problems on Piazza or anywhere else. This is considered cheating and there will be
consequences. Do not send other people answers to homework by email.
For your protection, the Piazza page is not accessible without logging in and your conversations
also will not be accessible without logging in.
This syllabus may change throughout the course as I adapt. The latest version will always be
available on Sakai.

You also have the option of creating a blog on the Sakai page if you wish. This could be
particularly useful for your own reference, and for others as well.

There are over 150 students registered for this course, and only a few staff members, so please be
courteous in preserving the time of the staff! If you have a question that you can’t figure out
despite trying hard to answer it yourself:
1) Look at piazza to see if someone has already answered it.
2) Ask your friend in the class.
3) Ask at office hours.
4) If it’s a conceptual question rather than a specific question about a homework problem,
ask it at the discussion section meeting.
5) Post it to others on Piazza. Do not ask very specific questions about the homework that
would force others to reveal the answer.
6) If you have a question about grading, it needs to be addressed to the grader who actually
did the grading for that assignment. Your grade could go either up or down if you ask for
a regrade. Approach the TA respectfully (do not be demanding or difficult). I have asked
the TA’s not to be responsive to disrespectful communication with them. Regrading
submissions need to be made within 1 week from when the graded homework is returned
to you.
7) Ask Prof. Rudin after class. The best questions to ask Prof Rudin are questions about the
lecture, or questions you haven’t been able to get the answer to anywhere else. Other
questions go to the other instructors. Thanks!

You might also like