You are on page 1of 3

FE590

Introduction to
Knowledge Engineering
Instructor:
email:
Office:
Office Hours:

David Starer
dstarer@stevens.edu
Babbio 546
By appointment only

Course Goals

This course provides an applied overview of classical linear approaches to statistical learning
and also introduces modern statistical methods. The classical linear approaches will include
logistic regression, linear discriminant analysis, k-means clustering, and nearest neighbors;
while the more modern approaches will include generalized additive models, decision trees,
boosting, bagging, support vector machines, and others.

Prerequisites

Undergraduate mathematics.
Familiarity with basic probability and statistics.
Knowledge of R, or eagerness to learn it.

Textbook

The required textbook for this course is:


Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani
An Introduction to Statistical Learning with Applications in R. Springer, 2014.
This book is available free from the first authors web site at
http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Fourth%20Printing.pdf

Assessment

The precise scoring rubric will finalized during the semester. However, as a rough guide,
grades will probably be based on a combination of quizzes, homework, and exams as follows:
1. Quizzes. There will be four 30-minute in-class moodle multiple choice quizzes.
2. Exams. There will be a 3-hour in-class mid-term exam, and a 3-hour final exam.
3. Homework. There will be four homework assignments in which students will write
programs to solve problems related to statistical learning.
The quizzes are designed to keep students up-to-date with the material, and to pinpoint
any comprehension difficulties before they become major problems. The exams are designed
to test students understanding of statistical learning. The homework is designed to test
students ability to put statistical learning into practice.
Weights.
In computing the course grade, each activity will typically be assigned a weight as follows:
Quizzes
Homework
Mid-term Exam
Final Exam
Total

approximately
approximately
approximately
approximately

20%
40%
20%
20%
100%

Topics to be Covered

The following topics from the textbook will be covered; but they are subject to change.
1. Statistical Learning
Why to Estimate
How to Estimate
Supervised Versus Unsupervised Learning
Regression Versus Classification Problems
2. Assessing Model Accuracy
3. Linear Regression
Simple Linear Regression
Multiple Linear Regression
K Nearest Neighbors
4. Classification
Logistic Regression
Linear Discriminant Analysis
Quadratic Discriminant Analysis
5. Resampling Methods
Bootstrap
Cross-Validation
6. Linear Model Selection and Regularization
2

7.

8.

9.

10.

Shrinkage Methods
Dimension Reduction Methods
Ridge Regression
Partial Least Squares
Moving Beyond Linearity
Polynomial Regression
Step Functions
Basis Functions
Regression Splines
Smoothing Splines
Generalized Additive Models
Decision Trees
Regression Trees
Classification Trees
Random Forests
Bagging
Boosting
Support Vector Machines
Maximal Margin Classifier
Support Vector Classifiers
Unsupervised Learning
Principal Component Analysis
Clustering Methods