You are on page 1of 22

Data Mining/Machine Learning

Instructor: Chris Ding


This course covers useful data mining techniques.

Also teach a way to look at these new techniques.

Machine learning/data mining is a simple subject.

The main purpose is to solve real problems. It is not to


discover great new fundamental theories of anything!

Use best practices I encountered when I study/worked at


Columbia University, California Institute of Technology, UC
Berkeley, Univ of Texas.

Also use my 20+ years of research experiences on the subject.


(You may know more about the instructor on WeChat)
This course has

Two exams, midterm and final. Each exam covers half of the materials, so
final exam is same importance as midterm exam. 35% each for final grade.

Many homeworks, 10%. TA will grade only half of them. Exam problems
are often modified homework problems.

Programing homework and project. 20%. This is required. Not doing


project will fail the course, even if you get 100% correct on exams.

Final grade is based on (1) student mastering the materials and


(2) in comparison with other students.
For this reason, make-up exam can not be done properly because it is almost
impossible to match two sets of exam problems.
We only allow makeup exam because of medical conditions, with doctors letters
showing student has a medical condition on the day of exam and also show prior
same/similar medical conditions.
Data Mining: extract knowledge from data
Data Mining / Machine Learning

Fit/merge (融合) existing data to a model,


Use the model to analysis new data.
The most important algorithm is k-nearest neighbors(k近邻).
It is invented in 1953.

The most widely data clustering method is k-means (k 均值)


clustering. It was invented in 1956

The most widely used data transformation is Principal


Component Analysis (主成分分析). It was invented in 1901.

The No.1 principal learning method (parameter estimation)


is maximum likelihood estimation (MLE,最大似然估计). It was
invented in 1930’s.

Linear regression (线性回归)is invented 200 years ago. It


is still one of most trusted method in machine learning.
3 Examples of Data Mining

• Market basket data


• Handwritten digits recognition
• Cancel detection from gene expression data
Market Basket Data Analysis
Market Basket Data Analysis: find associations
DNA Gene Expressions to detect cancer

Expression Microarray

19
Recognize handwritten digits and letters.
All post office mailed letters are machine scanned to automatically
decide the destination zip-code
What does the word “learning” means?

• Like child learns something using examples


• Used to have student / teacher
• Now: learn a model, learning model parameters.
Why machine learning is part of AI?

• Most machine learning algorithms were invented for solving a


concrete problems. Unrelated with AI.

• But they uses ideas/heuristics that is similar to human


behavior, human reasoning.

• The word supervised learning come from AI.

• AI researcher invented several algorithms that turned out to


be not very good and mostly forgotten.

You might also like