You are on page 1of 30

Machine Learning (IN2064)

Lecture 1 - Introduction

Prof. Dr. Stephan Günnemann

Data Mining and Analytics

kdd.in.tum.de

15.10.2019
Register on Piazza

or at least write down the URL to register later

https://piazza.com/tum.de/fall2019/in2064

Data Mining
Introduction to Machine Learning 1 and Analytics
Self-driving cars and robotics

Data Mining
Introduction to Machine Learning 2 and Analytics
Game playing

Data Mining
Introduction to Machine Learning 3 and Analytics
Natural language processing

Data Mining
Introduction to Machine Learning 4 and Analytics
Web, ads and recommendations

Data Mining
Introduction to Machine Learning 5 and Analytics
Data science

Data Mining
Introduction to Machine Learning 6 and Analytics
Physics, biology and medicine

Data Mining
Introduction to Machine Learning 7 and Analytics
What unites all these technologies?

• Computer vision




• Natural language processing






• Recommender systems






• Computational advertising






• Robotics All are using Machine learning

• Artificial intelligence






• Data science






• Bioninformatics






• Many other fields

Data Mining
Introduction to Machine Learning 8 and Analytics
Hot topic
# of participants in IN2064 1000

500

0
2011 2013 2015 2017 2019

Year

Data Mining
Introduction to Machine Learning 9 and Analytics
What is Machine Learning?

Simple example - classify transactions into legitimate and fraudulent.

Rule-based approaches - rules handcrafted by human experts

Machine learning - learning from data

Figures adapted from https://siftscience.com/sift- edu/prevent-fraud


Data Mining
Introduction to Machine Learning 10 and Analytics
Types of ML problems

Data Mining
Introduction to Machine Learning 11 and Analytics
Supervised learning

• Given training samples Xtrain = {x1 , ..., xN } ⊆ X


with corresponding targets ytrain = {y1 , ..., yN } ⊆ Y

• Find a function f that generalizes this relationship, i.e. f (xi ) ≈ yi .

• Using f , make predictions ŷtest for the test data Xtest .

Training data Test data


Xtrain , ytrain Xtest

Model Predictions
f : X → Y ŷtest = f (Xtest )

Data Mining
Introduction to Machine Learning 12 and Analytics
Supervised learning: Classification
If the targets yi represent categories, the problem is called classification.

Examples
• Handwritten digit recognition
• Transaction classification
(fraud, valid)
• Object classification
(cat, dog, hotdog, ...)
• Cancer detection

Data Mining
Introduction to Machine Learning 13 and Analytics
Supervised learning: Regression
If the targets yi represent continuous numbers, the problem is called
regression.

Examples
• Stock market prediction
• Demand forecasting
• User involvement measurement
• Revenue analysis

Data Mining
Introduction to Machine Learning 14 and Analytics
Unsupervised learning
Unsupervised learning is concerned with finding structure in unlabeled
data.

Typical tasks
• Clustering
- Group similar objects together
• Dimensionality reduction
- Project down high-dimensional data

Data Mining
Introduction to Machine Learning 15 and Analytics
Unsupervised learning
Typical tasks - continued
• Generative modeling
- (Controllably) generate new
”realistic” data
• Topic models
- Discover hidden semantic
structures in text.

Data Mining
Introduction to Machine Learning 16 and Analytics
Other categories
• Reinforcement learning
- Learning by interacting with a dynamic environment. Goal is to
maximize rewards obtained by performing “desirable” actions.

• Semi-supervised learning
- Learning to combine lots of unlabeled data with a few labeled
examples for further prediction tasks.

• Active learning
- Learn while obtaining labels by querying an oracle.

• Learning to learn (meta-learning)


- Learning to construct better models for ML. Operates one level
above the standard ML techniques.

• Learning to rank
- What are the relevant items for a given query?
(e.g. Netflix, web search, ad placement)

• And many more...


Data Mining
Introduction to Machine Learning 17 and Analytics
A little bit of history

Source: https://beamandrew.github.io/deeplearning/2017/02/23/deep_learning_101_part1.html

Data Mining
Introduction to Machine Learning 18 and Analytics
General information
Staff
• Lecturer: Prof. Dr. Stephan Günnemann

• Teaching assistants:
Aleksandar Bojchevski, Johannes Klicpera, Anna Kopetzki, Aleksei
Kuvshinov, Oleksandr Shchur

Details
• 8 ECTS

• Language - English

• Doesn’t count for Wirtschaftsinformatik (Information Systems)


students

• Doesn’t stack with IN2332 in your curriculum

Data Mining
Introduction to Machine Learning 19 and Analytics
Piazza

• Register at https://piazza.com/tum.de/fall2019/in2064

• All announcements and course materials will be uploaded to Piazza.

• You will miss important information if you don’t register.

• You need an @tum.de email address to enroll. Talk to us after the


lecture if you are not a TUM student.

Use Piazza to ask questions - your emails will likely not be answered.

Data Mining
Introduction to Machine Learning 20 and Analytics
Schedule

Class hours
Mon 10:00–11:45 MW 2001 Lecture

Tue 12:15–13:45 MW 0001 Lecture

Wed 16:00–18:00 MI 00.02.001 Tutorial/ homework discussion

Optional
We will hold a Q&A session where you can ask your questions in person.

Wed 12:00–14:00 MI 02.11.018 Q&A session

Data Mining
Introduction to Machine Learning 21 and Analytics
Schedule & Logistics

• Lecture slides and homework assignments for topic of week N are


uploaded on Thursday of week N − 1.

• Homework of week N is due on Sunday of week N at 23:59.

• Homework of week N will be discussed during the Wednesday 16:00


slot of week N + 1.

• Submit Homework via Moodle (see sheet 1 for detailed instructions).

• Homework solutions are published to Piazza.

Data Mining
Introduction to Machine Learning 22 and Analytics
Grading

Exam
• Written final exam in February

• 120 minutes

• One handwritten two-sided A4 sheet with notes

• Bonus of 0.3 if you show sufficient work for ≥ 75% of HW sheets.

• Only grades in the range 1.3 - 4.0 can be improved.

Data Mining
Introduction to Machine Learning 23 and Analytics
Tentative list of topics

1. Introduction, history, basic concepts


2. K-nearest neighbors, decision trees
3. Probabilistic inference
4. Linear regression
5. Linear classification
6. Optimization
7. Support vector machines
8. Kernel methods
9. Deep learning
10. Dimensionality reduction, matrix factorization
11. Mixture models, clustering
12. Advanced topics (fairness, robustness, privacy, ...)

Data Mining
Introduction to Machine Learning 24 and Analytics
Contents

• This is an introductory, theoretical Machine Learning course


- There will be a fair amount of theory and mathematics
- We will focus on fundamental Machine Learning concepts
- We will mostly discuss independent (iid) data

• Next semester, we will cover (even) more advanced topics (IN2323)


- Non-iid data: graphs & temporal data
→ this is the core focus of our research group :-)
- Scalable algorithms
- etc.

Data Mining
Introduction to Machine Learning 25 and Analytics
What this course is not about

• This is not a pure Deep Learning course


- look at IN2346, IN2349 instead

• This is not a course about Big Data (Hadoop, etc.)


- look at IN2326 instead

• This is not an applied Data Science / Business Analytics course


- look at IN2028, IN2339 instead

Data Mining
Introduction to Machine Learning 26 and Analytics
Recommended reading

Our official reading recommendation:

• Christopher M. Bishop, Pattern Recognition and Machine Learning.


Springer, Berlin, New York, 2006 (free, online version available).

but we also like:

• Kevin Murphy, Machine Learning: A probabilistic perspective. MIT


Press, 2012.

Data Mining
Introduction to Machine Learning 27 and Analytics
Linear Algebra and Calculus

Brush up on your linear algebra, calculus and probability theory


knowledge.

Read

• http://cs229.stanford.edu/section/cs229-linalg.pdf
• http://cs229.stanford.edu/summer2019/cs229-prob.pdf
• Bishop [ch. 1.2.0 - 1.2.3, 2.1 - 2.3.0]

Data Mining
Introduction to Machine Learning 28 and Analytics
Today & next week: k-NN and Decision Trees

• Tomorrow (16.10) - Math refresher

• Thursday (17.10) - Homework for k-NN & decision trees are


published

Data Mining
Introduction to Machine Learning 29 and Analytics

You might also like