## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Ibrahim Sabek

Computer and Systems Engineering Department, Faculty of Engineering, Alexandria University, Egypt

1 / 33

Agenda

1 Machine learning overview and applications 2 Supervised vs. Unsupervised learning 3 Generative vs. Discriminative models 4 Overview of Classiﬁcation 5 The big picture 6 Bayesian inference 7 Summary 8 Feedback

2 / 33

Machine learning overview and applications

**What is Machine Learning (ML)?
**

Deﬁnition: algorithms for inferring unknowns from knowns.

What do you mean by inferring ?? How to get unknowns from knowns??

3 / 33

Machine learning overview and applications

**What is Machine Learning (ML)?
**

Deﬁnition: algorithms for inferring unknowns from knowns.

What do you mean by inferring ?? How to get unknowns from knowns??

ML applications

Spam detection Handwriting detection Speech recognition Netﬂix recommendation system

4 / 33

Machine learning overview and applications

**What is Machine Learning (ML)?
**

Deﬁnition: algorithms for inferring unknowns from knowns.

What do you mean by inferring ?? How to get unknowns from knowns??

ML applications

Spam detection Handwriting detection Speech recognition Netﬂix recommendation system

Classes of ML models

Supervised vs. Unsupervised. Generative vs. Discriminative

5 / 33

Supervised vs. Unsupervised learning

Supervised vs. Unsupervised

Supervised: Given (x 1 , y 1 ), (x 2 , y 2 ), ......, (x n , y n ), choose a function f (x i ) = y i

x i ∈ R 2 , x i = data points y i = class/value

6 / 33

Supervised vs. Unsupervised learning

Supervised vs. Unsupervised

Supervised: Given (x 1 , y 1 ), (x 2 , y 2 ), ......, (x n , y n ), choose a function f (x i ) = y i

x i ∈ R 2 , x i = data points y i = class/value Classiﬁcation: y i ∈ {ﬁnite set } Regression: y i ∈ R

7 / 33

Supervised vs. Unsupervised learning

Supervised vs. Unsupervised

Supervised: Given (x 1 , y 1 ), (x 2 , y 2 ), ......, (x n , y n ), choose a function f (x i ) = y i

x i ∈ R 2 , x i = data points y i = class/value Classiﬁcation: y i ∈ {ﬁnite set } Regression: y i ∈ R

8 / 33

Supervised vs. Unsupervised learning

Supervised vs. Unsupervised

Supervised: Given (x 1 , y 1 ), (x 2 , y 2 ), ......, (x n , y n ), choose a function f (x i ) = y i

x i ∈ R 2 , x i = data points y i = class/value Classiﬁcation: y i ∈ {ﬁnite set } Regression: y i ∈ R

9 / 33

Supervised vs. Unsupervised learning

Supervised vs. Unsupervised

Unsupervised: Given (x 1 , x 2 , ..., x n ), ﬁnd patterns in the data.

x i ∈ R 2 , x i = data points

10 / 33

Supervised vs. Unsupervised learning

Supervised vs. Unsupervised

Unsupervised: Given (x 1 , x 2 , ..., x n ), ﬁnd patterns in the data.

x i ∈ R 2 , x i = data points Clustering Density estimation Dimensional reduction

11 / 33

Supervised vs. Unsupervised learning

Supervised vs. Unsupervised

Unsupervised: Given (x 1 , x 2 , ..., x n ), ﬁnd patterns in the data.

x i ∈ R 2 , x i = data points Clustering Density estimation Dimensional reduction

12 / 33

Supervised vs. Unsupervised learning

Supervised vs. Unsupervised

Unsupervised: Given (x 1 , x 2 , ..., x n ), ﬁnd patterns in the data.

x i ∈ R 2 , x i = data points Clustering Density estimation Dimensional reduction

13 / 33

Supervised vs. Unsupervised learning

Unsupervised: Given (x 1 , x 2 , ..., x n ), ﬁnd patterns in the data.

x i ∈ R 2 , x i = data points Clustering Density estimation Dimensional reduction

14 / 33

Supervised vs. Unsupervised learning

**Variations on Supervised and Unsupervised
**

Semi-supervised: Given (x 1 , y 1 ), (x 2 , y 2 ), ......, (x k , y k ), x k +1 , x k +2 , ..., x n , predict y k +1 , y k +2 , ..., y n

15 / 33

Supervised vs. Unsupervised learning

**Variations on Supervised and Unsupervised
**

Semi-supervised: Given (x 1 , y 1 ), (x 2 , y 2 ), ......, (x k , y k ), x k +1 , x k +2 , ..., x n , predict y k +1 , y k +2 , ..., y n Active learning:

16 / 33

Supervised vs. Unsupervised learning

**Variations on Supervised and Unsupervised
**

Decision theory: measure the prediction performance of unlabeled data

17 / 33

Supervised vs. Unsupervised learning

**Variations on Supervised and Unsupervised
**

Decision theory: measure the prediction performance of unlabeled data Reinforcement learning:

maximize rewards (minimize losses) by actions maximize overall lifetime reward

18 / 33

Generative vs. Discriminative models

**Generative vs. Discriminative models
**

Given (x 1 , y 1 ), (x 2 , y 2 ), ......, (x n , y n ), and a new point (x , y )

19 / 33

Generative vs. Discriminative models

**Generative vs. Discriminative models
**

Given (x 1 , y 1 ), (x 2 , y 2 ), ......, (x n , y n ), and a new point (x , y ) Discriminative:

you want to estimate p (y = 1|x ), p (y = 0|x ) for y ∈ {0, 1}

20 / 33

Generative vs. Discriminative models

**Generative vs. Discriminative models
**

Given (x 1 , y 1 ), (x 2 , y 2 ), ......, (x n , y n ), and a new point (x , y ) Discriminative:

you want to estimate p (y = 1|x ), p (y = 0|x ) for y ∈ {0, 1}

Generative:

you want to estimate the joint distribution p (x , y )

21 / 33

Overview of Classiﬁcation

**k-Nearest Neighbor classiﬁcation (kNN)
**

Given D = {(x 1 , y 1 ), (x 2 , y 2 ), ......, (x n , y n )}, and a new point (x , y ) where x i ∈ R , y i ∈ {0, 1}

**Dissimilarity metric: d (x , x ) = ||x − x ||2 for k = 1 Probabilistic interpretation:
**

Given ﬁxed k , p (y ) = fraction of pts x i in Nk (x ) s.t. y i = y y ˆ = argmaxy p (y |x , D )

22 / 33

Overview of Classiﬁcation

**Classiﬁcation trees (CART)
**

Given D = {(x 1 , y 1 ), (x 2 , y 2 ), ......, (x n , y n )}, and a new x where x i ∈ R , y i ∈ {0, 1} You build a binary tree Minimize error in each leaf

23 / 33

Overview of Classiﬁcation

**Regression tress (CART)
**

Given D = {(x 1 , y 1 ), (x 2 , y 2 ), ......, (x n , y n )}, and a new x where xi ∈ R, yi ∈ R

24 / 33

Overview of Classiﬁcation

**Bootstrap aggregation (Bagging)
**

Given D = {(x 1 , y 1 ), (x 2 , y 2 ), ......, (x n , y n )} follows P iid , and a new x where x i ∈ R , y i ∈ R , we need to ﬁnd its y value

25 / 33

Overview of Classiﬁcation

**Bootstrap aggregation (Bagging)
**

Given D = {(x 1 , y 1 ), (x 2 , y 2 ), ......, (x n , y n )} follows P iid , and a new x where x i ∈ R , y i ∈ R , we need to ﬁnd its y value Intuition: averaging makes your prediction close to the true label i i Diﬀerent training datasets , (xk , yk ) follows uniform (D ) iid . The ﬁnal label y is the average of generated labels from the diﬀerent datasets.

26 / 33

Overview of Classiﬁcation

Random forests

Given D = {(x 1 , y 1 ), (x 2 , y 2 ), ......, (x n , y n )} where x i ∈ R , y i ∈ R For i = 1, ..., B

Choose bootstrap sample Di from D Construct tree Ti using Di s.t. at each node choose random subset of features and only consider splitting on these features.

Given x , take majority vote (for classiﬁcation) or average (for regression).

27 / 33

The big picture

**The big picture
**

Given the expected loss function EL(y , f (x )) and D = {(x 1 , y 1 ), (x 2 , y 2 ), ......, (x n , y n )} where x i ∈ R , y i ∈ R , we want to estimate p (y |x ) Discriminative: Estimate p (y |x ) directly using D .

KNN, Trees, SVM

**Generative: Estimate p (x , y ) directly using D . and then
**

p (y |x ) =

p (x ,y ) p (x ) ,

also we have p (x , y ) = p (x |y )p (y )

**Params/Latent variables θ: by including parameters, we have p (x , y |θ)
**

for discrete space: p (y |x , D ) = p (y |x , D , θ)p (θ|x , D )

p (y |x , D , θ) is nice p (θ|x , D ) is nasty (called posterior dist. on θ) summation (or integration in case of continuous space) is nasty and often intractable

28 / 33

The big picture

**The big picture
**

p (y |x , D ) = p (y |x , D , θ)p (θ|x , D ) Exact inference:

Multi-variate Gaussian. Graphical models

Point estimate of θ

Maximum Likelihood Estimation (MLE) Maximum A Prior (MAP) θEst . = argmaxθ p (θ|x , D )

Deterministic Approximation

Laplace Approx. Variational methods

Stochastic Approximation

Importance sampling Gibbs sampling

29 / 33

Bayesian inference

Bayesian inference

”Put distributions on everything and then use rules of probability to infer values” Aspects of Bayesian inference

Priors: Assuming a prior distribution p (θ) Procedures: Minimizing expected loss (averaging over θ) Pros.:

Directly answer questions. Avoid overﬁtting

Cons.:

Must assume prior. Exact computation can be intractable

30 / 33

Bayesian inference

**Directed graphical models
**

”Bayesian networks” or ”Conditional independ. diagram”:

Why? Tractable inference.

Factorization of the probabilistic model. Notational device Visualization for inference algorithms Example for thinking graphically p (a, b , c ):

p (a, b , c ) = p (c |a, b )p (a, b ) = p (c |a, b )p (b |a)p (a)

31 / 33

Summary

Summary

Machine learning is an essential ﬁeld for our life. Machine learning is a broad world, we just started it in this session :D :D.

32 / 33

Feedback

Feedback

Your feedback is welcomed on alex.acm.org/feedback/machine/

33 / 33

- Alexandria ACM SC | Challenges in Computer Architecture
- Machine Learning in Action
- Alexandria ACM SC | Quantum Computing Lecture.
- Alexandria ACM SC | TECHnew Magazine | April Special Edition
- Alexandria ACM SC | Introduction to Genetic Algorithms
- Alexnadria ACM SC | TECHnews Magazine IV
- Alex ACM SC Machine Learning Day [Materials] | Machine Learning& Artificial Intelligence ... Our Ultimate Dream. By Eng. Mohamad Abdelaziz Gowayed
- Alexandria ACM Student Chapter | Systems Analysis
- Alexandria ACM SC | Web Engineering Series | Episode #1
- Webalo Alex Internship Opportunity
- Ten Steps to a Results-Based Monitoring and Evaluation System

- Moving Car Detection using HOG features
- Named Entity Recognition for English Tweets using Random Kitchen Sink Algorithm
- Cluster Analysis Techniques in Data Mining
- Filter unwanted messages from walls and blocking nonlegitimate user in osn
- Instagram Scam Whitepaper
- Feature Selection Techniques
- tmpA940.tmp
- UT Dallas Syllabus for cgs4315.501.08s taught by Richard Golden (golden)
- UT Dallas Syllabus for cs6375.001.09s taught by Yang Liu (yxl053200)
- Content based mechanism for social network using MAchine learning
- A Review on Knowledge Discovery by Pattern Mining
- Reducing Complexity of Business Forecasting Using Big Data Predictive Analysis
- UT Dallas Syllabus for cs4365.001.08s taught by Yu Chung Ng (ycn041000)
- tmp8086.tmp
- A Mining Cluster Based Temporal Mobile Sequential Patterns in Location Based Service Environments using CTMSP Mining
- Comparative Study of J48, ADTree, REPTree and BFTree Data Mining Algorithms through Colon Tumour Dataset
- Comparison of Supervised Machine Learning Algorithms for Spam E-Mail Filtering
- UT Dallas Syllabus for cs6375.001.10s taught by Yang Liu (yxl053200)
- UT Dallas Syllabus for cs6375.001.09f taught by Haim Schweitzer (haim)
- A Survey on Various Feature Selection Methodologies
- UT Dallas Syllabus for hcs6347.501.11f taught by Richard Golden (golden)
- An Improvement of Web browser
- UT Dallas Syllabus for cs4375.501.07f taught by Yu Chung Ng (ycn041000)
- tmp8828.tmp
- UT Dallas Syllabus for cs4315.501.10s taught by Richard Golden (golden)
- UT Dallas Syllabus for cs4375.501 06f taught by Yu Chung Ng (ycn041000)
- Heart Disease Prediction System Using Artificial Neural Network and Naive Bayes
- Automatic Classification of Medical Data using Machine Learning
- tmpBC71
- Data Partitioning and Machine Learning Techniques for Lake Level Forecasting

- Numeral Systems and Data Structures
- Alexandria ACM Student Chapter | Research in Egypt | Changing Culture & Building Hope
- Alexandria ACM SC | Introduction to Genetic Algorithms
- Alexandria ACM SC | Introduction to Natural Language Processing
- Alexnadria ACM SC | TECHnews Magazine IV
- Alexandria ACM SC | Rigorous Computational Simulation of Dynamical Systems
- Alexandria ACM Student Chapter | Systems Analysis
- Alexandria ACM Mentorship Program | Guidelines for Successful Mentorship
- Alexandria ACM SC | Web Engineering Series | Episode #1
- Alexandria ACM SC | Graduation Projects
- Alexandria ACM SC | W8 Phone Applications Training
- Alexandria ACM SC | TECHnews (III)
- Alexandria ACM SC | TechNews | April
- Cloud Computing ... The Art of Storage
- Alex ACM SC Machine Learning Day [Materials] | Machine Learning& Artificial Intelligence ... Our Ultimate Dream. By Eng. Mohamad Abdelaziz Gowayed
- Alexandria ACM SC | Cloud Computing Trainign
- Alexandria ACM SC | Android Applications Training Program
- Webalo Alex Internship Opportunity

Sign up to vote on this title

UsefulNot usefulClose Dialog## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Close Dialog## This title now requires a credit

Use one of your book credits to continue reading from where you left off, or restart the preview.

Loading