L05 Machinelearning

Data Science and
Machine Learning
Terminology and Concepts
Marco Brambilla, Emanuele Della Valle - Data Science for Business Innovation
Agenda
• Reality, Abstraction and Modeling
• Artificial Intelligence (AI)
• Machine Learning (ML)
• Supervised, Unsupervised and Reinforced Learning
• Training, Testing, Validation
• Cross-validation
Marco Brambilla, Emanuele Della Valle - Data Science for Business Innovation 2
Abstraction and human mind
• Complexity of reality –> Simplification
• The human mind continuously re-works reality by applying cognitive
processes
• Abstraction: capability of finding the commonality in many different
observations:
• Generalize specific features of real objects (generalization)
• Classify the objects into coherent clusters (classification)
• Aggregate objects into more complex ones (aggregation)
Models
Model: a simplified or partial representation of reality, defined
in order to accomplish a task or to reach an agreement
Model represents Reality
The Role of Models
Data Science
AI MODELS ML
Big Data
Artificial Intelligence
• Many Interpretations
• Artificial General Intelligence (aka. Strong AI or Full AI)
• General intelligent actions
• Discerning problems
• Acting as humans would
• … up to self-consciousness
• Restricted AI (aka. Weak or Applied AI or Narrow AI)
• Implementing intelligence in specific applications or problem solving tasks
• No full cognitive abilities
Machine Learning
“The field of machine learning is concerned with the question of
how to construct computer programs
that automatically improve with experience.”
Tom Mitchell (1997)
• A program that “learns” from experience with respect to some task

• Performance of this learning process is measurable
• Meaning of learning?
• Supervised vs. unsupervised vs. reinforced learning process
Supervised Learning
• Task to be learnt: to extract a description / labeling or pattern from
the data, based on the training
• Training examples labeled by a (human) supervisor
• Use it to predict the output for further examples
• Performance measured as how accurate the output is
• Example applications
• Credit approval
• Medical diagnosis
• Fraud detection
• Text and image labeling or classification
Unsupervised Learning
• Task to be learnt: finding interesting patterns/ groups/ categories in
the data based on evidence
• No pre-labeled data à detection of facts from raw data
• Performance measured as how good / meaningful the groups /
patterns are
• Customer segmentation
• User behavior categorization
• Grouping of items by similarity
Reinforcement Learning
• The machine learns through trial-and-error interactions
• The goal is to maximize the amount of reward received from the
environment
• Iterative process
• Trained through interactions with the environment
• Rewards assigned upon success
• Performance measured as amount of rewards collected
• Robot learning
• Games
Training, Validation, and Test
Split process into: training, validation, and test
Training Validation Testing

Training
Learn the model over the training set
MODEL
Training
Validation
Optimize the predictive capability of the model using the validation set
MODEL
Training Validation
Testing
See how well the model works on the test set (unseen before)
MODEL
Testing
The error gives an unbiased estimate of the predictive power of a model

Split data into three sets: training, validation, and test
Dataset
Training Validation Test

set set set
Split data into three sets: training, validation, and test
Dataset

set set set
70% 15% 15%

Cross-Validation
Repeat the iteration on training + validation multiple times
Dataset
MODEL
set set set
10-fold cross-validation: pick 10 times random subsets as training and

validation, and average the quality of the results
Wrapping up
• Learning is
• An iterative process
• Learns from statistics of past phenomena / known data
• Understand future data
Overfitting
• Learns «too much» from
existing data
• Does not generalize well
on new data
Overfitting
• Learns «too much» from Training Data
existing data Test Data
• Does not generalize well

on new data
Error
Cycles / Data
Bias
• Learning bias
• Algorithmic bias
• Learns past statistical distributions
• Biased data implies biased models
• Risk on people evaluation

L05 Machinelearning

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

L05 Machinelearning

Uploaded by

Copyright:

Available Formats

Data Science and

Terminology and Concepts

Model represents Reality

• A program that “learns” from experience with respect to some task

Training Validation Testing

The error gives an unbiased estimate of the predictive power of a model

Training Validation Test

Training Validation Test

70% 15% 15%

Training Validation Test

10-fold cross-validation: pick 10 times random subsets as training and

• Does not generalize well

• Learns past statistical distributions

• Biased data implies biased models

• Risk on people evaluation

You might also like