Professional Documents
Culture Documents
2
Objectives
• This lesson covers the following objectives:
−Define Supervised Learning
−Define Unsupervised Learning
−Define Classification
−Define Regression
−Define Structured and Unstructured Data
AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 3
3
Machine Learning
• Underneath an “intelligent machine” is an algorithm
designed by humans
• Algorithms have been developed over many years in
the search for true intelligence
• Machines can learn in different ways, but to learn
there must be some kind of input
• This input may be from voice, files, Internet searches,
electronic instruments, or other sources
AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 4
4
Machine Learning
• The algorithms for Machine Learning can be split into
categories
• Two of these categories are:
−Supervised Learning
−Unsupervised Learning
AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 5
5
Supervised Learning
• Supervised Learning occurs when there is information
available, but not enough information to learn until
more data is gathered
• Supervised Learning has a known label (property or
result set) that can feed it data from a training set
(data)
• The data that is being fed to the learning falls into 2
categories
−Independent data
−Dependent data
AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 6
6
Independent and Dependent Data
• Imagine a scenario where someone is applying for a
bank loan
−The criteria used to determine if the loan will be successful is
the independent data
−We use the independent data (predictors) to determine the
dependent data (target/outcome)
−The outcome of the application (approved/not approved) is
the dependent data
−In other words, the dependent data is dependent on the
independent data
AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 7
7
Loan Example
Labels
8
Supervised Learning
• Another example is an anti spam filter that predicts if a
known input (the email) is spam
−The filter will start with a basic guide, but to improve results,
it will ask you if certain emails are spam
−As you give responses, the algorithm will ask less as it learns
what you wish to label as spam
−Ultimately, you train the algorithm to
improve, but it has known inputs,
like email as in the example above
AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 9
9
Supervised Learning
• In Supervised Learning, we train the algorithm with
training data, and test its accuracy with test data
• This will improve the ability to accurately respond to
future data
• This is called generalization
AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 10
10
Generalization and Overfitting
• It is important that training algorithms have flexibility
• If the model is strict with test data, then the algorithm
may over-fit
• This means that it does not handle test data properly
when presented with new data that is outside of the
strict test data used for training
• The aim is to produce good test data for the models
that are capable of generalizing, and being more
flexible with yet unseen data
AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 11
11
Unsupervised Learning
• Unsupervised Learning is when the structure of the
data set is unknown
• The data is neither classified nor labelled
• In the previous email example, the dataset was known,
but in Unsupervised Learning the data may be of
unknown types and relationships
• In Unsupervised Learning, algorithms draw inferences
from (usually large) datasets without labelled
responses
AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 12
12
Unsupervised Learning
• Most data will have a mixture of labelled and
unlabelled data within the dataset
• Supervised Learning will be the focus of this course
AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 13
13
Data in more Detail
• Data stored in a label falls into one of 2 categories:
−Classification
−Regression
AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 14
14
Classification
• Classification is when independent data is defined as a
class label, and has a definite discrete value
• Example:
−Depending on the actual value of the amount of rain that has
fell (independent data), the weather could be classified as:
• Rainy
• Dry
• Sunny
AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 15
15
Classification
• Remember: the dependent data is dependent on the
independent data
• The prediction (dependent data) is also returned
• Examples:
−Will it rain tomorrow (based on the weather classification)
• Yes/No
−Approve a loan(based on the classification of the applicant)
• Yes/No
−Send an email to the junk folder(based on the classification of
the email content)
• Yes/No
AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 16
16
Regression
• Regression is when ranges of data are stored using real
numbers
• Example:
−Rainfall in mm
• The prediction from this regression range could also be
a range rather than a closed set, like classification
• Examples:
−How much rain will fall tomorrow?
−What is the current average house price?
AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 17
17
Classification and Regression
• This course will focus on Classification, but in many
cases you can change a regression to a classification
• For example, rainfall:
−Classify it in ranges
• Rainfall < 2mm
• Rainfall > 2mm
• This would then allow
use of regression data
in the classifications
AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 18
18
Complete the following
Task: Identify if the following data ranges would be
Classification or Regression
Data Classification Regression
Temperature – 26,24,23,26
Age – 22, 45, 22, 30
Age – Young, Middle, Old
House Value – 100000, 150000, 120000
House Value – Low, Middle, High, Very High
Animal - Cat, Dog, Fish
Eye Color – Blue, Green, Brown
AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 19
19
Complete the following
Classification or Regression solution
AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 20
20
Structured vs Unstructured Data
• Structured Data has a high degree of organization
where each item falls into a particular type
• Example:
−numbers
−dates
• This would normally be displayed in table format, with
columns containing data of the same type, and each
row containing an instance of the data
AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 21
21
Structured vs Unstructured Data
• Unstructured Data is data that does not conform to the
rules of Structured Data
• An example is email
• Email may have some structure like a recipient or
subject but the actual message itself is unstructured
−Text
−Pictures
−Sound
AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 22
22
Steps at Looking at Data
• To look at Structured Data, there are a number of
steps, such as:
−Gather example inputs and results
−Generate a model from the inputs and results
−Add new inputs (test data) to the model
−Test the results (outputs) from the model and evaluate
AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 23
23
Loan Example Revisited
The table on the following slide shows the information used
and generated for our loan application example
Task:
• Where can you find examples of classification?
• The loan applicants information is stored as structured
data
• Show an example of an instance of an applicants data
• Explain with an example how dependant data is used
here
AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 24
24
Loan Example
AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 25
25
Loan Example
Task Solution:
−Where can you find examples of classification?
• The labels on the table (Age, has Job etc.) shows classification as they
have a set of values that can be assigned to each
−The loan applicants information is stored as structured data
Show an example of an instance of an applicants data
• Middle, Yes, Yes, Excellent
−Explain with an example how dependant data is used here
• Independent data is used to generate the result for the dependent
data
• Using the independent data from Q3 the dependant data value of yes
is generated
AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 26
26
Loan Example
Labels (Classification)
AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 27
27
Summary
• In this lesson, you should have learned how to:
−Define Supervised Learning
−Define Unsupervised Learning
−Define Classification
−Define Regression
−Define Structured and Unstructured Data
AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 28
28
29