You are on page 1of 29

1

Artificial Intelligence with


Machine Learning in Java
1-4
Categorizing Data

Copyright © 2020, Oracle and/or its affiliates. All rights reserved.

2
Objectives
• This lesson covers the following objectives:
−Define Supervised Learning
−Define Unsupervised Learning
−Define Classification
−Define Regression
−Define Structured and Unstructured Data

AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 3

3
Machine Learning
• Underneath an “intelligent machine” is an algorithm
designed by humans
• Algorithms have been developed over many years in
the search for true intelligence
• Machines can learn in different ways, but to learn
there must be some kind of input
• This input may be from voice, files, Internet searches,
electronic instruments, or other sources

AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 4

4
Machine Learning
• The algorithms for Machine Learning can be split into
categories
• Two of these categories are:
−Supervised Learning
−Unsupervised Learning

AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 5

5
Supervised Learning
• Supervised Learning occurs when there is information
available, but not enough information to learn until
more data is gathered
• Supervised Learning has a known label (property or
result set) that can feed it data from a training set
(data)
• The data that is being fed to the learning falls into 2
categories
−Independent data
−Dependent data

AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 6

6
Independent and Dependent Data
• Imagine a scenario where someone is applying for a
bank loan
−The criteria used to determine if the loan will be successful is
the independent data
−We use the independent data (predictors) to determine the
dependent data (target/outcome)
−The outcome of the application (approved/not approved) is
the dependent data
−In other words, the dependent data is dependent on the
independent data

AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 7

7
Loan Example
Labels

Age Has Job Owns House Credit Rating Approved?


Young No No Fair No
Young No No Good No
Young Yes No Good Yes
Middle No No Good No
Middle Yes Yes Excellent Yes
Old Yes No Good Yes
Old No No Fair No

Independent Data Dependent


Data
AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 8

8
Supervised Learning
• Another example is an anti spam filter that predicts if a
known input (the email) is spam
−The filter will start with a basic guide, but to improve results,
it will ask you if certain emails are spam
−As you give responses, the algorithm will ask less as it learns
what you wish to label as spam
−Ultimately, you train the algorithm to
improve, but it has known inputs,
like email as in the example above

AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 9

9
Supervised Learning
• In Supervised Learning, we train the algorithm with
training data, and test its accuracy with test data
• This will improve the ability to accurately respond to
future data
• This is called generalization

AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 10

10
Generalization and Overfitting
• It is important that training algorithms have flexibility
• If the model is strict with test data, then the algorithm
may over-fit
• This means that it does not handle test data properly
when presented with new data that is outside of the
strict test data used for training
• The aim is to produce good test data for the models
that are capable of generalizing, and being more
flexible with yet unseen data

AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 11

11
Unsupervised Learning
• Unsupervised Learning is when the structure of the
data set is unknown
• The data is neither classified nor labelled
• In the previous email example, the dataset was known,
but in Unsupervised Learning the data may be of
unknown types and relationships
• In Unsupervised Learning, algorithms draw inferences
from (usually large) datasets without labelled
responses

AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 12

12
Unsupervised Learning
• Most data will have a mixture of labelled and
unlabelled data within the dataset
• Supervised Learning will be the focus of this course

AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 13

13
Data in more Detail
• Data stored in a label falls into one of 2 categories:
−Classification
−Regression

AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 14

14
Classification
• Classification is when independent data is defined as a
class label, and has a definite discrete value
• Example:
−Depending on the actual value of the amount of rain that has
fell (independent data), the weather could be classified as:
• Rainy
• Dry
• Sunny

AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 15

15
Classification
• Remember: the dependent data is dependent on the
independent data
• The prediction (dependent data) is also returned
• Examples:
−Will it rain tomorrow (based on the weather classification)
• Yes/No
−Approve a loan(based on the classification of the applicant)
• Yes/No
−Send an email to the junk folder(based on the classification of
the email content)
• Yes/No

AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 16

16
Regression
• Regression is when ranges of data are stored using real
numbers
• Example:
−Rainfall in mm
• The prediction from this regression range could also be
a range rather than a closed set, like classification
• Examples:
−How much rain will fall tomorrow?
−What is the current average house price?

AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 17

17
Classification and Regression
• This course will focus on Classification, but in many
cases you can change a regression to a classification
• For example, rainfall:
−Classify it in ranges
• Rainfall < 2mm
• Rainfall > 2mm
• This would then allow
use of regression data
in the classifications

AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 18

18
Complete the following
Task: Identify if the following data ranges would be
Classification or Regression
Data Classification Regression
Temperature – 26,24,23,26
Age – 22, 45, 22, 30
Age – Young, Middle, Old
House Value – 100000, 150000, 120000
House Value – Low, Middle, High, Very High
Animal - Cat, Dog, Fish
Eye Color – Blue, Green, Brown

AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 19

19
Complete the following
Classification or Regression solution

Data Classification Regression


Temperature – 26,24,23,26 ✓
Age – 22, 45, 22, 30 ✓
Age – Young, Middle, Old ✓
House Value – 100000, 150000, 120000 ✓
House Value – Low, Middle, High, Very High ✓
Animal - Cat, Dog, Fish ✓
Eye Color – Blue, Green, Brown ✓

AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 20

20
Structured vs Unstructured Data
• Structured Data has a high degree of organization
where each item falls into a particular type
• Example:
−numbers
−dates
• This would normally be displayed in table format, with
columns containing data of the same type, and each
row containing an instance of the data

AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 21

21
Structured vs Unstructured Data
• Unstructured Data is data that does not conform to the
rules of Structured Data
• An example is email
• Email may have some structure like a recipient or
subject but the actual message itself is unstructured
−Text
−Pictures
−Sound

• This course will focus on Structured Data

AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 22

22
Steps at Looking at Data
• To look at Structured Data, there are a number of
steps, such as:
−Gather example inputs and results
−Generate a model from the inputs and results
−Add new inputs (test data) to the model
−Test the results (outputs) from the model and evaluate

AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 23

23
Loan Example Revisited
The table on the following slide shows the information used
and generated for our loan application example

Task:
• Where can you find examples of classification?
• The loan applicants information is stored as structured
data
• Show an example of an instance of an applicants data
• Explain with an example how dependant data is used
here

AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 24

24
Loan Example

Age Has Job Owns House Credit Rating Approved?


Young No No Fair No
Young No No Good No
Young Yes No Good Yes
Middle No No Good No
Middle Yes Yes Excellent Yes
Old Yes No Good Yes
Old No No Fair No

AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 25

25
Loan Example
Task Solution:
−Where can you find examples of classification?
• The labels on the table (Age, has Job etc.) shows classification as they
have a set of values that can be assigned to each
−The loan applicants information is stored as structured data
Show an example of an instance of an applicants data
• Middle, Yes, Yes, Excellent
−Explain with an example how dependant data is used here
• Independent data is used to generate the result for the dependent
data
• Using the independent data from Q3 the dependant data value of yes
is generated

AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 26

26
Loan Example
Labels (Classification)

Age Has Job Owns House Credit Rating Approved?


Young No No Fair No
Young No No Good No
Young Yes No Good Yes
Middle No No Good No
Middle Yes Yes Excellent Yes
Old Yes No Good Yes
Old No No Fair No

Independent Data Dependent


Instance Data

AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 27

27
Summary
• In this lesson, you should have learned how to:
−Define Supervised Learning
−Define Unsupervised Learning
−Define Classification
−Define Regression
−Define Structured and Unstructured Data

AiML 1-4
Categorizing Data Copyright © 2020, Oracle and/or its affiliates. All rights reserved. 28

28
29

You might also like