You are on page 1of 99

Artificial Intelligence

Midterm Revision
Agenda

1. AI Overview
2. Knowledge Engineering & Expert system
3. Machine Learning & Examples
1- AI Overview
What Is AI?
The birth year of AI is 1956
Artificial Intelligence (AI) is a new technical science that studies and
develops theories, methods, techniques,
and application systems for simulating and extending human intelligence.
In 1956, the concept of AI was first proposed by John McCarthy, who defined
the subject as:
"science and engineering of making intelligent machines, especially intelligent
computer program". Brain
science Cognitive
science
AI is concerned with making machines work in an intelligent way,
Computersimilar to the
science
way that the human mind works. Psychology
AI
Philosophy

Linguistics
Logic
1950

1980

2010
Relationship of AI, Machine Learning and Deep Learning

AI: A new technical science that focuses on the research and


development of theories, methods, techniques, and application
systems for simulating and extending human intelligence.
Machine learning: A core research field of AI. It focuses on the study
of how computers can obtain new knowledge or skills by
simulating or performing learning behavior of human beings, and
reorganize existing knowledge architecture to improve its
performance. It is one of the core research fields of AI.
Deep learning: A new field of machine learning. The concept of deep
learning originates from the research on artificial neural networks.
The multi-layer perceptron (MLP) is a type a deep learning
architecture. Deep learning aims to simulate the human brain to
Three Major Schools of Thoughts(Theories)

Symbolism Connectionism Behaviorism


Three Major Schools of Thought: Symbolism
Basic thoughts:
The cognitive process of human beings is the process of inference and
operation of various symbols.
A human being is a physical symbol system, and so is a computer.
Computers, therefore, can be used to simulate intelligent behavior of
human beings.
The core of AI lies in knowledge representation, knowledge inference,
and knowledge application.
Knowledge and concepts can be represented with symbols.
Cognition is the process of symbol processing while inference refers to
the process of solving problems by using heuristic knowledge and
search.
Representative of symbolism: inference, including symbolic inference
and machine inference
Three Major Schools of Thought: Connectionism
Basic thoughts:
The basis of thinking is neurons rather than the process of symbol
processing.
Human brains vary from computers. A computer working mode based on
connectionism is proposed to replace the computer working mode based
on symbolic operation.

• Representative of connectionism:
neural networks and deep
learning
Three Major Schools of Thought: Behaviorism
Basic thoughts:
Intelligence depends on perception and action.
The perception-action mode of intelligent behavior is proposed.
Intelligence requires no knowledge, representation, or inference.
AI can evolve like human intelligence. Intelligent behavior can only be
demonstrated in the real world through the constant interaction with
the surrounding environment.

Representative of behaviorism:
behavior control, adaptation, and evolutionary computing
Types of AI
Strong AI
The strong AI view holds that it is possible to create intelligent
machines that can really reason and solve problems.
Such machines are considered to be conscious and self-aware, can
independently think about problems and work out optimal solutions
to problems, have their own system of values and world views, and
have all the same instincts as living things, such as survival and
security needs.
It can be regarded as a new civilization in a certain sense.
Weak AI
The weak AI view holds that intelligent machines
cannot really reason and solve problems.
These machines only look intelligent, but do not have real
intelligence or self-awareness, Ex: Siri.
Classification of Intelligent Robots

Currently, there is no unified definition of AI research.


Intelligent robots are generally classified into the following
four types:
"Thinking like human beings": weak AI, such as
Watson and AlphaGo
"Acting like human beings": weak AI, such as humanoid
Robot, iRobot, and Atlas of Boston Dynamics

"Thinking rationally(reasonable)": strong AI (Currently,


no intelligent robots of this type have been created
due to the bottleneck in brain science.)
"Acting rationally(reasonable)": strong AI
AI Industry Ecosystem

The four elements of AI are data, algorithm, computing


power, and scenario.
To meet requirements of these four elements, we need to
combine AI with cloud computing, big data, and IoT to build an
intelligent society.
AI Applications

Intelligent Government: is based on Enterprise Intelligence (EI)


Voice Processing
Natural Language Processing (NLP)
Intelligent Healthcare
Smart Home
Smart Retail: unmanned supermarkets.
Autonomous Driving
The Society of Automotive Engineers (SAE) in the U.S.
defines 6 levels of driving automation ranging from 0 (fully
manual) to 5 (fully autonomous).
Examples
Examples(Cont.)
Examples(Cont.)
Symbolism
What are the Mainstream AI Theories? Connectionism
Behaviorism

A technical science that studies and develops theories, technologies ?


Artificial Intelligence(AI)

State the elements of Artificial Intelligence? Data


Computing Power
Algorithm
Scenario
2- Knowledge Engineering : Rule Based
System
Knowledge

Knowledge is a theoretical or practical understanding of a


subject or a domain.
Knowledge is also the sum of what is currently known, and
apparently knowledge is power.
Static knowledge
Dynamic Knowledge
Domain Expert is anyone has deep knowledge and strong
practical experience in a particular domain.
Knowledge Engineering

Knowledge Engineering(KE) and Knowledge Management (KM)


are closely tied.

Knowledge Engineering deals with the development of


intelligent information systems in which knowledge and
reasoning play pivotal role.

Knowledge Management is a developed filed at the intersection


of computer science and management science deals with
knowledge as a key source.
Knowledge Representation Techniques

Logical Representation
Semantic Network Representation
Frame Representation
Production Rules
Logical Representation

Language with concrete communication rules that deals with


propositions
No ambiguity in representation

Example: All birds fly


∀x bird(x) → fly(x)
Semantic Network Representation
Semantic network is graphical representation that represent semantic relations
between concepts.
Vertices: represent concepts
Edges : represent semantic relation/mapping/connection between concepts
Frame Representation

Knowledge is represented as set of related frames.


Frame representation is based on the inheritance concept.
Frame is set of attributes that defines the object condition.
Every attribute has Name and value.
Frame Representation(cont.)
Production Rules

Production rules (rules) is defined as IF-THEN structure that relates


given information or facts in the IF part to some action in the THEN part.
IF part called the antecedent (premise or condition)
Then part called the consequent (conclusion or action).
A rule provides some description of how to solve a problem.

IF <antecedent>
THEN <consequent>
Production Rules(Cont.)

The antecedent of a rule incorporates two parts: an object


(linguistic object) and its value. The object and its value are linked
by an operator.
The operator identifies the object and assigns the value.
Operators such as is, are, is not, are not are used to assign a
symbolic value to a linguistic object.
Mathematical operators can be used to assign numerical value to
a numerical object.

IF customer is teenager
AND ‘cash withdrawal’ > 1000
THEN ‘signature of the parent’ is required
Production Rules(Cont.)
Relation
IF the ‘fuel tank’ is empty
THEN the car is dead
Recommendation
IF the season is autumn
AND the sky is cloudy
AND the forecast is drizzle
THEN the advice is ‘take an umbrella’
Directive
IF the car is dead
AND the ‘fuel tank’ is empty
THEN the action is ‘refuel the car’
Production Rules(Cont.)

Strategy
IF the car is dead
THEN the action is ‘check the fuel tank’
step1 is complete
IF step1 is complete
AND the ‘fuel tank’ is full
THEN the action is ‘check the battery’
step2 is complete
Heuristic
IF the spill is liquid
AND the ‘spill pH’ < 6
AND the ‘spill smell’ is vinegar
THEN the ‘spill material’ is ‘acetic acid’
Rule-based System

Complex
Machine learning
Manual rules
Rule complexity

algorithms
Simple

Rule-based
Simple problems
algorithms

Small Large

Scale of the problem


Expert System
Expert System

Expert system is computer


program that emulates the
decision making capability of
a human expert.

The expert system


development team contains
five members: the domain
expert, the knowledge
engineer, the programmer,
the project manager and the
end-user.
Expert System Structure

Production system model


Expert System Structure(Cont.)
Basic structure of a rule-based expert system
Inference Chaining
Forward Chaining

Forward chaining is the data-driven reasoning.


The reasoning starts from the known data and proceeds forward
with that data.
Each time only the topmost rule is executed.
When fired, the rule adds a new fact in the database.
Any rule can be executed only once.
The match-fire cycle stops when no further rules can be fired.
Forward Chaining(Cont.)
Backward Chaining

Backward chaining is the goal-driven reasoning.

In backward chaining, an expert system has the goal (a


hypothetical solution) and the inference engine attempts to
find the evidence to prove it.
First, the knowledge base is searched to find rules that might
have the desired solution (goal in their THEN (action) parts).
If such a rule is found and its IF (condition) part matches data in
the database, then the rule is fired and the goal is proved.
However, this is rarely the case.
Backward Chaining(Cont.)

Thus the inference engine puts aside the rule it is working with
(the rule is said to stack) and sets up a new goal, a sub goal,
to prove the IF part of this rule.
Then the knowledge base is searched again for rules that can
prove the sub goal.
The inference engine repeats the process of stacking the rules
until no rules are found in knowledge base to prove the current
sub goal.
Backward Chaining(Cont.)
Conflict Resolution

conflict resolution: method for choosing a rule to fire when


more than one rule can be fired in a given cycle.

Conflict Resolution Methods


Fire the rule with the highest priority
Fire the most specific rule
Fire the rule that uses the data most recently entered in
the database. This method relies on time tags
attached to each fact in the database.
Uncertainty Management in Expert System

Information can be incomplete, inconsistent, uncertain, or all


three
Uncertainty is defined as the lack of the exact knowledge
that would enable us to reach a perfectly reliable conclusion.
Uncertainty Management Techniques
Bayesian Reasoning
Certainty Factors
Examples

Relation
what are the different types of Production-Rule? Directive
Heuristic
Recommendation
Strategy
Examples(Cont.)
Mention the Knowledge Representation Techniques?

Logical Representation
Semantic Network Representation
Frame Representation
Production Rules

The theoretical or practical understanding of a subject or a domain …

Knowledge
Examples(Cont.)
Knowledge representation technique that is defined as IF-THEN structure …

Production Rule

A computer program that emulates the decision making capability of a human expert …

Expert System

A data-driven reasoning technique where the reasoning starts …

Forward chaining
3- Machine Learning Overview
Main Problems Solved by Machine Learning
Classification: A computer program needs to specify which of the k categories some input belongs to.
The output of classification is discrete category values
For example, the image classification algorithm in computer vision.

Regression: For this type of task, a computer program predicts the output for the given input.
An example of this task type is to predict the claim amount of an insured person (to set the insurance
premium) or predict the security price.
the output of regression is continuous numbers.

Clustering: A large amount of data from an unlabeled dataset is divided into multiple categories
according to internal similarity of the data.
Data in the same category is more similar than that in different categories.
This feature can be used in scenarios such as image retrieval and user profile management.
Machine Learning Classification
Supervised learning: For labeled samples , Obtain an optimal model with
required performance through training and learning based on the samples
of known categories.
Unsupervised learning: For unlabeled samples, the learning algorithms
directly model the input datasets. Clustering is a common form of
unsupervised learning. We only need to put highly similar samples
together, calculate the similarity between new samples and existing ones,
and classify them by similarity.
Semi-supervised learning: In one task, a machine learning model that
automatically uses a large amount of unlabeled data to assist learning
directly of a small amount of labeled data.
Reinforcement learning: It is an area of machine learning concerned with
how agents ought to take actions in an environment to maximize some
notion of cumulative reward. The difference between reinforcement
learning and supervised learning is the teacher signal. The reinforcement
Machine Learning Process

Data Feature Model


Data extractio Model
cleansin Model deployme
collectio n and evaluati
g/ training nt and
n selection on
cleaning integration

Feedback and
iteration
Importance of Data Preprocessing

Data is crucial to models. It is the ceiling of model capabilities. Without good data,
there is no good model.

Data
Data Data normalizatio
cleansing
preprocessing n
Fill in missing values, Normalize data to reduce
and detect and noise and improve model
eliminate causes of accuracy.
dataset exceptions.

Data dimension
reduction
Simplify data attributes to avoid
dimension explosion.
Dirty Data

Generally, real data may have some quality problems.


Incompleteness: contains missing values or the data that lacks attributes.
Noise: contains incorrect records or exceptions.
Inconsistency: contains inconsistent(conflict) records.
Dirty Data (cont.)

#Stude
# Id Name Birthday Gender Is Teacher Country City
nts

1 111 John 31/12/1990 M 0 0 Ireland Dublin

2 222 Mery 15/10/1978 F 1 15 Iceland Missing value

3 333 Alice 19/04/2000 F 0 0 Spain Madrid

4 444 Mark 01/11/1997 M 0 0 France Paris

5 555 Alex 15/03/2000 A 1 23 Germany Berlin Invalid value

6 555 Peter 1983-12-01 M 1 10 Italy Rome

7 777 Calvin 05/05/1995 M 0 0 Italy Italy Value that should


be in another
8 888 Roxane 03/08/1948 F 0 0 Portugal Lisbon
column
Switzerla Genev
9 999 Anne 05/09/1992 F 0 5
Invalid duplicate item nd a
1010
10 Paul 14/11/1992 M 1 26 Ytali Rome Misspelling
10

Incorrect format Attribute dependency


Feature Selection
Generally, a dataset has many features, some of which may be redundant or
irrelevant to the value to be predicted.
Feature selection is necessary in the following aspects:

Simplify models to make them Reduce the


easy for users to interpret training time

Avoid
Improve model generalization and
dimension avoid overfitting
explosion
Feature Selection Methods - Filter
Filter methods are independent of the model during feature selection.

By evaluating the correlation between


each feature and the target attribute,
these methods use a statistical measure
to assign a value to each feature.
Features are then sorted by score, which
is helpful for preserving or eliminating
Traverse all Select the Train Evaluate the specific features.
features optimal feature models performance
subset
Common methods
Procedure of a filter method • Pearson correlation coefficient
• Chi-square coefficient
• Mutual information

Limitations
• The filter method tends to select
redundant variables as the relationship
Feature Selection Methods - Wrapper
Wrapper methods use a prediction model to score feature subsets.

Wrapper methods consider feature


selection as a search issue for which
different combinations are evaluated and
Select the optimal compared. A predictive model is used to
feature subset evaluate a combination of features and
assign a score based on model
Generate
Traverse all
features
a feature
Train
models
Evaluate accuracy.
subset models

Common methods
Procedure of a
wrapper method
• Recursive feature elimination (RFE)

Limitations
• Wrapper methods train a new model for each
subset, resulting in a huge number of
computations.
• A feature set with the best performance is usually
provided for a specific type of model.
Feature Selection Methods - Embedded
Embedded methods consider feature selection as a part of model
construction.
The most common type of embedded
feature selection method is the
regularization method.
Select the optimal feature subset
Regularization methods are also called
penalization methods that introduce
Traverse all Generate a Train models
features feature subset + Evaluate the
effect
additional constraints into the
optimization of a predictive algorithm that
Procedure of an embedded method
bias the model toward lower complexity
and reduce the number of features.

Common methods
Model Validity
Generalization capability: The goal of machine learning is that the
model obtained after learning should perform well on new samples,
not just on samples used for training. The capability of applying a
model to new samples is called generalization or robustness.

Error: difference between the sample result predicted by the model


obtained after learning and the actual sample result.
Training error: error that you get when you run the model on the
training data.
Generalization error: error that you get when you run the model
on new samples.

Underfitting: occurs when the model or the algorithm does not fit
the data well enough.
Model Validity(Cont.)

Model capacity: model's capability of fitting functions, which is also


called model complexity.
When the capacity suits the task complexity and the amount of
training data provided, the algorithm effect is usually optimal.

Models with insufficient capacity cannot solve complex tasks and


underfitting may occur.
A high-capacity model can solve complex tasks, but overfitting may
occur if the capacity is higher than that required by a task.
Model Complexity and Error

As the model complexity increases, the training error


decreases.
As the model complexity increases, the test error decreases to
a certain point and then increases in the reverse direction,
forming a convex curve.
High bias & low Low bias &
variance high variance

Testing error
Erro
r

Training error

Model Complexity
Machine Learning Performance Evaluation - Regression
chine Learning Performance Evaluation - Classification Confusion matrix
Estimated
amount
Total
Actual
amount
yes
no
Total
chine Learning Performance Evaluation - Classification

Estimated
amount
Total
Actual
amount
yes
no
Total
Confusion matrix
Machine Learning Performance Evaluation - Classification
(Given)
Measurement Ratio

Accuracy and recognition rate

Error rate and misclassification rate

Sensitivity, true positive rate, and recall

Specificity and true negative rate

Precision
Parameters and Hyperparameters in Models

Parameters are automatically learned by models.


Hyperparameters are manually set.
Hyperparameter Search Procedure

Grid search
Random search
Heuristic intelligent search
Bayesian search
Linear Regression and Overfitting Prevention
Logistic Regression Extension - Softmax Function

Logistic regression applies only to binary classification


problems.
For multi-class classification problems, use the Softmax
function.
Binary classification Multi-class classification
problem problem
Grape?

Male? Orange?

Apple?

Female? Banana?
Decision Tree Structure

Root
Node

Internal Internal
Node Node

Internal
Leaf Leaf Leaf
Node
Node Node Node

Leaf Leaf Leaf


Node Node Node
Key Points of Decision Tree Construction
The metrics to quantify the 'purity'
Information Entropy (Non-binary Tree)
Entropy is the measure of disorder of feature

GINI Index (Binary Tree):

Gain: The feature with the highest Information Gain is selected as


the best one.
Decision Tree Construction Process

Feature selection: Select a feature from the features of the


training data as the split standard of the current node.
(Different standards generate different decision tree
algorithms.)

Decision tree generation: Generate internal node upside


down based on the selected features and stop until the
dataset can no longer be split.

Pruning: The decision tree may easily become overfitting


unless necessary pruning (including pre-pruning and
post-pruning) is performed to reduce the tree size and
optimize its node structure.
Examples
Examples(Cont.)
Examples(Cont.)
Examples(Cont.)

Mention the different types of Learning Algorithms


Supervised Learning
Unsupervised Learning
Reinforcement Learning
Semi-supervised learning

State the Hyperparameter Search Procedures


Grid search
Random search
Heuristic intelligent search
Bayesian search
Examples(Cont.)
What is the correct order of the following tasks of Machine Learning Process
data collection
Data Cleansing
Feature Execration
Feature Selection
Model Training
Model Evaluation
Model deployment & integration

It is an area of machine learning concerned with how agents ought to take actions in an
environment to maximize some notion of cumulative reward …
Reinforcement learning
Examples(Cont.)
A science that studies how computers simulate or implement human learning
behavior to acquire new knowledge …
Machine Learning

a machine learning model to obtain an optimal model with required


performance through training and learning based on the samples of known
categories …
Supervised learning

a machine learning model that automatically uses a large amount of unlabeled


data to assist learning directly of a small amount of labelled data …

Semi-supervised learning
chine Learning Performance Evaluation - Classification

Estimated
amount
Total
Actual
amount
yes
no
Total
Confusion matrix
Example of Machine Learning Performance Evaluation

Estimated
amount
Total
Actual
amount
140 30 170

20 10 30

Total 160 40 200


Examples(Cont.)
Examples(Cont.)
Linear Regression
Linear Regression (cont.)
Initial value: a=1, b=1
x = [3,21,22,34,54] x y Y Prediction
y = [1,10,14,34,44] 3 1 4
21 10 22
22 14 23
Y^ = ax+b
34 34 35
Y prediction = x + 1 54 44 55
Linear Regression (cont.)

x y Y Prediction
3 1 4 -3 9
21 10 22 -12 144
22 14 23 -9 81
34 34 35 -1 1
54 44 55 -11 121

Loss Function = 356 / 2 = 178


DECISION TREE
Supervised learning
Entropy(S) = ∑ – p(I) . log2p(I) (Given)
ID3 Algorithm Gain(S, A) = Entropy(S) – ∑ [ p(S|A) . Entropy(S|A) ]

Overall Entropy
Decision column consists of 14 instances and includes two labels: yes and no.
There are 9 decisions labeled yes, and 5 decisions labeled no.
Entropy(Decision) = – p(Yes) . log2p(Yes) – p(No) . log2p(No)
Entropy(Decision) = – (9/14) . log2(9/14) – (5/14) . log2(5/14) = 0.940
Gain(S, A) = Entropy(S) – ∑ [ p(S|A) . Entropy(S|A) ]
ID3 Algorithm (cont.)

Wind factor on decision


Gain(Decision, Wind) = Entropy(Decision) – ∑ [ p(Decision|Wind) . Entropy(Decision|Wind) ]

Wind attribute has two labels: weak and strong.

Gain(Decision, Wind) = Entropy(Decision) – [ p(Decision|Wind=Weak) .


Entropy(Decision|Wind=Weak) ] – [ p(Decision|Wind=Strong) .
Entropy(Decision|Wind=Strong) ]
ID3 Algorithm (cont.)

There are 8 instances for weak wind.


Decision of 2 items are no, whereas 6 items are yes as illustrated below.
Entropy(Decision|Wind=Weak) = – p(No) . log2p(No) – p(Yes) . log2p(Yes)
Entropy(Decision|Wind=Weak) = – (2/8) . log2(2/8) – (6/8) . log2(6/8) = 0.811

There are 6 instances for strong wind.


Decision is divided into two equal parts.
Entropy(Decision|Wind=Strong) = – p(No) . log2p(No) – p(Yes) . log2p(Yes)
Entropy(Decision|Wind=Strong) = – (3/6) . log2(3/6) – (3/6) . log2(3/6) = 1
ID3 Algorithm (cont.)

Gain(Decision, Wind) = Entropy(Decision) – [ p(Decision|Wind=Weak) .


Entropy(Decision|Wind=Weak) ] – [ p(Decision|Wind=Strong) .
Entropy(Decision|Wind=Strong) ]

Gain(Decision, Wind) = 0.940 – [ (8/14) . 0.811 ] – [ (6/14). 1] = 0.048


ID3 Algorithm (cont.)

Gain(Decision, Wind) = 0.048


Gain(Decision, Outlook) = 0.246
Gain(Decision, Humidity) = 0.151
ID3 Algorithm (cont.)
ID3 Algorithm (cont.)
Example 2: Data Set For Heart Disease Risk

calculate information gain of Gender ?


Gain(S, A) = Entropy(S) – ∑ [ p(S|A) . Entropy(S|A) ]

o The following formula can be used to calculate the total entropy,


o E=

o So, in this example, the total entropy would be:


E = -(5/14)log2(5/14) - (9/14)log2(9/14) =
0.94
o Next, we take each column and calculate the Gain.
Starting with Gender, we look at each of the columns with Male and Female
and calculate entropy of each “Yes” and “No”
for Gender:Female (6/14) and Gender:Male (8/14)
and subtract from the total entropy that we calculated previously.

Gain(S, Gender) = Total Entropy – ( 6/14 x (Entropy Female)) – (8/14 x (Entropy Male) )

=
= 0.048

You might also like