You are on page 1of 15

# Overview of Machine

Learning
Tusty Nadia Maghfira
Purpose of Machine Learning

▰ As time goes by, the level of human needs increases and led to the increase amount of
data stored on computers. Many computers used in public services such as shops, banks,
hospitals, laboratories, etc. capture and store terabytes of data everyday. We can organize
and categorize the data so we can obtain important information from the data. But it will
be difficult and take so much time for us to organize plentiful stored data. Instead, we can
use some automatic calculation method in machine learning to help us in maintaining
data.
1. Make analysis and prediction for the future
2. Develop efficient and robust algorithms for maintaining massive amount and very large
dimensional data that can’t be done efficiently by using statistical method

2
1
Artificial Neural Network
Let’s start with the first set of slides

3
Basic Concept of Artificial Neural Network

▰ Neural network works like how neurons in our brain system work
4
Basic Concept of Artificial Neural Network

## ▰ Input features can be transmitted at current neuron by

support of weights
▰ The strength of the signal can be obtained by:
▰ ℎ = σ𝑚𝑖=1 𝑤𝑖 𝑥𝑖
▰ In order to know whether it will fires or not, we map the
result in activation function, for example:
1 𝑖𝑓 ℎ > 𝜃
▰ 𝑜=𝑔 ℎ =ቊ
0 𝑖𝑓 ℎ ≤ 𝜃

5
Activation Function

▰ Step Function
▰ Linear Function
▰ Sigmoid Function
▰ TanH
▰ ReLU

6
Perceptron

## ▰ Consist of input layer and output layer

▰ Binary linear classifier
▰ Using step function
▰ Rosenblatt’s perceptron rule:
▰ 1. initialize weight  0 or small random number
▰ 2. ℎ = σ𝑚 𝑖=1 𝑤𝑖 𝑥𝑖 + b
▰ 𝑤𝑖 = 𝑤𝑖 + ∆𝑤𝑖
▰ ∆𝑤𝑖 = η 𝑡 − 𝑜 𝑥𝑖
1 𝑖𝑓 ℎ > 𝜃
▰ 𝑜=ቊ
−1 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 7
Multi Layer Perceptron – Forward Propagation

8
Multi Layer Perceptron – Error Backpropagation

▰ 𝐸𝑤 = σ𝑁
𝑛=1 𝐸𝑛 (𝑤) ▰ After we get the result,
1
▰ 𝑒𝑛 = (𝑦𝑛 − 𝑡𝑛 )2 update weight and bias
2 using stochastic gradient
▰ For example we want to descent update:
know the partial derivative
▰ 𝑤𝑘𝑗 = 𝑤𝑘𝑗 − ∆𝑤𝑘𝑗
of total error with respect
𝜕𝐸𝑛
to weight of hidden layer: ▰ ∆𝑤𝑘𝑗 = 𝜂
𝜕𝑤𝑘𝑗
𝜕𝐸𝑛 𝜕𝐸𝑛 𝜕𝑎𝑘
▰ =
𝜕𝑤𝑘𝑗 𝜕𝑎𝑘 𝜕𝑤𝑘𝑗
𝜕𝐸𝑛 𝜕𝐸𝑛 𝜕𝑒𝑘 𝜕𝑦𝑘 𝜕𝑎𝑘
▰ =
𝜕𝑤𝑘𝑗 𝜕𝑒𝑘 𝜕𝑦𝑘 𝜕𝑎𝑘 𝜕𝑤𝑘𝑗
9
2
Restricted Boltzmann Machines

10
Restricted Boltzmann Machine

▰ 𝐸 𝑣, ℎ = − σ𝑖 𝑎𝑖 𝑣𝑖 − σ𝑗 𝑏𝑗 ℎ𝑗 − σ𝑖,𝑗 𝑣𝑖 ℎ𝑗 𝑤𝑖𝑗
11
Training : Gibbs Sampling

1
▰ 𝑝 ℎ𝑗 = 1 𝑣 = −(𝑏𝑗 +𝑊𝑖𝑗 𝑣𝑖 ) = 𝜎(𝑏𝑗 + 𝑊𝑖𝑗 𝑣𝑖 )
1+𝑒
1
▰ 𝑝 𝑣𝑖 = 1 ℎ = −(𝑎𝑖 +𝑊𝑖𝑗 ℎ𝑗 ) = 𝜎(𝑎𝑖 + 𝑊𝑖𝑗 ℎ𝑗 )
1+𝑒 12
Training: Contrastive Divergence

▰ Δ𝑊 = 𝑣0 ⊗ 𝑝 ℎ0 𝑣0 − 𝑣𝑘 ⊗ 𝑝 ℎ𝑘 𝑣𝑘
▰ 𝑊𝑛𝑒𝑤 = 𝑊𝑜𝑙𝑑 + ∆𝑊

13
Performance Measurement

 Accuracy Actual
𝑇𝑃 + 𝐹𝑁 Relevant Nonrevelant
𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁 + 𝑇𝑁

Retrieved
True
 Sensitivity / Recall
Positive
False Positive
𝑇𝑃 (FP)

Prediction
(TP)
𝑇𝑃 + 𝐹𝑁
 Specificity
False

retrieved
𝑇𝑁 True Negative

Not
Negative
𝑇𝑁 + 𝐹𝑃 (TN)
(FN)
 Precision
𝑇𝑃
𝑇𝑃 + 𝐹𝑃 14
Thank you

15