You are on page 1of 55

Machine Learning and Deep Learning

Applications in Design Automation and


Practical Issues
Haoxing (Mark) Ren, DAC 19 Tutorial
OUTLINE

• Classical machine learning applications


• Deep learning applications
• Supervised learning: CNN, FCN, GCN
• Unsupervised learning and Reinforcement learning
• Practical Issues
• Feature selection
• Model selection
• Data imbalance
• Conclusions
2
CLASSICAL ML MODELS
Linear Regression Support Vector Machine

Decision Tree Neural Network

3
ENSEMBLE MODELS

Decision Tree Random Forrest XGBoost


https://quantdare.com/what-is-the-difference-between-bagging-and-boosting/
4
ML FOR DRC PREDICTION
Window based

GR-abased DRV prediction

ISPD17: SVM-based DRV prediction Routability Optimization In Sub14nm Technologies, ISPD’17 5


ML TIMER CORRELATION

Hierarchical ML Models: LSQR, ANN, SVM, RF

A deep learning methodology to proliferate golden signoff timing, DATE’14 6


ML LEGALIZATION CLASSIFICATION
Classify partition legalization max displacement for any arbitrary region

Circuit row density histogram Circuit row overlap histogram

How Deep Learning Can Drive Physical Synthesis Towards More Predictable Legalization, ISPD’19
7
ML DESIGN SPACE EXPLORATION
Intelligently explore a large design space to find the optimal target

Use Random Forest to model to predict HLS QoR

On Learning-Based Methods for Design-Space Exploration with High-Level Synthesis ,DAC’13 8


OUTLINE

• Classical machine learning applications


• Deep learning applications
• Supervised learning: CNN, FCN, GCN
• Unsupervised learning and Reinforcement learning
• Practical Issues
• Feature selection
• Model selection
• Data imbalance
• Conclusions
9
THE RISE OF DEEP LEARNING

Source : aiindex.org

10
CLASSICAL MACHINE LEARNING VS DEEP LEARNING
No need for feature engineering

Deep Learning
Performance

Machine Learning

Amount of Data
https://arxiv.org/ftp/arxiv/papers/1803/1803.01164.pdf

11
LEARNING PARADIGMS

12
OUTLINE

• Classical machine learning applications


• Deep learning applications
• Supervised learning: CNN, FCN, GCN
• Unsupervised learning and Reinforcement learning
• Practical Issues
• Feature selection
• Model selection
• Data imbalance
• Conclusions
13
IMAGE CONVOLUTION
Convolution aggregates information from neighboring pixels

feature
learned map
weights

Fully Connected Convolution


14
CONVOLUTIONAL NEURAL NETWORK

LeNet (Yann LeCun 1998)

15
CNN VARIATIONS

ShuffleNet V2

https://arxiv.org/pdf/1810.00736.pdf

16
POWER PREDICTION WITH MACHINE LEARNING

SystemC Slow (10-100 cycles/s)


High Accuracy

Fast (1k-10k cycles/s) Gate Level


Low Accuracy Netlist
Predict gate level
RTL power with RTL or
SystemC register traces

PRIMAL: Power Inference using Machine Learning (DAC’19) 17


POWER PREDICTION WITH CNN

Fast & Accurate ML

C0: non-switching 1K – 50K


C1: 0→1 inferences/s
Reg-to-Pixel Mapping C2: 1→0
A B C 1 0 0
0 0 1
D E 1 10 01 0
0 0 0
0 00 00 0
0 0 0
0 0 0

ShuffleNet V2
PRIMAL: Power Inference using Machine Learning (DAC’19) 18
POWER PREDICTION WITH CNN
DL has better accuracy than ML for large design

Register 160 384 405 438 381 372 5651 23531


& I/Os

PRIMAL: Power Inference using Machine Learning, DAC’19


19
FULLY CONVOLUTIONAL NETWORK (FCN)
No Fully Connected Layers

Upsampling/Transposed Convolution
(Blue input, green output)
Fully Convolutional Networks for Semantic Segmentation, CVPR’15

20
DRC HOTSPOT PREDICTION WITH FCN

DRC Hotspot Map


?

Input tensor constructed by stacking 2D features:


(1) Pin density, (2) macro (3) long-range RUDY, (4) RUDY pins

ROUTENET: Routability Prediction for Mixed-Size Designs Using Convolutional Neural Network, ICCAD’18
21
ROUTENET MODEL

Pixel-wise loss function

ROUTENET: Routability Prediction for Mixed-Size Designs Using Convolutional Neural Network, ICCAD’18
22
DRC HOTSPOT DETECTION EVALUATION

Window-based ML Ground True RouteNet

ROUTENET: Routability Prediction for Mixed-Size Designs Using Convolutional Neural Network, ICCAD’18
23
GRAPH CONVOLUTIONAL NETWORK (GCN)
GCN aggregates information from neighboring nodes
1 3
2

4 9 8 𝐹 (𝑙+1) = 𝜎(𝐴𝐹 (𝑙) 𝑊 (𝑙) )

5 7

Aggregation (mean, sum)


Encoding (Rm → Rn,Relu)

Semi-Supervised Classification with Graph Convolutional Networks, ICLR’17 24


5 6
GCN EXAMPLE 2
2nd Layer

9 1 7
1 [1 x 64]
4 3
Encoding
1 [1 x 32] 8
Aggregation

1 [1 x 32] 2 [1 x 32] 3 [1 x 32] 4 [1 x 32]


1st Layer

Encoding Encoding Encoding Encoding

1 [1 x 4] 2 [1 x 4] 3 [1 x 4] 4 [1 x 4]

Aggregation Aggregation Aggregation Aggregation

1 2 3 4 [4 x 4] 1 2 5 6 [4 x 4] 1 3 7 8 [4 x 4] 1 4 8 9 [4 x 4]

25
25
GCN BASED TESTABILITY PREDICTION
Logic Level
SCOAP_C0
SCOAP_C1
SCOAP_OB
Layer 1 Layer 2 Layer 3 Fully Connected Layers

1
1
0 0
0 0

Weighted sum Weighted sum Weighted sum


(64,64,128,2)
& Relu(R4 → R32) & Relu(R32 → R64) & Relu(R64 → R128)

High Performance Graph Convolutional Networks with Applications in Testability Analysis, DAC’19
26
TESTABILITY PREDICTION ACCURACY
Testing Accuracy(%) 1
0.9
100 0.8
K: # GCN 0.7
Layers
95 0.6
0.5
90 0.4
0.3
85 0.2
0.1
80 0
75
70
65
60
1 31 61 91 121 151 181 211 241 271
Precision Recall F1 score Accuracy
Epochs

• Test point insertion reduced by 11% over TetraMax.


• Graph based model well-suited for EDA problems.
High Performance Graph Convolutional Networks with Applications in Testability Analysis, DAC’19
27
OUTLINE

• Classical machine learning applications


• Deep learning applications
• Supervised learning: CNN, FCN, GCN
• Unsupervised learning and Reinforcement learning
• Practical Issues
• Feature selection
• Model selection
• Data imbalance
• Conclusions
28
UNSUPERVISED LEARNING
• Supervised learning learns to predict y
from x, typically with maximum likelihood

• Unsupervised learning: model density, do


maximum likelihood on the data instead
of the targets

• Generative models

Unsupervised Learning Tutorial, NIPS’18


29
GENERATIVE MODELS
Autoregressive Autoencoder

GAN

Image credit: PixelRNN (CVPR’16), https://skymind.ai/wiki/generative-adversarial-network-gan 30


GAN BASICS

Hung-Yi Lee, Generative Adversary Networks, 2018


31
OPC-GAN
Design target

Mask with OPC

ILT

Wafer

GAN-OPC : mask optimization with lithography-guided generative adversarial nets, DAC’18


32
LithoGAN
CGAN

Mask Photo Resist

33
LithoGAN: End-to-End Lithography Modeling with Generative Adversarial Networks , DAC’19
REINFORCEMENT LEARNING BACKGROUND

• Reward 𝑹(𝒕): score you earned at current step

• State 𝐒 : current screen

• Action 𝒂: move your board left / right

෡ (𝑺, 𝒂): your predicted future total rewards


• Action value function 𝑸

• Policy 𝝅(𝒔): How to choose your action

34
REINFORCEMENT LEARNING CATEGORIES

Learn Q Function Learn Action Policy


(DQN) (Policy Gradient)

෡ (𝑺, 𝒂)
𝑸 𝝅(𝒔)
Value-based Policy-based

• Simple action policy • Stochastic action


• Discrete action space • Continuous space
• Sample efficient • Sample inefficient
Actor + Critic
(A2C, A3C)
35
LEARNING Q FUNCTION
Q Learning with Temporal Difference

Deep neural network


(DQN)
Hung-Yi Lee, Deep Reinforcement Learning, 2018
36
POLICY GRADIENT

Hung-Yi Lee, Deep Reinforcement Learning, 2018


37
DQN FOR COMBINATIONAL OPTIMIZATION
Replacing heuristics

Minimum Vertex Cover (MV) RF Formulation

Select a vertex to insert into • Reward: -1, cost of vertex cover


cover one at a time • State: current selected nodes, use
GCN to learn graph state
• Action: which node to select
• Q function: Use DQN to learn which
node has highest value
• Policy: 𝜀-greedy

Learning Combinatorial Optimization Algorithms Over Graph, NIPS’17


38
DQN WITH DEEP NODE REPRESENTATIONS

Learning Combinatorial Optimization Algorithms Over Graph, NIPS’17 39


POLICY GRADIENT FOR LOGIC OPTIMIZATION
Stochastic policy to select optimization transforms
Majority Inverter Graph (MIG) RF Formulation
MAJ x, y, y = (𝑥 ∧ 𝑦) ∨ (𝑥 ∧ 𝑧) ∨ (𝑦 ∧ 𝑧)
• Reward: logic depth reduction
• State: current graph, use GCN to learn
graph state
• Action: which move to select
• Policy: Use move dependent fully
connected layer to compute probabilities
of each move

Deep Learning for Logic Optimization, IWLS’17


40
RF VS OTHER ALGORITHMS

𝜀-greedy policy Simulated


Annealing
Known
Environment
Dynamic
Programming

Simple policy Heuristics

Gradient free

Evolutionary
Algorithm
41
POTENTIAL OF UNSUPERVISED LEARNING

42
OUTLINE

• Classical machine learning applications


• Deep learning applications
• Supervised learning: CNN, FCN, GCN
• Unsupervised learning and Reinforcement learning
• Practical Issues
• Feature selection
• Model selection
• Data imbalance
• Conclusions
43
DATASET ANALYSIS
Analyze dataset with Pandas Profiler

44
FEATURE SELECTION
• Filter method
• Evaluate relationship between features and target to compute importance of each feature

• F Test, Mutual information, Variance threshold

• Wrapper method
• Add features one at a time

• Eliminate features one at a time

• Embedded method
• Lasso regression: zero weight for unimportant features

• Tree based method: important feature at root of tree


45
WRAPPER METHOD EXAMPLE

Routability Optimization In Sub14nm Technologies, ISPD’17

46
1D FEATURE ENCODING

➢ on-chip measurement point location


Sub-Block-Level layout of an SoC ➢ sense point neighborhood-level graph
➢ global and local feature vectors

Robust Power Estimation and Simultaneous Switching Noise Prediction Methods Using Machine Learning, GTC’19 47
2D FEATURE ENCODING
Map array of registers to an 2D image

Partition based encoding Node embedding based encoding

PRIMAL: Power Inference using Machine Learning (DAC’19) 48


DATASET CREATION

Cover a wide range of design frequencies


Cover different types of standard cell sizes
Prevent duplication in training data due to replicated partitions/chiplets
Select more outliers in the design chosen
Training/Testing split

Using Machine Learning for VLSI Testability and Reliability, GTC’19

49
MODELING
Always try classical machine learning to establish a strong baseline
Use Linear Regression, SVM and XGBoost

Model building: Think about underline physics


The DL model performs better if adhere to the physics, e.g. 2D CNN associate with patterns

Use Priors in model construction: graphs, cost functions, etc.

Hyperparameter tuning:
Start with small dataset and make sure you can overfit it with the model

Gradually increase model complexity if you can not overfit the training dataset

Cross validation
50
DATA IMBALANCE ISSUE
It is very common to have much more non-DTs (negative class) than DTs
(positive class), imbalance ratio more than 100X

Classifier 1: ok precision, low recall Classifier 2: high recall, low precision


Predict: 0 Predict: 1 Predict: 0 Predict: 1
Fact: 0 133576 290 Fact: 0 100919 32927
Fact: 1 3681 432 Fact: 1 114 4069

Recall: 10.5% Recall: 97.3%


Precision: 59.8% Precision: 11.0%

High Performance Graph Convolutional Networks with Applications in Testability Analysis, DAC’19

51
WEIGHTED LABEL
Apply weights to compensate the bias {2, 3, 4, 5,.... 10, 20, 30, 40, 50}

Routability Optimization In Sub14nm Technologies, ISPD’17 52


MULTI-STAGE CLASSIFICATION
The networks on initial stages only filter out negative data points with high confidence
High recall, low precision
Positive predictions are sent to the network on the next stage

+ + - +
- -

Network 1 Network 2 Network 3

High Performance Graph Convolutional Networks with Applications in Testability Analysis, DAC’19
53
CONCLUSIONS

• Deep learning and machine learning can improve quality and productivity
of design automation in many ways.

• We should focus on innovative methods to apply advanced DL models to


hard EDA problems

• There are still a lot of challenges in applying DL : open dataset,


transferability, interpretability

54

You might also like