Professional Documents
Culture Documents
Optimization Notice
Copyright © 2022, Intel Corporation. All rights reserved. 3
*Other names and brands may be claimed as the property of others.
Using the Console Resize and move windows around
Biographies
Resources
Presentation
Speaker video
Live demos
▪
▪
▪
▪
5
...
Available Now
8
XGBoost CPU vs. GPU
5.00
XGBoost fit v1.1 CPU vs GPU speed-up, (higher is better for Intel)
4.53
4.00 3.54
3.00
speed
2.00
up
1.35
1.03 1.15
1.00
0.00
higgs1m airline-ohe msrank letters abalone
Intel Xeon 8124M vs Nvidia V100
• Intel’s contribution to
XGBoost project on GitHub
https://github.com/dmlc/xgboost XGBoost fit CPU acceleration (“hist” method)
• Memory prefetching, nestled and XGBoost fit - acceleration against baseline (v0.81) on Intel CPU
advanced parallelism, usage of uint8
18
15.5
16
14
XGB 0.81 (CPU) XGB 0.9 (CPU) XGB 1.0 (CPU) XGB master 1.1 (CPU)
SAMPLE USE CASES & PROOF POINTS > View More Case Studies
Accrad AI-based Solution Helps AbbVie Machine Translation Solution Acceleration for HPC & AI Inferencing AI Machine 3D Imaging – For many KFBIO AI-based Solution Helps
Accelerate Lung Disease Diagnosis – accelerates natural language processing - CERN, SURFsara, & Intel are driving machine learning & AI tasks, Daspatial Accelerate Tuberculosis Diagnosis
Acceleration for training + inference inference models using processor breakthrough performance on achieved significant performance demonstrating Intel inference
optimized capabilities scientific, engineering, and financial improvements. The company is a performance leadership
simulations. Includes strong inference design win – moving from Nvidia HW
Optimized by Intel® oneAPI Analytics Toolkit & benchmark. to Intel’s, HW, & successfully migrated
Intel® Distribution of OpenVINO™ toolkit CUDA code to DPC++.
AsiaInfo AI-based Solution Bentley Motors Limited’s Car LAIKA Studios & Intel Join Forces to Intel & Facebook Accelerate Red Hat Optimizing Data Science
optimized the AI end-to-end Configurator uses real-time, accurate, Expand What’s Possible in Stop- PyTorch Performance on training Workflows
workflow for performance, and helps visualization & AI for 1.7M+ rendered Motion Filmmaking – decreases workloads using Intel® Deep
Accelerate 5G Network Intelligence images in providing options to production time by transitioning Learning Boost on 3rd gen Intel®
customers. manual work to AI-based processing Xeon® Scalable processors
PR Video Demo Session Article Video
Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.
10
Overview of Classifier Characteristics
12
Overview of Classifier Characteristics
parameters
0.5
• Fitting can be slow—must find best
parameters
0.0
• Prediction is fast—calculate expected
value
X
1 • Decision boundary is simple, less
𝑦𝛽 𝑥 = flexible
1+𝑒 −(𝛽0 + 𝛽1 𝑥 + ε )
13
Introduction to Decision Trees
14
Introduction to Decision Trees
15
Regression Trees Predict Continuous Values
Leaves
16
Regression Trees Predict Continuous Values
2.0 max_depth=2
max_depth=5
1.0
0.0
-1.0
-2.0
0 1 2 3 4 5
Source: http://scikit-learn.org/stable/auto_examples/tree/plot_tree_regression.html
17
Building a Decision Tree
18
For How Long?
Until:
• Leaf node(s) are pure (only
one class remains)
• A maximum depth is
reached
• A performance metric is
achieved
• Choose a maximum depth
and prune
19
Building the Best Decision Tree
• Use greedy search: find the
best split at each step
• What defines the best split?
Temperature:
>= Mild • One that maximizes the
information gained from the
split
• How is information gain
defined?
No Tennis Play Tennis
Leaves
20
Splitting Based on Classification Error
Classification Error
8 Yes Equation
4 No 𝐸 𝑡 = 1 − max[𝑝 𝑖 𝑡 ]
Temperature: 𝑖
>= Mild
Classification Error Before
2 Yes 6 Yes
2 No 2 No 1 − 8Τ12 = 0.3333
21
Splitting Based on Classification Error
Classification Error
8 Yes Equation
0.3333
4 No 𝐸 𝑡 = 1 − max[𝑝 𝑖 𝑡 ]
Temperature: 𝑖
>= Mild
Classification Error Change
2 Yes 6 Yes weighted average
2 No 2 No 0.3333 − 4Τ12 ∗ 0.5000 − 8Τ12 ∗0.2500
=0
No Tennis Play Tennis
0.5000 0.2500
22
Splitting Based on Classification Error
23
Splitting Based on Entropy
Entropy Equation
8 Yes
𝑛
4 No 𝐻 𝑡 = − 𝑝 𝑖 𝑡 𝑙𝑜𝑔2 [𝑝(𝑖|𝑡)]
Temperature:
𝑖=1
>= Mild
24
Splitting Based on Entropy
Entropy Equation
8 Yes
0.9183 𝑛
4 No 𝐻 𝑡 = − 𝑝 𝑖 𝑡 𝑙𝑜𝑔2 [𝑝(𝑖|𝑡)]
Temperature:
𝑖=1
>= Mild
26
Classification Error vs Entropy
Error
• Center represents
ambiguity—50/50 split
Classi fi cat i on E rror
• Splitting metrics favor results
that are furthest away from 0.5
the center 0.0 1.0
Purity
𝐸 𝑡 = 1 − max[𝑝 𝑖 𝑡 ]
𝑖
27
Classification Error vs Entropy
Error/Entropy
maximum but is curved
29
The Gini Index
𝐺 𝑡 = 1 − 𝑝 𝑖 𝑡 2
𝑖=1 30
Decision Trees are High Variance
31
Pruning Decision Trees
32
Pruning Decision Trees
33
Strengths of Decision Trees
34
DecisionTreeClassifier: The Syntax
Import the class containing the classification method
DTC = DecisionTreeClassifier(criterion='gini',
max_features=10, max_depth=5)
Fit https://devcloud.intel.com/oneapi/get_started/
y_predict = DTC.predict(X_test)
35
−
Development Environment
▪ 220 GB of file storage
▪ 192 GB of RAM
▪ Ubuntu 20.04
▪ Up to 24 hours of continuous workload
execution times
▪ Free 120-day access; account extensions upon
request
https://devcloud.intel.com/oneapi/
Refer to software.intel.com/articles/optimization-notice for more information regarding performance & optimization choices in Intel software products.
36
▪If you have a Devcloud account:
• Step 0: Launch a terminal from launcher screen and enter the following:
• Step 1: cp /data/oneapi_workshop/xgboost.tar.gz
• Step 2: tar –xzf xgboost.tar.gz
• Step 3: Navigate to AiKit_XGBoost_Predictive_Modeling
37
https://devcloud.intel.com/oneapi/get_started/
Decision Trees are High Variance
40
Improvement: Use Many Trees
Create Many New Trees and Combine predictions to reduce variance
41
How to Create Multiple Trees?
Grow decision tree from each bootstrapped sample
42
Distribution of Data in Bootstrapped Samples
0.40
• Given a dataset, create n bootstrapped
samples
0.25
0 20 40 60 80 100
Number of Bootstrapped Samples (n)
43
Aggregate Results
Trees vote on or average result for each data point
Vote to
Form a
Single
Classifier
44
Aggregate Results
Trees vote on or average result for each data point
Data
Point
Vote to
Form a
Single
Classifier
45
Aggregate Results
Trees vote on or average result for each data point
Vote to Form
a Single
Classifier
Results
46
Aggregate Results
Trees vote on or average result for each data point
Vote to
Form a
Single
Classifier
48
Calculation of Feature Importance
49
• Bagging performance
RMSE (Cross-Validated)
improvements increase
with more trees
• Maximum improvement
generally reached ~50 trees
0 100 200 300 400 500
Number of Bagged Trees
50
Strengths of Bagging
Same as decision trees:
• Easy to interpret and
implement
• Heterogeneous input data
allowed, no preprocessing
required
Specific to bagging:
• Less variability than decision
trees
• Can grow trees in parallel*
51
BaggingClassifier: The Syntax
Import the class containing the classification method
BC = BaggingClassifier(n_estimators=50)
Fit the instance on the data and then predict the expected value
BC = BC.fit(X_train, y_train)
y_predict = BC.predict(X_test)
RMSE (Cross-Validated)
σ2
𝑛
• However, bootstrap
samples are correlated (𝜌):
2
1−𝜌 2
𝜌σ + σ 0 100 200 300 400 500
𝑛
Number of Bagged Trees
53
Introducing More Randomness
RMSE (Cross-Validated)
• Use random subset of Bagging
• Classification: 𝑚
• Regression: 𝑚Τ3
54
How Many Random Forest Trees?
RMSE (Cross-Validated)
to Bagging
Bagging
55
RandomForest: The Syntax
Import the class containing the classification method
RC = RandomForestClassifier(n_estimators=100, max_features=10)
Fit the instance on the data and then predict the expected value
RC = RC.fit(X_train, y_train)
y_predict = RC.predict(X_test)
57
ExtraTreesClassifier: The Syntax
Import the class containing the classification method
from sklearn.ensemble import ExtraTreesClassifier
Fit the instance on the data and then predict the expected value
EC = EC.fit(X_train, y_train)
y_predict = EC.predict(X_test)
58
Hands on Lab 2
•
•
•
•
•
61
Decision Stump: the Boosting Base Learner
62
Decision Stump: the Boosting Base Learner
Humidity
Humidity
< 30%
63
Overview of Boosting
x x
Create Fit to data x x
initial and x x
decision calculate x
stump residuals x x x x
64
Overview of Boosting
Find new x x x x
decision
x x
stump to fit
weighted
x +
x x x x
residuals
65
Overview of Boosting
Fit new x x x x
x x x x
decision
x x x x
stump to
current
x + x
residuals x x x x x x x x
x x x x
Calculate x x x x
errors and x x x x
weight x + x
data points x x x x x x x x
66
Overview of Boosting
Find new x x x x
x x x x
decision
x x x x
stump to fit
weighted
x + x +
x x x x x x x x
residuals
67
Overview of Boosting
Fit new x x x x x x
x x x x x x
decision
x x x x x x
stump to
current
x + x + x
x x x x x x x x x x x x
residuals
68
Overview of Boosting
x x x x x x
x x x x x x
x x x x x x
x + x + x
x x x x x x x x x x x x
Combine to form a
single classifier
=
69
Overview of Boosting
x x x x x x
x x x x x x
x x x x x x
x + x + x
x x x x x x x x x x x x
x x
Combine to form a x x
single classifier
= x x
x
x x x x
70
Overview of Boosting
+𝜆 +𝜆
71
Boosting Specifics
• Boosting utilizes different loss
functions Loss
72
Gradient Boosting Loss Function
AdaBoost
• Generalized boosting method
that can use different loss Loss
functions
Deviance
• Common implementation uses (Gradient
binomial log likelihood loss Boosting)
function (deviance):
(−𝑚𝑎𝑟𝑔𝑖𝑛)
log(1 + 𝑒 ) 0-1 Loss
• More robust to outliers than
AdaBoost
Misclassified Correct
Margin
73
Bagging vs Boosting
Bagging Boosting
• Bootstrapped samples • Fit entire data set
• Base trees created • Base trees created successively
independently
• Only data points considered • Use residuals from previous
models
• No weighting used • Up-weight misclassified points
• Excess trees will not overfit • Beware of overfitting
74
Tuning a Gradient Boosted Model
Base
𝜆=0.1 • Learning rate (𝝀): set to <1.0
subsample=0.5
𝜆=0.1, subsample=0.5
for regularization. That’s also
𝜆=0.1, max_features=2 called “shrinkage”
• Subsample: set to <1.0 to use
Test Set Error
Boosting Iterations
75
GradientBoostingClassifier: The Syntax
Import the class containing the classification method
Fit the instance on the data and then predict the expected value
y_predict = GBC.predict(X_test)
76
18
16 15.5
14
12
10
8 7.5
6 5.4 5.7
3.7 3.8
4 3.4
3.1
1.8 2.1
2 1.5 1.4
1 1 1 1.1 1 1 1.0
0.4
0
higgs1m Letters Airline-ohe MSRank-30K Mortgage
78
▪
▪
•
• 51
•
79
▪
80
81
82
▪ Visit Intel® oneAPI AI ▪ Download the AI Kit ▪ Code Samples ▪ Machine Learning & ▪ Ask questions and
Analytics Toolkit (AI from Intel, Anaconda Analytics Blogs at Intel share information with
▪ Build, test and
Kit) for more details or any of your favorite Medium others through the
remotely run
and up-to-date package managers Community Forum
workloads on the ▪ Intel AI Blog site
product information ▪ Get started quickly Intel® DevCloud for ▪ Discuss with experts at
▪ Webinars and Articles
▪ Release Notes with the AI Kit Docker free. No software AI Frameworks Forum
at Intel® Tech Decoded
Container downloads. No
configuration steps.
▪ Installation Guide
No installations.
▪ Utilize the Getting
Started Guide
Download Now
83
XGBoost Hands-on Lab
85
86
Intel technologies may require enabled hardware, software or service activation. Learn more at intel.com or from the OEM or retailer.
Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.
Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel
microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or
effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel
microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and
Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804. https://software.intel.com/en-
us/articles/optimization-notice
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.
Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to
any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated
purchases, including the performance of that product when combined with other products. See backup for configuration details. For more complete information about
performance and benchmark results, visit www.intel.com/benchmarks.
Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See configuration disclosure for details.
No product or component can be absolutely secure.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-
infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the
property of others.
87
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.
Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration
details. No product or component can be absolutely secure.
Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be
claimed as the property of others.
88
XGBoost CPU vs GPU
Test configs: Tested by Intel as of 10/13/2020;
CPU: c5.18xlarge AWS Instance (2 x Intel® Xeon Platinum 8124M @ 18 cores, OS: Ubuntu 20.04.2 LTS, 193 GB RAM. GPU: p3.2xlarge AWS Instance (GPU: NVIDIA Tesla V100
16GB, 8 vCPUs), OS: Ubuntu 18.04.2 LTS, 61 GB RAM. SW: XGBoost 1.1:build from sources. compiler – G++ 7.4, nvcc 9.1. Intel® Data Analytics Acceleration Library (Intel®
DAAL): 2019.4 version; Python env: Python 3.6, Numpy 1.16.4, Pandas 0.25, Scikit-lean 0.21.2.
System Board Intel prototype, TGL U DDR4 SODIMM RVP ASUSTeK COMPUTER INC. / PRIME Z370-A
th
CPU 11 Gen Intel® Core™ -5-1145G7E @ 2.6 GHz. 8th Gen Intel® Core™ i5-8500T @ 3.0 GHz.
Sockets / Physical cores 1/4 1/6
HyperThreading / Turbo Setting Enabled / On Na / On
Memory 2 x 8198 MB 3200 MT/s DDR4 2 x 16384 MB 2667 MT/s DDR4
OS Ubuntu* 18.04 LTS Ubuntu* 18.04 LTS
Kernel 5.8.0-050800-generic 5.3.0-24-generic
Software Intel® Distribution of OpenVINO™ toolkit 2021.1.075 Intel® Distribution of OpenVINO™ toolkit 2021.1.075
BIOS Intel TGLIFUI1.R00.3243.A04.2006302148 AMI, version 2401
BIOS release date Release Date: 06/30/2020 7/12/2019
BIOS Setting Load default settings Load default settings, set XMP to 2667
Test Date 9/9/2020 9/9/2020
Precision and Batch Size CPU: INT8, GPU: FP16-INT8, batch size: 1 CPU: INT8, GPU: FP16-INT8, batch size: 1
Number of Inference Requests 4 6
Number of Execution Streams 4 6
Power (TDP Link) 28 W 35W
89