You are on page 1of 89

Relax and sit tight

The workshop will begin at 9:00AM PT


Ben Odom
Developer Evangelist
About Our Speakers
Ben Odom - Technical Developer Evangelist
Ben Odom is a Technical Developer Evangelist focused on highlighting, training and showcasing Intel
products and tools to developers worldwide. His recent work on Artificial Intelligence is developing
coursework for Intel’s developer ecosystem and delivering training to both industry and academic
developers. Currently Ben is developing coursework for oneAPI and DPC++. He’s been in the tech
industry for over 20 years and has a Master’s Degree in Computer Science and Engineering from Oregon
Health Sciences University.

Bob Chesebrough - Developer Evangelist for AI and oneAPI


Bob Chesebrough is a Developer Evangelist at Intel with over 20 years of experience in software
development, optimization on Intel platforms, and AI. In his current role, he works with universities and
developers in evangelizing and helping them understand AI and oneAPI concepts. He has experience
with Python, C, C++, Fortran programming languages as he supported software and/or clients at two
DOE National labs, and Aerospace company, IBM, and Intel.

Optimization Notice
Copyright © 2022, Intel Corporation. All rights reserved. 3
*Other names and brands may be claimed as the property of others.
Using the Console Resize and move windows around

Biographies

Resources

Presentation

Speaker video

Live demos

Quick links Questions and Answers


4



5
...

▪ Popular frameworks and middleware are


built using one or more of the oneAPI
industry specification elements to extend
and optimize

▪ AI libraries support machine learning and


deep learning solutions

▪ Target any CPU, GPU, or FPGA hardware

Available Now

Visit software.intel.com/oneapi for more details


Some capabilities may differ per architecture and custom-tuning will still be required. Other accelerators to be supported in the future. 6
▪ Accelerate end-to-end AI and Data Science pipelines, achieve drop-in acceleration with optimized
Python tools built using oneAPI libraries (i.e. oneMKL, oneDNN, oneCCL, oneDAL, and more)
▪ Achieve high-performance deep learning training and inference with Intel-optimized TensorFlow and
PyTorch versions, and low-precision optimization (Intel® Neural Compressor) with support for fp16,
int8 and bfloat16
▪ Expedite development using open-source Intel-optimized pre-trained deep learning models for best
performance via Model Zoo for Intel® Architecture (IA)
▪ Enable distributed training through Torch-CCL, and support of standards-based Horovod library
▪ Seamlessly scale Pandas workflows across multi-node dataframes with Intel® Distribution of Modin,
accelerate analytics with performant backends such as Heavy.AI
▪ Increase machine learning model accuracy and performance with algorithms in Scikit-learn and
XGBoost optimized for IA
▪ Supports cross-architecture development (Intel® CPUs/GPUs) and compute
7 7
Optimization Tools for Broad AI
Workloads
Intel® AI Analytics Toolkit – Used to
accelerate AI and machine learning
applications from training to inference
deployment. Achieve drop-in acceleration
on Intel® CPUs and GPUs with optimized
Python libraries and deep learning
frameworks
M
Intel® Distribution of OpenVINO™ toolkit –
Used to deliver high-performance deep
learning and simplify inference deployment
across multiple types of architectures. This
toolkit is powered by oneAPI’s oneDNN
Library.

Tools for Graphics-specific AI


Optimization
Intel® Open Image Denoise (also part of the
Intel® oneAPI Rendering Toolkit) – Used to
Increase image quality with machine-
learning algorithms that selectively filter
visual noise.

Intel® Game Dev AI Toolkit – Used to


ntegrate AI and machine learning easier into
game applications on Intel CPUs and GPUs.
Learn more

8
XGBoost CPU vs. GPU

5.00
XGBoost fit v1.1 CPU vs GPU speed-up, (higher is better for Intel)
4.53

4.00 3.54

3.00

speed
2.00

up
1.35
1.03 1.15
1.00

0.00
higgs1m airline-ohe msrank letters abalone
Intel Xeon 8124M vs Nvidia V100
• Intel’s contribution to
XGBoost project on GitHub
https://github.com/dmlc/xgboost XGBoost fit CPU acceleration (“hist” method)
• Memory prefetching, nestled and XGBoost fit - acceleration against baseline (v0.81) on Intel CPU
advanced parallelism, usage of uint8
18
15.5
16
14

Speedup vs. 0.81


12
10
+ Reducing memory consumption 7.5
8
5.4 5.7
6
3.7 3.8 3.4
4 3.1
1.8 1.5 2.1
2 1 1 1 1.1 1 1 1.0 1.4
0.4
0
higgs1m Letters Airline-ohe MSRank-30K Mortgage

XGB 0.81 (CPU) XGB 0.9 (CPU) XGB 1.0 (CPU) XGB master 1.1 (CPU)

*Measured March 2021


For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
See backup for configuration details. 9

SAMPLE USE CASES & PROOF POINTS > View More Case Studies

Accrad AI-based Solution Helps AbbVie Machine Translation Solution Acceleration for HPC & AI Inferencing AI Machine 3D Imaging – For many KFBIO AI-based Solution Helps
Accelerate Lung Disease Diagnosis – accelerates natural language processing - CERN, SURFsara, & Intel are driving machine learning & AI tasks, Daspatial Accelerate Tuberculosis Diagnosis
Acceleration for training + inference inference models using processor breakthrough performance on achieved significant performance demonstrating Intel inference
optimized capabilities scientific, engineering, and financial improvements. The company is a performance leadership
simulations. Includes strong inference design win – moving from Nvidia HW
Optimized by Intel® oneAPI Analytics Toolkit & benchmark. to Intel’s, HW, & successfully migrated
Intel® Distribution of OpenVINO™ toolkit CUDA code to DPC++.

AsiaInfo AI-based Solution Bentley Motors Limited’s Car LAIKA Studios & Intel Join Forces to Intel & Facebook Accelerate Red Hat Optimizing Data Science
optimized the AI end-to-end Configurator uses real-time, accurate, Expand What’s Possible in Stop- PyTorch Performance on training Workflows
workflow for performance, and helps visualization & AI for 1.7M+ rendered Motion Filmmaking – decreases workloads using Intel® Deep
Accelerate 5G Network Intelligence images in providing options to production time by transitioning Learning Boost on 3rd gen Intel®
customers. manual work to AI-based processing Xeon® Scalable processors
PR  Video  Demo  Session Article  Video

Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.
10
Overview of Classifier Characteristics

• For K-Nearest Neighbors,


training data is the model
• Fitting is fast—just store
data
Y

• Prediction can be slow—lots


of distances to measure
• Decision boundary is flexible
X

12
Overview of Classifier Characteristics

1.0 • For logistic regression, model is just


Probability

parameters
0.5
• Fitting can be slow—must find best
parameters
0.0
• Prediction is fast—calculate expected
value
X
1 • Decision boundary is simple, less
𝑦𝛽 𝑥 = flexible
1+𝑒 −(𝛽0 + 𝛽1 𝑥 + ε )

13
Introduction to Decision Trees

14
Introduction to Decision Trees

• Want to predict whether to Nodes


play tennis based on Temperature:
temperature, humidity, >= Mild
wind, outlook Humidity:
• Segment data based on = Normal
No Tennis
features to predict result
• Trees that predict
categorical results are No Tennis Play Tennis
decision trees
Leaves

15
Regression Trees Predict Continuous Values

• Example: use slope an Nodes Elevation:


elevation in Himalayas < 7900 ft.
• Predict average Slope:
precipitation (continuous < 2.5º
value) 55.42 in.
• Values at leaves are
averages of members 13.67 in. 48.50 in.

Leaves

16
Regression Trees Predict Continuous Values
2.0 max_depth=2
max_depth=5

1.0

0.0

-1.0

-2.0
0 1 2 3 4 5

Source: http://scikit-learn.org/stable/auto_examples/tree/plot_tree_regression.html
17
Building a Decision Tree

• Select a feature and split


data into binary tree
• Continue splitting with
available features

18
For How Long?
Until:
• Leaf node(s) are pure (only
one class remains)
• A maximum depth is
reached
• A performance metric is
achieved
• Choose a maximum depth
and prune

19
Building the Best Decision Tree
• Use greedy search: find the
best split at each step
• What defines the best split?
Temperature:
>= Mild • One that maximizes the
information gained from the
split
• How is information gain
defined?
No Tennis Play Tennis
Leaves
20
Splitting Based on Classification Error
Classification Error
8 Yes Equation
4 No 𝐸 𝑡 = 1 − max[𝑝 𝑖 𝑡 ]
Temperature: 𝑖
>= Mild
Classification Error Before
2 Yes 6 Yes
2 No 2 No 1 − 8Τ12 = 0.3333

No Tennis Play Tennis

21
Splitting Based on Classification Error
Classification Error
8 Yes Equation
0.3333
4 No 𝐸 𝑡 = 1 − max[𝑝 𝑖 𝑡 ]
Temperature: 𝑖
>= Mild
Classification Error Change
2 Yes 6 Yes weighted average
2 No 2 No 0.3333 − 4Τ12 ∗ 0.5000 − 8Τ12 ∗0.2500

=0
No Tennis Play Tennis
0.5000 0.2500
22
Splitting Based on Classification Error

8 Yes • Using classification error, no


further splits would occur
4 No
Temperature: • Problem: end nodes are not
>= Mild homogeneous
• Try a different metric?
2 Yes 6 Yes
2 No 2 No

No Tennis Play Tennis

23
Splitting Based on Entropy
Entropy Equation
8 Yes
𝑛
4 No 𝐻 𝑡 = − ෍ 𝑝 𝑖 𝑡 𝑙𝑜𝑔2 [𝑝(𝑖|𝑡)]
Temperature:
𝑖=1
>= Mild

2 Yes 6 Yes Entropy Before


2 No 2 No − 8Τ12 𝑙𝑜𝑔2 (8Τ12) − 4Τ12 𝑙𝑜𝑔2 (4Τ12) = 0.9183

No Tennis Play Tennis

24
Splitting Based on Entropy
Entropy Equation
8 Yes
0.9183 𝑛
4 No 𝐻 𝑡 = − ෍ 𝑝 𝑖 𝑡 𝑙𝑜𝑔2 [𝑝(𝑖|𝑡)]
Temperature:
𝑖=1
>= Mild

2 Yes 6 Yes Entropy Change


2 No 2 No 0.9183 − 4Τ12 ∗ 1.0000 −
8Τ ∗0.8113
12
No Tennis Play Tennis
1.0000 0.8113 = 0.0441
25
Splitting Based on Entropy

8 Yes • Splitting based on entropy


allows further splits to occur
4 No
Temperature: • Can eventually reach goal of
>= Mild homogeneous nodes
• Why does this work with
2 Yes 6 Yes
entropy but not classification
2 No 2 No error?

No Tennis Play Tennis

26
Classification Error vs Entropy

• Classification error is a flat


function with maximum at
center

Error
• Center represents
ambiguity—50/50 split
Classi fi cat i on E rror
• Splitting metrics favor results
that are furthest away from 0.5
the center 0.0 1.0
Purity

𝐸 𝑡 = 1 − max[𝑝 𝑖 𝑡 ]
𝑖
27
Classification Error vs Entropy

• Entropy has the same

Error/Entropy
maximum but is curved

• Curvature allows splitting to


Classification Error
continue until nodes are pure
Cross Entropy

0.0 0.5 1.0


• How does this work? 𝑛
Purity
𝐻 𝑡 = − ෍ 𝑝 𝑖 𝑡 𝑙𝑜𝑔2 [𝑝(𝑖|𝑡)]
𝑖=1
28
Information Gained by Splitting
• With entropy gain, the
function has a "bulge"
• Allows average information
of children to be less than
parent
• Results in information gain
and continued splitting

29
The Gini Index

• In practice, Gini index often


used for splitting
• Function is similar to
Classification
entropy—has bulge
Error
• Does not contain logarithm Cross Entropy
Gini Index
0. 0.5 1.0
0
Purity
𝑛

𝐺 𝑡 = 1 − ෍𝑝 𝑖 𝑡 2

𝑖=1 30
Decision Trees are High Variance

• Problem: decision trees tend to


overfit
• Small changes in data greatly
affect prediction--high variance
• Solution: Prune trees

31
Pruning Decision Trees

• Problem: decision trees


tend to overfit
• Small changes in data
greatly affect prediction--
high variance
• Solution: Prune trees

32
Pruning Decision Trees

• How to decide what leaves


to prune?

• Solution: prune based on


classification error
threshold
𝐸 𝑡 = 1 − max[𝑝 𝑖 𝑡 ]
𝑖

33
Strengths of Decision Trees

• Easy to interpret and


implement—"if … then …
else" logic
• Handle any data category—
binary, ordinal, continuous
• No preprocessing or scaling
required

34
DecisionTreeClassifier: The Syntax
Import the class containing the classification method

from sklearn.tree import DecisionTreeClassifier

Create an instance of the class

DTC = DecisionTreeClassifier(criterion='gini',

max_features=10, max_depth=5)

Fit https://devcloud.intel.com/oneapi/get_started/

DTC = DTC.fit(X_train, y_train)

y_predict = DTC.predict(X_test)

Tune parameters with cross-validation. Use DecisionTreeRegressor for regression.

35

Development Environment
▪ 220 GB of file storage
▪ 192 GB of RAM
▪ Ubuntu 20.04
▪ Up to 24 hours of continuous workload
execution times
▪ Free 120-day access; account extensions upon
request

Quick How-to Resources


▪ Videos
▪ Developer guides

https://devcloud.intel.com/oneapi/

Refer to software.intel.com/articles/optimization-notice for more information regarding performance & optimization choices in Intel software products.
36
▪If you have a Devcloud account:
• Step 0: Launch a terminal from launcher screen and enter the following:
• Step 1: cp /data/oneapi_workshop/xgboost.tar.gz
• Step 2: tar –xzf xgboost.tar.gz
• Step 3: Navigate to AiKit_XGBoost_Predictive_Modeling

▪If you do not have a DevCloud account and just got


one:
• Step 0: Navigate to AiKIT_XGBoost_Predictive_Modeling/03_XGBoost
• Step 1: Execute code cell 2

37
https://devcloud.intel.com/oneapi/get_started/
Decision Trees are High Variance

• Problem: decision trees


tend to overfit
• Pruning helps reduce
variance to a point
• Often not significant for
model to generalize well

40
Improvement: Use Many Trees
Create Many New Trees and Combine predictions to reduce variance

41
How to Create Multiple Trees?
Grow decision tree from each bootstrapped sample

42
Distribution of Data in Bootstrapped Samples

0.40
• Given a dataset, create n bootstrapped
samples

0.35 • For a given record x,


(1-1/n)n

𝑃 𝑟𝑒𝑐 𝑥 𝑛𝑜𝑡 𝑠𝑒𝑙𝑒𝑐𝑡𝑒𝑑 =(1 − 1ൗ𝑛)𝑛

• Each bootstrap sample contains


0.30 approximately 2Τ3 of the records

0.25
0 20 40 60 80 100
Number of Bootstrapped Samples (n)

43
Aggregate Results
Trees vote on or average result for each data point

Vote to
Form a
Single
Classifier

44
Aggregate Results
Trees vote on or average result for each data point

Data
Point

Vote to
Form a
Single
Classifier

45
Aggregate Results
Trees vote on or average result for each data point

Vote to Form
a Single
Classifier

Results
46
Aggregate Results
Trees vote on or average result for each data point

Vote to
Form a
Single
Classifier

Bagging = Bootstrap Aggregating 47


Bagging error calculations
• Bootstrapped samples provide
built-in error estimate for each
tree
• Create tree based on subset of
data
• Measure error for that tree on
unused samples
• Called "Out-of-Bag" error

48
Calculation of Feature Importance

• Fitting a bagged model doesn't


produce coefficients like logistic
regression
• Instead, feature importances
are estimated using oob error
• Randomly permute data for
particular feature and measure
change in accuracy

49
• Bagging performance

RMSE (Cross-Validated)
improvements increase
with more trees

• Maximum improvement
generally reached ~50 trees
0 100 200 300 400 500
Number of Bagged Trees

50
Strengths of Bagging
Same as decision trees:
• Easy to interpret and
implement
• Heterogeneous input data
allowed, no preprocessing
required
Specific to bagging:
• Less variability than decision
trees
• Can grow trees in parallel*

51
BaggingClassifier: The Syntax
Import the class containing the classification method

from sklearn.ensemble import BaggingClassifier

Create an instance of the class

BC = BaggingClassifier(n_estimators=50)

Fit the instance on the data and then predict the expected value

BC = BC.fit(X_train, y_train)

y_predict = BC.predict(X_test)

Tune parameters with cross-validation. Use BaggingRegressor for regression. 52


Reduction in Variance Due to Bagging
• For 𝑛 independent trees,
each with variance σ2 , the
bagged variance is:

RMSE (Cross-Validated)
σ2
𝑛

• However, bootstrap
samples are correlated (𝜌):
2
1−𝜌 2
𝜌σ + σ 0 100 200 300 400 500
𝑛
Number of Bagged Trees

53
Introducing More Randomness

• Solution: further de-


correlate trees

RMSE (Cross-Validated)
• Use random subset of Bagging

features for each tree Random Forest

• Classification: 𝑚
• Regression: 𝑚Τ3

• Called "Random Forest" 0 100 200 300 400 500

Number of Bagged Trees

54
How Many Random Forest Trees?

• Errors are further reduced


for Random Forest relative

RMSE (Cross-Validated)
to Bagging
Bagging

• Grow enough trees until Random Forest

error settles down

• Additional trees won't


improve results 0 100 200 300 400 500

Number of Bagged Trees

55
RandomForest: The Syntax
Import the class containing the classification method

from sklearn.ensemble import RandomForestClassifier

Create an instance of the class

RC = RandomForestClassifier(n_estimators=100, max_features=10)

Fit the instance on the data and then predict the expected value

RC = RC.fit(X_train, y_train)

y_predict = RC.predict(X_test)

Tune parameters with cross-validation. Use RandomForestRegressor for regression. 56


Create Even More Randomness

• Sometimes additional randomness is desired beyond


Random Forest
• Solution: select features randomly and create splits
randomly—don't choose greedily
• Called "Extra Random Trees"

57
ExtraTreesClassifier: The Syntax
Import the class containing the classification method
from sklearn.ensemble import ExtraTreesClassifier

Create an instance of the class


EC = ExtraTreesClassifier(n_estimators=100, max_features=10)

Fit the instance on the data and then predict the expected value
EC = EC.fit(X_train, y_train)

y_predict = EC.predict(X_test)

Tune parameters with cross-validation. Use ExtraTreesRegressor for regression.

58
Hands on Lab 2



61
Decision Stump: the Boosting Base Learner

Temperature >50ºF Temperature

62
Decision Stump: the Boosting Base Learner

Temperature >50ºF Temperature

Humidity

Humidity
< 30%

63
Overview of Boosting
x x
Create Fit to data x x
initial and x x
decision calculate x
stump residuals x x x x

64
Overview of Boosting
Find new x x x x
decision
x x
stump to fit
weighted
x +
x x x x
residuals

65
Overview of Boosting
Fit new x x x x
x x x x
decision
x x x x
stump to
current
x + x

residuals x x x x x x x x

x x x x
Calculate x x x x
errors and x x x x
weight x + x
data points x x x x x x x x
66
Overview of Boosting
Find new x x x x
x x x x
decision
x x x x
stump to fit
weighted
x + x +
x x x x x x x x
residuals

67
Overview of Boosting
Fit new x x x x x x
x x x x x x
decision
x x x x x x
stump to
current
x + x + x
x x x x x x x x x x x x
residuals

68
Overview of Boosting
x x x x x x
x x x x x x
x x x x x x
x + x + x
x x x x x x x x x x x x

Combine to form a
single classifier
=
69
Overview of Boosting
x x x x x x
x x x x x x
x x x x x x
x + x + x
x x x x x x x x x x x x

x x
Combine to form a x x
single classifier
= x x
x
x x x x
70
Overview of Boosting

+𝜆 +𝜆

Result is weighted sum of all classifiers

Successive classifiers are weighted by


learning rate (𝜆)

Using a learning rate =


< 1.0 helps prevent overfitting
(regularization)

71
Boosting Specifics
• Boosting utilizes different loss
functions Loss

• At each stage, the margin is


determined for each point Incorrectly Correctly
Classified Classified
• Margin is positive for correctly Points Points
classified points and negative
for misclassifications
• Value of loss function is
calculated from margin 0
Margin

72
Gradient Boosting Loss Function
AdaBoost
• Generalized boosting method
that can use different loss Loss
functions
Deviance
• Common implementation uses (Gradient
binomial log likelihood loss Boosting)
function (deviance):
(−𝑚𝑎𝑟𝑔𝑖𝑛)
log(1 + 𝑒 ) 0-1 Loss
• More robust to outliers than
AdaBoost
Misclassified Correct
Margin
73
Bagging vs Boosting

Bagging Boosting
• Bootstrapped samples • Fit entire data set
• Base trees created • Base trees created successively
independently
• Only data points considered • Use residuals from previous
models
• No weighting used • Up-weight misclassified points
• Excess trees will not overfit • Beware of overfitting

74
Tuning a Gradient Boosted Model
Base
𝜆=0.1 • Learning rate (𝝀): set to <1.0
subsample=0.5
𝜆=0.1, subsample=0.5
for regularization. That’s also
𝜆=0.1, max_features=2 called “shrinkage”
• Subsample: set to <1.0 to use
Test Set Error

fraction of data for base


learners (stochastic gradient
boosting)
• Max_features: number of
features to consider in base
learners when splitting.
0 200 400 600 800 1000

Boosting Iterations

75
GradientBoostingClassifier: The Syntax
Import the class containing the classification method

from sklearn.ensemble import GradientBoostingClassifier

Create an instance of the class

GBC = GradientBoostingClassifier(learning_rate=0.1, max_features=1, subsample=0.5, n_estimators=200)

Fit the instance on the data and then predict the expected value

GBC = GBC.fit (X_train, y_train)

y_predict = GBC.predict(X_test)

Tune with cross-validation. Use GradientBoostingRegressor for regression.

76
18

16 15.5

14

12

10

8 7.5

6 5.4 5.7

3.7 3.8
4 3.4
3.1

1.8 2.1
2 1.5 1.4
1 1 1 1.1 1 1 1.0
0.4
0
higgs1m Letters Airline-ohe MSRank-30K Mortgage

78

# Train common XGBoost model as usual


xgb_model = xgb.train(params, X_train)
import daal4py as d4p
# XGBoost model to DAAL model
daal_model = d4p.get_gbt_model_from_xgboost(xgb_model)
# make fast prediction with DAAL
daal_prediction = d4p.gbt_classification_prediction(…).compute(X_test, daal_model)



• 51

79

for(int i = 0; i < n; i++) {


prefetch_row(bins[row_idx[i+10]*p);
for(int j = 0; j < p; j++) {
int bin = bins[row_idx[i]*p + j];
hist[bin].g += g[i]; Prefetch
hist[bin].h += h[i];
}
}

80
81
82
▪ Visit Intel® oneAPI AI ▪ Download the AI Kit ▪ Code Samples ▪ Machine Learning & ▪ Ask questions and
Analytics Toolkit (AI from Intel, Anaconda Analytics Blogs at Intel share information with
▪ Build, test and
Kit) for more details or any of your favorite Medium others through the
remotely run
and up-to-date package managers Community Forum
workloads on the ▪ Intel AI Blog site
product information ▪ Get started quickly Intel® DevCloud for ▪ Discuss with experts at
▪ Webinars and Articles
▪ Release Notes with the AI Kit Docker free. No software AI Frameworks Forum
at Intel® Tech Decoded
Container downloads. No
configuration steps.
▪ Installation Guide
No installations.
▪ Utilize the Getting
Started Guide

Download Now

83
XGBoost Hands-on Lab
85
86
Intel technologies may require enabled hardware, software or service activation. Learn more at intel.com or from the OEM or retailer.

Your costs and results may vary.

Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.

Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel
microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or
effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel
microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and
Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804. https://software.intel.com/en-
us/articles/optimization-notice

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.

Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to
any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated
purchases, including the performance of that product when combined with other products. See backup for configuration details. For more complete information about
performance and benchmark results, visit www.intel.com/benchmarks.

Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See configuration disclosure for details.
No product or component can be absolutely secure.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-
infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the
property of others.

87
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.

Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration
details. No product or component can be absolutely secure.

Your costs and results may vary.

Intel technologies may require enabled hardware, software or service activation.

Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be
claimed as the property of others.

88
XGBoost CPU vs GPU
Test configs: Tested by Intel as of 10/13/2020;
CPU: c5.18xlarge AWS Instance (2 x Intel® Xeon Platinum 8124M @ 18 cores, OS: Ubuntu 20.04.2 LTS, 193 GB RAM. GPU: p3.2xlarge AWS Instance (GPU: NVIDIA Tesla V100
16GB, 8 vCPUs), OS: Ubuntu 18.04.2 LTS, 61 GB RAM. SW: XGBoost 1.1:build from sources. compiler – G++ 7.4, nvcc 9.1. Intel® Data Analytics Acceleration Library (Intel®
DAAL): 2019.4 version; Python env: Python 3.6, Numpy 1.16.4, Pandas 0.25, Scikit-lean 0.21.2.

XGBoost fit CPU acceleration


Test configs: Tested by Intel as of 10/13/2020; c5.24xlarge AWS Instance, CLX 8275 @ 3.0GHz, 2 sockets, 24 cores per socket, HT:on, DRAM (12 slots / 32GB / 2933 MHz);
SW: XGBoost 0.81, 0.9, 1.0 and 1.1:build from sources. compiler – G++ 7.4, nvcc 9.1. Intel® DAAL: 2019.4 version; Python env: Python 3.6, Numpy 1.16.4, Pandas 0.25, Scikit-
lean 0.21.2.

End-to-End Census Workload Performance


Tested by Intel as of 10/15/2020. 2x Intel® Xeon® Platinum 8280 @ 28cores, OS: Ubuntu 19.10.5.3.0-64-generic Mitigated, 384GB RAM. SW: Modin 0.8.1, scikit-learn 0.22.2,
Pandas 1.0.1, Python 3.8.5, Daal4Py 2020.2 Census Data, (21721922, 45). Dataset is from IPUMS USA, University of Minnesota, www.ipums.org . Version 10.0.

Tiger Lake + Intel® Distribution of OpenVINO™ toolkit vs Coffee Lake CPU

System Board Intel prototype, TGL U DDR4 SODIMM RVP ASUSTeK COMPUTER INC. / PRIME Z370-A
th
CPU 11 Gen Intel® Core™ -5-1145G7E @ 2.6 GHz. 8th Gen Intel® Core™ i5-8500T @ 3.0 GHz.
Sockets / Physical cores 1/4 1/6
HyperThreading / Turbo Setting Enabled / On Na / On
Memory 2 x 8198 MB 3200 MT/s DDR4 2 x 16384 MB 2667 MT/s DDR4
OS Ubuntu* 18.04 LTS Ubuntu* 18.04 LTS
Kernel 5.8.0-050800-generic 5.3.0-24-generic
Software Intel® Distribution of OpenVINO™ toolkit 2021.1.075 Intel® Distribution of OpenVINO™ toolkit 2021.1.075
BIOS Intel TGLIFUI1.R00.3243.A04.2006302148 AMI, version 2401
BIOS release date Release Date: 06/30/2020 7/12/2019
BIOS Setting Load default settings Load default settings, set XMP to 2667
Test Date 9/9/2020 9/9/2020
Precision and Batch Size CPU: INT8, GPU: FP16-INT8, batch size: 1 CPU: INT8, GPU: FP16-INT8, batch size: 1
Number of Inference Requests 4 6
Number of Execution Streams 4 6
Power (TDP Link) 28 W 35W

89

You might also like