Professional Documents
Culture Documents
ATLAS MLWorkshop
ATLAS MLWorkshop
Tensorflow in CMSSW
Mauro Verzetti (CERN and FWO)
on behalf of the CMS Collaboration
The problem - Jet flavour classification
jet combined
Secondary
PV vertex (SV)
Track selection
based tagger
SV
super
combined
e/µ
Soft-lepton (SL)
based tagger
s = 13 TeV s = 13 TeV
1 1
misid. probability
misid. probability
CMS Simulation Preliminary CMS Simulation Preliminary
tt events tt events
AK4jets (p > 30 GeV) AK4jets (p > 90 GeV)
T T
DeepFlavour phase 1 DeepFlavour phase 1
10−1 10−1
DeepCSV phase 1 DeepCSV phase 1
DeepCSV phase 0 DeepCSV phase 0
udsg udsg
10−2 c 10−2 c
10−3 10−3
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
b jet efficiency b jet efficiency
Misid. probability
0.9
CMS Simulation Preliminary
QCD events, p = 30-50 GeV
T
0.8 jet p > 30 GeV
T
Similar performance to simpler, 0.7 DeepJet
dedicated binary taggers, but with full 0.6 recurrent
multi-class power. convolutional
0.5
0.4
Significantly better performances in
given regions with different quark 0.3
composition 0.2
0.1
{
P-CNN layers x14 Dense
30 features, sIP ordered Output
521 nodes x1
Flavour
Secondary Vtx x5
14 features,
P-CNN layers x14
displacement ordered
Figure 1. Comparison of the performance of the two BDT taggers and the
CMS-DP-2017-049
CNN taggers in terms of ROC curves in MC simulated events for top jets as
as background. The events correspond to AK8 jets with 1000<pT<1400 GeV
!10 M. Verzetti (CERN and FWO)
• GRU = Gated Recurrent Unit = Recurrent network to reduce
dimensionality of output from Conv1D layers
DeepDoubleB (60×32, 5×32) → (50, 50)
Double-b (27) BN
features 7
Significantly better
300 < jet pT < 2000 GeV
40 < jet mSD < 200 GeV
DeepDoubleBvL, AUC = 97.3%
DeepDoubleBvL
double-b, AUC = 91.3%
than current BDT
Figure 4. Effect on the jet soft-drop mass
10 1 approach!
distribution of misidentified events by the
DeepDoubleBvL identification algorithm
demonstrating the degree to which the
algorithm is dependent on the mass of the
jet. These histograms are obtained for a fixed
overall mistagging rate from a QCD sample.
10 2
Some mass
sculpting
3
10
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Tagging efficiency (H ! bb̄)
CMS DP-2018/046
7/5/2018 10
Mass sculpting
gone
13 CMS DP-2018/046
7/5/2018 12
ROC Comparison
DeepDoubleCvB
Figure 3. Performance of the double-b and
the DeepDoubleBvL and DeepDoubleCvB
quark-antiquark pair jet identification
algorithms demonstrating the probability of
misidentifying Hbb jets as a function of the
tagging efficiency. These receiver operating
characteristic curves are obtained from a
combined sample of Hbb and Hcc.
7/5/2018 8 9
CMS DP-2018/046
!13 M. Verzetti (CERN and FWO)
From training to
practice
Two worlds colliding
• Keras + TensorFlow
• Custom framework
• Python-based
• C++ based (speed!)
• Private productions
• Expendable jobs
• Many other concurrent
activities → memory
constraints
•
+ Processes cannot die (e.g.
trigger)
• Access to all the needed internals for production usage with minimal
need for custom code
Multi-threading in TF
•Solved with the implementation of two custom sessions:
Marcel Rieger - 28.2.
●
• Without any threading (NTSession)
• Sharing the thread pool with the rest of the framework (TBBSession)
Similar behavior
→2 in C/C++ API
→ 10
!19 M. Verzetti (CERN and FWO)
Issue 2: Memory footprint
• no post-processing needed
• For each thread, mxnet creates creates a workspace to store temporary variables (partially
computed results).
• If multiple graphs are run simultaneously there might be a race condition, but this does should not
happen since each thread runs one graph at a time…
• …unless the object computing the graph is reattached to a different thread every time, in a
worker-pool design!
Solution: re-attach the workspace every time before running the computation graph (mxnet patch)
• Additional mutex added when initialising the Executor as helgrind reported race conditions in
initialisation (minimal cost)
• For each thread, mxnet creates creates a workspace to store temporary variables (partially
computed results).
• If multiple graphs are run simultaneously there might be a race condition, but this does should not
happen since each thread runs one graph at a time…
• …unless the object computing the graph is reattached to a different thread every time, in a
worker-pool design!
Solution: re-attach the workspace every time before running the computation graph (mxnet patch)
• Additional mutex added when initialising the Executor as helgrind reported race conditions in
initialisation (minimal cost)
MXNet TensorFlow
BLAS library)
• Significantly better operators
• Simple model export for support and coverage (always
inference
cutting edge)
udsg
10−2 c
10−3
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
b jet efficiency
DeepDoubleC - mass sculpting
ulpting
oubleCvL
on the jet soft-drop mass
misidentified events by the
identification algorithm
the degree to which the
pendent on the mass of the
rams are obtained for a fixed
ng rate from a QCD sample.
11
CMS-DP-2018-XXX
!31 M. Verzetti (CERN and FWO)
Particle-based NN architecture
CMS-DP-2017-013
mance of the b jet identification algorithms demonstrating the
32probability
Figure 5: Performance of the DeepCSV (retrained for the(CERN
M. Verzetti Phaseand
1 detect
FWO)
P-CNNs
particles (sorted)
features
{
{{
zam = SaSj kaa,j xa,(m+j-1)
P-CNNs
particles (sorted)
features
{
{{
zam = SaSj kaa,j xa,(m+j-1)
particles (sorted)
features
{
{{
zam = SaSj kaa,j xa,(m+j-1)
particles (sorted)
features
{
{{
zam = SaSj kaa,j xa,(m+j-1)