DeepDive Eleefficiencies

Deep Dive
June 21st, 2023
Electron ID (online and offline) and efficiency measurements
Riccardo Salvatico for EGM

Electron-matter interactions
Electrons lose energy by interacting with nuclei and atomic electrons
e!
e!
Dominating energy loss mechanism above a few MeV: bremsstrahlung

2
Photons lose energy by interacting with nuclei and atomic electrons too
e!
e!
e!
𝛾
e"
Dominating energy loss mechanism above a few tens of MeV: electron-positron pair production
3
These two processes are at the basis of the electromagnetic showers that we detect in our calorimeters…
ECAL
e!
𝛾
e! e"
e! e!
e!
. e 𝛾
𝛾
e"
e"
4
…but in CMS, such interactions with matter start to take place well before the calorimeters
ECAL
Tracker
e 𝛾
.
The tracker material extends up to ~2 radiation lengths → large bremsstrahlung/pair production cross section
5
Furthermore, tracker and calorimeters are immersed in a 3.8 T magnetic field
× • When the electrons emit bremsstrahlung photons, they lose

ECAL energy and the bending angle of their track increases (“kink”)
• Need for special track reconstruction algorithm to consider

Tracker
energy losses via photon radiation (Gaussian-sum filter)
e 𝛾 • Bremsstrahlung can happen multiple times – and so can pair

. production
• Possibility of multiple energy deposits in ECAL
• Bending along 𝜼 of the low-𝑝# particles within EM showers in

the crystals causes further spread of the shower (“mustache”)
6
How to ID an electron
Essentially three categories of identifiers
1. Shower shape
o 𝝈𝒊𝜼𝒊𝜼 → lateral shower-shape variable: spread of an EM

shower in the 𝜂 direction. Defined as the logarithm of
the energy-weighted RMS of the EM shower in units of
crystals. Good discrimination of electrons (and photons)
vs jets, whose spread is larger.
CMS Simulation (13 TeV)
×103
0.2
∆η
Arbitrary units
0.15 0.5
0.1
0.4
0.05
0.3
0
−0.05 0.2
−0.1 𝝈𝒊𝜼𝒊𝜼
1< ETseed
< 10 GeV 0.1
−0.15 1.48 < η < 1.75
seed
−0.2 0
−0.5 0 0.5 7
∆φ [rad]
2. Tracking and cluster-track matching (examples)
o Missing hits → maximum number of allowed mixing hits along a track
o 𝛘𝟐 → track fit quality
o 1/E – 1/p → compares energy of the supercluster and momentum measured in the tracker
o 𝒇𝐛𝐫𝐞𝐦 → fraction of momentum lost between the point of closest approach to the vertex
and the extrapolation to the surface of the ECAL
𝟐 𝟐
𝟐 ∆𝝓𝟏 ∆𝝓𝟐 ∆𝒛 𝟐
HLT o 𝒔 = + + → measure of the compatibility of ECAL SC and hits in the
𝒂𝝓𝟏 𝒂𝝓𝟐 𝒂𝒛
first two layers of the Pixel detector 8
3. Isolation
o H/E → energy deposited in HCAL / energy deposited in ECAL A. Kapoor
o Tracker + PF Cluster → isolation (sometimes relative) based on

tracks, ECAL and HCAL PF clusters. Typically computed in a cone of
∆𝑅 = 0.3 (hollow for the tracker isolation)
o PF-based isolation → isolation from charged hadrons, neutral

hadrons and photons. Note that this is correlated with H/E
Isolation variables are particularly sensitive to pileup, so they need to be corrected (e.g., by subtracting 𝜌𝐴/00 ).
9
And now…?
What an electron should look like in the detector is clearly process-dependent
HCAL HCAL
ECAL ECAL
Tracker Tracker
Jet
. e . e
Extreme example: isolated electron VS electron in a jet

10
The variables discussed are the most used to identify standard (?) electrons with 𝟏𝟎 < 𝒑𝐓 < 𝟐𝟎𝟎 𝐆𝐞𝐕
IDs are crafted by manipulating them in terms of sequential cuts or using multivariate/ML techniques
The performance of the IDs will depend on the variables used, on the training samples and
on the application samples!
Typically use pat::Electron, so essentially ECAL clusters associated to a track with high (~98-99%) efficiency
Low (< 10 GeV) and high (> 200 GeV) electrons often require a different level of creativity
→ dedicated talks in today’s session
11
EGM electron IDs
Generic, standard model-oriented IDs
Not designed for being the most cutting-edge IDs for everyone’s signal!
→ provide a good ID in most of the cases, general stability vs PU, and a few working points to choose in between.
Cut-based IDs CMS Simulation (13 TeV) 2017

0.4
Background efficiency
0.35
BDT-based ID
0.3 Cut-based ID
p le 0.25 Barrel
m
Exa 0.2
Endcap
0.15
0.1
Easy to “play with”: analyzers can reproduce them without a 0.05
quantity they aren’t interested in or add cuts on top
0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
EGM provides four working points: veto, loose, medium, tight Signal efficiency
12
EGM electron IDs
Generic, standard model-oriented IDs
Not designed for being the most cutting-edge IDs for everyone’s signal!
→ provide a good ID in most of the cases, general stability vs PU, and a few working points to choose in between.
MVA (BDT) IDs

• Large number of input variables. Trained on MC DY+Jets → prompt
electrons as signal, unmatched and non-prompt as background.
• Generally better performance than cut-based ones (but this is a signal-

dependent statement…). Tuned for electrons with 𝒑𝐓 > 𝟏𝟎 𝐆𝐞𝐕.
• EGM provides two working points: 80 and 90% signal efficiency, but
one can also choose their own threshold on the discriminant.
• Two version: iso and non iso, for more flexibility (many analysis prefer
to use non iso version and apply isolation criteria on top).
Note that H/E is an input variable for both! 13
Other electron IDs
Analyzers or large analysis groups often develop their own IDs, targeting better performance for a given signal or topology
HZZ MVA “ttH” MVA ID

• Optimized for low 𝑝# ( ~ 5-15 GeV), isolated • Aims at identifying prompt vs non-prompt electrons
electrons for low mass H searches
• Particularly focused on processes with top quarks
• Specifically devised for very high efficiency
• Uses XGBoost as EGM MVA IDs, so differences are
essentially in training samples, variables and training
parameters
Ana Sculac
AN-2022/016
14
Other electron IDs
Other proposals presented to EGM include:
q “ParticleNet lepton”: DGCNN to separate prompt, non-prompt, and fake electrons
q “Merged electron ID” for high-mass 4 lepton resonance in boosted regime
q Cut-based ID mixing detector-based and PF-based isolation
Few cases of “general purpose, high performance” IDs. Several cases of topology-oriented IDs.
15
Zoology of electron triggers
Creating their own electron ID is something several people go for. Creating their own trigger is somewhat more complex
(or perhaps just less known or less necessary…)
Single, tight-ID, isolated electron

Double electron with “medium” ID/isolation
HLT_Ele30_WPTight_Gsf
HLT_Ele23_Ele12_CaloIdL_TrackIdL_IsoVL
Single, high-𝑝# , non-isolated electron
HLT_Ele115_CaloIdVT_GsfTrkIdT
Double electron with tight pixel-matching but no full-track reconstruction
HLT_DoubleEle33_CaloIdL_MW
Double electron with no pixel-matching nor tracking

Electron scouting
HLT_DiEle27_WPTightCaloOnly_L1DoubleEG
DST_Run3_*_PFScoutingPixelTracking
16
Electron IDs at HLT
Select only a subset of the possible L1 seeds to start the reconstruction:
L1_SingleLooseIsoEG26er2p5 OR L1_SingleLooseIsoEG26er1p5 OR L1_SingleLooseIsoEG28er2p5 OR

L1_SingleLooseIsoEG28er2p1 OR L1_SingleLooseIsoEG28er1p5 OR L1_SingleLooseIsoEG30er2p5 OR
L1_SingleLooseIsoEG30er1p5 OR L1_SingleEG26er2p5 OR L1_SingleEG38er2p5 OR
L1_SingleEG40er2p5 OR L1_SingleEG42er2p5 OR L1_SingleEG45er2p5 OR L1_SingleEG60 OR
L1_SingleEG34er2p5 OR L1_SingleEG36er2p5 OR L1_SingleIsoEG24er2p1 OR L1_SingleIsoEG26er2p1
OR L1_SingleIsoEG28er2p1 OR L1_SingleIsoEG30er2p1 OR L1_SingleIsoEG32er2p1 OR
L1_SingleIsoEG26er2p5 OR L1_SingleIsoEG28er2p5 OR L1_SingleIsoEG30er2p5 OR
L1_SingleIsoEG32er2p5 OR L1_SingleIsoEG34er2p5
“EG” in the seed names→ at L1, no distinction between electrons and photons!
…
17
Electron IDs at HLT
Build the Particle Flow clusters and the ECAL superclusters
…
18
Electron IDs at HLT
Apply transverse momentum threshold (in this case of 32 GeV)
…
19
Electron IDs at HLT
Apply cutoffs over quantities related to the shape of the supercluster

(in this case, σ2323 )
…
20
Electron IDs at HLT
Similar to the ECAL case, reconstruct HCAL-related quantities
Threshold on H/E
Isolation based on ECAL and HCAL PF clusters
…
21
Electron IDs at HLT
ECAL-seeded pixel matching: search for the compatibility of a

supercluster with a track (doublet or triplet in the Pixel detector)
ECAL
Pixel
22
Electron IDs at HLT
Apply quality requirements on the pixel track (𝑠 4 ).
Reconstruct the whole GSF track.
Cutoffs on:
• 1/E – 1/p
• number of allowed missing hits in the tracker layers
• difference in 𝜂 and 𝜙 between the supercluster and the track
positions extrapolated to the beamspot
Apply tracker-based isolation.
The end 23
What not to forget
The purpose of triggering is to reduce the number of events to be saved, processed and stored
§ When a new trigger is devised, we always have a certain particle or process in mind.
Increasing the purity is a good option for lowering the rate. But…
Purity
Rate
§ …we also want to be somewhat generic in defining ID criteria, so that we can use the acquired data for multiple
analyses, searches, studies. An electron trigger could be potentially exploited to save events containing objects that
behave like electrons.
24
What not to forget, also
§ Need to keep low HLT reconstruction time → HLT algorithms often simplified wrt offline ones. Notable examples for
electrons:
o Absence of supercluster refinement procedure (i.e., tracker information will not help adding or removing PF
clusters to the ECAL mustache supercluster)
o Electron objects can only be ECAL-seeded and not tracker-seeded
§ Order matters! Simpler algorithms run first, so that more complex

ones can run on less events or less objects.
25
Electron efficiency measurement
Reconstruction, trigger and ID efficiencies for given process → number of events observed
Plus, efficiencies as modeled in MC simulation are never 1:1 with those of collision data → scale factors
Both offline and at HLT
o Tag & Probe (EGM T&P)

Ø The most used, both offline and at HLT.
Relevant at HLT
o Orthogonal dataset (tutorial)

Ø Useful, for example, when a measurement in a particular phase space is preferred for analysis-related reasons.
o Reference trigger (tutorial)

Ø Useful when dealing with cross triggers (e.g., a trigger based on the presence of leptons and jets in the same event)
26
Tag & Probe – recipe
Take a dataset that is rich of

your favorite resonance and
its decay into two electrons
e
e
Z
Z e
e
e e
Z e Z
e
For the momentum range

considered in this talk, this
would be a Z boson.
27

its decay into two electrons Select a particle (TAG) by applying tight electron
ID criteria – you want to be as sure as you can
that this is really an electron from Z decay.
e
e
Z e
Z e
e
Z
e
e e
Z e Z
e

would be a Z boson.
28

e
e
Z
Z e
e e
Z
e e e
Z e Z
e
For the momentum range Select a second object (PROBE) within

considered in this talk, this reasonable invariant mass window around Z
would be a Z boson. peak and with opposite charge sign wrt the tag.
29

Use collection of probes to test the
efficiency of the desired requirement
e
e
Z
Z e
e e
Z
e Passing Failing
e e
probes probes
Z e Z
e

would be a Z boson.
30
Take a dataset that is rich of Efficiencies

34.3willfbbe
-1 the ratio
(13.6 TeV) 2022
your favorite resonance and 1.2
Efficiency
its decay into two electrons CMS Passing probes
Preliminary Passing probes + Failing probes
1
e
e
Z
Z e
e e
0.8
Z
e
e e
0.6
Z e Z
e
For the momentum range 0.4

0.00 < |η| < 1.44
would be a Z boson.
1.57 < |η| < 2.00
0.2
2.00 < |η| < 2.50
31
More on the Tag & Probe
Why Tag & Probe?
• Mainly: it can be applied to datasets which are full of true electrons and provides a good way to distinguish them,
namely the matching with the resonance (matching with MC truth is only possible in simulation) → typically small
statistical uncertainties.
• If measurement is performed on inclusive dataset (not just on special phase space), results can be reasonably
applicable to subsets of phase space too – with some uncertainties, of course.
Fit-based TnP with background subtraction
Fit-based TnP generally not

necessary for HLT efficiencies
→ measurement performed with

respect to ID-ed offline electrons
→ almost no background
→ cut & count is enough

32
Summary
q Identifying electrons in the 10 < 𝑝# < 200 GeV range is typically a matter of playing with properties of
§ ECAL clusters
§ Tracks and track-cluster matching
§ Isolation
q BUT how and which ones to use is strongly dependent on what an electron is considered to be in analyses
q Cut-based IDs are flexible and rather straightforward. One can get very creative with multivariate/ML techniques
q The identification at HLT makes use of similar, but simplified algorithms with respect to offline
§ Necessity to be generic, but also to control rate and timing
q A few common ways to measure efficiency – offline and at HLT. Tag&Probe still the most common for electrons
33
BACKUP
ttH MVA – input variables
“ttH” MVA ID AN-2022/016
• Aims at identifying prompt vs non-prompt electrons
• Particularly focused on processes with top quarks
• Uses XGBoost as EGM MVA IDs, so differences are

essentially in training samples, variables and training
parameters
35

DeepDive Eleefficiencies

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DeepDive Eleefficiencies

Uploaded by

Copyright:

Available Formats

Deep Dive

June 21st, 2023

Electron ID (online and offline) and efficiency measurements

Riccardo Salvatico for EGM

Dominating energy loss mechanism above a few MeV: bremsstrahlung

× • When the electrons emit bremsstrahlung photons, they lose

• Need for special track reconstruction algorithm to consider

e 𝛾 • Bremsstrahlung can happen multiple times – and so can pair

• Possibility of multiple energy deposits in ECAL

• Bending along 𝜼 of the low-𝑝# particles within EM showers in

Essentially three categories of identifiers

o 𝝈𝒊𝜼𝒊𝜼 → lateral shower-shape variable: spread of an EM

Essentially three categories of identifiers

2. Tracking and cluster-track matching (examples)

o Missing hits → maximum number of allowed mixing hits along a track

o 𝛘𝟐 → track fit quality

Essentially three categories of identifiers

o H/E → energy deposited in HCAL / energy deposited in ECAL A. Kapoor

o Tracker + PF Cluster → isolation (sometimes relative) based on

o PF-based isolation → isolation from charged hadrons, neutral

Extreme example: isolated electron VS electron in a jet

Generic, standard model-oriented IDs

Cut-based IDs CMS Simulation (13 TeV) 2017

Generic, standard model-oriented IDs

MVA (BDT) IDs

• Generally better performance than cut-based ones (but this is a signal-

HZZ MVA “ttH” MVA ID

Other proposals presented to EGM include:

q “ParticleNet lepton”: DGCNN to separate prompt, non-prompt, and fake electrons

q “Merged electron ID” for high-mass 4 lepton resonance in boosted regime

q Cut-based ID mixing detector-based and PF-based isolation

Single, tight-ID, isolated electron

Double electron with tight pixel-matching but no full-track reconstruction

Double electron with no pixel-matching nor tracking

Select only a subset of the possible L1 seeds to start the reconstruction:

L1_SingleLooseIsoEG26er2p5 OR L1_SingleLooseIsoEG26er1p5 OR L1_SingleLooseIsoEG28er2p5 OR

Build the Particle Flow clusters and the ECAL superclusters

Apply transverse momentum threshold (in this case of 32 GeV)

Apply cutoffs over quantities related to the shape of the supercluster

Similar to the ECAL case, reconstruct HCAL-related quantities

Isolation based on ECAL and HCAL PF clusters

ECAL-seeded pixel matching: search for the compatibility of a

Apply quality requirements on the pixel track (𝑠 4 ).

Reconstruct the whole GSF track.

Apply tracker-based isolation.

o Electron objects can only be ECAL-seeded and not tracker-seeded

§ Order matters! Simpler algorithms run first, so that more complex

Both offline and at HLT

o Tag & Probe (EGM T&P)

o Orthogonal dataset (tutorial)

o Reference trigger (tutorial)

Take a dataset that is rich of

For the momentum range

Take a dataset that is rich of

For the momentum range

Take a dataset that is rich of

For the momentum range Select a second object (PROBE) within

Take a dataset that is rich of

For the momentum range

Take a dataset that is rich of Efficiencies

For the momentum range 0.4

Fit-based TnP with background subtraction