You are on page 1of 35

Deep Dive

June 21st, 2023

Electron ID (online and offline) and efficiency measurements

Riccardo Salvatico for EGM


Electron-matter interactions
Electrons lose energy by interacting with nuclei and atomic electrons

e!
e!

Dominating energy loss mechanism above a few MeV: bremsstrahlung


2
Electron-matter interactions
Photons lose energy by interacting with nuclei and atomic electrons too

e!
e!

e!
𝛾

e"

Dominating energy loss mechanism above a few tens of MeV: electron-positron pair production
3
Electron-matter interactions
These two processes are at the basis of the electromagnetic showers that we detect in our calorimeters…

ECAL
e!
𝛾

e! e"
e! e!

e!
. e 𝛾
𝛾
e"

e"

4
Electron-matter interactions
…but in CMS, such interactions with matter start to take place well before the calorimeters

ECAL

Tracker

e 𝛾
.

The tracker material extends up to ~2 radiation lengths → large bremsstrahlung/pair production cross section
5
Electron-matter interactions
Furthermore, tracker and calorimeters are immersed in a 3.8 T magnetic field

× • When the electrons emit bremsstrahlung photons, they lose


ECAL energy and the bending angle of their track increases (“kink”)

• Need for special track reconstruction algorithm to consider


Tracker
energy losses via photon radiation (Gaussian-sum filter)

e 𝛾 • Bremsstrahlung can happen multiple times – and so can pair


. production

• Possibility of multiple energy deposits in ECAL

• Bending along 𝜼 of the low-𝑝# particles within EM showers in


the crystals causes further spread of the shower (“mustache”)
6
How to ID an electron

Essentially three categories of identifiers

1. Shower shape

o 𝝈𝒊𝜼𝒊𝜼 → lateral shower-shape variable: spread of an EM


shower in the 𝜂 direction. Defined as the logarithm of
the energy-weighted RMS of the EM shower in units of
crystals. Good discrimination of electrons (and photons)
vs jets, whose spread is larger.
CMS Simulation (13 TeV)
×103
0.2
∆η

Arbitrary units
0.15 0.5

0.1
0.4

0.05

0.3
0

−0.05 0.2

−0.1 𝝈𝒊𝜼𝒊𝜼
1< ETseed
< 10 GeV 0.1
−0.15 1.48 < η < 1.75
seed

−0.2 0
−0.5 0 0.5 7
∆φ [rad]
How to ID an electron

Essentially three categories of identifiers

2. Tracking and cluster-track matching (examples)

o Missing hits → maximum number of allowed mixing hits along a track

o 𝛘𝟐 → track fit quality

o 1/E – 1/p → compares energy of the supercluster and momentum measured in the tracker

o 𝒇𝐛𝐫𝐞𝐦 → fraction of momentum lost between the point of closest approach to the vertex
and the extrapolation to the surface of the ECAL

𝟐 𝟐
𝟐 ∆𝝓𝟏 ∆𝝓𝟐 ∆𝒛 𝟐
HLT o 𝒔 = + + → measure of the compatibility of ECAL SC and hits in the
𝒂𝝓𝟏 𝒂𝝓𝟐 𝒂𝒛
first two layers of the Pixel detector 8
How to ID an electron

Essentially three categories of identifiers

3. Isolation

o H/E → energy deposited in HCAL / energy deposited in ECAL A. Kapoor

o Tracker + PF Cluster → isolation (sometimes relative) based on


tracks, ECAL and HCAL PF clusters. Typically computed in a cone of
∆𝑅 = 0.3 (hollow for the tracker isolation)

o PF-based isolation → isolation from charged hadrons, neutral


hadrons and photons. Note that this is correlated with H/E

Isolation variables are particularly sensitive to pileup, so they need to be corrected (e.g., by subtracting 𝜌𝐴/00 ).

9
And now…?
What an electron should look like in the detector is clearly process-dependent

HCAL HCAL

ECAL ECAL

Tracker Tracker
Jet

. e . e

Extreme example: isolated electron VS electron in a jet


10
How to ID an electron

The variables discussed are the most used to identify standard (?) electrons with 𝟏𝟎 < 𝒑𝐓 < 𝟐𝟎𝟎 𝐆𝐞𝐕

IDs are crafted by manipulating them in terms of sequential cuts or using multivariate/ML techniques

The performance of the IDs will depend on the variables used, on the training samples and
on the application samples!

Typically use pat::Electron, so essentially ECAL clusters associated to a track with high (~98-99%) efficiency

Low (< 10 GeV) and high (> 200 GeV) electrons often require a different level of creativity
→ dedicated talks in today’s session

11
EGM electron IDs

Generic, standard model-oriented IDs

Not designed for being the most cutting-edge IDs for everyone’s signal!

→ provide a good ID in most of the cases, general stability vs PU, and a few working points to choose in between.

Cut-based IDs CMS Simulation (13 TeV) 2017


0.4

Background efficiency
0.35
BDT-based ID
0.3 Cut-based ID

p le 0.25 Barrel
m
Exa 0.2
Endcap

0.15

0.1
Easy to “play with”: analyzers can reproduce them without a 0.05
quantity they aren’t interested in or add cuts on top
0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
EGM provides four working points: veto, loose, medium, tight Signal efficiency
12
EGM electron IDs

Generic, standard model-oriented IDs

Not designed for being the most cutting-edge IDs for everyone’s signal!

→ provide a good ID in most of the cases, general stability vs PU, and a few working points to choose in between.

MVA (BDT) IDs


• Large number of input variables. Trained on MC DY+Jets → prompt
electrons as signal, unmatched and non-prompt as background.

• Generally better performance than cut-based ones (but this is a signal-


dependent statement…). Tuned for electrons with 𝒑𝐓 > 𝟏𝟎 𝐆𝐞𝐕.

• EGM provides two working points: 80 and 90% signal efficiency, but
one can also choose their own threshold on the discriminant.

• Two version: iso and non iso, for more flexibility (many analysis prefer
to use non iso version and apply isolation criteria on top).
Note that H/E is an input variable for both! 13
Other electron IDs

Analyzers or large analysis groups often develop their own IDs, targeting better performance for a given signal or topology

HZZ MVA “ttH” MVA ID


• Optimized for low 𝑝# ( ~ 5-15 GeV), isolated • Aims at identifying prompt vs non-prompt electrons
electrons for low mass H searches
• Particularly focused on processes with top quarks
• Specifically devised for very high efficiency
• Uses XGBoost as EGM MVA IDs, so differences are
essentially in training samples, variables and training
parameters

Ana Sculac

AN-2022/016
14
Other electron IDs

Analyzers or large analysis groups often develop their own IDs, targeting better performance for a given signal or topology

Other proposals presented to EGM include:

q “ParticleNet lepton”: DGCNN to separate prompt, non-prompt, and fake electrons

q “Merged electron ID” for high-mass 4 lepton resonance in boosted regime

q Cut-based ID mixing detector-based and PF-based isolation

Few cases of “general purpose, high performance” IDs. Several cases of topology-oriented IDs.

15
Zoology of electron triggers

Creating their own electron ID is something several people go for. Creating their own trigger is somewhat more complex
(or perhaps just less known or less necessary…)

Single, tight-ID, isolated electron


Double electron with “medium” ID/isolation
HLT_Ele30_WPTight_Gsf
HLT_Ele23_Ele12_CaloIdL_TrackIdL_IsoVL
Single, high-𝑝# , non-isolated electron

HLT_Ele115_CaloIdVT_GsfTrkIdT

Double electron with tight pixel-matching but no full-track reconstruction

HLT_DoubleEle33_CaloIdL_MW

Double electron with no pixel-matching nor tracking


Electron scouting
HLT_DiEle27_WPTightCaloOnly_L1DoubleEG
DST_Run3_*_PFScoutingPixelTracking
16
Electron IDs at HLT

Select only a subset of the possible L1 seeds to start the reconstruction:

L1_SingleLooseIsoEG26er2p5 OR L1_SingleLooseIsoEG26er1p5 OR L1_SingleLooseIsoEG28er2p5 OR


L1_SingleLooseIsoEG28er2p1 OR L1_SingleLooseIsoEG28er1p5 OR L1_SingleLooseIsoEG30er2p5 OR
L1_SingleLooseIsoEG30er1p5 OR L1_SingleEG26er2p5 OR L1_SingleEG38er2p5 OR
L1_SingleEG40er2p5 OR L1_SingleEG42er2p5 OR L1_SingleEG45er2p5 OR L1_SingleEG60 OR
L1_SingleEG34er2p5 OR L1_SingleEG36er2p5 OR L1_SingleIsoEG24er2p1 OR L1_SingleIsoEG26er2p1
OR L1_SingleIsoEG28er2p1 OR L1_SingleIsoEG30er2p1 OR L1_SingleIsoEG32er2p1 OR
L1_SingleIsoEG26er2p5 OR L1_SingleIsoEG28er2p5 OR L1_SingleIsoEG30er2p5 OR
L1_SingleIsoEG32er2p5 OR L1_SingleIsoEG34er2p5

“EG” in the seed names→ at L1, no distinction between electrons and photons!


17
Electron IDs at HLT

Build the Particle Flow clusters and the ECAL superclusters


18
Electron IDs at HLT

Apply transverse momentum threshold (in this case of 32 GeV)


19
Electron IDs at HLT

Apply cutoffs over quantities related to the shape of the supercluster


(in this case, σ2323 )


20
Electron IDs at HLT

Similar to the ECAL case, reconstruct HCAL-related quantities

Threshold on H/E

Isolation based on ECAL and HCAL PF clusters


21
Electron IDs at HLT

ECAL-seeded pixel matching: search for the compatibility of a


supercluster with a track (doublet or triplet in the Pixel detector)

ECAL

Pixel

22
Electron IDs at HLT

Apply quality requirements on the pixel track (𝑠 4 ).

Reconstruct the whole GSF track.

Cutoffs on:

• 1/E – 1/p
• number of allowed missing hits in the tracker layers
• difference in 𝜂 and 𝜙 between the supercluster and the track
positions extrapolated to the beamspot

Apply tracker-based isolation.

The end 23
What not to forget

The purpose of triggering is to reduce the number of events to be saved, processed and stored

§ When a new trigger is devised, we always have a certain particle or process in mind.
Increasing the purity is a good option for lowering the rate. But…

Purity

Rate
§ …we also want to be somewhat generic in defining ID criteria, so that we can use the acquired data for multiple
analyses, searches, studies. An electron trigger could be potentially exploited to save events containing objects that
behave like electrons.
24
What not to forget, also

§ Need to keep low HLT reconstruction time → HLT algorithms often simplified wrt offline ones. Notable examples for
electrons:

o Absence of supercluster refinement procedure (i.e., tracker information will not help adding or removing PF
clusters to the ECAL mustache supercluster)

o Electron objects can only be ECAL-seeded and not tracker-seeded

§ Order matters! Simpler algorithms run first, so that more complex


ones can run on less events or less objects.

25
Electron efficiency measurement
Reconstruction, trigger and ID efficiencies for given process → number of events observed

Plus, efficiencies as modeled in MC simulation are never 1:1 with those of collision data → scale factors

Both offline and at HLT

o Tag & Probe (EGM T&P)


Ø The most used, both offline and at HLT.

Relevant at HLT

o Orthogonal dataset (tutorial)


Ø Useful, for example, when a measurement in a particular phase space is preferred for analysis-related reasons.

o Reference trigger (tutorial)


Ø Useful when dealing with cross triggers (e.g., a trigger based on the presence of leptons and jets in the same event)

26
Tag & Probe – recipe

Take a dataset that is rich of


your favorite resonance and
its decay into two electrons

e
e
Z
Z e
e

e e
Z e Z
e

For the momentum range


considered in this talk, this
would be a Z boson.

27
Tag & Probe – recipe

Take a dataset that is rich of


your favorite resonance and
its decay into two electrons Select a particle (TAG) by applying tight electron
ID criteria – you want to be as sure as you can
that this is really an electron from Z decay.
e
e
Z e
Z e
e
Z
e
e e
Z e Z
e

For the momentum range


considered in this talk, this
would be a Z boson.

28
Tag & Probe – recipe

Take a dataset that is rich of


your favorite resonance and
its decay into two electrons

e
e
Z
Z e
e e
Z
e e e
Z e Z
e

For the momentum range Select a second object (PROBE) within


considered in this talk, this reasonable invariant mass window around Z
would be a Z boson. peak and with opposite charge sign wrt the tag.

29
Tag & Probe – recipe

Take a dataset that is rich of


your favorite resonance and
Use collection of probes to test the
its decay into two electrons
efficiency of the desired requirement

e
e
Z
Z e
e e
Z
e Passing Failing
e e
probes probes
Z e Z
e

For the momentum range


considered in this talk, this
would be a Z boson.

30
Tag & Probe – recipe

Take a dataset that is rich of Efficiencies


34.3willfbbe
-1 the ratio
(13.6 TeV) 2022
your favorite resonance and 1.2

Efficiency
its decay into two electrons CMS Passing probes
Preliminary Passing probes + Failing probes
1
e
e
Z
Z e
e e
0.8
Z
e
e e
0.6
Z e Z
e

For the momentum range 0.4


considered in this talk, this
0.00 < |η| < 1.44
would be a Z boson.
1.57 < |η| < 2.00
0.2
2.00 < |η| < 2.50
31
More on the Tag & Probe
Why Tag & Probe?

• Mainly: it can be applied to datasets which are full of true electrons and provides a good way to distinguish them,
namely the matching with the resonance (matching with MC truth is only possible in simulation) → typically small
statistical uncertainties.

• If measurement is performed on inclusive dataset (not just on special phase space), results can be reasonably
applicable to subsets of phase space too – with some uncertainties, of course.

Fit-based TnP with background subtraction

Fit-based TnP generally not


necessary for HLT efficiencies

→ measurement performed with


respect to ID-ed offline electrons

→ almost no background

→ cut & count is enough


32
Summary

q Identifying electrons in the 10 < 𝑝# < 200 GeV range is typically a matter of playing with properties of

§ ECAL clusters
§ Tracks and track-cluster matching
§ Isolation

q BUT how and which ones to use is strongly dependent on what an electron is considered to be in analyses

q Cut-based IDs are flexible and rather straightforward. One can get very creative with multivariate/ML techniques

q The identification at HLT makes use of similar, but simplified algorithms with respect to offline

§ Necessity to be generic, but also to control rate and timing

q A few common ways to measure efficiency – offline and at HLT. Tag&Probe still the most common for electrons
33
BACKUP
ttH MVA – input variables

Analyzers or large analysis groups often develop their own IDs, targeting better performance for a given signal or topology

“ttH” MVA ID AN-2022/016

• Aims at identifying prompt vs non-prompt electrons

• Particularly focused on processes with top quarks

• Uses XGBoost as EGM MVA IDs, so differences are


essentially in training samples, variables and training
parameters

35

You might also like