Welcome to Scribd!

XG Boost

Uploaded by

0% found this document useful (0 votes)

6 views11 pages

This document provides instructions for using multivariate analysis (MVA) to train a decision tree model to distinguish between signal and background events. It describes using Monte Carlo samples to divide data into training and test samples. Decision variables are identified and used to split the samples and find the optimal decision boundary that separates signal and background with the highest purity. Boosting is then used to iteratively reweight misclassified events to improve the model. XGBoost is recommended for its performance, regularization, flexibility and parallel computing capabilities. The notebook tutorial provides hands-on guidance for applying these techniques to train an electron MVA identification model.

Original Description:

Original Title

Xg Boost

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

6 views11 pages

XG Boost

Uploaded by

kambledashrath13

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 11

Search inside document

CMSPOS 2019 E/gamma long exercise

Aachen
Exercise 5: MVA training

Sam Harper, Swagata Mukherjee

Task: draw decision boundary

B B

S S

How to find the best decision boundary?

Decision Tree
• Task: Divide events into signal and background
• Need Monte Carlo samples of each. Root node
• Divide each Monte Carlo sample into two
parts: training sample, test sample
• Decide some ID variables useful for Branch
Branch
distinguishing between signal and background.
node
• For each ID variable, order the events by the
value of the variable.
• Pick 1st variable, pick a cut value, and see node
what happens if the training sample is split
into two parts depending on that cut on that
variable. (Repeat)
• Pick the cut which gives the best separation :
one side having mostly signal and the other
mostly background. Non-overlapping
• Then repeat this for each variable.
Decision Tree
Define purity as

Note that P*(1-P) is 0 if the sample is pure signal or pure background.

For a given branch, construct this: n is the number of events on that branch.

Minimize (Gini_left + Gini_right)

Boosting

•If event is misclassified, i.e., a signal event

lands on a background leaf or a background
event lands on a signal leaf, then the weight of
that event is increased (boosted).
•A second tree is built using the new weights
•Again misclassified events have their weights
boosted and the procedure is repeated.
•Typically, one may build few thousand trees
this way: Random Forest.
Different types of Boosting

• Gradient Boosting
• XGBoost: eXtreme Gradient Boosting
• XGBoost algorithm has recently been dominating applied machine learning. Why?
• Parallel Computing: when you run xgboost, by default, it would use all the cores of
your machine
• Regularization: a technique used to avoid overfitting
• Flexibility: supports user defined evaluation metrics
• Availability: Currently, it is available for programming languages such as R, Python,
Java, etc.
• We will use it to train electron MVA ID
Next slides are technical and will be explained
as we take you through the tutorial

Any questions?
Start the exercise by following instructions here
https://github.com/guitargeek/ElectronMVATutorial

First step is running the ntuplizer. Use slc7

ssh -XY your_username@lxplus7.cern.ch
export SCRAM_ARCH=slc7_amd64_gcc700

Take the code from git and remember to do

“scram b” and “voms-proxy-init -voms cms”

Use a Run3 root file (choose a reasonable maxEvents)

/store/mc/Run3Summer19MiniAOD/DYJets_incl_MLL-50_TuneCP5_14TeV-madgraphMLM-
pythia8/MINIAODSIM/2023Scenario_106X_mcRun3_2023_realistic_v3-
v1/270000/222889F5-1E13-F34C-B312-B9B102119CBB.root

Once you have ran the ntuplizing step, open the root tree and have a look what’s in there

Find variable definitions/explanations in:

ElectronIdentification/data/ElectronIDVariables.txt

Now, open the SWAN notebook

Notebook
• Select a cell and then Shift+return to run a cell
• In[*] means running. In[some_number] means
run complete
• You can insert a cell like this
• Insert your own username
• Follow the in-line instructions

At the end,
check
variable importance
More things to try
• Change hyper-parameters and see if performance improves. You can
change learning rate.
• Consult this: https://xgboost.readthedocs.io/en/latest/parameter.html

SketchBook User Manual
Document57 pages
SketchBook User Manual
JCM
100% (1)
Introduction To LTspice IV
Document38 pages
Introduction To LTspice IV
29377
No ratings yet
Layout Reference Guide
Document296 pages
Layout Reference Guide
krish2322
100% (1)
10.1-GDIR Data Integrity
Document12 pages
10.1-GDIR Data Integrity
Sebastian Lopez
No ratings yet
Using Python Libraries
Document101 pages
Using Python Libraries
VenugopalavarmaRaja
No ratings yet
Decision Trees
Document14 pages
Decision Trees
Justin Russo Harry
50% (2)
Digital Modulations using Matlab
From Everand
Digital Modulations using Matlab
Mathuranathan Viswanathan
Rating: 4 out of 5 stars
4/5 (6)
DAA Unit 1
Document84 pages
DAA Unit 1
Sarvesh Ahuja
No ratings yet
Machine Learning CNN
Document28 pages
Machine Learning CNN
abcd efgh
No ratings yet
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
From Everand
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
Mark Magic
No ratings yet
ML QB Solutionss
Document16 pages
ML QB Solutionss
Kunj Trivedi
No ratings yet
Cory Rieth Lecture MVPA
Document23 pages
Cory Rieth Lecture MVPA
Rasheed Kibria
No ratings yet
ML (Interview)
Document20 pages
ML (Interview)
ratnadepp
No ratings yet
ML Lab 11 Manual - Neural Networks (Ver4)
Document8 pages
ML Lab 11 Manual - Neural Networks (Ver4)
dodela6303
No ratings yet
Data Mining CS4168 Lecture 5 Basics of Classification 1
Document25 pages
Data Mining CS4168 Lecture 5 Basics of Classification 1
alina sheikh
No ratings yet
MODULE 2 Deep Learning
Document26 pages
MODULE 2 Deep Learning
ENG20DS0040 SukruthaG
No ratings yet
Module1 ECO-598 AI & ML Aug 21
Document45 pages
Module1 ECO-598 AI & ML Aug 21
Soujanya Nerlekar
No ratings yet
SciKit Cheat Sheets ALL
Document6 pages
SciKit Cheat Sheets ALL
J Salazar
No ratings yet
Introduction To Machine Learning
Document24 pages
Introduction To Machine Learning
harith danish
No ratings yet
lab4DSPBEE13 Sampling
Document5 pages
lab4DSPBEE13 Sampling
Maryam
No ratings yet
Machine Learning 1707965934
Document15 pages
Machine Learning 1707965934
robson110770
No ratings yet
MID 2 Presentation
Document40 pages
MID 2 Presentation
Quratulain Tariq
No ratings yet
Lab4: Sampling and Quantization of Audio Signal in Matlab
Document9 pages
Lab4: Sampling and Quantization of Audio Signal in Matlab
Rabail InKredibl
100% (1)
Fundamentals of Machine Learning Support Vector Machines, Practical Session
Document4 pages
Fundamentals of Machine Learning Support Vector Machines, Practical Session
vothiquynhyen
No ratings yet
Neural Networks
Document68 pages
Neural Networks
Haiping Lu
No ratings yet
Dclde2013 Neural Nets
Document43 pages
Dclde2013 Neural Nets
qwervqwerv
No ratings yet
PyData London 2022 - Unlocking The Power of LightGBM (Summarized)
Document28 pages
PyData London 2022 - Unlocking The Power of LightGBM (Summarized)
rigginstim87
No ratings yet
Neuralnetworks 1
Document65 pages
Neuralnetworks 1
rdsraj
No ratings yet
Learning Techniques For NILMTK
Document9 pages
Learning Techniques For NILMTK
UMAR
No ratings yet
Assignment 7
Document3 pages
Assignment 7
jaanav mathavan
No ratings yet
Ai/Ml Workshop Presented by Dapplogix Software Pvt. LTD.: Guest Speakers
Document27 pages
Ai/Ml Workshop Presented by Dapplogix Software Pvt. LTD.: Guest Speakers
Kranthi Kumar Kukkala
No ratings yet
Deep Learning Lab (Ai&ds)
Document39 pages
Deep Learning Lab (Ai&ds)
BELMER GLADSON Asst. Prof. (CSE)
No ratings yet
Machine Learning: 1. Write Shorts Notes On: Bernoulli Naivebayes. (Unit 4)
Document17 pages
Machine Learning: 1. Write Shorts Notes On: Bernoulli Naivebayes. (Unit 4)
Ananya Swaminathan
No ratings yet
Deep Learning
Document40 pages
Deep Learning
Dr. Dnyaneshwar Kirange
No ratings yet
Decision Trees: Decision Tree Is One of The Most Widely Used and
Document53 pages
Decision Trees: Decision Tree Is One of The Most Widely Used and
Alka Choudhary
No ratings yet
Practical 1
Document8 pages
Practical 1
Chidera Abanulo
No ratings yet
Lecture 4
Document45 pages
Lecture 4
Parag Dhanawade
No ratings yet
Unit 3
Document110 pages
Unit 3
Nishanth Nuthi
No ratings yet
Transform Raw Texts Into Training and Development Data: Instructor: Nikos Aletras
Document2 pages
Transform Raw Texts Into Training and Development Data: Instructor: Nikos Aletras
MD. Naimul Isalm Shovon
No ratings yet
Artificial Neural Networks - Lect - 4
Document17 pages
Artificial Neural Networks - Lect - 4
ma5395822
No ratings yet
Convolutional Neural Networks (2) : Geena Kim
Document34 pages
Convolutional Neural Networks (2) : Geena Kim
Huston LAM
No ratings yet
Random Forest
Document16 pages
Random Forest
tanmayi nandiraju
No ratings yet
Machine Learning: Prepared by
Document44 pages
Machine Learning: Prepared by
abdala sabry
No ratings yet
UNIT I - Session 2
Document12 pages
UNIT I - Session 2
sb8515
No ratings yet
Advanced Predictive Analytics Using R & Python: - Muquayyar Ahmed Data Scientist
Document11 pages
Advanced Predictive Analytics Using R & Python: - Muquayyar Ahmed Data Scientist
Sashank Sai
No ratings yet
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
Document45 pages
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
GiGa GF
No ratings yet
CIS 520, Machine Learning, Fall 2015: Assignment 2 Due: Friday, September 18th, 11:59pm (Via Turnin)
Document3 pages
CIS 520, Machine Learning, Fall 2015: Assignment 2 Due: Friday, September 18th, 11:59pm (Via Turnin)
DiDo Mohammed Abdulla
No ratings yet
4 DataAnalyics Part1
Document59 pages
4 DataAnalyics Part1
Ali Shana'a
No ratings yet
Mtech Project Seminar1
Document36 pages
Mtech Project Seminar1
pateal
No ratings yet
GE8151 KSN Notes - by WWW - EasyEngineering.net 1
Document108 pages
GE8151 KSN Notes - by WWW - EasyEngineering.net 1
Sri Ch.V.Krishna Reddy Assistant Professor (Sr,)
No ratings yet
Blockchain Platforms Allow The Development of Blockchain-Based
Document22 pages
Blockchain Platforms Allow The Development of Blockchain-Based
Nithya Prasath
No ratings yet
GNN Python Code in Keras and Pytorch - by YashwanthReddyGoduguchintha - Medium
Document10 pages
GNN Python Code in Keras and Pytorch - by YashwanthReddyGoduguchintha - Medium
ravinder.ds7865
No ratings yet
Prof. Mohammed Tanzeem Agra
Document33 pages
Prof. Mohammed Tanzeem Agra
CR world
No ratings yet
Design and Anlaysis of Algorithm
Document206 pages
Design and Anlaysis of Algorithm
Sandeep Venupure
No ratings yet
Decision Trees
Document11 pages
Decision Trees
Derek Degbedzui
No ratings yet
Project Deep Learning
Document4 pages
Project Deep Learning
Prasu Muthyalapati
No ratings yet
Spectral Analysis & Programming Options
Document48 pages
Spectral Analysis & Programming Options
ashimgiyanani
No ratings yet
Types of Pruning Techniques
Document10 pages
Types of Pruning Techniques
Smriti Piyush
No ratings yet
Intro CNN PDF
Document31 pages
Intro CNN PDF
Aditi Jaiswal
No ratings yet
CNN
Document31 pages
CNN
gourav Verma
No ratings yet
LP1 1
Document129 pages
LP1 1
Sagar Khode
No ratings yet
Data Preprocessing: Modern Data Analytics (G0Z39A) Prof. Dr. Ir. Jan de Spiegeleer
Document82 pages
Data Preprocessing: Modern Data Analytics (G0Z39A) Prof. Dr. Ir. Jan de Spiegeleer
Ali Shana'a
No ratings yet
DigitalLogic ComputerOrganization L23 Multicore Handout
Document32 pages
DigitalLogic ComputerOrganization L23 Multicore Handout
Phan Tuấn Khôi
No ratings yet
Chaduvu 17 03 2021
Document1 page
Chaduvu 17 03 2021
Chepuri Sravan Kumar
No ratings yet
Cookie Grammarly
Document7 pages
Cookie Grammarly
Aroneda
No ratings yet
Identity Management in IoT Networks Using Blockchain and Smart Contracts
Document7 pages
Identity Management in IoT Networks Using Blockchain and Smart Contracts
mdmisbahuddin
No ratings yet
Wireshark Assignment 2 TCP
Document8 pages
Wireshark Assignment 2 TCP
geni1701d
No ratings yet
Sathiyaraj.s Resume
Document3 pages
Sathiyaraj.s Resume
Anand
No ratings yet
Avaya Interaction Center Release 7.3 Telephony Connectors Programmer Guide
Document418 pages
Avaya Interaction Center Release 7.3 Telephony Connectors Programmer Guide
Rodrigo Mahon
No ratings yet
Sentiment Analysis On Online Reviews
Document11 pages
Sentiment Analysis On Online Reviews
Rishabh Jha
No ratings yet
Guidance On The New Submission Process
Document2 pages
Guidance On The New Submission Process
GabiIordache
No ratings yet
Implementation of Algorithm To Detect The Diseases in Fruit Using Image Processing Technique
Document39 pages
Implementation of Algorithm To Detect The Diseases in Fruit Using Image Processing Technique
babu surendra
No ratings yet
Final Cad
Document24 pages
Final Cad
PRATEEK GOYAL
No ratings yet
OF PROPELLER HEAD Finish
Document20 pages
OF PROPELLER HEAD Finish
Prajoth 171A224
No ratings yet
CS125 Asmt10 Perimeter PDF
Document3 pages
CS125 Asmt10 Perimeter PDF
Robbie Shaw
No ratings yet
Advanced Analytics Integration With Qlik Sense Takeaway
Document4 pages
Advanced Analytics Integration With Qlik Sense Takeaway
caner g
No ratings yet
Hipo Lecture
Document12 pages
Hipo Lecture
Jenny Sison-Wilson
100% (7)
Java Codelab Solutions - Section 2.2.3
Document1 page
Java Codelab Solutions - Section 2.2.3
thetechboss
No ratings yet
Andy Seracuse Cover Letter (Cyber Chasse)
Document1 page
Andy Seracuse Cover Letter (Cyber Chasse)
andy
No ratings yet
43bfl2214 12 Dfu Ron
Document43 pages
43bfl2214 12 Dfu Ron
andrew
No ratings yet
Introduction To Software
Document3 pages
Introduction To Software
LifeLong Skills
No ratings yet
ELEC 4509 Quiz1 PDF
Document8 pages
ELEC 4509 Quiz1 PDF
Xiuquan Zhang
No ratings yet
Assignment Dict For Python
Document6 pages
Assignment Dict For Python
devansh
No ratings yet
Markdown Guide
Document11 pages
Markdown Guide
Renaud Harrys Telolahy
No ratings yet
Solid Edge Certification Exam Guide Rev3
Document40 pages
Solid Edge Certification Exam Guide Rev3
José
No ratings yet
Acronis True Image - Create Boot CD Review and Tutorial PDF
Document5 pages
Acronis True Image - Create Boot CD Review and Tutorial PDF
Betovhen Humberto Sanz
No ratings yet
Risk Management Software: Strikerisk V6.0 Iec/En 62305-2 Protection Against Lightning
Document4 pages
Risk Management Software: Strikerisk V6.0 Iec/En 62305-2 Protection Against Lightning
wise man
No ratings yet
Emerson Automated Patch Management Service
Document8 pages
Emerson Automated Patch Management Service
bio
No ratings yet
KT Biometric SOP V1.0
Document3 pages
KT Biometric SOP V1.0
FCI chhola Office bhopal
No ratings yet
Hortonworks Support Compatability Report
Document1 page
Hortonworks Support Compatability Report
Sivaprakash Chidambaram
No ratings yet