b-tagging activities Aug 9, 2007 Meenakshi Narain Brown University (co-conveners of LPC btag: Gerber & Narain)
July 2007 Workshop @LPC Goals and Format Goal: General review of b tagging and vertexing Strategies and plans for how to measure performance with real data. Format: Presentations in the morning with afternoon for discussions & decisions Topics: Monday: Vertexing and btagging Tuesday: How to measure efficiency and mistags from data Wednesday: 1) How to use the measurements from data in our physics analyses and 2) effect of detector issues on performance of btagging Thursday: Trigger and Wrapup
Documentation and Infos b tag & vertex algorithm task lists & contacts on twiki page https://twiki.cern.ch/twiki/bin/view/cms/btagpog. LPC btag workshop page: Comprehensive summary of various activities http://indico.cern.ch/conferencedisplay.py?confid=1 5416
Btagging/Vertexing Algos Many algorithms exist and implemented in CMSSW Performance being optimized Validation suites being developed
Vertex Reconstruction: VertexReconstruction Vertex Finding: Identification of vertices and assignment of tracks to vertices, with possible estimate of vertex position Offline primary vertex reconstruction Vertex finding in Jets Vertex Fitting: Most precise estimate of the vertex position and track parameters at vertex from a set of tracks Thomas Speer 9 th July 2007 -p.2
Vertex Reconstruction: VertexReconstruction Vertex Finding: Identification of vertices and assignment of tracks to vertices, with possible estimate of vertex position Offline primary vertex reconstruction Vertex finding in Jets Vertex Fitting: Most precise estimate of the vertex position and track parameters at vertex from a set of tracks Vertices and b-tagging: Primary Vertex: determine origin of jet - fragmentation tracks originate from the PV impact parameters, flight distances, etc., are defined relative to PV Secondary vertex reconstruction Thomas Speer 9 th July 2007 -p.3
Vertex Reconstruction: VertexReconstruction Vertex Finding: Identification of vertices and assignment of tracks to vertices, with possible estimate of vertex position Offline primary vertex reconstruction Vertex finding in Jets Vertex Fitting: Most precise estimate of the vertex position and track parameters at vertex from a set of tracks Vertices and b-tagging: Primary Vertex: determine origin of jet - fragmentation tracks originate from the PV impact parameters, flight distances, etc., are defined relative to PV Secondary vertex reconstruction Description of the algorithms Description of the VertexReco framework To-do list! Thomas Speer 9 th July 2007 -p.4
In the beginning were the tracks... Persistent track: reco::track in DataFormats/TrackReco States stored: Initial State : For the primary tracks: 2D-PCA to beamline For the other tracks, where it makes the most sense E.g., for vertex constrained tracks, at the vertex On First and Last measurement layer For all states: (x, p) + curvilinear error (21 floats) Not suitable for most higher-level algorithms (e.g. vertex, b/ -tagging) no access to magnetic field (no propagation!) use Tracks through TransientTrack Thomas Speer 9 th July 2007 -p.5
TransientTrack Transient track: reco::transienttrack (in TrackingTools/TransientTrack) https://twiki.cern.ch/twiki/bin/view/cms/swguidetransienttracks Gives access to different states, etc http://cmsdoc.cern.ch/releases/cmssw/latest_nightly/doc/html/dd/dc7/classreco_1_1transienttrack.html New: state at PCA to arbitrary BeamLine, taking into account tilt. (e.g. for TIP w.r.t. to be helix-line PCA) Has access to magnetic field ReferenceCounted (à la TSOS) Different concrete classes: TrackTransientTrack, GsfTransientTrack, TransientTrackFromFTS Same interface In your application, build TT through TransientTrackBuilder: //get the builder from the EventSetup: edm::eshandle<transienttrackbuilder> theb; isetup.get<transienttrackrecord>().get("transienttrackbuilder",theb); //do the conversion: vector<transienttrack> t_tks = (*theb).build(trackcollection); Thomas Speer 9 th July 2007 -p.6
Algorithms VertexFitters: https://twiki.cern.ch/twiki/bin/view/cms/swguidevertexfitting Kalman Filter Adaptive Vertex Fitter TrimmedKalmanVertex Fitter Gaussian-Sum Filter and others, not ported to CMSSW: Least Trimmed Squares, Least Median of Squares, Minimum Volume Ellipsoid, Minimum Covariance Determinant, M-estimator... Vertex finders: https://twiki.cern.ch/twiki/bin/view/cms/swguideofflinesecondaryvertexfinding TrimmedKalmanVertexFinder AdaptiveVertexReconstructor MultiVertexFit TertiaryTracksVertexFinder Thomas Speer 9 th July 2007 -p.7
B-tag introduction Different b-tagging algorithms may have different features in term of performances (efficiency vs mis-tagging rate) robustness against detector effect (e.g. misalignment, tracker inefficiencies) possibility to measure its efficiency on data need for MC calibration, data only calibration or no calibration So, different analysis may want to use different algorithms
Structure of b-tagging The CMS b-tagging is now organized as a two phases process: first some tag info or tagging variables are computed for jet/tracks/vertices/leptons then the computed information are used to compute the discriminators (floats) that can be used in the analysis RECO Tracks Jets Calo,PF,Gen JetTracksAssociation Primary Vertex Muon data ECAL data TagInfos ImpactParameter IP 3D and 2D dlen, jetdist Track prob CombinedSV Secondary Vtx multiplicity, mass flight dist,... SoftLepton (X2) Lepton ID Ptrel, Lepton IP energy fraction,.. Discriminators (produced with a pluggable fwk) HighEff JetProb TkCnt HighPur MVA TkCnt IP Comb SV New3 MVA SV New1 Soft ele New2 Soft mu Soft mu noip
Lifetime based algorithms Algorithms in CMS exploiting lifetime: Combined Secondary Vertex Track Counting Jet probability Pixel detector needed for all lifetime algorithms pixel resolution ~50um SiStrip only resolution ~mm Track quality selection is also applied to reject tracks with badly measured impact parameters
Combined SV algorithm In CMS a combined algorithm based on SV is avaiable: Define 3 vertex categories: reco vertex, pseudo vertex, no vertex Computes in each case some vertex/jet properties such as: track multiplicity invariant mass decay length (in transverse plane) track rapidities (wrt jet direction) fraction of energy of the SV IP of first track above charm A likelihood function is used to combine the above information
CombinedSV variables FINAL DISCRIMINATOR
Soft lepton tagging A b-hadron can decay producing one or more lepton in three ways: direct decay b -> l - (BR 10%) via charm, b -> c -> l + (BR 8%) via anti charm, b -> cbar -> l - (1.6%) The main background for this algorithm are light meson decaying to leptons, photon conversion, and wrong lepton ID The Pt_rel and the IP of the lepton are used to increase the discriminating power
Performances ORCA / PTDR Tk Counting High Eff Jet Probability Combined SV Tools exists in RecoBTag/Analysis to study algorithm performance in a standard way The performances of the algorithms in CMSSW is almost at the level of PTDR Track Counting CombinedSV Probability MVA IP MVA SV Training/calibration still needed to get optimal performances MVA very preliminary but promising CMSSW
Vertexing/btagging US tasks Improve analysis / validation suite F.Yumiceva, V.Bazterra, C.Kopecky, L. Christofek, Puerto Reco (E.Ramirez et al.). Provide ultra-combined (MVA-based) b tag (L.Christofek, in collaboration with C.Saout). Make use of Muon ID default in b µ tag, to improve performance at low Pt. (Ping Tan) Check if Track (HitPattern) RECO/AOD object contains all info we need for b tagging (Z.Wan). Investigate use of track jets and DR association with CaloJets. (C.Gerber).
b tag performance w/ data Use of b mu to measure b efficiency (Ping/Gerber & Francisco/Narain/Bloch) Use of ve tags to measure uds efficiency (L.Christofek, Jeremy & Daniel) Use of t-t-bar to measure b and c efficiency (Kukartsev, Narain, Speer, Joris & Steven)
Methods for Performance Studies using data btag efficiency from ttbar events (Santa Barbara, Bruxelles) Use b-enriched sample of semileptonic ttbar events to estimate btag eff. SystemD method (FNAL, Brown, Strasbourg) Use muon+jet events and two ~uncorrelated taggers to measure the b-tagging efficiency. Pt-rel method (UIC, FNAL) Measure tagging eff. of lifetime based taggers using pt-rel distribution in muon+jet events Light quark mistagging rate from data (Strasbourg) Use ve impact parameter significance distribution in data to estimate light quark mistag rate
using µ+jet events System 8 Method Method requires events with 2 jets, one with a muon of Pt > 6 GeV. Make 8 measurements: µ+jets, µ+ jets tagged with lifetime, µ+ jets tagged with pt(rel); µ+ jets tagged with both. Repeat requiring away jet tagged by lifetime. Then solve for unknowns! Measured & true efficiencies b µ pt_rel efficency D.Bloch, M.Narain, F.Yumiceva b-lifetime tag efficency
System 8 Method Expected performance in early running: Use µ in jet trigger Back-of-the-envelope calculation (M.Narain, D. Bloch, F. Yumiceva): Relative systematic errors: ~10% at 10 pb -1 & ~3% for > 100 pb -1. Relative Statistical errors: 1 fb -1
pt-rel Method P. Tan, C. Gerber Use µ+jet events Determine b-fraction using a fit of templates to the muon pt-rel distribution b b ( pt ) + N f ( pt ) N! f! Extract btag efficiency from above fractions determined before and after applying other (lifetime based) tagger rel c c rel
The method We use semileptonic decays: From data: N 1, N 2, N 3 - number of events with 1,2,3 tagged jets Luminosity From MC: F ijk fractions of events with i b-jets, j c-jets, k light jets (no tagging, MC truth only) Selection efficiency sel We expect <N 1,2,3 > = f( b, c, l, F ijk, sel, lumi, ttbar ) Maximize loglikelihood and find b, c, ttbar : L= log PoissonN 1, N 1 Poisson N 2, N 2 Poisson N 3, N 3 July 30, 2007, Gena Kukartsev Tagging efficiency and ttbar cross section with semileptonic decays Slide 5 of 12 10
Toy result Solid lines true MC values b l c Discriminator value July 30, 2007, Gena Kukartsev Tagging efficiency and ttbar cross section with semileptonic decays Slide 9 of 12
Confidence levels Monte Carlo (equivalent ~ 100/pb) 68% confidence level # tags # events 0 1691 1 4256 2 2806 3 378 4 20 5 1 95% confidence level July 30, 2007, Gena Kukartsev Tagging efficiency and ttbar cross section with semileptonic decays Slide 10 of 12
User Interface Software Design (V.Bazterra/ Thomas Speer) Will create DB DB contains TRF for DATA and MC with effi_b and effi_uds Will measure b, c & uds efficiency in data for 4 cuts (according to b tag efficiency) and for all b tag algorithms. Results stored in DB, as function of Et, rapidity Also store b tag cut value used. User interface: On data: bool pass = btag( Combined, Loose, Jet); On MC - use scale factor pair (pass, weight) = btag( Combined, Loose, Jet, Truth); Or on MC - use data TRF float effi = btag( Combined, Loose, Jet, Truth);
Tag Rate Functions Assume the following are measured and available in data for use: b-tag efficiency (TRF ε b ) derived using ttbar events or muon-jet events or a combination thereof. c-tag efficiency (TRF ε c ) Derived from ttbar events or c-jet MC scaled to dijet data rates ε c = ε c MC (ε b µ data/ ε b µ MC ) Light quark Mistag Rates (TRF lq) derived using negative tags using multijet events + MC correction factors Or maybe smarter method at a future date which uses all tags
TRFs TRFb TRFc TRF light
Multiple Operating Points 12 at Dzero AT CMS - if we want ALL btaggers for ALL jet definitions, we have at least 70 combinations!!! Need to think very carefully how to use btagging in any physics analysis if performance measurement is given from DATA and NOT purely MC based. Agreed on 4 operating points per tagger.
Multiple Operating Points Example from Dzero
Tagging Analysis Data MC Background/Signal Apply Selection Cuts Apply b-tagging Final Data Sample Apply Selection Cuts Calculating b-tagging probability Background/ Signal Estimation Analysis applies b- tagging two ways Driven by measuring efficiency on data, not MC!!
Estimating The Signal/Background Measure b-tagging efficiency on data, but wish to apply in MC or other non-b-quark data sample. Method 1 D (most analyses) Determine efficiency vs. jet p T, η, etc., on data, and use as lookup table in MC. Method 2 Determine MC-to-data tagging ratio vs. jet p T, η, etc., on data, and use as lookup table applied to tags found in MC. CDF/D Both require determining, in Data, the eff of a standard b-jet and matching it to MC jets Method 3 Use data sample with same flavor content of sample you are interested in, and derive tagging function (p T, η, etc.) and apply directly. (hbb/d )
Jet Tagging Probability For the MC event: probability for a jet of a given flavor α (b, c or light jet) to be tagged product of the taggability and the tagging efficiency
Event Tagging Probability For the MC event: Event tagging probabilities P event: derived by weighting each reconstructed jet in the event by the per jet tagging probability P α (pt, η) according to its flavor α, its pt and its η. The probability to have at least one tag in a given event
Using Multiple Operating points In the case that the working points are inclusive, i.e. a Tight jet is necessarily Loose. A jet can be defined as : Tight tagged (T), Loose but not Tight tagged (L) not Loose tagged (U). The probability of an event to pass a given tagging scheme is given by: where the sum is over all permutations of T, L & U
Tag Permutations The weighting procedure allow to estimate the number of tagged events, but does not give access to the actual tagged jets in the event. If one wants to use kinematic variables using the tagged or untagged jets, then we need to consider each permutation in the sum separately.
Conclusions Many US participants now plugged into mainstream issues in btagging/vertexing A successful workshop with a lot of discussion with all key developers of btagging Many issues - mostly emphasizing how to measure performance from data and how to use them in physics analyses were discussed. This led to change in thinking of the group and hence agreement for possible modifications of the taggers to allow this Develop Framework to measure performance from Data Start a dialogue with the physics groups on proposal for using btagging in the analyses (plus develop framework)