Velo readout board RB3 Testing... Common L1 board (ROB) Specifying Federica Legger 10 February 2003 1
Summary LHCb Detectors Online (Trigger, DAQ) VELO (detector and Readout chain) L1 electronics for VELO Functionalities and Prototyping Syncronization and error correction L1T Preprocessing L1B Implementations (DSP - SBSRAM vs. FPGA - QDR) Testing What s next RB4 -> L1 Common Board (ROB) (VELO, VETO, ST, OT) 10 February 2003 2
LHCb Systematic studies of CP violation in the beauty sector by measuring particle - antiparticle time dependent decay rate asymmetries. 10 February 2003 3
Online Data Flow Detectors Event data High p T tracks No pile-up L0 electronics L0-trigger data L0 Trigger Event data L0-trigger decision Secondary vertices L1 electronics L1-trigger data L1 Trigger Event data L1-trigger decision Reconstructed B events DAQ (L2 and L3 trigger) Storage 10 February 2003 4
Trigger/DAQ Fixed latency 4.0 µs Variable latency <1 ms Level 0 Trigger Level 1 Trigger 40 MHz 1 MHz 40 khz 1 MHz Number of Channels 900000 Bunch crossing rate 40 MHz Level-0 accept rate 1 MHz Level-1 accept rate 40 khz Readout Rate 40 khz Variable latency L2 ~10 ms L3 ~200 ms Timing & Fast Control Throttle L0 L1 Storage VDET TRACK ECAL HCAL MUON RICH RU SFC RU CPU CPU RU LHCb Detector Front-End Electronics Front-End Multiplexers (FEM) Read-out Network (RN) SFC CPU CPU Trigger Level 2 & 3 Event Filter Front End Links Read-out units (RU) (Network Processors) Sub-Farm Controllers (SFC) Control & Monitoring LAN Data rates 40 TB/s 1 TB/s 4 GB/s 2-4 GB/s 20 MB/s 10 February 2003 5
VELO Detector Sensor design: Double metal layer strips readout Analog readout for better hit resolution and monitoring 2048 strips/sensor - 16 Front-End chips (128 channels per chip - 1344 FE chips in total) Present configuration: 21 stations 1 station = 2 modules (left and right) 1 module = 2 sensors (84 sensors in total) 1 sensor = 1 hybrid (84 hybrids in total) 10 February 2003 6
Counting room (~40 m): VELO L1 Read-Out VELO Power supplies Standard components Cavern (~10 m): VELO L0 TFC/ECS Rad tolerant components VELO (1-2 m): VELO L0 Front-End Rad Hard components Limited space and access High radiation levels 10 February 2003 7
Readout Chain VELO L0 (Front-End) FE chips Link drivers TFC/ECS receivers VELO L0 (ECS) TFC/ECS drivers VELO L1 (Read-Out) Readout boards Power supplies Silicon sensor 2K strips 16 Front-end chips on hybrid Vacuum vessel Connectors on hybrid Feedthroughs Kapton cables Cavern 1 m 10 m Repeater cards SPECS crate TFC coupler ~40 m 64 data links to the L1 electronics Counting room LHCb TFC/ECS 64-channel readout board 10 February 2003 8
VELO L1 Board functional overview Analog FADC Sync Logic L1PPI L1T L1 Buffer DAQI DAQ ECS, TFC 10 February 2003 9
L1 - Prototyping RB1 (1999) To setup the design and test environment 10 MHz 2-ch FADC card 2K FIFO FPGA TTCrx RB2 (2000) 40 MHz 4-ch FADC Clock delay line Widely used for VELO/IT detector tests L1T preprocessor FADC card ECS prototype 10 February 2003 10
Clock & Vref Synchronization and L1T preprocessor RB3 (2001) 16-channels board All functionalities of the final board L1B card ECS interface SLINK card 4-ch FADC to L1T RU card TTCrx receiver SLINK control FPGAs Front-End Emulator SLINK card to DAQ Control FPGA Power connectors 10 February 2003 11
TFC Link (optical) Analog Data Links (twisted pairs) L1 Functionality (RB3) Vref control (16) 64 input analog links from (4) 16 Front-End chips TTCrx receiver chip Clock phase Front-End Emulator Synchronization and error detection logic FADC Ve rte xtrigg e r Pre p r ocesso r Verte x Trigge r Interface (4) 8-bit 40MHz FADC (Clock & Vref) TFC interface (TTCrx) Front-End Emulator L1 Accept 40 khz Readout board L1 Buffe r L1 derandomizer Synchronization and error detection Vertex L1 Trigger preprocessor L1 Buffer (1900 events) and L1 derandomizer (FIFO/SRAM/QDR) Data Processing (DSP/FPGA) Single link to L1 Trigger RU Data processing (DSP/FPGA) Control interface DAQ interface Single link to DAQ NP Standard LHCb ECS interface (Credit Card PC) ECS link (Ethernet) DAQ Link (to NP) Link to L1 trigger (to RU) Located in standard LHCb crate with power back-plane 10 February 2003 12
Synchronization and error checking Each board is equipped with a TTCrx chip. It provides: the 40 MHz sampling clock for the ADC s which is used together with a delay chip for channel individual phase adjustment the L0 accept signal used in the FEM and to increment the EvId counter which serves as event identification identification on the board the Bunch counter the L1 decision The FE emulator is based on the actual readout chip and provides an accurate generation of the situation at the detector level. Event identification consistency check is done among neighboring channels, TTC information and the FE emulator. All errors are flagged and attached to the data. 10 February 2003 13
L0 FE Chip Analog readout of the silicon sensor: Better hit resolution and detector monitoring L0 Accept Beetle 1.1 10 February 2003 14
L1 Pre Processor tasks Pre Processing Interface to L1T ADC Pedestal subtraction (strip individual) Channel reordering Faulty channel masking Common mode suppression Hit detection Cluster encoding Cluster encapsulation Input to the processor Pedestal table Channel mask Parameters Hit threshold table Parameters, alignment table Event and board information To L1T 10 February 2003 15
The LCMS algorithm Noise in channel i Y i = a + b i Input data after pedestal correction Mean value calculation Mean value Mean value correction 10 February 2003 16
Calculate the slope Correct the slope RMS calculation RMS value Hit detected Hit detection 10 February 2003 17
Set detected hit to zero Hit set to zero for second iteration Calculate mean value Correct the mean value Calculate the slope again Correct the slope 10 February 2003 18
Insert the hits previously set to zero Insert previous hit This is also a hit Apply strip individual hit threshold mask 10 February 2003 19
Common mode suppression Tested by mixing "noise" coming from test beam data and "signal" m(b > ππ) from Monte-Carlo LCMS (Linear Common Mode Suppression) Applied on 1 analog channel (32 detector channels) 8 bit precision The data processing has been verified to be bit accurate by using: the FPGA simulation and a C coded algorithm Now implemented on DSP (C/Assembler) to produce the same results 10 February 2003 20
L1PPI prototype The LCMS algorithm has been implemented in an ALTERA APEX 100k Gate FPGA 128 detector channels have been processed working at 80 MHz resource usage: 3200 (77%) LE 20 (70%) ESB 10 February 2003 21
DSP based DAQ interface test setup FPGA used as L1B controller and data source Control interface (RB2) L1 Buffer Bus Switch Evaluation card for Texas Instrument TMS320C6711 DSP 10 February 2003 22
L1 Buffer L1 electronics requirements for read and write: Simultaneously: Write: 36 word / 900ns Read: 36 word / 900ns (The read time can be longer for some special L1B implementations, where only L1 accepted events are read out). L1 buffer size Buffer size is 1820 events deep => 64k word deep buffer (36 word / event) Memory type options FIFO SRAM QDR/DDR 10 February 2003 23
Comparison FIFO SRAM Simple to control since dual port and no addresses high I/O count per channel (low bandwidth) high cost, increasing depth adds high cost ($100k / 1820 events) L1 rejected events have to be read out low cost direct access from DSP possible (SBSRAM supported by DSP s) only L1 accepted events are read out Memory controller and bus arbitration necessary (single port memory) QDR/DDR very high bandwidth and therefore low I/O usage, allows high integration of the system Memory controller but no bus arbitration necessary not directly accessible by DSP 10 February 2003 24
Comparison of two approaches to DAQ Interface DSP vs FPGA The data rate after L1 accept is 25 times lower than for the L1PPI (~25µs / event). => DSP processing is possible DSP approach Advantages Flexible for changes in algorithm since most (~90%) of the code can be written in C. Idem for special acquisition modes implementation Low cost DSP s can be used Disadvantages Some assembler is still needed. Limited bandwidth does not allow very high integration. Processing is distributed over several DSP s. 10 February 2003 25
Comparison of two approaches to DAQ Interface DSP vs FPGA FPGA approach Advantages Only VHDL needed Processing for DAQ and L1 Pre Processor is done on only one chip, this reduces board complexity High bandwidth dual port memory can be used for easy bus arbitration Disadvantages High performance FPGA need to be used - higher cost Special mode processing also has to be implemented in VHDL 10 February 2003 26
L1B Implementation DSP DAQ FPGA DAQ 10 February 2003 27
Test setup Beetle setup for ST and Analog Transmission 2 RB3 setups 10 February 2003 28
TFC Link (optical) Analog Data Links (twisted pairs) TTCrx receiver chip Vref control Clock phase Front-End Emulator Synchronization and error detection L1 Buffer FADC Ve rte x Trigg e r Preprocessor Vertex Trigger Interface RB3 Data Flow L1 Accept 40 khz L1 derandomizer Readout board Data Processing (DSP/FPGA) Area under test Control interface DAQ interface ECS link (Ethernet) DAQ Link (to NP) Link to L1 trigger (to RU) 10 February 2003 29
RB3 testing in Lausanne Event generator on board to test readout from RB3 to L1T (S-Link to PC) DSP based DAQ card with SBSRAM L1B FPGA based card with QDR L1B Delayed clock generator for ADC s tested on FPGA FADC data read and sent to L1T (S-Link to PC) Synchronization of input data with FEM Complete read out system with FE-chip, analog transmission, FEM based synchronization, L1B, transmission via S-Link to PC. L1T Preprocessing and DAQ link 10 February 2003 30
What s next RB3 RB4 ROB Common L1 Board VELO and VETO Inner and Outer Tracker 10 February 2003 31
Input to L1 board VELO and VETO Analog transmission from detector to counting room FADC on L1 board 64 analog inputs (2k detector channels) SILICON TRACKER Output analog signals digitized in rad hard zone Optical link to L1 board (Serial transmission) Each ST L1 board receives data from 48 FE chips (1536 detector channels) OUTER TRACKER 32-channel OTIS TDC FE chip Optical link to L1 board (as ST -> 1536 detector channels) 10 February 2003 32
Data flow overview Max. 24 optical inputs per board FE FE FE FE 2 or 4 RxCards 4 PP-FPGAs 1 SyncLink-FPGA 2 RO-TxCards TTCrx CC-PC FEM A-RxCard (O-RxCard) PP-FPGA L1B CC-PC TTCrx A-RxCard (O-RxCard) PP-FPGA L1B PP-FPGA L1B SyncLink-FPGA RO-Tx A-RxCard (O-RxCard) RO-Tx A-RxCard (O-RxCard) PP-FPGA L1B FEM ECS TTC L1T DAQ L1 throttle 10 February 2003 33
Firmware framework Many common interfaces have to be implemented Blocks in red have to be developed individual for each sub-detector. Interfaces between blocks will be defined. L1B L1B L1B 16 16 16 To PP-FPGA left and right 8 Clock Generator L1B Ctrl L1B Ctrl L1B Ctrl 32 32 32 8 8 ECS Reset Generator Broad Cast CMD 16 16 16 16 16 16 32 32 32 8 8 Input from RxCard Sync Sync Sync Sync Sync Sync L1T ZSupp L1A Generator L1T ZSupp 8 DAQ PPLink SyncData Generator DAQ ZSupp L1T ZSupp Throttle OR 32 L1T ZSupp 8 32 L1T ZSupp 8 32 8 L1T PPLink L1T ZSupp PP- FPGA F I F O F I F O DAQ Link & Encapsulation L1T Link & Encapsulation ECS FIFO FIFO RO-Interface SyncLink FPGA 10 February 2003 34 TTCrx Throttle To RO-TxCards
Physical layout and its constraints Maximal card height needed for A-RxCard, not other interfaces on panel allowed. No transition module for analogue signals Front panel side 50 Pin 50 Pin 50 Pin 50 Pin A-RxCard for 16 ADC Channels A-RxCard for 16 ADC Channels A-RxCard for 16 ADC Channels A-RxCard for 16 ADC Channels 12 12 12 12 200 Pin 200 Pin 200 Pin 200 Pin PP FPGA PP FPGA RAM RAM DDR RAM RAM PP FPGA RAM RAM PP FPGA RAM RAM RAM RAM RAM Glue Card Power 64-bit S-Link RO-TxCard SyncLink FPGA 64-bit S-Link RO-TxCard FEM TTCrx CC-PC Power Gbit Ethernet Throttle TTC Gbit Ethernet Ethernet 10 February 2003 35
To do list Finish RB3 testing Start writing code for ROB Start a physics analysis before next seminar 10 February 2003 36