The Sorting Processor Project

Similar documents
Muon Reconstruction and Identification in CMS

The Track-Finding Processor for the Level-1 Trigger of the CMS Endcap Muon System

Written exam for IE1204/5 Digital Design Thursday 29/

Ignacy Kudla, Radomir Kupczak, Krzysztof Pozniak, Antonio Ranieri

The ATLAS Conditions Database Model for the Muon Spectrometer

CMS Conference Report

Fast pattern recognition with the ATLAS L1Track trigger for the HL-LHC

The ATLAS Level-1 Muon to Central Trigger Processor Interface

ENGR 3410: MP #1 MIPS 32-bit Register File

Muon Trigger Electronics in the Counting Room

Circuit Design and Simulation with VHDL 2nd edition Volnei A. Pedroni MIT Press, 2010 Book web:

WBS Trigger. Wesley H. Smith, U. Wisconsin CMS Trigger Project Manager. DOE/NSF Status Review November 20, 2003

The Compact Muon Solenoid Experiment. CMS Note. Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland

ENGR 3410: MP #1 MIPS 32-bit Register File

Altera FLEX 8000 Block Diagram

ENGR 3410: Lab #1 MIPS 32-bit Register File

Department of Electrical and Computer Engineering Introduction to Computer Engineering I (ECSE-221) Assignment 3: Sequential Logic

Outline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs?

PoS(EPS-HEP2017)492. Performance and recent developments of the real-time track reconstruction and alignment of the LHCb detector.

ECE 341 Final Exam Solution

arxiv:hep-ph/ v1 11 Mar 2002

Track reconstruction of real cosmic muon events with CMS tracker detector

Adding timing to the VELO

Control and Datapath 8

CMS FPGA Based Tracklet Approach for L1 Track Finding

Chapter 6. CMOS Functional Cells

First results from the LHCb Vertex Locator

First LHCb measurement with data from the LHC Run 2

3. The high voltage level of a digital signal in positive logic is : a) 1 b) 0 c) either 1 or 0

Track-Finder Test Results and VME Backplane R&D. D.Acosta University of Florida

Ripple Counters. Lecture 30 1

1. INTRODUCTION 2. MUON RECONSTRUCTION IN ATLAS. A. Formica DAPNIA/SEDI, CEA/Saclay, Gif-sur-Yvette CEDEX, France

A New Segment Building Algorithm for the Cathode Strip Chambers in the CMS Experiment

Associative Memory Pattern Matching for the L1 Track Trigger of CMS at the HL-LHC

EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)

The Compact Muon Solenoid Experiment. CMS Note. Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland

PSEC-4: Review of Architecture, etc. Eric Oberla 27-oct-2012

7 PROGRAMMING THE FPGA Simon Bevan

ECE410 Design Project Spring 2013 Design and Characterization of a CMOS 8-bit pipelined Microprocessor Data Path

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

Tracker algorithm based on Hough transformation

The LHCb upgrade. Outline: Present LHCb detector and trigger LHCb upgrade main drivers Overview of the sub-detector modifications Conclusions

Physics CMS Muon High Level Trigger: Level 3 reconstruction algorithm development and optimization

Track reconstruction for the Mu3e experiment based on a novel Multiple Scattering fit Alexandr Kozlinskiy (Mainz, KPH) for the Mu3e collaboration

INTERCONNECT TESTING WITH BOUNDARY SCAN

Module Performance Report. ATLAS Calorimeter Level-1 Trigger- Common Merger Module. Version February-2005

Unit 3 and Unit 4: Chapter 4 INPUT/OUTPUT ORGANIZATION

Nevis ADC Design. Jaroslav Bán. Columbia University. June 4, LAr ADC Review. LAr ADC Review. Jaroslav Bán

QUESTION BANK. EE 6502 / Microprocessor and Microcontroller. Unit I Processor. PART-A (2-Marks)

The Compact Muon Solenoid Experiment. CMS Note. Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland

PROBLEMS. 7.1 Why is the Wait-for-Memory-Function-Completed step needed when reading from or writing to the main memory?

Outcomes. Spiral 1 / Unit 6. Flip Flops FLIP FLOPS AND REGISTERS. Flip flops and Registers. Outputs only change once per clock period

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 13

EECS 150 Homework 7 Solutions Fall (a) 4.3 The functions for the 7 segment display decoder given in Section 4.3 are:

Chapter Operation Pinout Operation 35

Testing Principle Verification Testing

Programmable Logic Design I

Trigger Layout and Responsibilities

13. Brewster angle measurement

R. C. TECHNICAL INSTITUTE, SOLA

1. Prove that if you have tri-state buffers and inverters, you can build any combinational logic circuit. [4]

R07

Picture of memory. Word FFFFFFFD FFFFFFFE FFFFFFFF

Lecture 2 VLSI Testing Process and Equipment

Ting Wu, Chi-Ying Tsui, Mounir Hamdi Hong Kong University of Science & Technology Hong Kong SAR, China

Results on Long Term Performances and Laboratory Tests on the L3 RPC system at LEP. Gianpaolo Carlino INFN Napoli

Synchronization In Digital Systems

An Asynchronous NoC Router in a 14nm FinFET Library: Comparison to an Industrial Synchronous Counterpart

Summer 2003 Lecture 21 07/15/03

Design of memory efficient FIFO-based merge sorter

Outcomes. Spiral 1 / Unit 6. Flip Flops FLIP FLOPS AND REGISTERS. Flip flops and Registers. Outputs only change once per clock period

Spiral 1 / Unit 6. Flip-flops and Registers

ADVANCED COMPUTER ARCHITECTURE TWO MARKS WITH ANSWERS

Introduction to SRAM. Jasur Hanbaba

UNIVERSITY OF HONG KONG DEPARTMENT OF ELECTRICAL AND ELECTRONIC ENGINEERING. Principles of Computer Operation

arxiv:physics/ v1 [physics.ins-det] 18 Dec 1998

Simulation study for the EUDET pixel beam telescope

RTL Power Estimation and Optimization

436 Nuclear Instruments and Methods in Physics Research A278 (1989) North-Holland, Amsterdam. 2. The detector

! Program logic functions, interconnect using SRAM. ! Advantages: ! Re-programmable; ! dynamically reconfigurable; ! uses standard processes.

Updated impact parameter resolutions of the ATLAS Inner Detector

CSC Trigger Motherboard

The CMS alignment challenge

A Time-Multiplexed FPGA

Lecture (04 & 05) Packet switching & Frame Relay techniques Dr. Ahmed ElShafee

Lecture (04 & 05) Packet switching & Frame Relay techniques

CPE300: Digital System Architecture and Design

Chapter 4 Main Memory

Model EXAM Question Bank

Deduction and Logic Implementation of the Fractal Scan Algorithm

RPC Trigger Overview

Efficient VLSI Huffman encoder implementation and its application in high rate serial data encoding

A 3-D Track-Finding Processor for the CMS Level-1 Muon Trigger

Level-1 Regional Calorimeter Trigger System for CMS

R07. Code No: V0423. II B. Tech II Semester, Supplementary Examinations, April

Keywords: Binary Sort, Sorting, Efficient Algorithm, Sorting Algorithm, Sort Data.

Architecture and Partitioning - Architecture

High-Speed Cell-Level Path Allocation in a Three-Stage ATM Switch.

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006

ISO IEC. INTERNATIONAL ISO/IEC STANDARD Information technology Fibre Distributed Data Interface (FDDI) Part 5: Hybrid Ring Control (HRC)

Transcription:

CMS TN/95-028 6 March, 1995 The Sorting Processor Project G.De Robertis and A.Ranieri Sezione INN di Bari, Italy I.M.Kudla Institute of Experimental Physics,Warsaw University, Poland G.Wrochna on leave of absence from Institute of Experimental Physics,Warsaw University, Poland Abstract The general scheme for a Sorting Processor like tree structure, is presented here together with some simulation results made considering a CMOS 0.7µ technology implementation of such device. This scheme will be implemented as part of the electronics foreseen for the trigger muon system of the CMS experiment. 1

1. Introduction The CMS muon trigger is required to identify muons, measure their transverse momentum p t, and determine the bunch crossing from which they originated. This is achieved with four muon stations in the return yoke of the CMS magnet. Each station will have one or two planes of RPC s as dedicated µ trigger detector [1][2][3]. RPC s have been chosen for their excellent time resolution, to ensure unambiguous bunch crossing assignment. A muon traversing RPC s will give usually at least four points hit pattern used to make the p t cut. Due to the energy loss and the multiple scattering a muons of a given p t and pseudorapidity may produce different hit patterns. A set of those predefined hit patterns are loaded into a Pattern Comparator Trigger (PACT) processor [4]. The PACT will compare the predefined hit pattern with the observed pattern, delivering the p t code of the highest momentum track inside of a detector segment ( η=0.11 and ϕ=2.5), to the sector (ring) processor. The Ring processor, by using a like tree Sorting processors structure, selects 4 highest p t muons code, in a ring of 0.11 rapidity unit and transmits the data to the global muon trigger system where, the trigger informations coming from the others parts of the global CMS muon detector, are combined. The aim of this note, is to describe the internal structure of the Sorting processor, how it works and finally the structure, developed as a Sorting Processors Network, of the Ring Processor, which is the final goal of our project. 2. The Sorting Processor Structure The Sorting Processor, is a specialized VLSI circuit, composed by the following parts: a 6 bits two words comparator; a chain of 4 multiplexers, 2 for data and 2 for addresses values; a 2 fold 4 inputs Sorters; an 8 inputs Merger. These components are connected together to finally form, an 8 Inputs Sorting Network (the Sorting Processor), composed by 2 Sorters plus 1 Merger. The general structure of the Sorting Processor chip is shown in the ig.1. In this figure both the Sorters and the Merger, are more complex elements the structure of which, are explained later. 2

inp1 inp2 inp3 inp4 flip flop Sorter flip flop flip flop out1 out2 out3 out4 Merger inp5 inp6 inp7 inp8 flip flop Sorter flip flop ig.1 8 inputs Sorting Processor scheme 3. The Comparator The basic element to built a sorter is the comparator, the faster is the comparator the faster is the sorting process. Our comparator is formed by a pure combinatorial network, in which two 6 bits input words are compared producing, as a result, a signal A_GT_B used by the subsequent component of the structure, a 4 multiplexers chain, to select the greatest, between the two input words. Such process is performed by the comparator in the worst case, in 3.88 ns. The output of the comparator, is used afterward, to drive a 4 multiplexers chain which distributes the input words on the Sorting Processor s outputs, in such a way to have on the first 6 bits output the greatest word, in terms of absolute value, between the two input data words (the so called exchange mechanism). At the same time, the Sorter gives in output, also the addresses related to the correspondent data values. The proposed solution, allows to propagate, along the network, the data and the associated address with no need for additional circuit and with no increase of the time propagation. 4. The 4 Inputs Sorter ollowing the literature [5], we have decided to use, as basic structure of our Sorting Processor, an architecture composed by a 2 fold 4 inputs Sorter, capable each to give, as output, the 4 input data values, in a decreasing order, plus one 8 inputs Merger using, for this last one, a scheme of the odd-even merge type. The sorter, the structure of which is illustrated in ig.2, behaves like a comparators network into which any comparator causes an interchange of its inputs if necessary, so that the larger number value appears on the higher horizontal line after passing the comparator. At the right of the diagram all the number outcome in a decreasing order from top to bottom. The merger, the structure of which is illustrated in ig.3, behaves exactly as the sorter, only that it needs of input values already ordered in a decreasing order from top to bottom, to give as result the 4 highest number values between its 8 inputs. 3

point. The letter indicated on the right of the dot in ig.2, represents the output of the network at that in1 x1 y1 z1 in2 x2 y2 z2 in3 x3 y3 z3 in4 x4 y4 z4 1 st step 2 nd step 3 rd step ig.2 4 inputs sorter scheme with 3 comparison steps 1 2 3 4 5 6 7 8 1 st step 2 nd step 3 rd step ig.3 8 inputs merger odd-even type with the comparison steps visualized Any vertical segment indicated in the ig.2 and 3, represents the basic element constituting the general Sorter Processor scheme. The representation of this basic element, is shown in ig.4. In both 4

figures, any horizontal line represents the connection between the input on the left and the output on the right of the comparators of various steps (the dots represent the connection points), through which the input and output data flow. The simulation results have shown that, after transforming the various comparison steps as explained later, the more complex structure using sorter plus merger shown in ig.1 and proposed as final solution of our processor, is more faster respect to that which uses only sorter circuits. inp_a<5:0> inp_b<5:0> Comp A_GT_B sort_a<5:0> Multiplexer sort_b<5:0> ig.4 Sorter Processor basic element 5. More about the Sorter The structure shown in ig. 2, in which 3 steps of comparisons are need, can be optimized to make the Sorter s outputs available in shorter time. In fact, if we consider the various steps into which the Sorter s job is subdivided, we can easily transform the last step of comparison, into a simple exchange step into which only a simple multiplexer is enough instead of the more complex structure shown in ig.3. This process is achieved considering the results obtained from the first step and taking into account the results of the second step. This considerations are explained in a schematic form, in the following figures and tabs, where the C letter has the meaning of the output value (one or zero) of the corresponding comparator represented in the structure shown in ig.4, depending on its input data values. 5

x1 y1 z1 C1 x2 C3 y2 z2 x3 y3 C4 C5 z3 C2 x4 y4 z4 1 st step 2 nd step 3 rd step ig.5 4 inputs sorter scheme with in-steps results. Any letter represents the output of the corrisponding comparator. C i = 1 C i = 0 does not exchange does exchange Tab.1 C 3 C 4 C 5 0 0 X 4 > X 1 0 1 X 2 > X 1 (can be fixed to zero) 1 0 X 4 > X 3 (can be fixed to zero) 1 1 X 2 > X 3 Tab.2 The Tab.1, shows the rule used by the generic comparator into the Sorter. If the result of the comparator is 0, the input data are exchanged, otherwise no action is undertaken and the input values are reported in the same order to the output. By using this rule, if we consider the ig.5, it is possible to eliminate the last comparator C5 substituting it only by a multiplexer having as inputs, the results of the 2 nd step comparators and having as a control input, the result concerning one only of the four comparisons indicated in the Tab.2, depending by the output of the comparators C3 and C4. Moreover, the two intermediate lines, indicated in the Tab.2, are really redundant, because the corresponding comparisons, are already calculated during the 1 st step. In such a way, we can reduce the 4 inputs Sorter, having only the first two steps of comparisons and the last step reduced to a simple multiplexer stage (exchange step). This reduce furtherly the propagation time of the result of the sorting process. 6

(Note that in all the tables shown, the > symbol is used to indicate that the operation result depends only on the value entities X i s present on that line of the table). The same argument has been adopted when we consider the Merger. In fact, looking to the ig. 6, the 8 inputs Merger, can be seen as two 4 inputs sorter one below the other, the first one composed (ignoring the first step of comparison as present in the Sorter, which is not present in the Merger) by the comparators C1, C2 and C5 and the other one composed by the comparators C3, C4 and C6. x1 x2 x3 x4 x5 x6 x7 C1 C2 C3 C4 C5 C6 C7 C8 C9 x8 1 st step 2 nd step 3 rd step ig.6 C 1 C 2 C 5 0 0 X 7 > X 1 0 1 X 3 > X 1 can be fixed to zero 1 0 X 7 > X 5 can be fixed to zero 1 1 X 3 > X 5 Tab. 3 C 3 C 4 C 6 0 0 X 8 > X 2 0 1 X 4 > X 2 can be fixed to zero 1 0 X 8 > X 6 can be fixed to zero 1 1 X 4 > X 6 Tab. 4 7

In the Tab. 3 and 4, are shown the 8 possible combinations necessary to calculate the C5 and C6 outputs where, excluding the intermediate comparisons already done because coming from the Sorter s outputs, the necessary comparisons to calculate the C5 and C6 outputs, are reduced to 4. To calculate C7 and C8 outputs, we proceed in the same way as before, taking into account all the possible combinations shown in the following Tables, for the different possible values of C5 and C6. The combinations for C9 are not taken into account, because we are interested only at the first 4 outputs of the Merger. C 1 C 2 C 7 0 0 X 6 > X 1 0 1 X 2 > X 1 can be fixed to zero 1 0 X 6 > X 5 can be fixed to zero 1 1 X 2 > X 5 Tab. 5 Possible value of C7 for C5=0 C 2 C 3 C 7 0 0 X 6 > X 7 can be fixed to one 0 1 X 2 > X 7 1 0 X 6 > X 3 1 1 X 2 > X 3 can be fixed to one Tab. 6 Possible value of C7 for C5=1 C 2 C 3 C 8 0 0 X 2 > X 7 0 1 X 2 > X 3 can be fixed to one 1 0 X 6 > X 7 can be fixed to one 1 1 X 6 > X 3 Tab. 7 Possible value of C8 for C5=0 and C6=0 C 3 C 4 C 8 0 0 X 8 > X 7 can be fixed to zero 0 1 X 4 > X 7 1 0 X 8 > X 3 1 1 X 4 > X 3 can be fixed to zero Tab. 8 Possible value of C8 for C5=0 and C6=1 8

C 1 C 2 C 8 0 0 X 2 > X 1 can be fixed to zero 0 1 X 6 > X 1 1 0 X 2 > X 5 1 1 X 6 > X 5 can be fixed to zero Tab. 9 Possible value of C8 for C5=1 and C6=0 C 1 C 4 C 8 0 0 X 8 > X 1 0 1 X 4 > X 1 can be fixed to zero 1 0 X 8 > X 5 can be fixed to zero 1 1 X 4 > X 5 Tab. 10 Possible value of C8 for C5=1 and C6=1 Examining in more detail all the combinations shown in the Tables from 5 to 10, we can see that some of them are superfluous from the circuit point of view, because the corresponding comparisons, are already calculated in some previous step. To explain this, we take as example the first and forth combinations shown in Tab. 6. Here, looking at ig. 6, we see that the two comparisons X6>X7 and X2>X3, can be excluded from the circuit and their outputs can be substituted by one. The reason of this, relays on the fact that these two combinations are already evaluated from the circuit and therefore we substitute them by one because, in this case, the comparator C7 must not exchange its inputs. By looking at all the combinations, we can conclude that the Merger is constituted by 16 comparators plus 14 Mux es, but that all the possible and useful comparisons are done only at the first step of the merge process, gaining a lot of calculation time. In fact arranging the Merger as described, we obtain only 5.41ns, as worst set-up time for the calculation of the overall 8 data input sorting process. 6. The Sorting Processor Chip We can then conclude that our Sorting Processors, is a chip made by two 4 inputs Sorters, plus an 8 inputs Merger. It has a total of 168 pins, considering only the signals pins, so distributed: inputs 8 data x 6 bits = 48 8 address x 8 bits = 64 outputs 4 data x 6 bits = 4 address x 8 bits = 32. It will contain all the circuitry necessary to make Boundary Scan, for the chip remote test. 9

7. The Ring Processor Our final goal is to build an electronic board capable to sort the 144 inputs coming from a ring, that is a piece of the muon detector subtending an angle of 360 in ϕ and interesting an angular region, along the beam direction, corresponding to η 0.11. To do this, we use a total of 39 chips, which will furnish the final result, in a total of 7 time slices. In fact if we see the result of the simulation shown in the ig.7, the structure of the circuit equivalent to a one Ring Processor, gives the outputs, after 182.9 ns, since the application of the inputs. In this figure the input signals coming from only 16 data sources, are shown as Ini, while the 4 output codes are indicated as OUTi. The timing interval between any input values group and its corresponding output, is just 182.9 ns as indicated in the simulation output. Header: sort6_out_182.9ns User: ranieri Date: Dec 15, 1994 18:58: Time Scale rom: 0. 1ps To: 3527.90 1ps Page: 1 of 1 IN1 01 01 1e 20 IN2 02 20 IN3 02 03 20 IN4 07 07 IN5 03 IN6 09 26 09 IN7 07 0a 15 26 27 0a IN8 26 27 28 IN9 01 01 1e 20 IN10 02 20 IN11 02 03 20 IN12 07 07 IN13 03 IN14 09 26 09 IN15 07 0a 14 26 27 0a IN16 1e 26 27 28 ckout OUT1 OUT2 OUT3 1e OUT4 1e TIME 0. 1ps 881.90 1ps 1763.90 1ps 2645.90 1ps ig.7 Ring Processor timing simulation 10

The structure of the Ring Processor is shown in the ig.8, where only a part of the overall circuit is illustrated. Note that the inputs, are strobed, by using the system clock, into the circuit, in a first stage latch, just to synchronize the incoming data. or all the chips in the network different from those at the first level, the unique part involved in the calculation is the Merger part. This means that the Ring Processor, will use the Sorters only for the chips at the first level. or the others stages instead, the Sorters are excluded by using appropriate mux es and the data will propagate through the network, directly to the Mergers. Onto one Ring processor board, we will have a total of 39 chips and due to the muon trigger segmentation of the overall apparatus, we aspect to have a total of 44x39=1716 chips. 3 O 27 CHIP 1 3 O 9 CHIP 3 CHIP 28 CHIP 2 CHIP 30 4 O 36 CHIP 29 CHIP 37 CHIP 39 4 O 12 CHIP 38 Sorter Sorter Merger SORTING PROCESSOR STRUCTURE ig.8 Ring Processor structure 11

References [1] R.Cardarelli and R.Santonico: Development of Resistive Plate Counter. Nucl.Instr. and Meth. A 187 (1981) 377. [2] R.Cardarelli and R.Santonico: Progress in Resistive Plate Counter. Nucl.Instr. and Meth. A 263 (1988) 20. [3] M.Gòrski: Warsaw RPC test results. Talks given at RD5 Collaboration Meetings, May 27 and June 14,1993. [4] G.Wrochna et al. Pattern Comparator Trigger (PACT) for the Muon System of the CMS Experiment. CMS-TN/94-281 (1994). [5] D.E.Knuth. The Art of Computer Programming. Vol.3 Sorting and Searching. Addison-Wesley Publishing Company. 12