Jason Manley Internal presentation: Operation overview and drill-down October 2007
System overview Achievements to date ibob F Engine in detail BEE2 X Engine in detail Backend System in detail Future developments Discussion
8, 16 and 32 antenna dual-pol designs Bandwidth <200MHz using ibobs and BEEs Full Stokes 2048 frequency channels Control, initialisation and monitoring using Python scripts 100Mbps Ethernet output with integration times >16seconds Capture UDP output packets using Python code into Miriad format Web interface for near real-time visualisation
FX architecture F Engine 0 X Engine 0 F Engine 1 X Engine 1......... 10GbE Switch F Engine N-1 X Engine N-1
FX architecture F Engine 0 X Engine 0 F Engine 1 X Engine 1......... 10GbE Switch F Engine N-1 X Engine N-1
ibob BEE2 BEE2 user FPGA BEE2 user FPGA ibob F Eng F Eng X Eng X Eng X Eng X Eng F Eng F Eng ibob BEE2 user FPGA BEE2 user FPGA ibob F Eng F Eng X Eng X Eng X Eng X Eng F Eng F Eng 10GbE Switch
Known good mirrors pocket correlator Two F engines per ibob Dual polarization design Currently uses combination of ASTRO and CASPER libraries Major data flow components: ADC DDC Channelizer Equalization Reformat X Engine
Two X engines per BEE user FPGA Uses CASPER library only F Engine Pktize 10GbE Buffer X Eng Accum
Clocks: X engines each run off independent clock Sampling synchronized at F engines, but clock not distributed to X engines Synchronized using global 1pps signal at ADCs Propagated to X engines using out-of-band signaling on XAUI links Headers labeling 10GbE Ethernet packet data System control: separate 100Mbps Ethernet network F engines configured through out-of-band signals on XAUI links Control packets: UDP to Python server on BEE2 control FPGAs Python scripts for configuration
ADC DDC Channelizer Equalization Reformat X Engine Analogue Input 600MHz, but up to 800MHz t 7, t 6, t 5, t 4, t 3, t 2, t 1, t 0 Output: t 4, t 0 t 5, t 1 t 6, t 2 t 7, t 3 f out = f s /4 (normally 150MHz) 8 bits 4 Signed Fixed point: 8.7 Numeric range: -1 to 1 DDC
ADC DDC Channelizer Equalization Reformat X Engine Input: Data: signed fix 8.7 Path: 32 bits Decimation Filter Output: Data: 8 bits I, 8 bits Q Path: 16 bits Extracts a frequency band from the input signal Current setup: For f s = 600MHz, Selects output band = 75 to 225 MHz
Input: Data: 8 bits I, 8 bits Q Path: 16 bits ADC PFB FIR Improves out-of-band rejection ratio DDC Channelizer Equalization Reformat Data: signed fix 18.17, complex Path: 36 bits Down shift Data: Signed fix 18.15, complex Path: 36 bits Downshift to prevent overflow in first stage of FFT Non-detrimental: effective signal resolution from PFB is 8 bits. FFT X Engine Output: Data: signed fix 18.17, complex Path: 36 bits Runtime configurable downshifting through each stage Current setup: 2048 channel, 4 tap PFB, hamming window
ADC Input (four signals): Data: signed fix 18.17, complex Path: 36 bits (x4) A x DDC Channelizer Equalization Reformat Data: Signed fix 35.20, complex X Engine Output: Data: signed fix 4.3, complex Total Path: 32 bits A y B x B y BRAM lookup table Equalizer Multiplies each frequency by a 17.3 bit scale factor. Can be used to correct system frequency response irregularities at runtime. Decimation Selects 4 bits, With saturating rounding Numeric range: -0.875 to +0.875
ADC DDC Channelizer Equalization Reformat X Engine Ch N-1 Ch 1 Ch 0 Ch 1 Ch 1 Ch 0 Ch 0 A x A y B x B y A x A y B x B y A x A y B x B y B x B y B x B y A x A y A x A y B x B y B x B y A x A y A x A y Input: Data: signed fix 4.3, Complex Total Path: 32 bits Corner Turner t 384 t 256 t 128 t 0 Data: signed fix 4.3, Complex, dual pol Total Path: 32 bits Divide by 2 XAUI Output: Data: signed fix 4.3, Complex, dual pol, four frequency chans
Data (32 x 64b) F Engine Pktize 10GbE Buffer X Eng Accum Payload size: 32 x 64bits + 64 bit hdr = 264 Bytes (Jumbo packet: 1120 bytes) MSb LSb MCNT ANT Hdr f 0 t 3 f 0 t 2 f 0 t 1 f 0 t 0 f 0 t 7 f 0 t 6 f 0 t 5 f 0 t 4 f 0 t 127 f 0 t 126 f 0 t 125 f 0 t 124 64 bits XAUI Data: signed fix 4.3, Complex, dual pol, four frequency chans Total Path: 64 bits Sync control, System reset, Ant decode Header Generation, Processing Allocation
F Engine Pktize 10GbE Buffer X Eng Accum t N/n -1 t 1 t 0 N antennas f n-n f N f 0 F Engine 0 X Engine 0 F Engine 1 X Engine 1 f n-n+1 f N+1 f 1...... 10GbE Switch............... F Engine N-1 X Engine N-1 f n-1 f 2N-1 f N-1
F Engine Pktize 10GbE Buffer X Eng Accum 10GbE Ethernet F Engine Packet stream 10GbE Transceiver Data Unpack Total packet size: 32 x 64 bit words + 64 bit hdr = 2112 bits or, 264 bytes (Giant packet: 1120 bytes)
F Engine Pktize 10GbE Buffer X Eng Accum 10GbE Ethernet F Engine Packet stream 10GbE Transceiver Mux Data Unpack Loopback
F Engine Pktize 10GbE Buffer X Eng Accum Data inserted into position in buffer determined by MCNT Circular buffer in packet header. Timeout if no packet received for 2 20 clocks. Ship out a window when first packet of ½ buffer ahead received (ie ship 1 when first packet of 5 received) Only accept packets with MCNT: 6 5 7 4 0 3 1 2 1 Packet: 1 Freq or 128 words ½ buffer size back to ¼ buffer size ahead (ie if already received up to packet 5, accept MCNTs for windows 2, 3, 4, 5 or 6) prevents spurious locks.
F Engine Pktize 10GbE Buffer X Eng Accum Streaming architecture assumes data valid on every clock. Integration occurs Each antenna input must thus be valid for integration_period clock cycles Output must be filtered as duplicates occur X engine (streaming)
F Engine Pktize 10GbE Buffer X Eng Accum Simplification! See detail on last slide t 128 B x B y t 0 A x A y Z -128 Z -128 Z -128 AB BC CD AC BD AD Accumulation for 128 clocks 5 4 3 2 1 AA X X X X BB AB X X X CC BC AC X X DD CD BD AD X EE DE CE BE AE FF EF DF CF BF GG FG EG DG CG HH GH FH EH DH O AH AG AF AE O O BH BG BF O O O CH CG O O O O DH t 257 t 256 Read out direction t 128 Data: 4.3 bits, dual pol, complex Path: 16 bits X engine (streaming) Data: 16.6 bits, cplx, 4 terms Path: 128 bits B x B x B y B y B x B y B y B x A x B x A y B y A x B y A y B x A x A x A y A y A x A y A y A x
F Engine Pktize 10GbE Buffer X Eng Accum 5 4 3 2 1 AA X X X X BB AB X X X CC BC AC X X DD CD BD AD X EE DE CE BE AE FF EF DF CF BF GG FG EG DG CG HH GH FH EH DH O AH AG AF AE O O BH BG BF O O O CH CG O O O O DH Windowed buffering Data throttling t 71 C x H x C y H y C x H y C y H x Windowed baselines fed out every second clock t 4 B x B x B y B y B x B y B y B x t 2 A x B x A y B y A x B y A y B x t 0 A x A x A y A y A x A y A y A x Data: 16.6 bits, complex, 4 terms Path: 128 bits X engine Re-order
F Engine Pktize 10GbE Buffer X Eng Accum Data: 16.6 bits, complex, 4 terms Path: 128 bits Data: 32.6 bits, complex, 2 terms Path: 128 bits Data: 32.6 bits, complex Path: 32 bits Shared BRAM DRAM Reformat Increase number space to 32 bits DRAM Accumulator Integration length run-time configurable
Listen Config Start tx Start rx Display Software registers on User FPGAs addressable by BORPH on Control FPGA UDP Listener on BEE2 Control FPGA processor Automated Python scripts for writing these registers Special command Start TX begins dumping Shared BRAM output on separate UDP port Receiver collects UDP output packets, buffers and writes to disk Multiple files generated for storage, display and debugging Web interface for plotting output data (useful for debugging) Confirmed working using simulated correlator output data, generated on BEE processor
Listen Config Start tx Start rx Display Python script executed on BEEs. Allows programming of any software register on the BEE by name Includes special functions, which can start/stop programs or scripts on the BEE Start or Stop transmitting data Globally program gains on all connected ibobs
Listen Config Start tx Start rx Display Command-line parameterized Sends packetized commands to listener on BEE Arms ibobs, sets FFT shifting schedule, sets ibob EQ gains to defaults, set accumulation length, sets antenna indices, ip addresses and ports. Reads debug registers and snap blocks to confirm correct dataflow. Attempts recovery through block reset and/or reprogram
Listen Config Start tx Start rx Display Receiving computer requests dump start BEE2 control FPGA monitors shared BRAMs and reads out when full Data is enclosed in a timestamped packet (determined using BORPH s system time) Header: 21 Bytes: Time, X Engine number, 4B vector num, 4B flags, 4B payload length (historical) Transmitted via UDP packet to pre-determined receiver
Listen Config Start tx Start rx Display Receives packets, decodes header and appends to buffer (thus requires to receive in order) If header out of order, dumps as invalid Correct with new C code Requires system parameters passed on command line when executed Ability to read source from UDP packets, files or std-in pipe (untested) Generates 4 files: Miriad UV, Info file (n_chans, integration length, system gain etc), Numpy database (Python) of last integration for plotting Numpy database of raw data (for debugging)
Listen Config Start tx Start rx Display cgi script
10Gbps output gives sub 1 second integration times High speed, scalable, distributed data capture software Walsh codes and phase switching 64 antenna design Upgrade to 4096 channels ROACH hardware: <550MHz bandwidth 16 384 channels 128 antennas with no architectural changes
casper_n/cn_i Currently in revision 3.02 testing, using ASTRO lib Revision 4 will use CASPER library casper_n/cn_b Revision 3.08 testing <- DEBUG! Revision 3.07d stable Revision 4 will have 10GbE output