Reconfigurable Hardware Implementation of Mesh Routing in the Number Field Sieve Factorization
|
|
- Avice Jones
- 5 years ago
- Views:
Transcription
1 Reconfigurable Hardware Implementation of Mesh Routing in the Number Field Sieve Factorization Sashisu Bajracharya, Deapesh Misra, Kris Gaj George Mason University Tarek El-Ghazawi The George Washington University
2 Objective & Outline N = P Q factorization P Q P, Q large integers..... FPGA Array. Number Field Sieve (NFS) Factorization 2. Mesh Routing Architecture for the Matrix Step of NFS 3. Implementation. Results 5. Conclusions 6. Reconfigurable computers & future work
3 Number Field Sieve (NFS) Steps Polynomial Selection Sieving (Relation Collection) Matrix (Linear Algebra) Two most computationally intensive steps Square Root
4 Previous work 2 D. J. Bernstein, Circuits for integer factorization: a proposal Mesh approach to the sieving and matrix steps improves asymptotic complexity for NFS performance 22 A. K. Lenstra, A. Shamir, J. Tomlinson, E. Tromer, Analysis of Bernstein's Factorization Circuit, Asiacrypt 22 Detailed design for mesh routing 23 W. Geiselmann, R. Steinwandt, Hardware to solve sparse systems of linear equations over GF(2), CHES 23 Distributing computations among multiple nodes
5 Our Objective Design, describe in RTL VHDL, synthesize & simulate existing theoretical designs for the matrix step of NFS using current generation of FPGA devices. Matrix (Linear Algebra) Focus of this paper
6 Input to the Matrix Step D D Matrix A: D = number of the matrix columns and rows D [ 7, ] d column density (weight) = maximum number of ones per column d << D, e.g., d= for D=
7 Function of the Matrix Step Find linear dependency in the large sparse matrix obtained after sieving step c i c i2 c il c i c i2... c il = D = number of the matrix columns and rows D D
8 Block Wiedemann Algorithm for the Matrix Step of NFS ) Uses multiple matrix-by-vector multiplications of the sparse matrix A with k random vectors v i A v i, A 2 v i,., A k v i where k = 2D/K 2) Post computations leading to the determination of the linear dependence of the matrix columns Most time consuming operation: A [DxD] v [Dx]
9 Mesh Routing Architecture for the Matrix Step
10 Matrix-by-Vector Multiplication v T A v A Sparse Matrix
11 Matrix-by-Vector Multiplication for Sparse Matrices v T A v A Sparse Matrix
12 Mesh Routing m x m mesh where m = D A v T representation of column d = maximum number of non-zero entries per column m D D m
13 Mesh corresponding to the matrix A and vector v Cell representing Column Row indices of non-zero 3 matrix entries Vector bit
14 Routing in the Mesh Fourth cell Each time a packet arrives at the target cell, the packet s vector s bit is xored with the partial result bit on the target cell
15 Packets generated by each cell in the mesh Destination address Cell representing Column Vector bit
16 Only packets with non-zero vector bits need to be routed
17 Mesh Routing with K parallel matrix by vector multiplications Example for K=2 A v and A v 2 computed in parallel A v v mesh
18 Mesh corresponding to the matrix A and vectors v, v 2 Cell representing Column Row indices of non-zero 3 matrix entries 6 7 Vector bits v, v
19 Packets with multiple vector bits generated by each cell in the mesh Cell representing Column Destination address Vector bits v, v
20 Mesh Routing with p multiple columns and K vectors Example for p=2, K=2 A v and A v 2 computed in parallel v v 2 A p columns
21 Mesh with multiple columns and multiple vectors per cell Cell representing Columns,
22 Packets generated by the mesh with multiple columns and multiple vectors per cell Cell representing Columns,
23 Clockwise Transposition Routing Four iterations repeated Cells compareexchange direction
24 Compare-Exchange Cases Between Columns Direction of travel = direction of exchange Result of comparison = Exchange the packets.
25 Compare-Exchange Cases Between Columns Direction of travel direction of exchange Result of comparison = Exchange the packets. Rule for Exchange: Exchange iff the distance to target of the packet which is farthest from its destination gets reduced.
26 Implementation
27 Structure of the Xilinx Virtex FPGA Configurable Logic Block (CLB) slices Block RAMs Block RAMs I/O Blocks Block RAMs
28 Two modes of operation of CLB slices Logic mode Memory mode 8 Combinational logic -bit register -bit register RAM 6x RAM 6x -bit register -bit register CLB slice
29 Target FPGA Device Xilinx Virtex II XC2V8 6,592 CLB slices 93,8 LUTs LookUpTables 93,8 FF (Flip-Flop) Multipliers 8 x 8 Block RAMs Multipliers 8 x 8 Block RAMs Multipliers 8 x 8 Block RAMs FF Carry & Control Logic O LUT CLB slice Carry & Control Logic LUT FF CLB-SLICE CLK Multipliers 8 x 8 Block RAMs I/O Block
30 Mesh parameters for single FPGA m () = 2 m () = m () = 2 p = 6 D () = (m () ) 2 * p = = (2) 2 * 6 = 23 K =5 33 3
31 Synthesis Results on one Virtex II XC2V8 for Improved Mesh Routing Design Matrix Size K CLB slices 23x23 (Mesh 2x2, p=6 ) 6738 (%) LUTs,38 (%) FFs 6,279 (7%) Clock Period (ns) Time for K mult (ns) Time per mult (ns) x23 (Mesh 2x2, p=6 ) 32 29,938 (6%) 5,983 (5%) 9,65 (2%) x23 (Mesh 2x2, p=6 ) 5 3,2 (93%) 7,3 (89%) 27,6 (29%) f CLK-ROUTE = 55.5 MHz T CLK-ROUTE = 8 ns
32 Distributed Computation (Geiselmann, Steinwandt, CHES 23) A v Av A, A,2 A,3 v A A 2, A 2,2 A 2,3 A 3, A 3,2 A 3,3 v 2 v 3 = A 2 A A A, v A v,2 2 A,3 v 3 = A v = s j=, j s A : A j= s, j v v j j
33 Using smaller FPGA arrays to perform the entire computation ) FPGA array performs single sub-matrix by sub-vector multiplication 2) Reuse FPGA array for next sub-computation
34 Parallel Loading & Unloading of Data Vector Non Zero Matrix Entries Result Vector
35 Sub matrix load-compute sequence A, A,2 A,3 A 2, A 2,2 A 2,3 A 3, A 3,2 A 3,3 v v 2 = v 3 A A 2 A 3 A, v + A,2 v 2 + A,3 v 3 = A Time load compute load compute load compute unload load compute load compute load compute unload load compute load compute load compute unload
36 Size of the mesh implemented using f FPGAs Single FPGA m (f) m () FPGA m () Matrix of f FPGAs #cells (f) = #cells () f (m (f) ) 2 = (m () ) 2 f m (f) = m () f m (f)
37 Size of the matrix handled using f FPGAs D () Sub-matrix handled by one FPGA D () A () D (f) A (f) D (f) A D D () = (m () ) 2 * p D (f) = D () * f d (f) = d* D (f) /D Sub-matrix handled by f FPGAs Entire matrix A D d column density of A= maximum number of ones per column of A d (f) column density of A (f)
38 Sizes and the column densities of submatrices handled using f FPGAs D () D =,d= D () A () A (f) D (f) f D (f) d (f) 2,3 D (f) A D 23, ,82 2 2,359,296, 23,, D
39 Mesh Cell Design for Improved Mesh Routing CU matrix data vector data Loading packet matrix data vector/result data annihilate Current packet new packet exchange result status bits Result calculation r c coordinate exchange annihilate row/col Comparator oper eq_packet
40 Matrix Data Status Matrix data Loading Address Routing Address st r L c L r R c R Vector data K k (f) k (f) + k p k(f) k (f) + k p f (#FPGAs) matrix data size (bits) Matrix Data Size = + k (f) + 2 k p = + log 2 m (f) + 2 log 2 p
41 Format of the Packet status row destination column destination local result position st r c l vector bits k (f) k (f) k p K v k (f) = log 2 m (f) k p = log 2 p m (f) = m () f K = 5 # FPGAs packet size
42 Loading Unit Memory of packet destination addresses R[i] p d en decode matrix data en P[i] K p matrix data Memory of vector data vector data comb vector/result data result packet
43 Current Packet Unit packet annihilate annih new packet CP exchange current packet
44 Result Calculation result Memory of the result vector bits P [i] en addr2 Check Dest new packet vector bits new packet excluding vector bits cell coordinate
45 Resource Proportion for LUT Usage.3 RAM Mesh of 2x2 P=6 K= logic %LUT RAM % LUT logic
46 T CLK-IO T CLK-IO T CLK-IO T CLK-IO T CLK-ROUTE = 3 * T CLK-IO m () * packet size 2 2 m () * packet size x x T CLK-ROUTE = 8 ns T CLK-IO = 6 ns m () * packet size 2 2 m () * packet size x x FPGA boundary
47 Slowdown caused by crossing the chips boundaries #bits crossing the boundary x = multiplexing factor = = # of pins per boundary = m () *packet size (f) * 2 2* 73 *2 = (#FPGA_pins / ) 277 = 7 for f=2 Slowdown factor = T CLK-IO + (x-) T CLK-IO + T CLK-ROUTE T CLK-ROUTE T CLK-IO + T CLK-ROUTE = =.33 T CLK-ROUTE
48 Results and Analysis
49 Results for a 52-bit number N K= number of concurrent multiplications=5 p =number of columns handled in one cell D = number of columns in matrix A = 6.7 x 6 d (f) = density of sub-matrix handled by mesh m (f) = mesh dimension n = number of times to repeat sub-multiplications T route = time for K multiplications in the mesh T Load = time for loading and unloading for K multiplications T Total = total time for the Matrix Step = 3 (D/K) n ( T route + T Load ) R = (#FPGAs * T Total )/ ( FPGA * T Total for FPGA) Virtex II chips (f) D p d (f) m (f) n 6.7x x 6 T route (ns) T Load (ns) T Total (days) R 2 6.7x x 5 6. x x x 6.3x x x 6.3 x
50 Results for a 2-bit number N K= number of concurrent multiplications=5 p =number of columns handled in one cell D = number of columns in matrix A = d (f) = density of sub-matrix handled by mesh m (f) = mesh dimension n = number of times to repeat sub-multiplications T route = time for K multiplications in the mesh T Load = time for loading and unloading for K multiplications T Total = total time for the Matrix Step = 3 (D/K) n ( T route + T Load ) R = (#FPGAs * T Total )/ ( FPGA * T Total for FPGA) Virtex II chips (f) D p d (f) m (f) n T route (ns) T Load (ns) T Total (days) R x 3 3,59 3,568.9x x 9.8 x 5.5 x 6.9x x x x.8x x 7.7 x 6.6 x x 8.2
51 Time vs. D Time vs D 2 5 fpga fpgas 256 fpgas 2 fpgas fpgas Time (days) 5 year 6 2, years Matrix Dimension (cols) fpgas fpga fpgas 256 fpgas 2 fpgas fpgas fpgas
52 Time vs. #FPGAs 2 Time (#days) 8 6 D = ~ (#FPGA) 3/ #FPGAs
53 Cost*Time vs. #FPGAs D = #FPGAs * #days 2 $2 bln$year #FPGAs
54 Conclusions
55 Summary & Conclusions () First practical hardware implementation of Mesh Routing for the Number Field Sieve implemented and verified using timing simulation Practical numbers, based on post-placing & routing static timing analysis, obtained for an array of Xilinx Virtex II 8 FPGAs A two-dimensional array of Virtex II chips can perform computations faster than a single FPGA by a factor approximately proportional to (number of FPGAs) 3/2
56 Summary & Conclusions (2) Matrix step for a 52-bit RSA key takes about 5 days on the rectangular array of FPGAs Assuming matrix size D= matrix step for a 2-bit RSA key appears to be prohibitive from the point of view of full (throughput) cost
57 Mapping designs to existing reconfigurable platforms..... Generic Array of FPGAs Existing General Purpose Reconfigurable SuperComputers
58 What is a reconfigurable computer? Microprocessor system FPGA system µp... µp FPGA... FPGA µp memory... µp memory FPGA memory... FPGA memory I/O Interface Interface I/O
59 Example: SRC 6E System Source: [SRC, MAPLD]
60 SRC MAP Reconfigurable Processor Source: [SRC, MAPLD]
61 SRC Hi-Bar TM Based Systems Hi-Bar sustains. GB/s per port with 8 ns latency per tier Up to 256 input and 256 output ports with two tiers of switch Common Memory (CM) has controller with DMA capability Controller can perform other functions such as scatter/gather Up to 8 GB DDR SDRAM supported per CM node SRC Hi-Bar Switch SNAP Memory SNAP Memory MAP MAP Common Memory Common Memory µp PCI-X Gig Ethernet etc. µp PCI-X Chaining GPIO SRC-6 Disk Storage Area Network Local Area Network Customers Existing Networks Wide Area Network Source: [SRC, MAPLD]
62 SRC Programming SRC µp system FPGA system HDL (VHDL) HLL (C) Library Developer Application Programmer
63 Other reconfigurable supercomputers Cray XD (formerly Octiga Bay 2 K) from Cray Inc. SGI Altix 3 from Silicon Graphics Star Bridge Hypercomputer from Star Bridge Systems
64 Advantages of reconfigurable computers general-purpose: cost distributed among multiple users with different needs behave like hardware: - parallel processing - distributed memory - specialized functional units, etc. can be programmed by mathematicians themselves using traditional programming languages or GUI environments encourage innovation and experimentation
65 Our future goal Polynomial Selection Sieving Next target Matrix (Linear Algebra) Square Root
66 Questions?
67 Backup slides
68 Sources of optimism - emergence of new companies supporting reconfigurable supercomputing, including major players in the area of traditional supercomputing, such as Cray Inc. and SGI - constant progress in the capabilities, performance, and flexibility of existing reconfigurable computing platforms - constant progress in the compiler technology, and logic synthesis of high level programming languages.
69 Parameters f = number of FPGAs in the FPGA array m () = mesh dimension for one FPGA = 2 m (f) = mesh dimension for f FPGAs = m () f p = number of columns of matrix handled in one cell of the mesh D (f) = matrix dimension handled by f FPGAs = (m (f) ) 2 p = ( m () f) 2 p = f (m () ) 2 p = f D ()
70 Parameters D D () D (f) = number of columns in matrix A = number of columns of sub-matrix of A handled by one FPGA in the mesh = (m () ) 2 p = number of columns of sub-matrix of A handled by f FPGAs in the mesh = (m (f) ) 2 p = (m () ) 2 f p = D () f d = column density of matrix A d () = density of sub-matrix handled by one FPGA = d D () / D d (f) = density of sub-matrix handled by f FPGAs = d D (f) / D
71 Routing Parameters T CLK_mult = multiplication clock period T CLK_IO = IO clock period x = bits to exchange between FPGAs / (bus size between FPGAs) = 2 ( + 2 k (f) +k p + K) (m () ) 2 )/ 277 T step = Total time needed for transfer of packet between cells across FPGAs = T CLK_IO + (x-) T CLK_IO + T CLK_mult h c = slowdown factor due to limited inter-fpga IO connections = T step / T CLK_mult T routne = routing time for sub-multiplication in the mesh = #entries per cell #steps T CLK_mult h c = p d (f) m (f) T CLK_mult h c
72 Loading Unloading Parameters b () = #pins for data transfer for FPGAs = #maximum FPGA IO/ 2 b (f) = #pins for data transfer for f FPGAs = (#maximum FPGA IO /) f s IO = clock stages between two FPGA connections T = CLK_load loading clock period T load = time for loading and unloading for a sub-multiplication = [( (#matrix entries bits) + (#vector bits to load) + (#vector bits to unload) )/ b (f) + s IO (m (f) -) m (f) * loading packet bits/b (f) ] T CLK_load = (( + k ( f ) + 2 k p ( ) d D f ) ( ) + ( K D b ( f ) f ) ( ) + K D f ) D ( f ) / D + s IO (m (f) -)m (f) ( + k ( f ) + 2 k p + K)/b (f) T CLK_load
73 Parameters n = number of times to repeat sub-multiplications = D 2 /(D (f) ) 2 = D 2 /((m (f) ) 2 p) 2 T Total = total time for a Matrix step = 3 D/K n ( T route + T load )
74 D (f) d (f) entries p columns d (f) p = d p f (m () ) 2 /D p threshold area on FPGA total of d (f) p entries
75 D (f) p entries p m () cells m () cells
76 Total time of routing Total routing takes maximum d m compare-exchange operations, where d matrix density = maximum number of non-zero entries per column m mesh size = D, where D is the matrix size
Implementation of Elliptic Curve Cryptosystems over GF(2 n ) in Optimal Normal Basis on a Reconfigurable Computer
Implementation of Elliptic Curve Cryptosystems over GF(2 n ) in Optimal Normal Basis on a Reconfigurable Computer Sashisu Bajracharya, Chang Shu, Kris Gaj George Mason University Tarek El-Ghazawi The George
More informationAn Implementation Comparison of an IDEA Encryption Cryptosystem on Two General-Purpose Reconfigurable Computers
An Implementation Comparison of an IDEA Encryption Cryptosystem on Two General-Purpose Reconfigurable Computers Allen Michalski 1, Kris Gaj 1, Tarek El-Ghazawi 2 1 ECE Department, George Mason University
More informationImplementation of Elliptic Curve Cryptosystems over GF(2 n ) in Optimal Normal Basis on a Reconfigurable Computer
Implementation of Elliptic Curve Cryptosystems over GF(2 n ) in Optimal Normal Basis on a Reconfigurable Computer Sashisu Bajracharya 1, Chang Shu 1, Kris Gaj 1, Tarek El-Ghazawi 2 1 ECE Department, George
More informationMaster s Thesis Presentation Hoang Le Director: Dr. Kris Gaj
Master s Thesis Presentation Hoang Le Director: Dr. Kris Gaj Outline RSA ECM Reconfigurable Computing Platforms, Languages and Programming Environments Partitioning t ECM Code between HDLs and HLLs Implementation
More informationIMPLICIT+EXPLICIT Architecture
IMPLICIT+EXPLICIT Architecture Fortran Carte Programming Environment C Implicitly Controlled Device Dense logic device Typically fixed logic µp, DSP, ASIC, etc. Implicit Device Explicit Device Explicitly
More informationAn Optimized Hardware Architecture for the Montgomery Multiplication Algorithm
An Optimized Hardware Architecture for the Montgomery Multiplication Algorithm Miaoqing Huang 1, Kris Gaj 2, Soonhak Kwon 3, Tarek El-Ghazawi 1 1 The George Washington University, Washington, D.C., U.S.A.
More informationTools for Reconfigurable Supercomputing. Kris Gaj George Mason University
Tools for Reconfigurable Supercomputing Kris Gaj George Mason University 1 Application Development for Reconfigurable Computers Program Entry Platform mapping Debugging & Verification Compilation Execution
More informationImproved Routing-Based Linear Algebra for the Number Field Sieve
Improved Routing-Based Linear Algebra for the Number Field Sieve Willi Geiselmann, Hubert Köpfer, Rainer Steinwandt IAKS, Arbeitsgruppe Systemsicherheit, Prof. Beth, Universität Karlsruhe, 76 131 Karlsruhe,
More informationPerformance and Overhead in a Hybrid Reconfigurable Computer
Performance and Overhead in a Hybrid Reconfigurable Computer Osman Devrim Fidanci 1, Dan Poznanovic 2, Kris Gaj 3, Tarek El-Ghazawi 1, Nikitas Alexandridis 1 1 George Washington University, 2 SRC Computers
More informationFPGA Implementation of High Throughput Circuit for Trial Division by Small Primes
FPGA Implementation of High Throughput Circuit for Trial Division by Small Primes Gabriel Southern, Chris Mason, Lalitha Chikkam, Patrick Baier, and Kris Gaj George Mason University {gsouther, cmason4,
More informationComparison of Number Field Sieve Factoring Hardware Implementations
Comparison of Number Field Sieve Factoring Hardware Implementations Sashisu Bajracharya and Sang Han ECE 646 Fall 23 Abstract Security of RSA public key cryptosystem lies in the hardness of factoring large
More informationToday. Comments about assignment Max 1/T (skew = 0) Max clock skew? Comments about assignment 3 ASICs and Programmable logic Others courses
Today Comments about assignment 3-43 Comments about assignment 3 ASICs and Programmable logic Others courses octor Per should show up in the end of the lecture Mealy machines can not be coded in a single
More informationFPGA: What? Why? Marco D. Santambrogio
FPGA: What? Why? Marco D. Santambrogio marco.santambrogio@polimi.it 2 Reconfigurable Hardware Reconfigurable computing is intended to fill the gap between hardware and software, achieving potentially much
More informationField Programmable Gate Array
Field Programmable Gate Array System Arch 27 (Fire Tom Wada) What is FPGA? System Arch 27 (Fire Tom Wada) 2 FPGA Programmable (= reconfigurable) Digital System Component Basic components Combinational
More informationIntroduction to Field Programmable Gate Arrays
Introduction to Field Programmable Gate Arrays Lecture 1/3 CERN Accelerator School on Digital Signal Processing Sigtuna, Sweden, 31 May 9 June 2007 Javier Serrano, CERN AB-CO-HT Outline Historical introduction.
More informationRiceNIC. Prototyping Network Interfaces. Jeffrey Shafer Scott Rixner
RiceNIC Prototyping Network Interfaces Jeffrey Shafer Scott Rixner RiceNIC Overview Gigabit Ethernet Network Interface Card RiceNIC - Prototyping Network Interfaces 2 RiceNIC Overview Reconfigurable and
More informationDeveloping a Data Driven System for Computational Neuroscience
Developing a Data Driven System for Computational Neuroscience Ross Snider and Yongming Zhu Montana State University, Bozeman MT 59717, USA Abstract. A data driven system implies the need to integrate
More informationAn FPGA Based Adaptive Viterbi Decoder
An FPGA Based Adaptive Viterbi Decoder Sriram Swaminathan Russell Tessier Department of ECE University of Massachusetts Amherst Overview Introduction Objectives Background Adaptive Viterbi Algorithm Architecture
More informationCPE/EE 422/522. Introduction to Xilinx Virtex Field-Programmable Gate Arrays Devices. Dr. Rhonda Kay Gaede UAH. Outline
CPE/EE 422/522 Introduction to Xilinx Virtex Field-Programmable Gate Arrays Devices Dr. Rhonda Kay Gaede UAH Outline Introduction Field-Programmable Gate Arrays Virtex Virtex-E, Virtex-II, and Virtex-II
More informationFPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011
FPGA for Complex System Implementation National Chiao Tung University Chun-Jen Tsai 04/14/2011 About FPGA FPGA was invented by Ross Freeman in 1989 SRAM-based FPGA properties Standard parts Allowing multi-level
More informationINTRODUCTION TO FPGA ARCHITECTURE
3/3/25 INTRODUCTION TO FPGA ARCHITECTURE DIGITAL LOGIC DESIGN (BASIC TECHNIQUES) a b a y 2input Black Box y b Functional Schematic a b y a b y a b y 2 Truth Table (AND) Truth Table (OR) Truth Table (XOR)
More informationBasic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices
3 Digital Systems Implementation Programmable Logic Devices Basic FPGA Architectures Why Programmable Logic Devices (PLDs)? Low cost, low risk way of implementing digital circuits as application specific
More informationImplementation of the rho, p-1 & the Elliptic Curve Methods of Factoring in Reconfigurable Hardware
Implementation of the rho, p-1 & the Elliptic Curve Methods of Factoring in Reconfigurable Hardware Kris Gaj Patrick Baier Soonhak Kwon Hoang Le Ramakrishna Bachimanchi Khaleeluddin Mohammed Paul Kohlbrenner
More informationSECURE PARTIAL RECONFIGURATION OF FPGAs. Amir S. Zeineddini Kris Gaj
SECURE PARTIAL RECONFIGURATION OF FPGAs Amir S. Zeineddini Kris Gaj Outline FPGAs Security Our scheme Implementation approach Experimental results Conclusions FPGAs SECURITY SRAM FPGA Security Designer/Vendor
More informationHigh-Performance Integer Factoring with Reconfigurable Devices
FPL 2010, Milan, August 31st September 2nd, 2010 High-Performance Integer Factoring with Reconfigurable Devices Ralf Zimmermann, Tim Güneysu, Christof Paar Horst Görtz Institute for IT-Security Ruhr-University
More informationHardware Implementation
Low Density Parity Check decoder Hardware Implementation Ruchi Rani (2008EEE2225) Under guidance of Prof. Jayadeva Dr.Shankar Prakriya 1 Indian Institute of Technology LDPC code Linear block code which
More informationFPGA architecture and design technology
CE 435 Embedded Systems Spring 2017 FPGA architecture and design technology Nikos Bellas Computer and Communications Engineering Department University of Thessaly 1 FPGA fabric A generic island-style FPGA
More informationFast implementation and fair comparison of the final candidates for Advanced Encryption Standard using Field Programmable Gate Arrays
Kris Gaj and Pawel Chodowiec Electrical and Computer Engineering George Mason University Fast implementation and fair comparison of the final candidates for Advanced Encryption Standard using Field Programmable
More informationLab 3 Sequential Logic for Synthesis. FPGA Design Flow.
Lab 3 Sequential Logic for Synthesis. FPGA Design Flow. Task 1 Part 1 Develop a VHDL description of a Debouncer specified below. The following diagram shows the interface of the Debouncer. The following
More informationVHDL for Synthesis. Course Description. Course Duration. Goals
VHDL for Synthesis Course Description This course provides all necessary theoretical and practical know how to write an efficient synthesizable HDL code through VHDL standard language. The course goes
More informationCprE 583 Reconfigurable Computing
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #9 Logic Emulation Technology Recap FPGA-Based Router (FPX)
More informationFast implementations of secret-key block ciphers using mixed inner- and outer-round pipelining
Pawel Chodowiec, Po Khuon, Kris Gaj Electrical and Computer Engineering George Mason University Fast implementations of secret-key block ciphers using mixed inner- and outer-round pipelining http://ece.gmu.edu/crypto-text.htm
More informationSignal Processing Algorithms into Fixed Point FPGA Hardware Dennis Silage ECE Temple University
Signal Processing Algorithms into Fixed Point FPGA Hardware Dennis Silage silage@temple.edu ECE Temple University www.temple.edu/scdl Signal Processing Algorithms into Fixed Point FPGA Hardware Motivation
More informationOutline of Presentation
Built-In Self-Test for Programmable I/O Buffers in FPGAs and SoCs Sudheer Vemula and Charles Stroud Electrical and Computer Engineering Auburn University presented at 2006 IEEE Southeastern Symp. On System
More informationAn Optimized Hardware Architecture for the Montgomery Multiplication Algorithm
An Optimized Hardware Architecture for the Montgomery Multiplication Algorithm Miaoqing Huang 1, Kris Gaj 2, Soonhak Kwon 3, and Tarek El-Ghazawi 1 1 The George Washington University, Washington, DC 20052,
More informationFPGA Polyphase Filter Bank Study & Implementation
FPGA Polyphase Filter Bank Study & Implementation Raghu Rao Matthieu Tisserand Mike Severa Prof. John Villasenor Image Communications/. Electrical Engineering Dept. UCLA 1 Introduction This document describes
More informationEnd User Update: High-Performance Reconfigurable Computing
End User Update: High-Performance Reconfigurable Computing Tarek El-Ghazawi Director, GW Institute for Massively Parallel Applications and Computing Technologies(IMPACT) Co-Director, NSF Center for High-Performance
More informationGraduate Institute of Electronics Engineering, NTU FPGA Design with Xilinx ISE
FPGA Design with Xilinx ISE Presenter: Shu-yen Lin Advisor: Prof. An-Yeu Wu 2005/6/6 ACCESS IC LAB Outline Concepts of Xilinx FPGA Xilinx FPGA Architecture Introduction to ISE Code Generator Constraints
More informationDeveloping Applications for HPRCs
Developing Applications for HPRCs Esam El-Araby The George Washington University Acknowledgement Prof.\ Tarek El-Ghazawi Mohamed Taher ARSC SRC SGI Cray 2 Outline Background Methodology A Case Studies
More informationSpiral 2-8. Cell Layout
2-8.1 Spiral 2-8 Cell Layout 2-8.2 Learning Outcomes I understand how a digital circuit is composed of layers of materials forming transistors and wires I understand how each layer is expressed as geometric
More informationFPGA VHDL Design Flow AES128 Implementation
Sakinder Ali FPGA VHDL Design Flow AES128 Implementation Field Programmable Gate Array Basic idea: two-dimensional array of logic blocks and flip-flops with a means for the user to configure: 1. The interconnection
More informationVirtex-II Architecture. Virtex II technical, Design Solutions. Active Interconnect Technology (continued)
Virtex-II Architecture SONET / SDH Virtex II technical, Design Solutions PCI-X PCI DCM Distri RAM 18Kb BRAM Multiplier LVDS FIFO Shift Registers BLVDS SDRAM QDR SRAM Backplane Rev 4 March 4th. 2002 J-L
More informationConsiderations for Algorithm Selection and C Programming Style for the SRC-6E Reconfigurable Computer
Considerations for Algorithm Selection and C Programming Style for the SRC-6E Reconfigurable Computer Russ Duren and Douglas Fouts Naval Postgraduate School Abstract: The architecture and programming environment
More information! Program logic functions, interconnect using SRAM. ! Advantages: ! Re-programmable; ! dynamically reconfigurable; ! uses standard processes.
Topics! SRAM-based FPGA fabrics:! Xilinx.! Altera. SRAM-based FPGAs! Program logic functions, using SRAM.! Advantages:! Re-programmable;! dynamically reconfigurable;! uses standard processes.! isadvantages:!
More informationA Framework to Improve IP Portability on Reconfigurable Computers
A Framework to Improve IP Portability on Reconfigurable Computers Miaoqing Huang, Ivan Gonzalez, Sergio Lopez-Buedo, and Tarek El-Ghazawi NSF Center for High-Performance Reconfigurable Computing (CHREC)
More informationVHDL-MODELING OF A GAS LASER S GAS DISCHARGE CIRCUIT Nataliya Golian, Vera Golian, Olga Kalynychenko
136 VHDL-MODELING OF A GAS LASER S GAS DISCHARGE CIRCUIT Nataliya Golian, Vera Golian, Olga Kalynychenko Abstract: Usage of modeling for construction of laser installations today is actual in connection
More informationAn Introduction to Programmable Logic
Outline An Introduction to Programmable Logic 3 November 24 Transistors Logic Gates CPLD Architectures FPGA Architectures Device Considerations Soft Core Processors Design Example Quiz Semiconductors Semiconductor
More informationHigh Capacity and High Performance 20nm FPGAs. Steve Young, Dinesh Gaitonde August Copyright 2014 Xilinx
High Capacity and High Performance 20nm FPGAs Steve Young, Dinesh Gaitonde August 2014 Not a Complete Product Overview Page 2 Outline Page 3 Petabytes per month Increasing Bandwidth Global IP Traffic Growth
More informationThe Xilinx XC6200 chip, the software tools and the board development tools
The Xilinx XC6200 chip, the software tools and the board development tools What is an FPGA? Field Programmable Gate Array Fully programmable alternative to a customized chip Used to implement functions
More informationField Programmable Gate Array (FPGA)
Field Programmable Gate Array (FPGA) Lecturer: Krébesz, Tamas 1 FPGA in general Reprogrammable Si chip Invented in 1985 by Ross Freeman (Xilinx inc.) Combines the advantages of ASIC and uc-based systems
More informationECE 636. Reconfigurable Computing. Lecture 2. Field Programmable Gate Arrays I
ECE 636 Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays I Overview Anti-fuse and EEPROM-based devices Contemporary SRAM devices - Wiring - Embedded New trends - Single-driver wiring -
More informationVirtex-II Architecture
Virtex-II Architecture Block SelectRAM resource I/O Blocks (IOBs) edicated multipliers Programmable interconnect Configurable Logic Blocks (CLBs) Virtex -II architecture s core voltage operates at 1.5V
More informationRECONFIGURABLE COMPUTING: A DESIGN AND IMPLEMENTATION STUDY OF ELLIPTIC CURVE METHOD OF FACTORING USING SRC CARTE-C AND CELOXICA HANDEL-C
RECONFIGURABLE COMPUTING: A DESIGN AND IMPLEMENTATION STUDY OF ELLIPTIC CURVE METHOD OF FACTORING USING SRC CARTE-C AND CELOXICA HANDEL-C Committee: by Hoang Le A Thesis Submitted to the Graduate Faculty
More informationChapter 2. FPGA and Dynamic Reconfiguration ...
Chapter 2 FPGA and Dynamic Reconfiguration... This chapter will introduce a family of silicon devices, FPGAs exploring their architecture. This work is based on these particular devices. The chapter will
More information2015 Paper E2.1: Digital Electronics II
s 2015 Paper E2.1: Digital Electronics II Answer ALL questions. There are THREE questions on the paper. Question ONE counts for 40% of the marks, other questions 30% Time allowed: 2 hours (Not to be removed
More informationFPGA design with National Instuments
FPGA design with National Instuments Rémi DA SILVA Systems Engineer - Embedded and Data Acquisition Systems - MED Region ni.com The NI Approach to Flexible Hardware Processor Real-time OS Application software
More informationComparison of the Hardware Performance of the AES Candidates Using Reconfigurable Hardware
Comparison of the Hardware Performance of the AES Candidates Using Reconfigurable Hardware Master s Thesis Pawel Chodowiec MS CpE Candidate, ECE George Mason University Advisor: Dr. Kris Gaj, ECE George
More informationCompute Node Design for DAQ and Trigger Subsystem in Giessen. Justus Liebig University in Giessen
Compute Node Design for DAQ and Trigger Subsystem in Giessen Justus Liebig University in Giessen Outline Design goals Current work in Giessen Hardware Software Future work Justus Liebig University in Giessen,
More informationDesign Methodologies and Tools. Full-Custom Design
Design Methodologies and Tools Design styles Full-custom design Standard-cell design Programmable logic Gate arrays and field-programmable gate arrays (FPGAs) Sea of gates System-on-a-chip (embedded cores)
More informationFPGA Provides Speedy Data Compression for Hyperspectral Imagery
FPGA Provides Speedy Data Compression for Hyperspectral Imagery Engineers implement the Fast Lossless compression algorithm on a Virtex-5 FPGA; this implementation provides the ability to keep up with
More informationHigh-Performance Linear Algebra Processor using FPGA
High-Performance Linear Algebra Processor using FPGA J. R. Johnson P. Nagvajara C. Nwankpa 1 Extended Abstract With recent advances in FPGA (Field Programmable Gate Array) technology it is now feasible
More informationA Hardware Cache memcpy Accelerator
A Hardware memcpy Accelerator Stephan Wong, Filipa Duarte, and Stamatis Vassiliadis Computer Engineering, Delft University of Technology Mekelweg 4, 2628 CD Delft, The Netherlands {J.S.S.M.Wong, F.Duarte,
More informationThe Next Generation 65-nm FPGA. Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006
The Next Generation 65-nm FPGA Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006 Hot Chips, 2006 Structure of the talk 65nm technology going towards 32nm Virtex-5 family Improved I/O Benchmarking
More informationA SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN
A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN Xiaoying Li 1 Fuming Sun 2 Enhua Wu 1, 3 1 University of Macau, Macao, China 2 University of Science and Technology Beijing, Beijing, China
More informationPROGRAMMABLE MODULES SPECIFICATION OF PROGRAMMABLE COMBINATIONAL AND SEQUENTIAL MODULES
PROGRAMMABLE MODULES SPECIFICATION OF PROGRAMMABLE COMBINATIONAL AND SEQUENTIAL MODULES. psa. rom. fpga THE WAY THE MODULES ARE PROGRAMMED NETWORKS OF PROGRAMMABLE MODULES EXAMPLES OF USES Programmable
More informationELE 758 * DIGITAL SYSTEMS ENGINEERING * MIDTERM TEST * Circle the memory type based on electrically re-chargeable elements
ELE 758 * DIGITAL SYSTEMS ENGINEERING * MIDTERM TEST * Student name: Date: Example 1 Section: Memory hierarchy (SRAM, DRAM) Question # 1.1 Circle the memory type based on electrically re-chargeable elements
More informationHardware Oriented Security
1 / 20 Hardware Oriented Security SRC-7 Programming Basics and Pipelining Miaoqing Huang University of Arkansas Fall 2014 2 / 20 Outline Basics of SRC-7 Programming Pipelining 3 / 20 Framework of Program
More informationWhat is Xilinx Design Language?
Bill Jason P. Tomas University of Nevada Las Vegas Dept. of Electrical and Computer Engineering What is Xilinx Design Language? XDL is a human readable ASCII format compatible with the more widely used
More informationBasic FPGA Architecture Xilinx, Inc. All Rights Reserved
Basic FPGA Architecture 2005 Xilinx, Inc. All Rights Reserved Objectives After completing this module, you will be able to: Identify the basic architectural resources of the Virtex -II FPGA List the differences
More informationIntroduction to FPGA Design with Vivado High-Level Synthesis. UG998 (v1.0) July 2, 2013
Introduction to FPGA Design with Vivado High-Level Synthesis Notice of Disclaimer The information disclosed to you hereunder (the Materials ) is provided solely for the selection and use of Xilinx products.
More informationEECS150 - Digital Design Lecture 16 Memory 1
EECS150 - Digital Design Lecture 16 Memory 1 March 13, 2003 John Wawrzynek Spring 2003 EECS150 - Lec16-mem1 Page 1 Memory Basics Uses: Whenever a large collection of state elements is required. data &
More informationRUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch
RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC Zoltan Baruch Computer Science Department, Technical University of Cluj-Napoca, 26-28, Bariţiu St., 3400 Cluj-Napoca,
More informationHardware Implementation of TRaX Architecture
Hardware Implementation of TRaX Architecture Thesis Project Proposal Tim George I. Project Summery The hardware ray tracing group at the University of Utah has designed an architecture for rendering graphics
More informationEECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)
EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) September 12, 2002 John Wawrzynek Fall 2002 EECS150 - Lec06-FPGA Page 1 Outline What are FPGAs? Why use FPGAs (a short history
More informationECE 297:11 Reconfigurable Architectures for Computer Security
ECE 297:11 Reconfigurable Architectures for Computer Security Course web page: http://mason.gmu.edu/~kgaj/ece297 Instructors: Kris Gaj (GMU) Tarek El-Ghazawi (GWU) TA: Pawel Chodowiec (GMU) Kris Gaj George
More informationOutline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs?
EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) September 12, 2002 John Wawrzynek Outline What are FPGAs? Why use FPGAs (a short history lesson). FPGA variations Internal logic
More informationPINE TRAINING ACADEMY
PINE TRAINING ACADEMY Course Module A d d r e s s D - 5 5 7, G o v i n d p u r a m, G h a z i a b a d, U. P., 2 0 1 0 1 3, I n d i a Digital Logic System Design using Gates/Verilog or VHDL and Implementation
More informationEECS Components and Design Techniques for Digital Systems. Lec 20 RTL Design Optimization 11/6/2007
EECS 5 - Components and Design Techniques for Digital Systems Lec 2 RTL Design Optimization /6/27 Shauki Elassaad Electrical Engineering and Computer Sciences University of California, Berkeley Slides
More informationEfficient Hardware Design and Implementation of AES Cryptosystem
Efficient Hardware Design and Implementation of AES Cryptosystem PRAVIN B. GHEWARI 1 MRS. JAYMALA K. PATIL 1 AMIT B. CHOUGULE 2 1 Department of Electronics & Telecommunication 2 Department of Computer
More informationCOE 561 Digital System Design & Synthesis Introduction
1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design
More informationMark Redekopp, All rights reserved. EE 352 Unit 10. Memory System Overview SRAM vs. DRAM DMA & Endian-ness
EE 352 Unit 10 Memory System Overview SRAM vs. DRAM DMA & Endian-ness The Memory Wall Problem: The Memory Wall Processor speeds have been increasing much faster than memory access speeds (Memory technology
More informationSimplify System Complexity
1 2 Simplify System Complexity With the new high-performance CompactRIO controller Arun Veeramani Senior Program Manager National Instruments NI CompactRIO The Worlds Only Software Designed Controller
More informationEECS150 - Digital Design Lecture 16 - Memory
EECS150 - Digital Design Lecture 16 - Memory October 17, 2002 John Wawrzynek Fall 2002 EECS150 - Lec16-mem1 Page 1 Memory Basics Uses: data & program storage general purpose registers buffering table lookups
More informationParallel graph traversal for FPGA
LETTER IEICE Electronics Express, Vol.11, No.7, 1 6 Parallel graph traversal for FPGA Shice Ni a), Yong Dou, Dan Zou, Rongchun Li, and Qiang Wang National Laboratory for Parallel and Distributed Processing,
More informationRC6 Implementation including key scheduling using FPGA
ECE 646, HI-3 1 RC6 Implementation including key scheduling using FPGA (ECE 646 Project, December 2006) Fouad Ramia, Hunar Qadir, GMU Abstract with today's great demand for secure communications systems,
More informationAdvanced FPGA Design Methodologies with Xilinx Vivado
Advanced FPGA Design Methodologies with Xilinx Vivado Alexander Jäger Computer Architecture Group Heidelberg University, Germany Abstract With shrinking feature sizes in the ASIC manufacturing technology,
More informationAL8253 Core Application Note
AL8253 Core Application Note 6-15-2012 Table of Contents General Information... 3 Features... 3 Block Diagram... 3 Contents... 4 Behavioral... 4 Synthesizable... 4 Test Vectors... 4 Interface... 5 Implementation
More informationFPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)
FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) D.Udhayasheela, pg student [Communication system],dept.ofece,,as-salam engineering and technology, N.MageshwariAssistant Professor
More informationSystem Verification of Hardware Optimization Based on Edge Detection
Circuits and Systems, 2013, 4, 293-298 http://dx.doi.org/10.4236/cs.2013.43040 Published Online July 2013 (http://www.scirp.org/journal/cs) System Verification of Hardware Optimization Based on Edge Detection
More informationFPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS
FPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS 1 RONNIE O. SERFA JUAN, 2 CHAN SU PARK, 3 HI SEOK KIM, 4 HYEONG WOO CHA 1,2,3,4 CheongJu University E-maul: 1 engr_serfs@yahoo.com,
More informationEITF35: Introduction to Structured VLSI Design
EITF35: Introduction to Structured VLSI Design Introduction to FPGA design Rakesh Gangarajaiah Rakesh.gangarajaiah@eit.lth.se Slides from Chenxin Zhang and Steffan Malkowsky WWW.FPGA What is FPGA? Field
More informationSpiral 3-1. Hardware/Software Interfacing
3-1.1 Spiral 3-1 Hardware/Software Interfacing 3-1.2 Learning Outcomes I understand the PicoBlaze bus interface signals: PORT_ID, IN_PORT, OUT_PORT, WRITE_STROBE I understand how a memory map provides
More informationA software platform to support dynamically reconfigurable Systems-on-Chip under the GNU/Linux operating system
A software platform to support dynamically reconfigurable Systems-on-Chip under the GNU/Linux operating system 26th July 2005 Alberto Donato donato@elet.polimi.it Relatore: Prof. Fabrizio Ferrandi Correlatore:
More informationFPGA Implementation and Validation of the Asynchronous Array of simple Processors
FPGA Implementation and Validation of the Asynchronous Array of simple Processors Jeremy W. Webb VLSI Computation Laboratory Department of ECE University of California, Davis One Shields Avenue Davis,
More informationINTRODUCTION TO FIELD PROGRAMMABLE GATE ARRAYS (FPGAS)
INTRODUCTION TO FIELD PROGRAMMABLE GATE ARRAYS (FPGAS) Bill Jason P. Tomas Dept. of Electrical and Computer Engineering University of Nevada Las Vegas FIELD PROGRAMMABLE ARRAYS Dominant digital design
More informationAES Core Specification. Author: Homer Hsing
AES Core Specification Author: Homer Hsing homer.hsing@gmail.com Rev. 0.1.1 October 30, 2012 This page has been intentionally left blank. www.opencores.org Rev 0.1.1 ii Revision History Rev. Date Author
More informationImplementation of Galois Field Arithmetic Unit on FPGA
Implementation of Galois Field Arithmetic Unit on FPGA 1 LakhendraKumar, 2 Dr. K. L. Sudha 1 B.E project scholar, VIII SEM, Dept. of E&C, DSCE, Bangalore, India 2 Professor, Dept. of E&C, DSCE, Bangalore,
More informationEE178 Spring 2018 Lecture Module 4. Eric Crabill
EE178 Spring 2018 Lecture Module 4 Eric Crabill Goals Implementation tradeoffs Design variables: throughput, latency, area Pipelining for throughput Retiming for throughput and latency Interleaving for
More informationEE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 15 Memories
EE 459/500 HDL Based Digital Design with Programmable Logic Lecture 15 Memories 1 Overview Introduction Memories Read Only Memories Random Access Memories FIFOs 2 1 Motivation Most applications need memory!
More informationDESIGN AND IMPLEMENTATION OF 32-BIT CONTROLLER FOR INTERACTIVE INTERFACING WITH RECONFIGURABLE COMPUTING SYSTEMS
DESIGN AND IMPLEMENTATION OF 32-BIT CONTROLLER FOR INTERACTIVE INTERFACING WITH RECONFIGURABLE COMPUTING SYSTEMS Ashutosh Gupta and Kota Solomon Raju Digital System Group, Central Electronics Engineering
More information