PARALLEL PERFORMANCE DIRECTED TECHNOLOGY MAPPING FOR FPGA. Laurent Lemarchand. Informatique. ea 2215, D pt. ubo University{ bp 809

Size: px
Start display at page:

Download "PARALLEL PERFORMANCE DIRECTED TECHNOLOGY MAPPING FOR FPGA. Laurent Lemarchand. Informatique. ea 2215, D pt. ubo University{ bp 809"

Transcription

1 PARALLEL PERFORMANCE DIRECTED TECHNOLOGY MAPPING FOR FPGA Laurent Lemarchand Informatique ubo University{ bp 809 f-29285, Brest { France lemarch@univ-brest.fr ea 2215, D pt ABSTRACT An ecient distributed method is developped for the technology mapping of Look Up Table-based Field Programmable Gate Arrays. Parallelization shortens the design cycle time for rapid prototyping of large designs onto fpga. In our algorithm, the boolean network is partitionned using an eective k-way partitioning tool, the subgraphs are synthesized for performance using the nominal delay predict model, and then merged back to form the covering of the circuit. Blocks are processed independently in parallel on a network of workstations. Experimental results for a set of large combinational circuits from the lgsynth'91 benchmark suite show linear speedups. Produced designs are equivalent or better in terms of performance and area as compared to designs processed without partitioning. I. INTRODUCTION Field Programmable Gate Arrays (fpga) with userprogrammability have become very popular for rapid prototyping, dsp and logic emulation due to short design time and inexpensive cost. A Look Up Table (lut)-based fpga consists of an array of lut which implement sequential and combinational logic functions, and a user-congurable network which provides connections among the lut. A K-input lut calculates any boolean function of up to K variables. Technology mapping tools convert a design, represented as a boolean network, into a functionally equivalent network of K-lut. A mapped circuit includes a lut for each primary output of the boolean network, and for each lut input which is not a primary input. The objective is to reduce the area (number of lut), to increase the routability by balancing signals among the lut, or to optimize the delays of the boolean network. Mixed objectives are also of interest, as minimizing area while preserving delays. In this paper, we focus on the delay objective. This criterion is crucial for real-time applications such as dsp. The runtimes of performance-oriented technology mappers are very large for the complex designs that could be accommodated into todays fpga components. High-level tools manage the circuit complexity at behavioral level. We have chosen a divide and conquer method based on data partitioning in order to handle complex circuits at the boolean level, and thus to speed up the prototyping process. We rst present the performance-oriented technology mapping problem, and the algorithms used for combinational circuits. In section 3, we detail how to partition the network for reducing the problem size and parallelizing the synthesis. Experimental results obtained with a distributed implementation of the synthesis are presented last. II. PERFORMANCE ORIENTED TECHNOLOGY MAPPING During the technology mapping process, designs are represented as boolean networks. A boolean network is a Directed Acyclic Graph (dag) G = (V; E) where V is the set of nodes and E the set of edges. Node v 2 V represents a logic gate or a primary input and (u; v) 2 E means that node u is an input of node v, i.e. v is a terminal of the net rooted at u. Node u (resp. v) belongs to the fanin (resp. fanout) of v (resp. u). Each node of a K-bounded network has no more than K inputs. Such a network can be mapped by covering each node by a single lut. Performance optimization objectives are based on delay models at the boolean level, that reect the expected actual delays for the mapped circuit. The rst static model is the unit delay model [8] : each lut has

2 a constant delay. The technology mapping objective is then to minimize the number of cascaded lut in the boolean network. More accurate models include the net delay model [4] in which each node (net) has a pre-assigned (thus also static) delay, the general delay model [13] that associates a propagation delay to each connection between terminals, and the nominal delay model [3] where the cost of a net is proportional to its number of terminals. This last model is dynamic since the delay of a net varies with the covering solution of the associated node. All except the unit delay model take into account both the delays induced by the lut and those of the interconnections within the mapping solution. Intuitively, the routing ressources involved in the propagation of a signal to all of its terminals increase with the number of terminals. Thus nominal delay is the more accurate for routing delays prediction. With this model, mappers need to identify congested areas in the network in order to decrease high fanout nodes delay cost. This also increases the routability of the design. boolean network 2-bounded - Decomposition- - Covering network (delay) optimal (a) (area) K-bounded - Packing - - network (delay) (c) optimized mapped network (area/delay) (b) Figure 1: Performance-oriented technology mapping for lut-based fpga Performance-oriented technology mapping is usually performed in 3 steps as shown in gure 1 : (a) decomposition : the network is decomposed into a set of 2-input lut. It has been shown that decomposition increases the solution space for covering, thus leading to better solutions [5]. This is the reason for the 2-lut decomposition step. Lots of decomposition methods have been proposed for the pre-processing. Dmig [1], a polynomial algorithm, provides equivalently good results as compared to other methods [5]. (b) covering : it is then covered by a network of K- lut. Researchers have mainly focused on static delays optimization, since dynamic delays optimization is a NP-Hard problem. Delay of a node depends exclusively on the delay of nodes on paths from the pi to this node. Current algorithms make use of dynamic programming and network ow computations for the mapping of a network, starting at pi and processing the nodes in topological order. For a K-bounded network, these algorithms nd the optimal solution in polynomial time, according to some static delay model. For the covering, FlowMap-d [4] has been proven to be optimal for the net model. It runs in O(Knm:log n) time, and prepares the network for area reduction. (c) packing : at last an area optimization step reduces the number of lut while preserving the delays obtained in (b). Various heuristics try to reduce the area as a post-processing step. df-map [2] assures optimal delay preservation, while reducing the area. It exploits mffc 1 structures in the network to reduce of the number of lut. Cong introduces in [4] the nominal delay predict model for net-based performance mapping : the delay assigned to each node (net) prior to covering reects the expected fanout of the node in the mapping solution. Fanout size and reconvergent paths from the node are considered for the delay estimation. His results show an improvement from 3 to 10 % of the actual delays of mapped circuits as compared to covering using the unit delay model. We use this model for the assignment of delay to nodes prior to the covering of the network with FlowMap-d. III. PARALLEL SYNTHESIS USING PARTITIONING Even if most of the synthesis algorithms used for performance-oriented technology mapping have polynomial runtimes, these are prohibitive for the complex designs which t into todays fpga. Our approach for reducing the synthesis runtimes consists in partitioning the circuit and in processing the subnetworks independently. These are much smaller than the original circuit, and the synthesis can be easily parallelized by distributing the blocks over a network of processors. Both factors allow to speedup the synthesis runtimes. However, the synthesis is no more global over the network, and this could lower the quality of the resulting netlist. Concerning the optimization of performance according to the nominal delay predict model, we must limit the loss of connectivity informations induced by the partitioning since performance optimization is based on network structures. For packing, the 1 Maximum Fanout Free Cone

3 partitioning must take into account the mffc structures exploited by Dmig. For both partitioning tools, the sizes of blocks must be balanced : since each block is expected to be processed in parallel, runtimes must be equilibrated to obtain good speedups for the parallel algorithm. We rst present the partitioning algorithm we use for the decomposition and covering steps. Balancing partition sizes is also discussed in this section. Next we detail our mffc-based partitioning tool. These two algorithms are used for the parallel synthesis of circuits. The overall parallel algorithm is presented last. A. Partitioning for performance optimization Given a boolean network, and a number k, k-way partitioning consists in nding an assignment of each node v 2 V to one of the k blocks while minimizing the cut, i.e. the number of nets crossing block boundaries, and balancing the number of nodes per block. By using such an algorithm, we aim at minimizing the maximal size of subnetworks within the blocks, and at limiting the loss of connectivity information between the blocks. This point is crucial since delay optimization is mainly based on connection structures within the network. We extended our PPart algorithm [11], devoted to the rapid logic synthesis of boolean networks using partitioning, to the case of performance optimization for lut-based fpga. The partitioning algorithm used is HMetis [10], an ecient multi-level partitionner, which was successfully applied in the vlsi domain. The algorithm works on the hypergraph obtained from the boolean network by connecting each node and its fanout to form an hyperedge. HMetis minimizes the number of hyperedges spawning terminals in dierents blocks. After partitioning, subnetworks are built according to the assignment of nodes to partitions. po are added to subnetworks for exporting signals used in other partitions. Corresponding pi are also added for nodes using nets from outside of the partition they belong to. The building process preserves fanin and fanout sizes, thus allowing good nominal delay estimation. Moreover, the partitionner exhibits congested areas in the network, since it doesn't split such areas, due to the cost they induce for the cut. The networks (blocks) considered are much smaller than the whole circuit. The pre-processing and the covering steps benet from this problem size reduction. Applying FlowMap-d with the nominal delay predict model on a high density subnetwork gives good results, since delays at the partition boundaries are minimized locally. Balancing partition sizes. Since PPart processes each subnetwork independently in parallel, the balancing of the partitions is the second objective of the partitioning tool. Each partition weight is calculated as the sum of the weights of nodes included in the partition. The weight of a node must thus reect the cost it induces for the processing of its associated subnetwork. Predicting this cost is very dicult since runtimes depend not only on node and network structures but also on the synthesis algorithms used for the processing of the blocks. ProperPart [7] is a parallel synthesis tool devoted to the logic optimization of combinational circuits with mis ii by partitioning. It uses an experimentally dened cost function for nodes. This cost function is tuned for the mis ii algorithm. However, results show that even with a specialized cost function, runtime estimations are coarse. Thus, we have chosen instead a general weighting function, that should be applied when coupling PPart with dierent synthesis tools. The weight of a node is calculated as the number of literals (positive or complemented variables) in the sum-of-products representation of the boolean function associated to the nodes. B. Partitioning for packing The third step, packing, exploits the Maximum Fanout Free Cone (mffc) structures. We have developped another partitioning algorithm, based on mffc clustering, that guarantees the same results for every partitioning of the clustered network when applying df-map. For each node v of a network G = (V; E), the mffc of v is the maximal set mffc v of predecessors of v such that any path starting from any node of mffc v lies entirely within mffc v. pi are excluded, and v itself is included in mffc v. Intuitively, mffc v consists of the nodes that are on paths that converge to v. It is proven that, for all couples of internal nodes in a network, their mffc are either disjoint or one contains the other [2]. Thus it is possible to cluster the nodes according to the mffc : nodes belonging to the same mffc are grouped together and replaced by a single node in the network until any mffc consists of a unique node. The resulting network is unique, and each initial node is aected to a single cluster (disjoint partitioning). Such a clustering algorithm has been exploited

4 prior to partitioning for acyclic partitioning of boolean networks [6]. However, when applied in conjunction with HMetis, the mffc-based clustering induces poor results, because, even if the partitionned hypergraph is reduced, the partitionned nodes have large weights, and dense connectivity. Thus, we apply the clustering technique only when an mffc-based synthesis algorithm is to be used. df-map reduces the area of a K-bounded network by reinjecting nodes into their successors while preserving both the K-feasibility of the network, and the delays obtained. Since no duplication of nodes is authorized, it is sucent to process each disjoint mffc independently. Due to this restriction, optimal area is calculated in polynomial time. Since the clusters are processed independently by df-map, the pre-clustering technique guarantees the optimallity of the solution. Moreover, no cut reduction is needed. This implies that we can partition the clustered network by considering balancing constraints only. Thus, our partitioning algorithm is as follows : (1) cluster the nodes based on mffc structures the weight of a cluster is the sum of nodes included (2) partition the clusters into roughly equal-size blocks (3) restore original nodes in each partition The partitioning algorithm sorts the clusters into decreasing weight order and aects them to partitions circularly. C. Parallel algorithm The parallel algorithm for performance-oriented technology mapping of lut-based fpga is integrated into the PPart parallel synthesis tool. It is a master/slave algorithm. The master partitions the circuit, and distributes synthesis tasks to the dierent slaves. The master collects the optimized circuits and merges them back into a netlist. If the number of tasks is greater than the available processors, tasks are sent on demand to idle slaves. The number of parts, and the synthesis tool used are set according to the user directives. Dening different synthesis policies allow the user to aim various optimization objectives. For performance optimization, we have dened the following procedure : (1) partition the network using HMetis (2) in parallel on each block (2.1) predict nominal delay for each node (delays are calculated as in [4]) (2.2) perform FlowMap-d (3) merge blocks and re-partition using mffc clustering (4) in parallel on each block (4.1) apply df-map (5) merge the blocks into the nal circuit. The PPart tool has been implemented in C. Parallelization is based on the pvm [9] routines library. The partitioning tools are integrated into the sis package of ucb [12]. All of the synthesis commands integrated in sis are eligeable for a parallel execution on partitionned networks. IV. RESULTS We have tested our algorithm on a network of sun ultra 1/140 Mhz workstations with 128 Mo of memory. Tested circuits are large examples from the lgsynth'91 [14] public benchmark suite. The results are given for 4 and 8 partitions. All of them are to be compared with those obtained without partitioning. Our goal is to improve the runtimes of synthesis tools for large circuits, without aecting the quality of the mapped designs. We rst detail the results in terms of quality (delay and area optimization) and then present the speedups obtained on a network of up to 8 processors. A. Quality Table 1 presents the results obtained with the usual unit delay model. Performance according to the nominal delay predict model (calculated as in [4]) is shown in table 2. Table 3 is devoted to the results in terms of area. The results obtained with the unit delay model illustrate the main drawback of the partitioning approach for the synthesis : quality could decrease largely since the algorithms are not applied globally on the network. With the unit delay model, the objective function corresponds to the minimization of the critical path lengths in the network. Partitioning destroys such structures, and involves bad results for the nal circuits, with an average loss of quality of over 20 %. Conversely, the results reported for the more accurate nominal delay model are at the opposite. Even

5 unit delay bigkey des misex seq C ex pdc s s avg gain % % area in # 4-lut bigkey des misex seq C ex pdc s s avg gain 1-1.7% -3.2% Table 1: model Performance according to the unit delay Table 3: Quality obtained in terms of area when applying the nominal delay predict model nominal delay bigkey des misex seq C ex pdc s s avg gain % 12.7 Table 2: Performance according to the nominal delay predict model if the synthesis process is no more global over the network, due to partitioning, more than 10 % performance improvement on the average is obtained for nominal delay (table 2). Nominal delay metric takes congested areas into account. The partitioning exhibits such zones, since splitting a high density connection area will increase the cut. Thus the decomposition phase benets from the partitioning, which allows to improve delays at each congested zone boundaries individually. This leads to an overall better mapped design. Area results are coarselly equivalent with or without partitioning. The covering phase has thus a small impact on the packing phase. The mffc-based clustering guarantees the optimality of the solution obtained by partitioning. Thus the good performances for the delays are not paid by an important loss of area for the design. seq. time speed up bigkey des misex seq C ex pdc s s average Table 4: Speed up on a network of workstations B. Speedup Table 4 shows the speedups obtained by distributing the synthesis process on 4 and 8 processors. Times are in seconds for the sequential case, and speedups are given otherwise. Parallel execution times include partitioning, synthesis and merging. Circuits are ordered in increasing runtimes for their synthesis on a single processor without partitioning. Results show linear speedups on the average. Since the partitioning tools have very small runtimes, partitioning doesn't penalizes the overall execution times if the circuits are large enough. For example, bigkey induces less than 2 minutes of cpu time in sequential, and the speedups are small (1.3) on 4 or 8 processors. On the other hand, synthesis of large circuits such as pdc and the following takes benet from the parallel approach for improving the runtimes. The bad result obtained for ex1010 on 8 processors is due to the unbalance of the partition sizes.

6 For some of the circuits (C7552, s38417, s ), speedups are super-linear These results are mainly due to the problem size reduction involved by the partitioning. Partitioning avoids the large memory room needed for the direct synthesis of circuits, thus avoids memory swap to disk, that slows down processing. V. CONCLUSION In this paper we have presented a partitioning approach for the performance-oriented technology mapping of large combinational circuits onto lut-based fpga. The synthesis process makes use of both exact and heuristic methods for optimizing performance and area of circuits. The partitioning tools are adapted to the synthesis algorithms used. Even if partitioning involves performance degradation if a simple unit delay model is used for the delay estimation, results are much better with the more accurate nominal delay predict model. Due to the partitioning, congested area in circuits are exhibited. This allows to optimize delay propagation locally. This local performance optimization allows to increase overall solution quality, as compared to a non-partitioning approach. A specialized partitioning tool limits the loss of area induced by partitioning. The model also increases the routability of the circuits, leading to better results in terms of performance for the placed-routed designs. The use of a partitioning approach allows to parallelize the synthesis easily on a network of computers. The parallel algorithm provides linear speedups and avoids prohibitive runtimes for the rapid prototyping of large designs. VI. ACKNOWLEDGEMENTS Thanks to Prof. Jason Cong, from ucla, who provided the source of his FlowMap package (version 0.2) for our experiments with sis. VII. REFERENCES [1] K.-C. Chen et al. DAG-map: Graph-based fpga technology mapping for delay optimisation. IEEE Design and Test of Computers, pages 7{20, September [2] J. Cong and Y. Ding. On area/depth trade-o in lut-based fpga technology mapping. IEEE Trans. on VLSI Systems, 2(2):137{148, June [3] J. Cong and Y. Ding. On nominal delay minimization in lut-based fpga technology mapping. Integration { The VLSI Journal, 18:73{94, November [4] J. Cong et al. lut-based fpga technology mapping under arbitrary net-delay models. Computers and Graphics, 18(4):507{516, [5] J. Cong and Y.-Y. Hwang. Structural gate decomposition for depth-optimal technology mapping in lut-based fpga designs. In Proc. ACM/IEEE Design Automation Conf., Las Vegas, NV, June [6] J. Cong et al. Acyclic multi-way partitioning of boolean networks. In Proc. ACM/IEEE Design Automation Conf., pages 670 { 675, [7] K. De and P. Banerjee. Parallel logic synthesis using partitioning. In Proc. Int'l Conf. on Parallel Processing, [8] R. Francis et al. Technology mapping of lookup table-based fpgas for performance. In Proc. Int'l Conf. on Computer-Aided Design, pages 568{ 571, Santa Clara,CA, Novembre [9] G. Geist et al. PVM: Parallel Virtual Machine - A Users Guide and Tutorial for Network Parallel Computing. MIT Press, [10] G. Karypis et al. Multilevel hypergraph partitioning : Application in VLSI domain. In Proc. ACM/IEEE Design Automation Conf., June [11] L. Lemarchand. Parallel synthesis of large combinational circuits for fpgas. In Proc. of High Performance Computing and Networking Europe'97, volume 1225 of Lecture Notes in Computer Science, Vienna, Austria, April Springer- Verlag. [12] E. Sentovich et al. SIS: a system for sequential circuit synthesis. memorandum UCB/ERL M92/41, University of California at Berkeley, mai [13] H. Yang and D.F. Wong. Edge-map: Optimal performance driven technology mapping for iterative lut-based fpgas designs. In Proc. Int'l Conf. on Computer-Aided Design, pages 150{ 155, San Jose, CA, [14] S. Yang. Logic synthesis and optimization benchmarks user guide. Technical report, Stanford University, 1991.

Beyond the Combinatorial Limit in Depth Minimization for LUT-Based FPGA Designs

Beyond the Combinatorial Limit in Depth Minimization for LUT-Based FPGA Designs Beyond the Combinatorial Limit in Depth Minimization for LUT-Based FPGA Designs Jason Cong and Yuzheng Ding Department of Computer Science University of California, Los Angeles, CA 90024 Abstract In this

More information

On Nominal Delay Minimization in LUT-Based FPGA Technology Mapping

On Nominal Delay Minimization in LUT-Based FPGA Technology Mapping On Nominal Delay Minimization in LUT-Based FPGA Technology Mapping Jason Cong and Yuzheng Ding Department of Computer Science University of California, Los Angeles, CA 90024 Abstract In this report, we

More information

FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimization in Lookup-Table Based FPGA Designs

FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimization in Lookup-Table Based FPGA Designs . FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimization in Lookup-Table Based FPGA Designs Jason Cong and Yuzheng Ding Department of Computer Science University of California, Los Angeles,

More information

Simultaneous Depth and Area Minimization in LUT-based FPGA Mapping

Simultaneous Depth and Area Minimization in LUT-based FPGA Mapping Simultaneous Depth and Area Minimization in LUT-based FPGA Mapping Jason Cong and Yean-Yow Hwang Department of Computer Science University of California, Los Angeles, CA 90024 Abstract In this paper, we

More information

CS137: Electronic Design Automation

CS137: Electronic Design Automation CS137: Electronic Design Automation Day 4: January 16, 2002 Clustering (LUT Mapping, Delay) Today How do we map to LUTs? What happens when delay dominates? Lessons for non-luts for delay-oriented partitioning

More information

Acyclic Multi-Way Partitioning of Boolean Networks

Acyclic Multi-Way Partitioning of Boolean Networks Acyclic Multi-Way Partitioning of Boolean Networks Jason Cong, Zheng Li, and Rajive Bagrodia Department of Computer Science University of California, Los Angeles, CA 90024 Abstract Acyclic partitioning

More information

Basic Block. Inputs. K input. N outputs. I inputs MUX. Clock. Input Multiplexors

Basic Block. Inputs. K input. N outputs. I inputs MUX. Clock. Input Multiplexors RPack: Rability-Driven packing for cluster-based FPGAs E. Bozorgzadeh S. Ogrenci-Memik M. Sarrafzadeh Computer Science Department Department ofece Computer Science Department UCLA Northwestern University

More information

Delay Estimation for Technology Independent Synthesis

Delay Estimation for Technology Independent Synthesis Delay Estimation for Technology Independent Synthesis Yutaka TAMIYA FUJITSU LABORATORIES LTD. 4-1-1 Kamikodanaka, Nakahara-ku, Kawasaki, JAPAN, 211-88 Tel: +81-44-754-2663 Fax: +81-44-754-2664 E-mail:

More information

ESE535: Electronic Design Automation. Today. LUT Mapping. Simplifying Structure. Preclass: Cover in 4-LUT? Preclass: Cover in 4-LUT?

ESE535: Electronic Design Automation. Today. LUT Mapping. Simplifying Structure. Preclass: Cover in 4-LUT? Preclass: Cover in 4-LUT? ESE55: Electronic Design Automation Day 7: February, 0 Clustering (LUT Mapping, Delay) Today How do we map to LUTs What happens when IO dominates Delay dominates Lessons for non-luts for delay-oriented

More information

Figure 1. PLA-Style Logic Block. P Product terms. I Inputs

Figure 1. PLA-Style Logic Block. P Product terms. I Inputs Technology Mapping for Large Complex PLDs Jason Helge Anderson and Stephen Dean Brown Department of Electrical and Computer Engineering University of Toronto 10 King s College Road Toronto, Ontario, Canada

More information

Global Clustering-Based Performance-Driven Circuit Partitioning

Global Clustering-Based Performance-Driven Circuit Partitioning Global Clustering-Based Performance-Driven Circuit Partitioning Jason Cong University of California at Los Angeles Los Angeles, CA 90095 cong@cs.ucla.edu Chang Wu Aplus Design Technologies, Inc. Los Angeles,

More information

ABC basics (compilation from different articles)

ABC basics (compilation from different articles) 1. AIG construction 2. AIG optimization 3. Technology mapping ABC basics (compilation from different articles) 1. BACKGROUND An And-Inverter Graph (AIG) is a directed acyclic graph (DAG), in which a node

More information

Simultaneous Depth and Area Minimization in LUT-based FPGA Mapping

Simultaneous Depth and Area Minimization in LUT-based FPGA Mapping Simultaneous Depth and Area Minimization in LUT-based FPGA Mapping Jason Cong and Yean-Yow Hwang Department of Computer Science University of California, Los Angeles, CA 90024 January 31, 1995 Abstract

More information

IMPLEMENTATION DESIGN FLOW

IMPLEMENTATION DESIGN FLOW IMPLEMENTATION DESIGN FLOW Hà Minh Trần Hạnh Nguyễn Duy Thái Course: Reconfigurable Computing Outline Over view Integra tion Node manipulation LUT-based mapping Design flow Design entry Functional simulation

More information

Mapping-aware Logic Synthesis with Parallelized Stochastic Optimization

Mapping-aware Logic Synthesis with Parallelized Stochastic Optimization Mapping-aware Logic Synthesis with Parallelized Stochastic Optimization Zhiru Zhang School of ECE, Cornell University September 29, 2017 @ EPFL A Case Study on Digit Recognition bit6 popcount(bit49 digit)

More information

On Nominal Delay Minimization in LUT-Based FPGA Technology Mapping

On Nominal Delay Minimization in LUT-Based FPGA Technology Mapping On Nominal Delay Minimization in LUT-Based FPGA Technology Mapping Jason Cong and Yuzheng Ding UCLA Computer Science Department, Los Angeles, CA 90024 Abstract We study the nominal delay minimization problem

More information

RASP: A General Logic Synthesis System for SRAM-based FPGAs

RASP: A General Logic Synthesis System for SRAM-based FPGAs RASP: A General Logic Synthesis System for SRAM-based FPGAs Abstract Jason Cong and John Peck Department of Computer Science University of California, Los Angeles, CA 90024 Yuzheng Ding AT&T Bell Laboratories,

More information

Heterogeneous Technology Mapping for FPGAs with Dual-Port Embedded Memory Arrays

Heterogeneous Technology Mapping for FPGAs with Dual-Port Embedded Memory Arrays Heterogeneous Technology Mapping for FPGAs with Dual-Port Embedded Memory Arrays Steven J.E. Wilton Department of Electrical and Computer Engineering University of British Columbia Vancouver, BC, Canada,

More information

Exploiting Signal Flow and Logic Dependency in Standard Cell Placement

Exploiting Signal Flow and Logic Dependency in Standard Cell Placement Exploiting Signal Flow and Logic Dependency in Standard Cell Placement Jason Cong and Dongmin Xu Computer Sci. Dept., UCLA, Los Angeles, CA 90024 Abstract -- Most existing placement algorithms consider

More information

DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs

DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jason Cong Computer Science Department University of California, Los Angeles {demingc, cong}@cs.ucla.edu ABSTRACT

More information

Large Scale Circuit Partitioning

Large Scale Circuit Partitioning Large Scale Circuit Partitioning With Loose/Stable Net Removal And Signal Flow Based Clustering Jason Cong Honching Li Sung-Kyu Lim Dongmin Xu UCLA VLSI CAD Lab Toshiyuki Shibuya Fujitsu Lab, LTD Support

More information

How Much Logic Should Go in an FPGA Logic Block?

How Much Logic Should Go in an FPGA Logic Block? How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca

More information

Implementing Logic in FPGA Memory Arrays: Heterogeneous Memory Architectures

Implementing Logic in FPGA Memory Arrays: Heterogeneous Memory Architectures Implementing Logic in FPGA Memory Arrays: Heterogeneous Memory Architectures Steven J.E. Wilton Department of Electrical and Computer Engineering University of British Columbia Vancouver, BC, Canada, V6T

More information

THE technology mapping and synthesis problem for field

THE technology mapping and synthesis problem for field 738 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 17, NO. 9, SEPTEMBER 1998 An Efficient Algorithm for Performance-Optimal FPGA Technology Mapping with Retiming Jason

More information

Combinational and Sequential Mapping with Priority Cuts

Combinational and Sequential Mapping with Priority Cuts Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton Department of EECS, University of California, Berkeley {alanmi, smcho, satrajit, brayton@eecs.berkeley.edu

More information

An Efficient Framework of Using Various Decomposition Methods to Synthesize LUT Networks and Its Evaluation

An Efficient Framework of Using Various Decomposition Methods to Synthesize LUT Networks and Its Evaluation An Efficient Framework of Using Various Decomposition Methods to Synthesize LUT Networks and Its Evaluation Shigeru Yamashita Hiroshi Sawada Akira Nagoya NTT Communication Science Laboratories 2-4, Hikaridai,

More information

Boolean Matching for Complex PLBs in LUT-based FPGAs with Application to Architecture Evaluation. Jason Cong and Yean-Yow Hwang

Boolean Matching for Complex PLBs in LUT-based FPGAs with Application to Architecture Evaluation. Jason Cong and Yean-Yow Hwang Boolean Matching for Complex PLBs in LUT-based PAs with Application to Architecture Evaluation Jason Cong and Yean-Yow wang Department of Computer Science University of California, Los Angeles {cong, yeanyow}@cs.ucla.edu

More information

Conclusions and Future Work. We introduce a new method for dealing with the shortage of quality benchmark circuits

Conclusions and Future Work. We introduce a new method for dealing with the shortage of quality benchmark circuits Chapter 7 Conclusions and Future Work 7.1 Thesis Summary. In this thesis we make new inroads into the understanding of digital circuits as graphs. We introduce a new method for dealing with the shortage

More information

ON THE INTERACTION BETWEEN POWER-AWARE FPGA CAD ALGORITHMS

ON THE INTERACTION BETWEEN POWER-AWARE FPGA CAD ALGORITHMS ON THE INTERACTION BETWEEN POWER-AWARE FPGA CAD ALGORITHMS ABSTRACT As Field-Programmable Gate Array (FPGA) power consumption continues to increase, lower power FPGA circuitry, architectures, and Computer-Aided

More information

Heterogeneous Technology Mapping for Area Reduction in FPGA s with Embedded Memory Arrays

Heterogeneous Technology Mapping for Area Reduction in FPGA s with Embedded Memory Arrays 56 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 19, NO. 1, JANUARY 2000 Heterogeneous Technology Mapping for Area Reduction in FPGA s with Embedded Memory Arrays

More information

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp Scientia Iranica, Vol. 11, No. 3, pp 159{164 c Sharif University of Technology, July 2004 On Routing Architecture for Hybrid FPGA M. Nadjarbashi, S.M. Fakhraie 1 and A. Kaviani 2 In this paper, the routing

More information

X(1) X. X(k) DFF PI1 FF PI2 PI3 PI1 FF PI2 PI3

X(1) X. X(k) DFF PI1 FF PI2 PI3 PI1 FF PI2 PI3 Partial Scan Design Methods Based on Internally Balanced Structure Tomoya TAKASAKI Tomoo INOUE Hideo FUJIWARA Graduate School of Information Science, Nara Institute of Science and Technology 8916-5 Takayama-cho,

More information

Placement Algorithm for FPGA Circuits

Placement Algorithm for FPGA Circuits Placement Algorithm for FPGA Circuits ZOLTAN BARUCH, OCTAVIAN CREŢ, KALMAN PUSZTAI Computer Science Department, Technical University of Cluj-Napoca, 26, Bariţiu St., 3400 Cluj-Napoca, Romania {Zoltan.Baruch,

More information

Routing Wire Optimization through Generic Synthesis on FPGA Carry Chains

Routing Wire Optimization through Generic Synthesis on FPGA Carry Chains Routing Wire Optimization through Generic Synthesis on FPGA Carry Chains Hadi Parandeh-Afshar hadi.parandehafshar@epfl.ch Philip Brisk philip@cs.ucr.edu Grace Zgheib grace.zgheib@lau.edu.lb Paolo Ienne

More information

g a0 1 a0 b 1 (3) (2) (4) (5) (1) 1 i 2

g a0 1 a0 b 1 (3) (2) (4) (5) (1) 1 i 2 A New Retiming-based Technology Mapping Algorithm for LUT-based FPGAs Peichen Pan y and Chih-Chang Lin z y Dept. of ECE, Clarkson University, Potsdam, NY 13699 z Verplex Systems, Inc., San Jose, CA 95112

More information

MOTION ESTIMATION IN MPEG-2 VIDEO ENCODING USING A PARALLEL BLOCK MATCHING ALGORITHM. Daniel Grosu, Honorius G^almeanu

MOTION ESTIMATION IN MPEG-2 VIDEO ENCODING USING A PARALLEL BLOCK MATCHING ALGORITHM. Daniel Grosu, Honorius G^almeanu MOTION ESTIMATION IN MPEG-2 VIDEO ENCODING USING A PARALLEL BLOCK MATCHING ALGORITHM Daniel Grosu, Honorius G^almeanu Multimedia Group - Department of Electronics and Computers Transilvania University

More information

Improvements to Technology Mapping for LUT-Based FPGAs

Improvements to Technology Mapping for LUT-Based FPGAs Improvements to Technology Mapping for LUT-Based FPGAs Alan Mishchenko Satrajit Chatterjee Robert Brayton Department of EECS, University of California, Berkeley {alanmi, satrajit, brayton}@eecs.berkeley.edu

More information

Submitted for TAU97 Abstract Many attempts have been made to combine some form of retiming with combinational

Submitted for TAU97 Abstract Many attempts have been made to combine some form of retiming with combinational Experiments in the Iterative Application of Resynthesis and Retiming Soha Hassoun and Carl Ebeling Department of Computer Science and Engineering University ofwashington, Seattle, WA fsoha,ebelingg@cs.washington.edu

More information

TECHNOLOGY MAPPING FOR THE ATMEL FPGA CIRCUITS

TECHNOLOGY MAPPING FOR THE ATMEL FPGA CIRCUITS TECHNOLOGY MAPPING FOR THE ATMEL FPGA CIRCUITS Zoltan Baruch E-mail: Zoltan.Baruch@cs.utcluj.ro Octavian Creţ E-mail: Octavian.Cret@cs.utcluj.ro Kalman Pusztai E-mail: Kalman.Pusztai@cs.utcluj.ro Computer

More information

Don't Cares in Multi-Level Network Optimization. Hamid Savoj. Abstract

Don't Cares in Multi-Level Network Optimization. Hamid Savoj. Abstract Don't Cares in Multi-Level Network Optimization Hamid Savoj University of California Berkeley, California Department of Electrical Engineering and Computer Sciences Abstract An important factor in the

More information

Quick Look under the Hood of ABC

Quick Look under the Hood of ABC Quick Look under the Hood of ABC A Programmer s Manual December 25, 2006 Network ABC is similar to SIS/MVSIS in that it processes the design by applying a sequence of transformations to the current network,

More information

A Routing Approach to Reduce Glitches in Low Power FPGAs

A Routing Approach to Reduce Glitches in Low Power FPGAs A Routing Approach to Reduce Glitches in Low Power FPGAs Quang Dinh, Deming Chen, Martin Wong Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign This research

More information

Technology Dependent Logic Optimization Prof. Kurt Keutzer EECS University of California Berkeley, CA Thanks to S. Devadas

Technology Dependent Logic Optimization Prof. Kurt Keutzer EECS University of California Berkeley, CA Thanks to S. Devadas Technology Dependent Logic Optimization Prof. Kurt Keutzer EECS University of California Berkeley, CA Thanks to S. Devadas 1 RTL Design Flow HDL RTL Synthesis Manual Design Module Generators Library netlist

More information

Preclass Warmup. ESE535: Electronic Design Automation. Motivation (1) Today. Bisection Width. Motivation (2)

Preclass Warmup. ESE535: Electronic Design Automation. Motivation (1) Today. Bisection Width. Motivation (2) ESE535: Electronic Design Automation Preclass Warmup What cut size were you able to achieve? Day 4: January 28, 25 Partitioning (Intro, KLFM) 2 Partitioning why important Today Can be used as tool at many

More information

Fast Timing-driven Partitioning-based Placement for Island Style FPGAs

Fast Timing-driven Partitioning-based Placement for Island Style FPGAs .1 Fast Timing-driven Partitioning-based Placement for Island Style FPGAs Pongstorn Maidee Cristinel Ababei Kia Bazargan Electrical and Computer Engineering Department University of Minnesota, Minneapolis,

More information

S 1 S 2. C s1. C s2. S n. C sn. S 3 C s3. Input. l k S k C k. C 1 C 2 C k-1. R d

S 1 S 2. C s1. C s2. S n. C sn. S 3 C s3. Input. l k S k C k. C 1 C 2 C k-1. R d Interconnect Delay and Area Estimation for Multiple-Pin Nets Jason Cong and David Zhigang Pan Department of Computer Science University of California, Los Angeles, CA 90095 Email: fcong,pang@cs.ucla.edu

More information

VLSI Physical Design: From Graph Partitioning to Timing Closure

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter Netlist and System Partitioning Original Authors: Andrew B. Kahng, Jens, Igor L. Markov, Jin Hu Chapter Netlist and System Partitioning. Introduction. Terminology. Optimization Goals. Partitioning

More information

8ns. 8ns. 16ns. 10ns COUT S3 COUT S3 A3 B3 A2 B2 A1 B1 B0 2 B0 CIN CIN COUT S3 A3 B3 A2 B2 A1 B1 A0 B0 CIN S0 S1 S2 S3 COUT CIN 2 A0 B0 A2 _ A1 B1

8ns. 8ns. 16ns. 10ns COUT S3 COUT S3 A3 B3 A2 B2 A1 B1 B0 2 B0 CIN CIN COUT S3 A3 B3 A2 B2 A1 B1 A0 B0 CIN S0 S1 S2 S3 COUT CIN 2 A0 B0 A2 _ A1 B1 Delay Abstraction in Combinational Logic Circuits Noriya Kobayashi Sharad Malik C&C Research Laboratories Department of Electrical Engineering NEC Corp. Princeton University Miyamae-ku, Kawasaki Japan

More information

CAD Algorithms. Circuit Partitioning

CAD Algorithms. Circuit Partitioning CAD Algorithms Partitioning Mohammad Tehranipoor ECE Department 13 October 2008 1 Circuit Partitioning Partitioning: The process of decomposing a circuit/system into smaller subcircuits/subsystems, which

More information

Parallelizing FPGA Technology Mapping using GPUs. Doris Chen Deshanand Singh Aug 31 st, 2010

Parallelizing FPGA Technology Mapping using GPUs. Doris Chen Deshanand Singh Aug 31 st, 2010 Parallelizing FPGA Technology Mapping using GPUs Doris Chen Deshanand Singh Aug 31 st, 2010 Motivation: Compile Time In last 12 years: 110x increase in FPGA Logic, 23x increase in CPU speed, 4.8x gap Question:

More information

THE PROCESS of field programmable gate array (FPGA)

THE PROCESS of field programmable gate array (FPGA) IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 11, NOVEMBER 2006 2331 Heuristics for Area Minimization in LUT-Based FPGA Technology Mapping Valavan Manohararajah,

More information

Introduction VLSI PHYSICAL DESIGN AUTOMATION

Introduction VLSI PHYSICAL DESIGN AUTOMATION VLSI PHYSICAL DESIGN AUTOMATION PROF. INDRANIL SENGUPTA DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Introduction Main steps in VLSI physical design 1. Partitioning and Floorplanning l 2. Placement 3.

More information

Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc. Abstract. Direct Volume Rendering (DVR) is a powerful technique for

Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc. Abstract. Direct Volume Rendering (DVR) is a powerful technique for Comparison of Two Image-Space Subdivision Algorithms for Direct Volume Rendering on Distributed-Memory Multicomputers Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc Dept. of Computer Eng. and

More information

Genetic Algorithm for Circuit Partitioning

Genetic Algorithm for Circuit Partitioning Genetic Algorithm for Circuit Partitioning ZOLTAN BARUCH, OCTAVIAN CREŢ, KALMAN PUSZTAI Computer Science Department, Technical University of Cluj-Napoca, 26, Bariţiu St., 3400 Cluj-Napoca, Romania {Zoltan.Baruch,

More information

A Toolbox for Counter-Example Analysis and Optimization

A Toolbox for Counter-Example Analysis and Optimization A Toolbox for Counter-Example Analysis and Optimization Alan Mishchenko Niklas Een Robert Brayton Department of EECS, University of California, Berkeley {alanmi, een, brayton}@eecs.berkeley.edu Abstract

More information

Incorporating the Controller Eects During Register Transfer Level. Synthesis. Champaka Ramachandran and Fadi J. Kurdahi

Incorporating the Controller Eects During Register Transfer Level. Synthesis. Champaka Ramachandran and Fadi J. Kurdahi Incorporating the Controller Eects During Register Transfer Level Synthesis Champaka Ramachandran and Fadi J. Kurdahi Department of Electrical & Computer Engineering, University of California, Irvine,

More information

Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction

Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction 44.1 Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He Electrical Engineering Department University of California, Los Angeles, CA

More information

International Conference on Parallel Processing (ICPP) 1994

International Conference on Parallel Processing (ICPP) 1994 Parallel Logic Synthesis using Partitioning Kaushik De LSI Logic Corporation 1551 McCarthy lvd., MS E-192 Milpitas, C 95035, US Email: kaushik@lsil.com Prithviraj anerjee Center for Reliable & High-Perf.

More information

A Path Based Algorithm for Timing Driven. Logic Replication in FPGA

A Path Based Algorithm for Timing Driven. Logic Replication in FPGA A Path Based Algorithm for Timing Driven Logic Replication in FPGA By Giancarlo Beraudo B.S., Politecnico di Torino, Torino, 2001 THESIS Submitted as partial fulfillment of the requirements for the degree

More information

Reducing Power in an FPGA via Computer-Aided Design

Reducing Power in an FPGA via Computer-Aided Design Reducing Power in an FPGA via Computer-Aided Design Steve Wilton University of British Columbia Power Reduction via CAD How to reduce power dissipation in an FPGA: - Create power-aware CAD tools - Create

More information

Field Programmable Gate Arrays

Field Programmable Gate Arrays Chortle: A Technology Mapping Program for Lookup Table-Based Field Programmable Gate Arrays Robert J. Francis, Jonathan Rose, Kevin Chung Department of Electrical Engineering, University of Toronto, Ontario,

More information

On Algebraic Expressions of Generalized Fibonacci Graphs

On Algebraic Expressions of Generalized Fibonacci Graphs On Algebraic Expressions of Generalized Fibonacci Graphs MARK KORENBLIT and VADIM E LEVIT Department of Computer Science Holon Academic Institute of Technology 5 Golomb Str, PO Box 305, Holon 580 ISRAEL

More information

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS Waqas Akram, Cirrus Logic Inc., Austin, Texas Abstract: This project is concerned with finding ways to synthesize hardware-efficient digital filters given

More information

Introduction Warp Processors Dynamic HW/SW Partitioning. Introduction Standard binary - Separating Function and Architecture

Introduction Warp Processors Dynamic HW/SW Partitioning. Introduction Standard binary - Separating Function and Architecture Roman Lysecky Department of Electrical and Computer Engineering University of Arizona Dynamic HW/SW Partitioning Initially execute application in software only 5 Partitioned application executes faster

More information

An Interconnect-Centric Design Flow for Nanometer Technologies

An Interconnect-Centric Design Flow for Nanometer Technologies An Interconnect-Centric Design Flow for Nanometer Technologies Jason Cong UCLA Computer Science Department Email: cong@cs.ucla.edu Tel: 310-206-2775 URL: http://cadlab.cs.ucla.edu/~cong Exponential Device

More information

A New Decomposition of Boolean Functions

A New Decomposition of Boolean Functions A New Decomposition of Boolean Functions Elena Dubrova Electronic System Design Lab Department of Electronics Royal Institute of Technology Kista, Sweden elena@ele.kth.se Abstract This paper introduces

More information

SEPP: a New Compact Three-Level Logic Form

SEPP: a New Compact Three-Level Logic Form SEPP: a New Compact Three-Level Logic Form Valentina Ciriani Department of Information Technologies Università degli Studi di Milano, Italy valentina.ciriani@unimi.it Anna Bernasconi Department of Computer

More information

Binary Decision Diagram with Minimum Expected Path Length

Binary Decision Diagram with Minimum Expected Path Length Binary Decision Diagram with Minimum Expected Path Length Yi-Yu Liu Kuo-Hua Wang TingTing Hwang C. L. Liu Department of Computer Science, National Tsing Hua University, Hsinchu 300, Taiwan Dept. of Computer

More information

Multilevel Algorithms for Multi-Constraint Hypergraph Partitioning

Multilevel Algorithms for Multi-Constraint Hypergraph Partitioning Multilevel Algorithms for Multi-Constraint Hypergraph Partitioning George Karypis University of Minnesota, Department of Computer Science / Army HPC Research Center Minneapolis, MN 55455 Technical Report

More information

Functional extension of structural logic optimization techniques

Functional extension of structural logic optimization techniques Functional extension of structural logic optimization techniques J. A. Espejo, L. Entrena, E. San Millán, E. Olías Universidad Carlos III de Madrid # e-mail: { ppespejo, entrena, quique, olias}@ing.uc3m.es

More information

Automated system partitioning based on hypergraphs for 3D stacked integrated circuits. FOSDEM 2018 Quentin Delhaye

Automated system partitioning based on hypergraphs for 3D stacked integrated circuits. FOSDEM 2018 Quentin Delhaye Automated system partitioning based on hypergraphs for 3D stacked integrated circuits FOSDEM 2018 Quentin Delhaye Integrated circuits: Let s go 3D Building an Integrated Circuit (IC) Transistors to build

More information

FPGA PLB Architecture Evaluation and Area Optimization Techniques using Boolean Satisfiability

FPGA PLB Architecture Evaluation and Area Optimization Techniques using Boolean Satisfiability IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. X, NO. XX, APRIL 2005 1 FPGA PLB Architecture Evaluation and Area Optimization Techniques using Boolean Satisfiability

More information

Factor Cuts. Satrajit Chatterjee Alan Mishchenko Robert Brayton ABSTRACT

Factor Cuts. Satrajit Chatterjee Alan Mishchenko Robert Brayton ABSTRACT Factor Cuts Satrajit Chatterjee Alan Mishchenko Robert Brayton Department of EECS U. C. Berkeley {satrajit, alanmi, brayton}@eecs.berkeley.edu ABSTRACT Enumeration of bounded size cuts is an important

More information

Optimal FPGA Mapping and Retiming with. Jason Cong and Chang Wu. problem which is in general NP-complete.

Optimal FPGA Mapping and Retiming with. Jason Cong and Chang Wu. problem which is in general NP-complete. Optimal FPGA Mapping and Retiming with Ecient Initial State Computation Jason Cong and Chang Wu Department of Computer Science University of California, Los Angeles, CA 90095 Abstract For sequential circuits

More information

Multi-level Quadratic Placement for Standard Cell Designs

Multi-level Quadratic Placement for Standard Cell Designs CS258f Project Report Kenton Sze Kevin Chen 06.10.02 Prof Cong Multi-level Quadratic Placement for Standard Cell Designs Project Description/Objectives: The goal of this project was to provide an algorithm

More information

Network. Department of Statistics. University of California, Berkeley. January, Abstract

Network. Department of Statistics. University of California, Berkeley. January, Abstract Parallelizing CART Using a Workstation Network Phil Spector Leo Breiman Department of Statistics University of California, Berkeley January, 1995 Abstract The CART (Classication and Regression Trees) program,

More information

/$ IEEE

/$ IEEE 240 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 2, FEBRUARY 2007 Improvements to Technology Mapping for LUT-Based FPGAs Alan Mishchenko, Member, IEEE, Satrajit

More information

Performance-Driven Mapping for CPLD Architectures

Performance-Driven Mapping for CPLD Architectures Performance-Driven Mapping for CPLD Architectures Deming Chen, Jason Cong, Miloš D. Ercegovac, and Zhijun Huang Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095

More information

Partitioning. Course contents: Readings. Kernighang-Lin partitioning heuristic Fiduccia-Mattheyses heuristic. Chapter 7.5.

Partitioning. Course contents: Readings. Kernighang-Lin partitioning heuristic Fiduccia-Mattheyses heuristic. Chapter 7.5. Course contents: Partitioning Kernighang-Lin partitioning heuristic Fiduccia-Mattheyses heuristic Readings Chapter 7.5 Partitioning 1 Basic Definitions Cell: a logic block used to build larger circuits.

More information

HYBRID FPGA ARCHITECTURE

HYBRID FPGA ARCHITECTURE HYBRID FPGA ARCHITECTURE Alireza Kaviani and Stephen Brown Department of Electrical and Computer Engineering University of Toronto, Canada Email: kaviani brown@eecg.toronto.edu Abstract This paper 1 proposes

More information

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,

More information

Routability-Driven Bump Assignment for Chip-Package Co-Design

Routability-Driven Bump Assignment for Chip-Package Co-Design 1 Routability-Driven Bump Assignment for Chip-Package Co-Design Presenter: Hung-Ming Chen Outline 2 Introduction Motivation Previous works Our contributions Preliminary Problem formulation Bump assignment

More information

A Fast Recursive Mapping Algorithm. Department of Computer and Information Science. New Jersey Institute of Technology.

A Fast Recursive Mapping Algorithm. Department of Computer and Information Science. New Jersey Institute of Technology. A Fast Recursive Mapping Algorithm Song Chen and Mary M. Eshaghian Department of Computer and Information Science New Jersey Institute of Technology Newark, NJ 7 Abstract This paper presents a generic

More information

A Novel Net Weighting Algorithm for Timing-Driven Placement

A Novel Net Weighting Algorithm for Timing-Driven Placement A Novel Net Weighting Algorithm for Timing-Driven Placement Tim (Tianming) Kong Aplus Design Technologies, Inc. 10850 Wilshire Blvd., Suite #370 Los Angeles, CA 90024 Abstract Net weighting for timing-driven

More information

IN general setting, a combinatorial network is

IN general setting, a combinatorial network is JOURNAL OF L A TEX CLASS FILES, VOL. 11, NO. 4, DECEMBER 2012 1 Clustering without replication: approximation and inapproximability Zola Donovan, Vahan Mkrtchyan, and K. Subramani, arxiv:1412.4051v1 [cs.ds]

More information

A Level-wise Priority Based Task Scheduling for Heterogeneous Systems

A Level-wise Priority Based Task Scheduling for Heterogeneous Systems International Journal of Information and Education Technology, Vol., No. 5, December A Level-wise Priority Based Task Scheduling for Heterogeneous Systems R. Eswari and S. Nickolas, Member IACSIT Abstract

More information

A Methodology and Tool Framework for Supporting Rapid Exploration of Memory Hierarchies in FPGAs

A Methodology and Tool Framework for Supporting Rapid Exploration of Memory Hierarchies in FPGAs A Methodology and Tool Framework for Supporting Rapid Exploration of Memory Hierarchies in FPGAs Harrys Sidiropoulos, Kostas Siozios and Dimitrios Soudris School of Electrical & Computer Engineering National

More information

Efficient SAT-based Boolean Matching for FPGA Technology Mapping

Efficient SAT-based Boolean Matching for FPGA Technology Mapping Efficient SAT-based Boolean Matching for FPGA Technology Mapping Sean Safarpour, Andreas Veneris Department of Electrical and Computer Engineering University of Toronto Toronto, ON, Canada {sean, veneris}@eecg.toronto.edu

More information

Standard FM MBC RW-ST. Benchmark Size Areas Net cut Areas Net cut Areas Net cut

Standard FM MBC RW-ST. Benchmark Size Areas Net cut Areas Net cut Areas Net cut Standard FM MBC RW-ST Benchmark Size Areas Net cut Areas Net cut Areas Net cut 19ks 2844 5501:5501 151 (1.000) 5501:5501 156 (1.033) 5501:5501 146 (0.967) bm1 882 1740:1740 65 (1.000) 1740:1740 54 (0.831)

More information

Minimizing Clock Domain Crossing in Network on Chip Interconnect

Minimizing Clock Domain Crossing in Network on Chip Interconnect Minimizing Clock Domain Crossing in Network on Chip Interconnect Parag Kulkarni 1, Puneet Gupta 2, Rudy Beraha 3 1 Synopsys 2 UCLA 3 Qualcomm Corp. R&D Outline Motivation The Router Coloring Problem Approaches

More information

A New Algorithm to Create Prime Irredundant Boolean Expressions

A New Algorithm to Create Prime Irredundant Boolean Expressions A New Algorithm to Create Prime Irredundant Boolean Expressions Michel R.C.M. Berkelaar Eindhoven University of technology, P.O. Box 513, NL 5600 MB Eindhoven, The Netherlands Email: michel@es.ele.tue.nl

More information

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University Information Retrieval System Using Concept Projection Based on PDDP algorithm Minoru SASAKI and Kenji KITA Department of Information Science & Intelligent Systems Faculty of Engineering, Tokushima University

More information

Local Unidirectional Bias for Smooth Cutsize-Delay Tradeoff in Performance-Driven Bipartitioning

Local Unidirectional Bias for Smooth Cutsize-Delay Tradeoff in Performance-Driven Bipartitioning Local Unidirectional Bias for Smooth Cutsize-Delay Tradeoff in Performance-Driven Bipartitioning Andrew B. Kahng CSE and ECE Departments UCSD La Jolla, CA 92093 abk@ucsd.edu Xu Xu CSE Department UCSD La

More information

Parallel Logic Synthesis Optimization for Digital Sequential Circuit

Parallel Logic Synthesis Optimization for Digital Sequential Circuit Kasetsart J. (Nat. Sci.) 36 : 319-326 (2002) Parallel Logic Synthesis Optimization for Digital Sequential Circuit Aswit Pungsema and Pradondet Nilagupta ABSTRACT High-level synthesis tools are very important

More information

Fast Boolean Matching for Small Practical Functions

Fast Boolean Matching for Small Practical Functions Fast Boolean Matching for Small Practical Functions Zheng Huang Lingli Wang Yakov Nasikovskiy Alan Mishchenko State Key Lab of ASIC and System Computer Science Department Department of EECS Fudan University,

More information

Chapter 1 Introduction

Chapter 1 Introduction Chapter 1 Introduction The advent of synthesis systems for Very Large Scale Integrated Circuits (VLSI) and automated design environments for Application Specific Integrated Circuits (ASIC) have allowed

More information

Faster Placer for Island-style FPGAs

Faster Placer for Island-style FPGAs Faster Placer for Island-style FPGAs Pritha Banerjee and Susmita Sur-Kolay Advanced Computing and Microelectronics Unit Indian Statistical Institute 0 B. T. Road, Kolkata, India email:{pritha r, ssk}@isical.ac.in

More information

THE field-programmable gate array (FPGA) has become

THE field-programmable gate array (FPGA) has become IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 27, NO. 7, JULY 2008 1203 DDBDD: Delay-Driven BDD Synthesis for FPGAs Lei Cheng, Deming Chen, and Martin D. F. Wong,

More information

Simultaneous Placement with Clustering and Duplication

Simultaneous Placement with Clustering and Duplication Simultaneous Placement with Clustering and Duplication GANG CHEN Magma Design Automation and JASON CONG UCLA Clustering, duplication, and placement are critical steps in a cluster-based FPGA design flow.

More information

1 Introduction Data format converters (DFCs) are used to permute the data from one format to another in signal processing and image processing applica

1 Introduction Data format converters (DFCs) are used to permute the data from one format to another in signal processing and image processing applica A New Register Allocation Scheme for Low Power Data Format Converters Kala Srivatsan, Chaitali Chakrabarti Lori E. Lucke Department of Electrical Engineering Minnetronix, Inc. Arizona State University

More information