PAPER Accelerating Boolean Matching Using Bloom Filter

Size: px
Start display at page:

Download "PAPER Accelerating Boolean Matching Using Bloom Filter"

Transcription

1 IEICE TRANS. FUNDAMENTALS, VOL.E93 A, NO.10 OCTOBER PAPER Accelerating Boolean Matching Using Bloom Filter Chun ZHANG, Member,YuHU, Lingli WANG a),leihe b), and Jiarong TONG, Nonmembers SUMMARY Boolean matching is a fundamental problem in FPGA synthesis, but existing Boolean matchers are not scalable to complex PLBs (programmable logic blocks) and large circuits. This paper proposes a filter-based Boolean matching method, F-BM, which accelerates Boolean matching using lookup tables implemented by Bloom filters storing precalculated matching results. To show the effectiveness of the proposed F- BM, a post-mapping re-synthesis minimizing area which employs Boolean matching as the kernel has been implemented. Tested on a broad selection of benchmarks, the re-synthesizer using F-BM is 80X faster with 0.5% more area, compared with the one using a SAT-based Boolean matcher. key words: FPGA, Boolean matching, Bloom filter, SAT, re-synthesis 1. Introduction Boolean matching is a widely used technique in field programmable gate array (FPGA) technology mapping [7], post-mapping re-synthesis [1] and architecture evaluation [2]. Ideally, an FPGA Boolean matcher should be scalable to large Boolean functions and complex PLB structures in terms of both runtime and memory, and be flexible to accommodate different PLB structures. MostexistingBoolean matching algorithms are based on function decomposition [4], [8] or function canonical forms [5], [6]. Decomposition-based methods try to split each Boolean function into smaller pieces, where each piece of function can be implemented by one component inside the PLB. However, such technique lacks flexibility as particular decomposition strategy needs to be customized for different PLB architectures. Depending on the function to be matched and the decomposition strategy employed, the run time efficiency varies greatly. Canonical form-based methods perform Boo-lean matching by computing and comparing canonical forms of different functions. A function s canonical form is the unique representative of its belonging equivalence class, and only functions of the same equivalence class match with each other. In case of matching a Boolean function to the PLB, all imple- Manuscript received February 5, Manuscript revised June 8, The authors are with the State Key Lab of ASICs and Systems, Fudan University, China. The author is with Electrical and Computer Engineering Department, University of Alberta, Canada. The author is with Electrical Engineering Department, UCLA, USA. a) llwang@fudan.edu.cn b) lhe@ee.ucla.edu DOI: /transfun.E93.A.1775 mentable functions of that PLB should be considered as matching candidates. Due to the computation complexity of canonical forms, this technique can only handle functions with limited input size. Recently, due to significant improvements of modern SAT solver [11], SAT-based Boolean matching (SAT-BM) [1] was proposed, which offers great flexibility in handling various PLB structures. However, even with numerous improvements [3], [7], [12], the expensive computational complexity of SAT-BM still limits its application to complex PLBs. In this paper, we use Bloom filter [9] based lookup tables to accelerate Boolean matching, where partial sets of pre-calculated implementable and non-implementable functions of target PLBs are stored. These lookup tables help to quickly filter out non-implementable functions in multiple calls of a Boolean matcher, and the time-consuming SAT- BM is only called for remaining functions. Different from a lookup table-based Boolean matching [15], our filter-based Boolean matching (F-BM) is capable of handling more complex PLBs. To verify the effectiveness of the proposed F-BM, we integrate it into a post-mapping re-synthesis algorithm which minimizes area (i.e., LUT number) with the same logic depth constraint [1]. Tested on three different benchmark sets (MCNC, IWLS and Industrial designs) using 3 GB memory, the re-synthesizer geared with F-BM is 80X faster with 0.5% more area than the one with the state-ofart SAT-BM [12]. With a 1500-second timeout, which is a common practice to avoid excessive runtime, F-BM based re-synthesizer reduces 2X more LUTs than re-synthesizer based on SAT-BM [12]. The remaining of the paper is organized as follows. Section 2 introduces preliminaries. Section 3 describes the proposed F-BM algorithm. Section 4 presents the postmapping re-synthesis using the proposed F-BM and experimental results. Section 5 concludes the paper. 2. Preliminaries 2.1 Boolean Matching An FPGA consists of an array of PLBs. As shown in Fig. 1(b), a PLB H(P) consists of a network of interconnected programmable and non-programmable logic devices with a set P of input pins {x 1,..., x p }. We sometimes omit Copyright c 2010 The Institute of Electronics, Information and Communication Engineers

2 1776 IEICE TRANS. FUNDAMENTALS, VOL.E93 A, NO.10 OCTOBER 2010 f (X) shown in Fig. 1(a), we need to check every entry of the truth-table for f (X), by simply extending the CNF encoding into equation (4), where X = [x 1, x 2, x 3 ]andx/000 means assigning 000 to X. G SAT = G(X/000, f /0, z/z 1 ) G(X/001, f /0, z/z 2 ) G(X/010, f /1, z/z 3 ) G(X/011, f /0, z/z 4 ) G(X/100, f /1, z/z 5 ) G(X/101, f /1, z/z 6 ) G(X/110, f /1, z/z 7 ) G(X/111, f /1, z/z 8 ) (4) Fig. 1 (a) Truth table for f (x 1, x 2, x 3 ). (b) Target PLB structure. the set of input pins and write H to refer to the PLB H(P). A K-input lookup table (K-LUT) consists of K inputs, one output, and 2 K configuration bits {L 1,...,L 2 K }. Boolean matching decides the equivalence of two Boolean functions under input negation/permutation and output negation (NPN). Specifically for FPGA where the PLB can implement only a partial set of all P-input functions, the Boolean matching problem takes as input a PLB H(P) and a Boolean function f (X) over variables X such that X P, and decides if the PLB H(P) can implement (i.e., realize the function) f (X). If it is implementable, correct configurations for the PLB are generated as well. For the simple case where H is a K-LUT, any function f (X) where X K can be implemented by H SAT-Based Method Among various approaches, SAT-based Boolean matching (SAT-BM) offers the greatest flexibility across different PLB structures [7]. It translates Boolean matching into a SAT problem by formulating the target PLB structure into Conjunctive-Normal-Form (CNF) [16], which is then solved by SAT reasoning [11]. We take an example here to review the entire flow. Equation (1) and (2) show the CNF encodings for the 2-LUT and AND gate in Fig. 1(b). Such encodings are consistent with the particular functionality of the target gate [16] due to the fact that they re only satisfiable under correct input and output relationships. For example the first two terms in equation (1) ensure that the correct configuration L 1 is fetched as output for given input x 1 = x 2 = 0. G LUT = (x 1 + x 2 + L 1 + z)(x 1 + x 2 + L 1 + z) (x 1 + x 2 + L 2 + z)(x 1 + x 2 + L 2 + z) (x 1 + x 2 + L 3 + z)(x 1 + x 2 + L 3 + z) (x 1 + x 2 + L 4 + z)(x 1 + x 2 + L 4 + z) (1) G AND = (z + f )(x 3 + f )(z + x 3 + f ) (2) Combining all components together, equation (3) formulates the CNF encoding for the target PLB. G = G LUT G AND (3) To test whether the PLB is capable of implementing G SAT is then solved by general SAT solver such as [11]. If the target PLB is capable of implementing f (X), satisfiability (SAT) along with correct assignments for configuration bits (e.g., L i ) will be returned. On the other hand, unsatisfiability (UNSAT) will be reported if the function is not implementable. Since SAT solver is called every time as a sub-routine, the computational complexity of SAT-BM is still high even with numerous improvements recently [3], [7], [12]. 2.2 Bloom Filter Bloom filter [9], [18] is a space-efficient probabilistic data structure for element s membership query against a set. It consists of one m-bit array M and k independent hash functions h i (x), 1 i k, each of which maps or hashes an element x to one of the m bit-array positions with a uniform random distribution. In practice, instead of using k different hash functions, one can pass k different initial values to a hash function or use k different bit-fields from the wide output of a hash function, to form the hash function set. Initially, all bits in M are set to 0. To insert an element, we feed it to each of the k hash functions to get k array positions, and set the bits at all these positions to 1. To query an element (test whether it is in the set), we feed it to each of the k hash functions to get k array positions. If any of the bits at these positions is 0, the element is definitely not in the set. If all are 1, we can claim with a high probability that the element is in the set. Note that false positives (i.e., an element is falsely determined to be in the filter while it is actually not) are possible, if the bits at these positions are set to 1 during insertion of other elements. The false positive rate or probability (FPR) of a bloom filter is bounded by [18] FPR = (1 e kn/m ) k (5) where n is the number of elements already inserted. Taking derivation with respect to k, the optimal FPR can be achieved k = m ln 2 (6) n ( ) k 1 FPR = (7) 2 It is easy to verify that for 1% error (false positive) with the above optimal value of k, only 9.6 bits are required per

3 ZHANG et al.: ACCELERATING BOOLEAN MATCHING USING BLOOM FILTER 1777 element regardless of the size of the element. There are three major advantages of Bloom filter over other data structures (e.g., binary search trees, tries and hash tables) for representing sets. Firstly, space efficiency is obtained that regardless of element s actual size, only constant number of bits are needed per element. Secondly, it takes a constant time (i.e., O(k)) to insert or query an element. Thirdly, one can make the tradeoff between false positive rate and space cost depending on the application. 3. Filter-Based Boolean Matching Bloom filter is used to build the lookup tables storing partial sets of pre-calculated implementable and nonimplementable functions of target PLBs. Equipped with these tables, non-implementable functions are quickly filtered out before calling SAT-BM explicitly. In this section, we present details of the proposed F-BM method. 3.1 Building the Lookup Table Modern SAT solver [11] stops once a satisfiable solution is found, or the whole solution space will be explored when the problem is unsatisfiable. As a result, the runtime for checking an implementable function (SAT) and a non-implementable function (UNSAT) differs significantly. Figure 2 compares the average runtime for checking implementability over 100,000 Boolean functions (with 7 9 inputs) extracted from MCNC benchmarks against a 9-input PLB (PLB1 in Fig. 3) using state-of-art SAT-BM [12]. It shows that the SAT-BM for an implementable function is 5X times faster than that for a non-implementable function. Since Boolean matching is called as a sub-routine for multiple CAD (Computer-Aided Design) tasks, it is beneficial to prune those non-implementable functions, and only perform the time-consuming SAT-BM for remaining ones. Different from [7], where a coarse-grained SAT solving is used for the pruning, the proposed F-BM is more efficient which filters out non-implementable functions by simple table lookup. There are different ways to build lookup tables storing implementable and non-implementable functions. The most straightforward way is to enumerate all implementable functions for a PLB. However, it is obviously not practical for large PLBs. Considering a PLB with P inputs and C configuration bits, the number of functions that it can implement could be ( P! 2 C ). For PLB1inFig.3where C = 32, this number is up to 10 15, which is too large to be enumerated. Instead of brute-force enumeration, we propose to select a set of training circuits and extract those functions that frequently appear to build the lookup tables. For each of these functions, we use SAT-BM [12] to pre-compute its implementability and insert it into the tables. As will be shown in 3.3, Boolean functions in real circuits exhibit similarities across different benchmarks, and therefore we can apply information extracted from one set of circuits to the others. In our experiments, the training set consists of 10 largest circuits from MCNC benchmark set (i.e., apex2, des, ex1010, pdc, spla, clma, elliptic, frisc, s38417 and s38584). We extract Boolean functions with 5 to 9 inputs using ABC command cut -K input size -M 1000 [13]. There are about 2,700,000 distinct functions extracted in this procedure. For each of these functions, we compute and store up to 10,000 of its permutations. Overall, an upper bound of 3 billion functions is inserted into the lookup tables. The training took two weeks in a Linux server with Quad-Core Intel Xeon 2.33 GHz CPU and 32 GB DRAM. However, it is performed only once and these tables can be reused thereafter. To implement such large lookup tables for both memory and runtime efficiency, we use the Bloom filter described in 2.2. Figure 4 compares the memory cost of the Bloom filter-based table and the hash table-based table to store training results. For Bloom filter, we set 1% false positive rate and thus 9.6 bits are required per function, while for hash table the memory cost grows exponentially with function input size (e.g., 512 bits are required to store a 9-input Boolean function). Note that extra memory to maintain the hash table data structure is ignored. Clearly, the hash tablebased implementation quickly reaches the memory limit of a desktop PC (typically with less than 4 GB memory) when input size increases. On the contrary, a trade-off between false-positive rate and table size can further increase the capacity of Bloom filter-based lookup table. Fig. 2 Average runtime for SAT-BM with various input sizes. Fig. 3 PLB structures used in experiments. Fig. 4 Memory requirements for the lookup table generated by the training set (Bloom filter vs. hash table).

4 1778 IEICE TRANS. FUNDAMENTALS, VOL.E93 A, NO.10 OCTOBER Selection of Hash Functions In order to achieve a scalable implementation with low false positive rate, the Bloom filter used in F-BM needs to be carefully customized, where the key design factor is the selection of adequate hash functions. Ideally, hash functions for a Bloom filter need to be perfect random, i.e., input keys should be hashed into each table position with exactly the same probability. Particularly in our application where a function is represented by truth table (0/1 bit string), different functions often show very similar characteristics. Therefore, a good hash function should also be able to magnify the small difference of input keys, e.g., one bit difference should lead to unrelated hash values. We compare the following four commonly-used hash functions in our implementation, i.e., simple (used in a popular open source Bloom filter project [19]), hash2 andhashlittle2 (two general hash functions from Bob Jenkins [20]) and sha256 (a cryptographic hash function [21]). A bloom filter is implemented using each of these hash functions, respectively. For simple, hash2 and hashlittle2, wepassdifferent initial values to generate the independent hash functions. For sha256, we extract different bit-fields from its 256-bit output as independent hash values. We quantitatively evaluate the randomness of a hash function used in Bloom filter storing functions with truth table representation. For a Bloom filter of size m with k perfect random hash functions, after adding n elements, the expected number of bits that will be set to 1 for exactly i times is [9] ( ) i ( 1 E i = m Ckn i 1 1 kn 1 (8) m m) Table 1 compares the randomness of the four hash functions by inserting top-1000 most frequently occurred 9-input functions extracted from training set, with settings: k=4, m=10,000 and n=1,000. The Randomness column denotes the number of bits that are set to 1 for exactly i times. The data for perfect is calculated by (8), which is the ideal case. The closer the number is to perfect,themore randomness a hash function has. In addition, the Time column shows the average runtime to insert one function. Finally, hash2 is chosen to implement the Bloom filter of F-BM for best randomness and efficiency tradeoff. 3.3 Coverage of the Filter We now evaluate the coverage of Bloom filter-based lookup table generated by the training set described in Sect A 9-input PLB (PLB1 infig. 3) isusedfortesting. TwoBloom filter-based lookup tables are maintained from the training step, one for implementable functions (BF-SAT) and the other for non-implementable functions (BF-UNSAT). The testing set consists of Boolean functions with 5 to 9 inputs extracted from the other 10 MCNC benchmarks (alu4, apex4, bigkey, diffeq, dsip, ex5p, misex3, s298, seq, tseng) using ABC command cut -K input size -M For each input size, 100,000 functions are randomly selected from the testing set. The following four cases are analyzed: S-hit: Implementable functions found in BF-SAT; S-miss: Implementable ones not found in BF-SAT; U-hit: Non-implementable ones found in BF-UNSAT; U-miss: Non-implementable ones not found BF- UNSAT; Column F-BM of Table 2 shows the result, which indicates that over 90% of implementable functions are found in BF-SAT (row S-coverage ). For those functions that are not found in either filter, only 20% of them are implementable (i.e., percentage of S-miss over the sum of both S- and U-misses). In other words, any function that is not found in either filter has a high probability of being nonimplementable, and one can drop it (i.e., consider it as nonimplementable) without significantly degrading the quality. To further explore the trade-off between runtime and quality, we propose a learn strategy, F-BM-L, which expands existing Bloom filter-based lookup tables by adding newly trained functions at runtime. For a testing function found in neither SAT-BF or UNSAT-BF, it is definitely not trained before. Therefore, we propose to train such functions using SAT-BM and add results to corresponding tables. In this manner, we re able to capture special characteristics of testing circuits. Scalable Bloom filter [10], which is capable of dynamic growth when the number of inserted items exceeds pre-defined filter size, can be used to implement such a strategy. The F-BM-L column of Table 2 shows that the coverage for non-implementable 9-input functions is improved by over 17% using this strategy. Table 3 summarizes the corresponding actions for SAT- Table 1 Randomness and runtime efficiency of hash functions. Randomness Time Hash function i = 0 i = 1 i = 2 i = 3 i = 4 i = 5 Error (µs) perfect simple % 18 hash % 4 hashlittle % 4.2 sha % 8.4 Table 2 Coverage of the Bloom filter-based lookup table. F-BM F-BM-L Type 7-input 8-input 9-input 7-input 8-input 9-input S-hit 80,460 69,886 48,048 81,482 71,100 49,257 S-miss 2,219 3,465 4,851 1,197 2,251 3,642 S-coverage 97.3% 95.3% 90.8% 98.6% 97.0% 93.1% U-hit 11,621 15,680 20,022 14,111 18,852 28,212 U-miss 5,700 10,969 27,079 3,210 7,797 18,889 U-coverage 67.1% 58.8% 42.5% 81.5% 70.7% 60.0%

5 ZHANG et al.: ACCELERATING BOOLEAN MATCHING USING BLOOM FILTER 1779 Table 3 Actions based on dual Bloom filters. SAT-BF UNSAT-BF Action F-BM Action F-BM-L Indication Yes Yes Check Check Definitely false positive Yes No Check Check Highly possible SAT No Yes Drop Drop Highly possible UNSAT No No Drop Check Definitely not trained, highly possible UNSAT BM check based on the query results in both Bloom filterbased lookup tables. Action Check makes an explicit call of SAT-BM to decide a function s implementability, while action Drop quickly determines a non-implementable function with high probability. For F-BM, only one check against SAT-BF is needed, and UNSAT-BF needs not necessarily be kept. However, for F-BM-L, we need to keep both lookup tables to decide untrained functions. Compared to F-BM, F-BM-L has better pruning quality (i.e., less implementable functions are erroneously pruned). 4. Re-Synthesis with F-BM The post-mapping re-synthesis which minimizes area (i.e., LUT number) with the same logic depth constraint described in [1] is adopted as an application of the Boolean matching to show the effectiveness of the proposed F-BM. Algorithm 1 shows the pseudo-code of one iteration of the re-synthesis procedure. The algorithm works in a greedy mode, which takes a circuit or network mapped to 3-LUTs (mapped by ABC [13]) and scans the combinational portion of the circuit in a topological order (i.e., each circuit node is scanned strictly after all its fanin nodes having been scanned). During the scanning, new logic blocks (i.e., cuts) are generated by enumerating and combining the logic blocks at inputs of an LUT, which is called cut-enumeration [17]. In line 8, each logic block is checked for its implementability against PLB1 and PLB2 showninfig.3by calling the Boolean Matching procedure. When an implementable case is found by the Boolean matcher, the logic block is replaced by the corresponding PLB structure if such a replacement reduces the number of LUTs without increasing logic depths. The algorithm terminates after several iterations (e.g., set by user) of full scan of all LUTs, or until no LUT can be further reduced. Two versions of the re-synthesizer are implemented, one uses the state-of-the-art SAT-BM [12] and the other uses F-BM (including both strategies described in Table 3). To explore the quality of the training set obtained from MCNC benchmarks, two other benchmark sets (IWLS 2005 [14] and Industrial designs) are also tested. Table 4 compares re-synthesis results of different approaches. Column Reduced LUT # compares number LUTs reduced during resynthesis, and column Total LUT # denotes number of LUTs in the circuit after re-synthesis. Compared with re-synthesizer geared with SAT-BM, the one with F-BM (where only implementable functions are stored in Bloom filter) is 80X faster with only 0.5% more area on average. In other words, F-BM-based re-synthesizer Algorithm 1 Resynthesis-one-iteration(network) 1: for all node of network in topological order do 2: cutset = enumeratekfeasiblecut(node) 3: for all cut in cutset do 4: for all PLB H in PLB library do 5: if cut H then 6: continue {No area reduction} 7: end if 8: impl = booleanmatching(cut, H) 9: if impl NULL then 10: updatenetwork(cut, H) 11: end if 12: end for 13: end for 14: end for achieves magnitude of speedup with negligible area overhead. Concerning number of LUTs reduced, F-BM-based re-synthesizer reduces 11% less LUTs than SAT-BM-based one, due to the possible pruning of implementable functions. To achieve the same area reduction, the learn strategy F- BM-L (where both implementable and non-implementable functions are kept) can be adopted. Compared to SAT-BMbased re-synthesizer, F-BM-L-based one is still 5X faster. Note that 3 GB memory is used for both F-BM and F-BM-L strategies to make the comparison fair. To further verify the effectiveness of F-BM, Fig. 5 shows the relationship of number of reduced LUTs vs. runtime for largest benchmark circuit leon3 (a micro-processor core with half million 3-LUTs). It is clear that F-BM-based re-synthesizer converges much faster than SAT-BM-based one. In other words, within the same time, F-BM-based resynthesizer reduces more LUTs. As is shown, 2X more LUT reduction is obtained by F-BM-based re-synthesizer with a 1500s timeout. ThemainreasonwhyF-BM-LisalotslowerthanF- BM is that much more calls to the SAT solver is required for F-BM-L. From Table 2, we observe the fact that the majority of untrained functions (i.e., misses) are UNSAT. In order to prune those functions, the F-BM-L needs explicit UNSAT checking while for F-BM simple table lookups are enough. Figure 5 gives us intuitions of the relationship of area reduction and runtime as well. Along the horizontal axis, the slope for the area reduction curve becomes less steep, indicating that more time is spent on UNSAT checking. In other words, it s difficult to get the last a few area reductions. In fact, F-BM-L is designed for area-critical applications. For most applications, F-BM achieves the best area and runtime trade-off.

6 1780 IEICE TRANS. FUNDAMENTALS, VOL.E93 A, NO.10 OCTOBER 2010 Table 4 Re-synthesis (SAT-BM vs. F-BM). Runtime (s) Reduced LUT # Total LUT # SAT-BMF-BMSpeedup F-BM-LSpeedup SAT-BM F-BM RatioF-BM-L Ratio SAT-BM F-BM Ratio F-BM-L Ratio alu x x diffeq x x MCNC ex5p x x s x x seq x x Ex x x Ex x x Industrial Ex x x Ex x x Ex x x leon x x IWLS leon x x leon3mp x x netcard x x Geomean 80x 5.16x In this paper, we have presented F-BM, which accelerates Boolean matching using Bloom filter-based lookup tables to quickly prune non-implementable functions with affordable memory. Using post-mapping re-synthesis which minimizes area without increasing logic depth as an application, experiments on MCNC, IWLS 2005 and Industrial design benchmark sets show that re-synthesizer geared with F-BM using 3 GB memory space is 80X faster than the one with a stateof-art SAT-BM [12], with only 0.5% more area. To achieve the same area, F-BM with learn strategy is still more than 5X faster with the same memory cost. In the future, we will target functions with more inputs and explore different structures of the Bloom filter. For example, we observe that top-10% most common 9-input Boolean functions in MCNC circuits cover 50% of 9-input cuts, and thus we can design a multi-level Bloom filter with different false positive rates at different levels, e.g., a lower false positive rate for 10% of most frequent functions and a higher false positive rate for the rest, to trade accuracy to memory. In addition, we plan to apply our F-BM to other CAD tasks, such as technology mapping, physical synthesis, etc. Finally, we ll seek methods to store matched configurations as well as satisfiability information to further reduce the number of SAT calls. References Fig. 5 LUT # reduction vs. Runtime for circuit leon3. 5. Conclusions and Future Work [1] A. Ling, D. Singh, and S. Brown, FPGA technology mapping: A study of optimality, Proc. ACM Des. Autom. Conf., pp , June [2] A. Ling, D. Singh, and S. Brown, FPGA PLB architecture evaluation and area optimization techniques using Boolean satisfiability, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol.26, no.7, pp , July [3] J. Cong and K. Minkovich, Improved SAT-based Boolean matching using implicants for LUT-based FPGAs, Proc. ACM Int. Symp. on FPGAs, pp , Feb [4] J. Cong and Y.Y. Hwang, Boolean matching for LUT-based logic blocks with applications to architecture evaluation and technology mapping, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol.20, no.9, pp , Sept [5] L. Benini and D. Micheli, A survey of Boolean matching techniques for library binding, ACM Trans. Des. Autom. Electron. Syst., vol.2, no.3, pp , July [6] A. Abdollahi and M. Pedram, A new canonical form for fast Boolean matching in logic synthesis and verification, Proc. ACM Des. Autom. Conf., pp , June [7] S. Safarpour, A. Veneris, G. Baeckler, and R. Yuan, Efficient SATbased Boolean matching for FPGA technology mapping, Proc. ACM Des. Autom. Conf., pp , July [8] A. Mishchenko, R.K. Brayton, and S. Chatterjee, Boolean factoring and decomposition of logic networks, Proc. ACM Int. Conf.

7 ZHANG et al.: ACCELERATING BOOLEAN MATCHING USING BLOOM FILTER 1781 Compt.-Aided Des., pp.38 44, Nov [9] B. Bloom, Space/time trade-offs in hash coding with allowable errors, Commun. ACM, vol.13, no.7, pp , [10] P.S. Almeida, C. Baquero, N. Preguica, and D. Hutchison, Scalable bloom filters, Inf. Process. Lett., vol.101, no.6, pp , March [11] N. Een and N. Sorensson, Minisat v2.0 (beta), Solver description, SAT Race 2006, [12] Y. Hu, V. Shih, R. Majumdar, and L. He, Exploiting symmetry in SAT-based Boolean matching for heterogeneous FPGA technology mapping, Proc. ACM Int. Conf. Compt.-Aided Des., pp , Nov [13] ABC: A system for sequential synthesis and verification, alanmi/abc/ [14] IWLS 2005 Benchmarks, html [15] A. Kennings, K. Vorwerk, A. Kundu, V. Pevzner, and A. Fox, FPGA technology mapping with encoded libraries and staged priority cuts, Proc. ACM Int. Symp. on FPGAs, pp , Feb [16] T. Larrabee, Test pattern generation using Boolean satisfiablity, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol.11, no.1, pp.6 22, Aug [17] J. Cong, C. Wu, and Y. Ding, Cut ranking and pruning: Enabling a general and efficient fpga mapping solution, Proc. ACM Int. Symp. on FPGAs, pp.29 35, Feb [18] Bloom filter, filter [19] [20] [21] Lingli Wang (IET/IEE IEEE member) received Ph.D. from School of Engineering Napier University, UK in He has worked in Altera European Technology Center for 4 years. Currently he is an Associate Professor with State Key Lab of ASICs and Systems, School of Microelectronics, Fudan University, Shanghai, China. His research interests include FPGA design and optimization, logic synthesis, reconfigurable computing, and quantum computing. Lei He (IEEE M 99 SM 08) is a professor at electrical engineering department, University of California, Los Angeles (UCLA) and was a faculty member at University of Wisconsin, Madison between 1999 and He also held visiting or consulting positions with Cadence, Empyrean Soft, Hewlett-Package, Intel, and Synopsys, and was technical advisory board member for Apache Design Solutions and Rio Design Automation. Dr. He obtained Ph.D. degree in computer science from UCLA in His research interests include modeling and simulation, VLSI circuits and systems, and cyber physical systems. He has published one book and over 200 technical papers with 12 best paper nominations mainly from Design Automation Conference and International Conference on Computer-Aided Design and five best paper or best contribution awards including the ACM Transactions on Electronic System Design Automation 2010 Best Paper Award. Chun Zhang received his B.E. degree in Microelectronics department from Fudan University, Shanghai, China, in Since 2005, he has been working on his Ph.D. degree at State Key Lab of ASICs and Systems, School of Microelectronics, Fudan University, Shanghai, China. He is now a visiting student to the Electronic Design Automation Laboratory, Electrical Engineering Department, University of California, Los Angeles. His main research interests include computer-aided design for integrated circuits, design and architectures of Field-programmable gate arrays (FPGAs), FPGA error modeling and robust logic synthesis algorithms. Jiarong Tong graduated from Physics department of Fudan University, Shanghai, China in He is now full professor and doctoral supervisor with State Key Lab of ASICs and Systems, School of Microelectronics, Fudan University, Shanghai, China. His main research area includes architectures and CAD techniques for FPGAs, digital circuit design, etc., and has published two books and over 60 technical papers. Yu Hu received his B.E. and M.E. degrees in computer science from Tsinghua University, Beijing, China, in 2002 and 2005, respectively and his Ph.D. degree in Electrical Engineering Department from University California, Los Angeles in Since 2010, he has been an Assistant Professor with the Department of Electrical and Computer Engineering at University of Alberta. His current research interests include CAD tools and architectures for Fieldprogrammable gate arrays (FPGAs). Dr. Hu was the recipient of the Outstanding Graduate Student Award in 2005 from Tsinghua University and of the Best Contribution Award of IEEE Programming Challenge at the International Workshop on Logic and Synthesis in 2008.

Efficient SAT-based Boolean Matching for FPGA Technology Mapping

Efficient SAT-based Boolean Matching for FPGA Technology Mapping Efficient SAT-based Boolean Matching for FPGA Technology Mapping Sean Safarpour, Andreas Veneris Department of Electrical and Computer Engineering University of Toronto Toronto, ON, Canada {sean, veneris}@eecg.toronto.edu

More information

FIELD-PROGRAMMABLE gate arrays (FPGAs) are programmable

FIELD-PROGRAMMABLE gate arrays (FPGAs) are programmable IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 27, NO. 10, OCTOBER 2008 1751 Exploiting Symmetries to Speed Up SAT-Based Boolean Matching for Logic Synthesis of FPGAs

More information

Boolean Matching for Complex PLBs in LUT-based FPGAs with Application to Architecture Evaluation. Jason Cong and Yean-Yow Hwang

Boolean Matching for Complex PLBs in LUT-based FPGAs with Application to Architecture Evaluation. Jason Cong and Yean-Yow Hwang Boolean Matching for Complex PLBs in LUT-based PAs with Application to Architecture Evaluation Jason Cong and Yean-Yow wang Department of Computer Science University of California, Los Angeles {cong, yeanyow}@cs.ucla.edu

More information

Beyond the Combinatorial Limit in Depth Minimization for LUT-Based FPGA Designs

Beyond the Combinatorial Limit in Depth Minimization for LUT-Based FPGA Designs Beyond the Combinatorial Limit in Depth Minimization for LUT-Based FPGA Designs Jason Cong and Yuzheng Ding Department of Computer Science University of California, Los Angeles, CA 90024 Abstract In this

More information

Designing Heterogeneous FPGAs with Multiple SBs *

Designing Heterogeneous FPGAs with Multiple SBs * Designing Heterogeneous FPGAs with Multiple SBs * K. Siozios, S. Mamagkakis, D. Soudris, and A. Thanailakis VLSI Design and Testing Center, Department of Electrical and Computer Engineering, Democritus

More information

FPGA PLB Architecture Evaluation and Area Optimization Techniques using Boolean Satisfiability

FPGA PLB Architecture Evaluation and Area Optimization Techniques using Boolean Satisfiability IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. X, NO. XX, APRIL 2005 1 FPGA PLB Architecture Evaluation and Area Optimization Techniques using Boolean Satisfiability

More information

Vdd Programmability to Reduce FPGA Interconnect Power

Vdd Programmability to Reduce FPGA Interconnect Power Vdd Programmability to Reduce FPGA Interconnect Power Fei Li, Yan Lin and Lei He Electrical Engineering Department University of California, Los Angeles, CA 90095 ABSTRACT Power is an increasingly important

More information

FPGA Programmable Logic Block Evaluation using. Quantified Boolean Satisfiability

FPGA Programmable Logic Block Evaluation using. Quantified Boolean Satisfiability FPGA Programmable Logic Block Evaluation using Quantified Boolean Satisfiability Andrew C. Ling, Deshanand P. Singh, and Stephen D. Brown, December 12, 2005 Abstract This paper describes a novel Field

More information

Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction

Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction 44.1 Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He Electrical Engineering Department University of California, Los Angeles, CA

More information

Factor Cuts. Satrajit Chatterjee Alan Mishchenko Robert Brayton ABSTRACT

Factor Cuts. Satrajit Chatterjee Alan Mishchenko Robert Brayton ABSTRACT Factor Cuts Satrajit Chatterjee Alan Mishchenko Robert Brayton Department of EECS U. C. Berkeley {satrajit, alanmi, brayton}@eecs.berkeley.edu ABSTRACT Enumeration of bounded size cuts is an important

More information

Generating efficient libraries for use in FPGA re-synthesis algorithms

Generating efficient libraries for use in FPGA re-synthesis algorithms Generating efficient libraries for use in FPGA re-synthesis algorithms Andrew Kennings, University of Waterloo Alan Mishchenko, UC Berkeley Kristofer Vorwerk, Val Pevzner, Arun Kundu, Actel Corporation

More information

ABC basics (compilation from different articles)

ABC basics (compilation from different articles) 1. AIG construction 2. AIG optimization 3. Technology mapping ABC basics (compilation from different articles) 1. BACKGROUND An And-Inverter Graph (AIG) is a directed acyclic graph (DAG), in which a node

More information

Versatile SAT-based Remapping for Standard Cells

Versatile SAT-based Remapping for Standard Cells Versatile SAT-based Remapping for Standard Cells Alan Mishchenko Robert Brayton Department of EECS, UC Berkeley {alanmi, brayton@berkeley.edu Thierry Besson Sriram Govindarajan Harm Arts Paul van Besouw

More information

Efficient Computation of Canonical Form for Boolean Matching in Large Libraries

Efficient Computation of Canonical Form for Boolean Matching in Large Libraries Efficient Computation of Canonical Form for Boolean Matching in Large Libraries Debatosh Debnath Dept. of Computer Science & Engineering Oakland University, Rochester Michigan 48309, U.S.A. debnath@oakland.edu

More information

Improvements to Technology Mapping for LUT-Based FPGAs

Improvements to Technology Mapping for LUT-Based FPGAs Improvements to Technology Mapping for LUT-Based FPGAs Alan Mishchenko Satrajit Chatterjee Robert Brayton Department of EECS, University of California, Berkeley {alanmi, satrajit, brayton}@eecs.berkeley.edu

More information

Lazy Man s Logic Synthesis

Lazy Man s Logic Synthesis Lazy Man s Logic Synthesis Wenlong Yang Lingli Wang State Key Lab of ASIC and System Fudan University, Shanghai, China {allanwin@hotmail.com, llwang@fudan.edu.cn Abstract Deriving a circuit for a Boolean

More information

How Much Logic Should Go in an FPGA Logic Block?

How Much Logic Should Go in an FPGA Logic Block? How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca

More information

Local Two-Level And-Inverter Graph Minimization without Blowup

Local Two-Level And-Inverter Graph Minimization without Blowup Local Two-Level And-Inverter Graph Minimization without Blowup Robert Brummayer and Armin Biere Institute for Formal Models and Verification Johannes Kepler University Linz, Austria {robert.brummayer,

More information

Fast Boolean Matching for Small Practical Functions

Fast Boolean Matching for Small Practical Functions Fast Boolean Matching for Small Practical Functions Zheng Huang Lingli Wang Yakov Nasikovskiy Alan Mishchenko State Key Lab of ASIC and System Computer Science Department Department of EECS Fudan University,

More information

Error Detection and Correction by using Bloom Filters R. Prem Kumar, Smt. V. Annapurna

Error Detection and Correction by using Bloom Filters R. Prem Kumar, Smt. V. Annapurna Error Detection and Correction by using Bloom Filters R. Prem Kumar, Smt. V. Annapurna Abstract---Bloom filters (BFs) provide a fast and efficient way to check whether a given element belongs to a set.

More information

A CORDIC Algorithm with Improved Rotation Strategy for Embedded Applications

A CORDIC Algorithm with Improved Rotation Strategy for Embedded Applications A CORDIC Algorithm with Improved Rotation Strategy for Embedded Applications Kui-Ting Chen Research Center of Information, Production and Systems, Waseda University, Fukuoka, Japan Email: nore@aoni.waseda.jp

More information

Towards More Effective Unsatisfiability-Based Maximum Satisfiability Algorithms

Towards More Effective Unsatisfiability-Based Maximum Satisfiability Algorithms Towards More Effective Unsatisfiability-Based Maximum Satisfiability Algorithms Joao Marques-Silva and Vasco Manquinho School of Electronics and Computer Science, University of Southampton, UK IST/INESC-ID,

More information

Formal Verification using Probabilistic Techniques

Formal Verification using Probabilistic Techniques Formal Verification using Probabilistic Techniques René Krenz Elena Dubrova Department of Microelectronic and Information Technology Royal Institute of Technology Stockholm, Sweden rene,elena @ele.kth.se

More information

Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool

Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool Jin Hee Kim and Jason Anderson FPL 2015 London, UK September 3, 2015 2 Motivation for Synthesizable FPGA Trend towards ASIC design flow Design

More information

Approximate Disjoint Bi-decomposition and Its Application to Approximate Logic Synthesis

Approximate Disjoint Bi-decomposition and Its Application to Approximate Logic Synthesis Approximate Disjoint Bi-decomposition and Its Application to Approximate Logic Synthesis Yue Yao, Shuyang Huang, Chen Wang, Yi Wu and Weikang Qian University of Michigan-Shanghai Jiao Tong University Joint

More information

Implementing Logic in FPGA Memory Arrays: Heterogeneous Memory Architectures

Implementing Logic in FPGA Memory Arrays: Heterogeneous Memory Architectures Implementing Logic in FPGA Memory Arrays: Heterogeneous Memory Architectures Steven J.E. Wilton Department of Electrical and Computer Engineering University of British Columbia Vancouver, BC, Canada, V6T

More information

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp Scientia Iranica, Vol. 11, No. 3, pp 159{164 c Sharif University of Technology, July 2004 On Routing Architecture for Hybrid FPGA M. Nadjarbashi, S.M. Fakhraie 1 and A. Kaviani 2 In this paper, the routing

More information

SAT-Based Logic Optimization and Resynthesis

SAT-Based Logic Optimization and Resynthesis SAT-Based Logic Optimization and Resynthesis Alan Mishchenko Robert Brayton Jie-Hong Roland Jiang Stephen Jang Department of EECS Department of EE Xilinx Inc. University of California, Berkeley National

More information

Mapping-aware Logic Synthesis with Parallelized Stochastic Optimization

Mapping-aware Logic Synthesis with Parallelized Stochastic Optimization Mapping-aware Logic Synthesis with Parallelized Stochastic Optimization Zhiru Zhang School of ECE, Cornell University September 29, 2017 @ EPFL A Case Study on Digit Recognition bit6 popcount(bit49 digit)

More information

Combinational and Sequential Mapping with Priority Cuts

Combinational and Sequential Mapping with Priority Cuts Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton Department of EECS, University of California, Berkeley {alanmi, smcho, satrajit, brayton@eecs.berkeley.edu

More information

/$ IEEE

/$ IEEE IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 56, NO. 1, JANUARY 2009 81 Bit-Level Extrinsic Information Exchange Method for Double-Binary Turbo Codes Ji-Hoon Kim, Student Member,

More information

THE technology mapping and synthesis problem for field

THE technology mapping and synthesis problem for field 738 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 17, NO. 9, SEPTEMBER 1998 An Efficient Algorithm for Performance-Optimal FPGA Technology Mapping with Retiming Jason

More information

A Routing Approach to Reduce Glitches in Low Power FPGAs

A Routing Approach to Reduce Glitches in Low Power FPGAs A Routing Approach to Reduce Glitches in Low Power FPGAs Quang Dinh, Deming Chen, Martin Wong Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign This research

More information

A Novel Net Weighting Algorithm for Timing-Driven Placement

A Novel Net Weighting Algorithm for Timing-Driven Placement A Novel Net Weighting Algorithm for Timing-Driven Placement Tim (Tianming) Kong Aplus Design Technologies, Inc. 10850 Wilshire Blvd., Suite #370 Los Angeles, CA 90024 Abstract Net weighting for timing-driven

More information

FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimization in Lookup-Table Based FPGA Designs

FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimization in Lookup-Table Based FPGA Designs . FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimization in Lookup-Table Based FPGA Designs Jason Cong and Yuzheng Ding Department of Computer Science University of California, Los Angeles,

More information

A Parallelized Iterative Improvement Approach to Area Optimization for LUT-Based Technology Mapping

A Parallelized Iterative Improvement Approach to Area Optimization for LUT-Based Technology Mapping A Parallelized Iterative Improvement Approach to Area Optimization for LUT-Based Technology Mapping Gai Liu and Zhiru Zhang School of Electrical and Computer Engineering, Cornell University, Ithaca, NY

More information

FPGA Power Reduction Using Configurable Dual-Vdd

FPGA Power Reduction Using Configurable Dual-Vdd FPGA Power Reduction Using Configurable Dual-Vdd 45.1 Fei Li, Yan Lin and Lei He Electrical Engineering Department University of California, Los Angeles, CA {feil, ylin, lhe}@ee.ucla.edu ABSTRACT Power

More information

Basic Block. Inputs. K input. N outputs. I inputs MUX. Clock. Input Multiplexors

Basic Block. Inputs. K input. N outputs. I inputs MUX. Clock. Input Multiplexors RPack: Rability-Driven packing for cluster-based FPGAs E. Bozorgzadeh S. Ogrenci-Memik M. Sarrafzadeh Computer Science Department Department ofece Computer Science Department UCLA Northwestern University

More information

SAT-Based Area Recovery in Technology Mapping

SAT-Based Area Recovery in Technology Mapping SAT-Based Area Recovery in Technology Mapping Bruno Schmitt Ecole Polytechnique Federale de Lausanne (EPFL) bruno@oschmitt.com Alan Mishchenko Robert Brayton Department of EECS, UC Berkeley {alanmi, brayton}@berkeley.edu

More information

Probability-Based Approach to Rectilinear Steiner Tree Problems

Probability-Based Approach to Rectilinear Steiner Tree Problems 836 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 10, NO. 6, DECEMBER 2002 Probability-Based Approach to Rectilinear Steiner Tree Problems Chunhong Chen, Member, IEEE, Jiang Zhao,

More information

A New Decomposition of Boolean Functions

A New Decomposition of Boolean Functions A New Decomposition of Boolean Functions Elena Dubrova Electronic System Design Lab Department of Electronics Royal Institute of Technology Kista, Sweden elena@ele.kth.se Abstract This paper introduces

More information

SUBMITTED FOR PUBLICATION TO: IEEE TRANSACTIONS ON VLSI, DECEMBER 5, A Low-Power Field-Programmable Gate Array Routing Fabric.

SUBMITTED FOR PUBLICATION TO: IEEE TRANSACTIONS ON VLSI, DECEMBER 5, A Low-Power Field-Programmable Gate Array Routing Fabric. SUBMITTED FOR PUBLICATION TO: IEEE TRANSACTIONS ON VLSI, DECEMBER 5, 2007 1 A Low-Power Field-Programmable Gate Array Routing Fabric Mingjie Lin Abbas El Gamal Abstract This paper describes a new FPGA

More information

FPGA PLB EVALUATION USING QUANTIFIED BOOLEAN SATISFIABILITY

FPGA PLB EVALUATION USING QUANTIFIED BOOLEAN SATISFIABILITY FPGA PLB EVALUATION USING QUANTIFIED BOOLEAN SATISFIABILITY Andrew C. Ling Electrical and Computer Engineering University o Toronto Toronto, CANADA email: aling@eecg.toronto.edu Deshanand P. Singh, Stephen

More information

Large-scale Boolean Matching

Large-scale Boolean Matching Large-scale Boolean Matching Hadi Katebi, Igor L. Markov University of Michigan, 2260 Hayward St., Ann Arbor, MI 48109 {hadik, imarkov}@eecs.umich.edu Abstract We propose a methodology for Boolean matching

More information

Reducing Structural Bias in Technology Mapping

Reducing Structural Bias in Technology Mapping Reducing Structural Bias in Technology Mapping S. Chatterjee A. Mishchenko R. Brayton Department of EECS U. C. Berkeley {satrajit, alanmi, brayton}@eecs.berkeley.edu X. Wang T. Kam Strategic CAD Labs Intel

More information

Functional extension of structural logic optimization techniques

Functional extension of structural logic optimization techniques Functional extension of structural logic optimization techniques J. A. Espejo, L. Entrena, E. San Millán, E. Olías Universidad Carlos III de Madrid # e-mail: { ppespejo, entrena, quique, olias}@ing.uc3m.es

More information

Logic synthesis and verification on fixed topology

Logic synthesis and verification on fixed topology Logic synthesis and verification on fixed topology Masahiro Fujita University of Tokyo lan Mishchenko University of California, erkeley bstract We discuss ab logic synthesis and formal verification of

More information

An Efficient Chip-level Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction

An Efficient Chip-level Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction An Efficient Chip-level Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction Yan Lin 1, Yu Hu 1, Lei He 1 and Vijay Raghunat 2 Electrical Engineering Dept., UCLA, Los Angeles, CA 1 Purdue

More information

A 4-way parallel CAVLC design for H.264/AVC 4 Kx2 K 60 fps encoder

A 4-way parallel CAVLC design for H.264/AVC 4 Kx2 K 60 fps encoder A 4-way parallel CAVLC design for H.264/AVC 4 Kx2 K 60 fps encoder Huibo Zhong, Sha Shen, Yibo Fan a), and Xiaoyang Zeng State Key Lab of ASIC and System, Fudan University 825 Zhangheng Road, Shanghai,

More information

On Nominal Delay Minimization in LUT-Based FPGA Technology Mapping

On Nominal Delay Minimization in LUT-Based FPGA Technology Mapping On Nominal Delay Minimization in LUT-Based FPGA Technology Mapping Jason Cong and Yuzheng Ding Department of Computer Science University of California, Los Angeles, CA 90024 Abstract In this report, we

More information

PARALLEL PERFORMANCE DIRECTED TECHNOLOGY MAPPING FOR FPGA. Laurent Lemarchand. Informatique. ea 2215, D pt. ubo University{ bp 809

PARALLEL PERFORMANCE DIRECTED TECHNOLOGY MAPPING FOR FPGA. Laurent Lemarchand. Informatique. ea 2215, D pt. ubo University{ bp 809 PARALLEL PERFORMANCE DIRECTED TECHNOLOGY MAPPING FOR FPGA Laurent Lemarchand Informatique ubo University{ bp 809 f-29285, Brest { France lemarch@univ-brest.fr ea 2215, D pt ABSTRACT An ecient distributed

More information

A Methodology and Tool Framework for Supporting Rapid Exploration of Memory Hierarchies in FPGAs

A Methodology and Tool Framework for Supporting Rapid Exploration of Memory Hierarchies in FPGAs A Methodology and Tool Framework for Supporting Rapid Exploration of Memory Hierarchies in FPGAs Harrys Sidiropoulos, Kostas Siozios and Dimitrios Soudris School of Electrical & Computer Engineering National

More information

IMPROVING LOGIC DENSITY THROUGH SYNTHESIS-INSPIRED ARCHITECTURE Jason H. Anderson

IMPROVING LOGIC DENSITY THROUGH SYNTHESIS-INSPIRED ARCHITECTURE Jason H. Anderson IMPROVING LOGIC DENITY THROUGH YNTHEI-INPIRED ARCHITECTURE Jason H. Anderson Dept. of ECE, Univ. of Toronto Toronto, ON Canada email: janders@eecg.toronto.edu ABTRACT We leverage properties of the logic

More information

SAT-Based Area Recovery in Structural Technology Mapping

SAT-Based Area Recovery in Structural Technology Mapping SAT-Based Area Recovery in Structural Technology Mapping Bruno Schmitt Alan Mishchenko Robert Brayton Ecole Polytechnique Federale de Lausanne (EPFL) bruno.schmitt@epfl.ch Department of EECS, UC Berkley

More information

MajorSat: A SAT Solver to Majority Logic

MajorSat: A SAT Solver to Majority Logic MajorSat: A SAT Solver to Majority Logic Speaker : Ching-Yi Huang Authors: Yu-Min Chou, Yung-Chih Chen *, Chun-Yao Wang, Ching-Yi Huang National Tsing Hua University, Taiwan * Yuan Ze University, Taiwan

More information

Heterogeneous Technology Mapping for FPGAs with Dual-Port Embedded Memory Arrays

Heterogeneous Technology Mapping for FPGAs with Dual-Port Embedded Memory Arrays Heterogeneous Technology Mapping for FPGAs with Dual-Port Embedded Memory Arrays Steven J.E. Wilton Department of Electrical and Computer Engineering University of British Columbia Vancouver, BC, Canada,

More information

CHAPTER 4 BLOOM FILTER

CHAPTER 4 BLOOM FILTER 54 CHAPTER 4 BLOOM FILTER 4.1 INTRODUCTION Bloom filter was formulated by Bloom (1970) and is used widely today for different purposes including web caching, intrusion detection, content based routing,

More information

Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates

Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1,2, Satyaki Das 2, Steve Trimberger 2 and Lei He 1 Electrical Engineering Department, UCLA, Los Angeles, CA

More information

Detailed Router for 3D FPGA using Sequential and Simultaneous Approach

Detailed Router for 3D FPGA using Sequential and Simultaneous Approach Detailed Router for 3D FPGA using Sequential and Simultaneous Approach Ashokkumar A, Dr. Niranjan N Chiplunkar, Vinay S Abstract The Auction Based methodology for routing of 3D FPGA (Field Programmable

More information

Routing Wire Optimization through Generic Synthesis on FPGA Carry Chains

Routing Wire Optimization through Generic Synthesis on FPGA Carry Chains Routing Wire Optimization through Generic Synthesis on FPGA Carry Chains Hadi Parandeh-Afshar hadi.parandehafshar@epfl.ch Philip Brisk philip@cs.ucr.edu Grace Zgheib grace.zgheib@lau.edu.lb Paolo Ienne

More information

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding N.Rajagopala krishnan, k.sivasuparamanyan, G.Ramadoss Abstract Field Programmable Gate Arrays (FPGAs) are widely

More information

Simultaneous Depth and Area Minimization in LUT-based FPGA Mapping

Simultaneous Depth and Area Minimization in LUT-based FPGA Mapping Simultaneous Depth and Area Minimization in LUT-based FPGA Mapping Jason Cong and Yean-Yow Hwang Department of Computer Science University of California, Los Angeles, CA 90024 Abstract In this paper, we

More information

Power-Mode-Aware Buffer Synthesis for Low-Power Clock Skew Minimization

Power-Mode-Aware Buffer Synthesis for Low-Power Clock Skew Minimization This article has been accepted and published on J-STAGE in advance of copyediting. Content is final as presented. IEICE Electronics Express, Vol.* No.*,*-* Power-Mode-Aware Buffer Synthesis for Low-Power

More information

AUTONOMOUS RECONFIGURATION OF IP CORE UNITS USING BLRB ALGORITHM

AUTONOMOUS RECONFIGURATION OF IP CORE UNITS USING BLRB ALGORITHM AUTONOMOUS RECONFIGURATION OF IP CORE UNITS USING BLRB ALGORITHM B.HARIKRISHNA 1, DR.S.RAVI 2 1 Sathyabama Univeristy, Chennai, India 2 Department of Electronics Engineering, Dr. M. G. R. Univeristy, Chennai,

More information

ISSN Vol.05,Issue.09, September-2017, Pages:

ISSN Vol.05,Issue.09, September-2017, Pages: WWW.IJITECH.ORG ISSN 2321-8665 Vol.05,Issue.09, September-2017, Pages:1693-1697 AJJAM PUSHPA 1, C. H. RAMA MOHAN 2 1 PG Scholar, Dept of ECE(DECS), Shirdi Sai Institute of Science and Technology, Anantapuramu,

More information

Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders

Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders Vol. 3, Issue. 4, July-august. 2013 pp-2266-2270 ISSN: 2249-6645 Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders V.Krishna Kumari (1), Y.Sri Chakrapani

More information

Busy Man s Synthesis: Combinational Delay Optimization With SAT

Busy Man s Synthesis: Combinational Delay Optimization With SAT Busy Man s Synthesis: Combinational Delay Optimization With SAT Mathias Soeken 1 Giovanni De Micheli 1 Alan Mishchenko 2 1 Integrated Systems Laboratory, EPFL, Lausanne, Switzerland 2 Department of EECS,

More information

Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates

Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu Electrical Engineering Dept. Univ. of California Los Angeles Los Angeles, CA 90095 hu@ee.ucla.edu Satyaki Das

More information

FPGA Matrix Multiplier

FPGA Matrix Multiplier FPGA Matrix Multiplier In Hwan Baek Henri Samueli School of Engineering and Applied Science University of California Los Angeles Los Angeles, California Email: chris.inhwan.baek@gmail.com David Boeck Henri

More information

A Toolbox for Counter-Example Analysis and Optimization

A Toolbox for Counter-Example Analysis and Optimization A Toolbox for Counter-Example Analysis and Optimization Alan Mishchenko Niklas Een Robert Brayton Department of EECS, University of California, Berkeley {alanmi, een, brayton}@eecs.berkeley.edu Abstract

More information

On the Relation between SAT and BDDs for Equivalence Checking

On the Relation between SAT and BDDs for Equivalence Checking On the Relation between SAT and BDDs for Equivalence Checking Sherief Reda 1 Rolf Drechsler 2 Alex Orailoglu 1 1 Computer Science & Engineering Department University of California, San Diego La Jolla,

More information

ADVANCED COMBINATORIAL TESTING ALGORITHMS AND APPLICATIONS LINBIN YU. Presented to the Faculty of the Graduate School of

ADVANCED COMBINATORIAL TESTING ALGORITHMS AND APPLICATIONS LINBIN YU. Presented to the Faculty of the Graduate School of ADVANCED COMBINATORIAL TESTING ALGORITHMS AND APPLICATIONS by LINBIN YU Presented to the Faculty of the Graduate School of The University of Texas at Arlington in Partial Fulfillment of the Requirements

More information

Performance Improvement and Size Reduction Scheme over Circuits by Using LUT/MUX Architecture

Performance Improvement and Size Reduction Scheme over Circuits by Using LUT/MUX Architecture Performance Improvement and Size Reduction Scheme over Circuits by Using LUT/MUX Architecture R. Pradeepa 1, S.P. Senthil Kumar 2 M.E. VLSI Design, Shanmuganathan Engineering College, Arasampatti, Pudukkottai-622507,

More information

Fast Generation of Lexicographic Satisfiable Assignments: Enabling Canonicity in SAT-based Applications

Fast Generation of Lexicographic Satisfiable Assignments: Enabling Canonicity in SAT-based Applications Fast Generation of Lexicographic Satisfiable Assignments: Enabling Canonicity in -based Applications Ana Petkovska 1 ana.petkovska@epfl.ch Giovanni De Micheli 1 giovanni.demicheli@epfl.ch Alan Mishchenko

More information

A Fast Reparameterization Procedure

A Fast Reparameterization Procedure A Fast Reparameterization Procedure Niklas Een, Alan Mishchenko {een,alanmi}@eecs.berkeley.edu Berkeley Verification and Synthesis Research Center EECS Department University of California, Berkeley, USA.

More information

Device And Architecture Co-Optimization for FPGA Power Reduction

Device And Architecture Co-Optimization for FPGA Power Reduction 54.2 Device And Architecture Co-Optimization for FPGA Power Reduction Lerong Cheng, Phoebe Wong, Fei Li, Yan Lin, and Lei He Electrical Engineering Department University of California, Los Angeles, CA

More information

Learning Techniques for Pseudo-Boolean Solving and Optimization

Learning Techniques for Pseudo-Boolean Solving and Optimization Learning Techniques for Pseudo-Boolean Solving and Optimization José Faustino Fragoso Fremenin dos Santos September 29, 2008 Abstract The extension of conflict-based learning from Propositional Satisfiability

More information

An Efficient Framework of Using Various Decomposition Methods to Synthesize LUT Networks and Its Evaluation

An Efficient Framework of Using Various Decomposition Methods to Synthesize LUT Networks and Its Evaluation An Efficient Framework of Using Various Decomposition Methods to Synthesize LUT Networks and Its Evaluation Shigeru Yamashita Hiroshi Sawada Akira Nagoya NTT Communication Science Laboratories 2-4, Hikaridai,

More information

Evaluation of FPGA Resources for Built-In Self-Test of Programmable Logic Blocks

Evaluation of FPGA Resources for Built-In Self-Test of Programmable Logic Blocks Evaluation of FPGA Resources for Built-In Self-Test of Programmable Logic Blocks Charles Stroud, Ping Chen, Srinivasa Konala, Dept. of Electrical Engineering University of Kentucky and Miron Abramovici

More information

A Robust Bloom Filter

A Robust Bloom Filter A Robust Bloom Filter Yoon-Hwa Choi Department of Computer Engineering, Hongik University, Seoul, Korea. Orcid: 0000-0003-4585-2875 Abstract A Bloom filter is a space-efficient randomized data structure

More information

Obstacle-Aware Longest-Path Routing with Parallel MILP Solvers

Obstacle-Aware Longest-Path Routing with Parallel MILP Solvers , October 20-22, 2010, San Francisco, USA Obstacle-Aware Longest-Path Routing with Parallel MILP Solvers I-Lun Tseng, Member, IAENG, Huan-Wen Chen, and Che-I Lee Abstract Longest-path routing problems,

More information

DYNAMICALLY SHIFTED SCRUBBING FOR FAST FPGA REPAIR. Leonardo P. Santos, Gabriel L. Nazar and Luigi Carro

DYNAMICALLY SHIFTED SCRUBBING FOR FAST FPGA REPAIR. Leonardo P. Santos, Gabriel L. Nazar and Luigi Carro DYNAMICALLY SHIFTED SCRUBBING FOR FAST FPGA REPAIR Leonardo P. Santos, Gabriel L. Nazar and Luigi Carro Instituto de Informática Universidade Federal do Rio Grande do Sul (UFRGS) Porto Alegre, RS - Brazil

More information

On Resolution Proofs for Combinational Equivalence Checking

On Resolution Proofs for Combinational Equivalence Checking On Resolution Proofs for Combinational Equivalence Checking Satrajit Chatterjee Alan Mishchenko Robert Brayton Department of EECS U. C. Berkeley {satrajit, alanmi, brayton}@eecs.berkeley.edu Andreas Kuehlmann

More information

Statistical Dual-Vdd Assignment for FPGA Interconnect Power Reduction

Statistical Dual-Vdd Assignment for FPGA Interconnect Power Reduction Statistical Dual-Vdd Assignment for FPGA Interconnect Power Reduction Yan Lin and Lei He Electrical Engineering Department University of California, Los Angeles {ylin, lhe}@ee.ucla.edu, http://eda.ee.ucla.edu

More information

Figure 1. PLA-Style Logic Block. P Product terms. I Inputs

Figure 1. PLA-Style Logic Block. P Product terms. I Inputs Technology Mapping for Large Complex PLDs Jason Helge Anderson and Stephen Dean Brown Department of Electrical and Computer Engineering University of Toronto 10 King s College Road Toronto, Ontario, Canada

More information

LOW-DENSITY PARITY-CHECK (LDPC) codes [1] can

LOW-DENSITY PARITY-CHECK (LDPC) codes [1] can 208 IEEE TRANSACTIONS ON MAGNETICS, VOL 42, NO 2, FEBRUARY 2006 Structured LDPC Codes for High-Density Recording: Large Girth and Low Error Floor J Lu and J M F Moura Department of Electrical and Computer

More information

Combinational Equivalence Checking Using Satisfiability and Recursive Learning

Combinational Equivalence Checking Using Satisfiability and Recursive Learning Combinational Equivalence Checking Using Satisfiability and Recursive Learning João Marques-Silva Thomas Glass Instituto Superior Técnico Siemens AG Cadence European Labs/INESC Corporate Technology 1000

More information

Introduction of the Research Based on FPGA at NICS

Introduction of the Research Based on FPGA at NICS Introduction of the Research Based on FPGA at NICS Rong Luo Nano Integrated Circuits and Systems Lab, Department of Electronic Engineering, Tsinghua University Beijing, 100084, China 1 luorong@tsinghua.edu.cn

More information

Introduction Warp Processors Dynamic HW/SW Partitioning. Introduction Standard binary - Separating Function and Architecture

Introduction Warp Processors Dynamic HW/SW Partitioning. Introduction Standard binary - Separating Function and Architecture Roman Lysecky Department of Electrical and Computer Engineering University of Arizona Dynamic HW/SW Partitioning Initially execute application in software only 5 Partitioned application executes faster

More information

Combinational Equivalence Checking

Combinational Equivalence Checking Combinational Equivalence Checking Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab. Dept. of Electrical Engineering Indian Institute of Technology Bombay viren@ee.iitb.ac.in

More information

Hybrid LUT/Multiplexer FPGA Logic Architectures

Hybrid LUT/Multiplexer FPGA Logic Architectures Hybrid LUT/Multiplexer FPGA Logic Architectures Abstract: Hybrid configurable logic block architectures for field-programmable gate arrays that contain a mixture of lookup tables and hardened multiplexers

More information

Integrating an AIG Package, Simulator, and SAT Solver

Integrating an AIG Package, Simulator, and SAT Solver Integrating an AIG Package, Simulator, and SAT Solver Alan Mishchenko Robert Brayton Department of EECS, UC Berkeley {alanmi, brayton}@berkeley.edu Abstract This paper focuses on problems where the interdependence

More information

Hardware Modeling using Verilog Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Hardware Modeling using Verilog Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Hardware Modeling using Verilog Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture 01 Introduction Welcome to the course on Hardware

More information

algorithms, i.e., they attempt to construct a solution piece by piece and are not able to offer a complete solution until the end. The FM algorithm, l

algorithms, i.e., they attempt to construct a solution piece by piece and are not able to offer a complete solution until the end. The FM algorithm, l The FMSAT Satisfiability Solver: Hypergraph Partitioning meets Boolean Satisfiability Arathi Ramani, Igor Markov framania, imarkovg@eecs.umich.edu February 6, 2002 Abstract This report is intended to present

More information

Design Diagnosis Using Boolean Satisfiability

Design Diagnosis Using Boolean Satisfiability Design Diagnosis Using Boolean Satisfiability Alexander Smith Andreas Veneris Anastasios Viglas University of Toronto University of Toronto University of Toronto Dept ECE Dept ECE and CS Dept CS Toronto,

More information

Efficient Test Compaction for Combinational Circuits Based on Fault Detection Count-Directed Clustering

Efficient Test Compaction for Combinational Circuits Based on Fault Detection Count-Directed Clustering Efficient Test Compaction for Combinational Circuits Based on Fault Detection Count-Directed Clustering Aiman El-Maleh, Saqib Khurshid King Fahd University of Petroleum and Minerals Dhahran, Saudi Arabia

More information

Fast Dual-V dd Buffering Based on Interconnect Prediction and Sampling

Fast Dual-V dd Buffering Based on Interconnect Prediction and Sampling Based on Interconnect Prediction and Sampling Yu Hu King Ho Tam Tom Tong Jing Lei He Electrical Engineering Department University of California at Los Angeles System Level Interconnect Prediction (SLIP),

More information

Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency

Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Yijie Huangfu and Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University {huangfuy2,wzhang4}@vcu.edu

More information

BoolTool: A Tool for Manipulation of Boolean Functions

BoolTool: A Tool for Manipulation of Boolean Functions BoolTool: A Tool for Manipulation of Boolean Functions Petr Fišer, David Toman Czech Technical University in Prague Department of Computer Science and Engineering Karlovo nám. 13, 121 35 Prague 2 e-mail:

More information

Resynthesis of Combinational Logic Circuits for Improved Path Delay Fault Testability Using Comparison Units

Resynthesis of Combinational Logic Circuits for Improved Path Delay Fault Testability Using Comparison Units IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 5, OCTOBER 2001 679 Resynthesis of Combinational Logic Circuits for Improved Path Delay Fault Testability Using Comparison

More information