arxiv: v1 [cs.ar] 31 Aug 2017

Size: px

Start display at page:

Download "arxiv: v1 [cs.ar] 31 Aug 2017"

Diane Blankenship
5 years ago
Views:

1 Advanced Datapath Synthei uing Graph Iomorphim Cunxi Yu, Mihir Choudhury 2, Andrew Sullivan 2, Maciej Cieielki ECE Department, Univerity o Maachuett, Amhert IBM T.J Waton Reearch Center 2 ycunxi@uma.edu, choudhury@u.ibm.com arxiv: v [c.ar] 3 Aug 27 Abtract - Thi paper preent an advanced DAG-baed algorithm or datapath ynthei that target area minimization uing logic-level reource haring. The problem o identiying common peciication logic i ormulated uing unweighted graph iomorphim problem, in contrat to a weighted graph iomorphim uing AIG. In the context o gate-level datapath circuit, our algorithm olve the unweighted graph iomorphim problem in linear time. The experiment are conducted within an indutrial ynthei low that include the complete high-level ynthei, logic ynthei and placement and route procedure. Experimental reult how a igniicant runtime improvement compared to the exiting datapath ynthei algorithm. Index Term Logic ynthei, datapath ynthei, reource haring, graph iomorphim I. INTRODUCTION Due to a large demand or computing, the complexity o hardware ytem have been igniicantly increaing, raiing the challenge in deign, veriication and ynthei to a new level. In the lat ten year, there ha been a puh to make change in optimization algorithm o EDA tool to improve their perormance in term o timing, area and power. Particularly aected are datapath module in microproceor and embedded ytem which play an important role in computation, which put new demand on logic ynthei. Traditional datapath ynthei low include extraction o arithmetic operation rom RTL code, high-level ynthei (HLS), logic ynthei, and technology mapping [][2]. Datapath ynthei technique have been mainly dicued in the context o traditional high-level ynthei reearch, uch a reource haring, cheduling and binding, relied on Data Flow Graph (DFG) repreentation [3][4][5]. Arithmetic operation uch a addition, multiplication, hiting and comparion, and control logic are extracted and modeled a block module. At the ame time, method uch a carry preix, and recoded partial product baed technique are applied or delay optimization [6]. The remaining part o the deign low produce the technology mapped netlit uing tandard-cell library. Even though mot o the datapath ynthei eort i pent in the high-level ynthei tage, there are many unexplored opportunitie in bit-level optimization that could improve reult o highlevel ynthei. Recently, high-level optimization technique, uch a reource haring, have been applied in logic ynthei to overcome ome o the limitation o datapath ynthei or tandard-cell deign. Speciically, a Directed Acyclic Graph (DAG) baed logic ynthei technique that target area minimization o datapath deign wa propoed in [7]. It i a tructural optimization technique implemented uing And-Inv-Graph (AIG) [8], which oer bit-level reource haring. The method include three tep: ) identiying ub-circuit candidate by earching a multiplexer-equivalent AIG tructure; 2) identiying common peciication logic uing graph iomorphim; and 3) inalizing the optimization by relocating multiplexer acro common logic. The mot critical part o the technique i tep 2, which olve the problem o identiying common logic and perorming Boolean matching. In act, inding iomorphim in AIG i a weighted graph iomorphim problem [7]. Thi i becaue, to repreent an arbitrary Boolean network uing AND node, the edge are required to repreent inverion or a wire, which claiie an AIG a a weighted graph. Note that olving graph iomorphim in weighted graph i much more complex than in the unweighted graph [9]. Although the technique o [7] oer new direction in datapath ynthei and promie area reduction, it ha ome limitation. Firt, the complexity o general graph iomorphim problem belong to NP, but i not known i it i P or NP-complete. Depite the reduction in complexity oered by DAG, olving a weighted DAG iomorphim could till caue memory and runtime exploion. Furthermore, ince that technique i implemented baed on AIG, it require tranormation between gate-level network and AIG repreentation to produce the technology mapped netlit. Thee tranormation could aect the optimization olution perormed by the previou ynthei procedure. In thi work, we develop new algorithm to overcome thee limitation. Speciically, we make the ollowing contribution: ) We propoe a novel algorithm or identiying common peciication logic that directly upport arbitrary tandard-cell netlit, without uing AIG, which maintain the optimization perormed by other ynthei technique. 2) Intead o olving weighted graph iomorphim problem, the propoed algorithm ormulate the problem a unweighted graph iomorphim, which igniicantly reduce the complexity o olving the problem. 3) The runtime complexity comparion between the AIG-baed algorithm [7] and the one preented here i provided uing illutrative example (Section 3.), and demontrated uing large datapath deign (Figure 6). 4) The propoed algorithm allow approximate iomorphim clae to be optimized (Section 3.2). 5) Thi approach ha been evaluated in two complete IBM ynthei low, including the complete low o high-level ynthei, logic ynthei and place and route (P&R), which allow it to make meaningul comparion with other technique. The experiment were perormed uing 4nm technology library. A. Boolean Network II. BACKGROUND A Boolean network can be repreented uing directed acyclic graph (DAG) with node repreenting logic gate and directed edge repreenting wire connecting the gate. I the network i equential, the memory element are aumed to be D lip-lop with known initial tate. In thi work, we only conider combinational logic optimization, which mean the lip-lop are conidered a primary input (PI) and primary output (PO) or the ub-circuit. In the AIG [8], each node ha either or two incoming edge. A node with no incoming edge i a primary input. Primary output are repreented uing pecial output node without output edge. Each

2 b c a d d a b c 3 2 b c a c (a) (b) (c) Fig. : (a) Gate-level netlit =bc + a ad; (b) AIG repreentation, = 2 3, 2= a, 3=ad, and =bc; (c) the propoed repreentation, node repreenting AOI2, and node2 and node3 repreenting the two NAND2. internal node repreent a Boolean AND unction. The combinational logic o an arbitrary Boolean network can be tranormed into an AIG [], while the edge can optionally provide inverion. Hence, AIG i conidered a a weighted DAG. Alternatively, the Boolean network can be directly repreented uing the gate-level netlit. The primary input, primary output, and lip-lop are contructed baed on tandard-cell netlit. Each logic gate i a vertex in the DAG. The logic gate with the ame correponding logic unction are conidered a the ame vertex type. Thi DAG ha only one type o edge, i.e. unweighted DAG, and provide more uniquene or checking iomorphim. The comparion between AIG and our repreentation i hown in Figure. The actual gate-level netlit, including one AOI2 and two NAND2 gate, i hown in Figure (a), and it AIG repreentation i hown in Figure (b). AIG require our AIG node with our inverion edge and ive non-inverion edge to repreent thi netlit. In contrat, the propoed repreentation in Figure (c), ha three node in two type, and all edge are identical. There are everal advantage o the repreentation hown in Figure (c) that we adopted in our work: ) avoid the tranormation between dierent Boolean network to maintain the original tructural, which maintain the optimization done in previou tage; 2) convert the weighted graph iomorphim problem into unweighted graph iomorphim problem to improve the runtime or identiying common peciication logic. B. Common Speciication Logic Two combinational circuit are conidered a common peciication logic i they have the ame peciication []. In thi work, common peciication logic ha to be identiied in the ollowing context: given the output boundarie o two logic cone, ind the input boundarie that reult in maximum common logic uch that the ignal o the input boundarie match. Mot technique or checking i two deign conorm to common peciication logic are baed on combinational equivalence checking (CEC). Thi problem ha been addreed by BDD [2], SAT[3][4], AIG[], etc. However, thoe method cannot be applied in thi work or the ollowing reaon: ) the input boundarie o the deign are unknown; and 2) i the input boundarie are detected, the relationhip (Boolean matching) o thoe input i unknown. Furthermore, it i well known that the unctional method uch a BDD and SAT, are not calable or gate-level arithmetic deign, uch a multiplier. C. Graph Iomorphim In graph theory, an iomorphim o graph G and H i a bijection between the vertex et V (G) and V (F ), : V (G) V (F ), uch that any two vertice u and v o G are adjacent in G i (u) and (v) are adjacent in H. Beide the mathematical reearch on graph iomorphim, the algorithmic approach to graph iomorphim ha been widely ued in computer engineering, e.g. Boolean matching [5] and program imilarity checking [6]. In general, graph iomorphim i applicable to undirected, unlabeled, unweighted graph. It i known to be an NP problem, but neither a NP-complete nor a P uing a determinitic algorithm. However, in the context o Boolean network, thi problem could be olved eiciently by heuritic algorithm. In thi work, we propoe a novel algorithm that reduce the number o reordering operation by employing anin-anout inormation o each node (i.e. tandard cell) or checking the exitence o an iomorphim between two directed acyclic graph. III. APPROACH The overall methodology o our approach i in three tep. Vector multiplexer i a et o 2-to- multiplexer with the identical control ignal. Firt, they are collected by irt tructurally revere engineering all the 2-to- multiplexer rom gate-level netlit [7], and then being claiied baed on their control ignal. Note that the multiplexer are eliminated rom the collection i any o their data input ha a anout. In cae o large multiplexer, uch a 64-to-, they are decompoed into 2-to- multiplexer [7]. Second, a et o ub-circuit i created baed on thee vector multiplexer. Each ub-circuit i a combinational logic cone whoe primary output are the output o all multiplexer in the vector multiplexer. Thee two procedure are pre-proceing tep. Third, a multiplexer relocation unction i applied to each output o the ub-circuit iteratively. The order o applying multiplexer relocation i orted by the number o logic gate per multiplexer in the ub-circuit. The original deign will be updated i the area o the ub-circuit i improved by relocating the multiplexer, i.e., moving the multiplexer backward without changing the unctionality o the deign. The reulting updated tandard-cell netlit, and will be ubjected to the remaining logic ynthei tep and eventually to phyical deign. A. Exact Iomorphim Determination Even though the multiplexer relocation i applied to a ub-circuit that include vector multiplexer at the primary output, the actual relocation i done individually or each multiplexer. The goal o multiplexer relocation i to maximize haring o common peciication logic that are the input cone o the multiplexer, by moving the multiplexer backward. The main challenge i to identiy the common peciication logic in the ub-circuit created by pre-proceing tep. Speciically, thi require perorming common tructure identiication and Boolean matching. According to the deinition o graph iomorphim, the algorithm propoed in [7] determine the iomorphim boundary between two graph uing breath-irt-earch. To obtain the maximum common logic, a look-ahead heuritic i applied in cae o there are multiple identical choice o contructing iomorphim. Thi could potentially caue an exponential runtime and memory exploion problem, epecially in the deign with many reconvergent anout. In thi ection, we introduce a novel algorithm to improve the runtime and calability or identiying common peciication logic. ) Standard-cell baed DAG advantage: Intead o uing AIG repreentation, the tandard-cell baed repreentation give two advantage: ) ome optimization eort in other tage o the ynthei low, that may diappear during the tranormation between AIG and tandard-cell netlit are maintained; 2) tandard-cell repreentation igniicantly reduce the poible choice or checking the exitence o iomorphim. For thi advantage, there are three reaon: (a) in each topological level, the total poible pairing choice i reduced;

3 (b) edge type i no longer neceary to be conidered, which make the iomorphim problem to be unweighted; and (c) utilizing the number o input and output o each tandard-cell reduce the number o poible choice when checking iomorphim, epecially in the repreentation o logic circuit. We demontrate thee uing an example in Figure 2 and 3. 2) Including ide anout inormation: Baed on the obervation hown in Example, we can ee that providing variou type o vertice at each logic level can igniicantly reduce the total number o pairing attempt or iomorphim determination. Thu, we preerve the anout inormation o the tandard cell in the vertice. Thi can igniicantly improve the runtime or a large deign that include many reconvergent anout, uch a the optimized multiplier. data data data data g g5 g g5 g g6 n n n2 n3 n4 n5 n6 n7 g2 g3 g4 g7 g8 g9 e e c d a b c d c d a b c d Fig. 2: Determine graph iomorphim uing tandard-cell baed DAG. Example (Figure 2) The tandard-cell netlit i hown in Figure 2. Signal data and data are the two input to a 2-to- multiplexer. Signal a, b, c, d, and e are the primary input. In each logic cone, the irt two level logic include one AOI2 and three NAND2 gate. Each gate i conidered a a vertex. The determination proce tart with g and g 5. Then, two vector o vertice are created uing breathirt-earch ince g and g 5 are the ame type vertice. V ={g, g 4}, V ={g 7, g 6}. To maintain the travered graph in the iomorphic cla, there exit only one pairing choice, i.e. (g, g 6), (g 4, g 7). The two vector will be updated, V ={g 2, e, g 3} and V ={g 2, e, g 3}. Since x and y are primary input, they are paired and eliminated rom V and V. Hence, we have two NAND2 vertice in each vector, which ha two pairing option, i.e. (g 2, g 8) or (g 2, g 9). However, in the tandard-cell baed DAG, only one option remain. Thi i becaue AOI2 ha two type o input, including two input or AND and one input or OR/NOR. To maintain the unction equivalence, g 2 mut pair with g 8, and o g 3 mut pair with g 9. In ummary, the total number o poible attempt or determining iomorphim or the irt two level logic i one. e data data2 c d a b a b c d Fig. 3: Determining graph iomorphim uing AIG. However, thi approach require much more eort to determine the maximum iomorphim while uing AIG repreentation. The AIG repreentation o thi deign i hown in Figure 3. According to the algorithm propoed in [7], the irt level logic ha two option or pairing, i.e. node 2 with node 9, or node 2 with node. The algorithm olve thi problem uing a look-ahead heuritic, which travere three level deeper and pick the pairing that give more common logic. Thi ituation happen alo while checking (node 4 with node 7, and node with node 3), and (node 6 with node 7, and node 3 with node 4). Thi mean that it require three time look-ahead checking and total o eight attempt to identiy the ame common logic a the one hown in Figure e g g2 g3 g4 g6 g7 g8 g9 Fig. 4: Illutrative example o utilizing anout inormation o each vertex. a b c x y z g g3 (a) g g2 a z b y Fig. 5: Approximate iomorphim determination by ignoring inverter. a) original deign; b) optimized deign uing extra XOR2 gate. c x Example 2 (Figure 4) Aume that each logic cone o a 2-to- multiplexer include one XOR4 and our NAND2 gate in the irt two level. Let the number o ide anout o net {n, n, n 2, n 3} be {3,2,,}, and the number o ide anout o net {n 4, n 5, n 6, n 7} be {,3,2,}. Without including the anout inormation, the total number o poible pairing i 24 ince our vertice in the econd level are identical. However, i we conider to pair the vertice according to the number o ide anout, there will be only one pairing choice, i.e. (g, g 7), (g 2, g 8), (g 3, g 6), and (g 4, g 9). Although, the anout inormation can igniicantly reduce the number o pairing, uch cae may not alway exit. I o, our approach will go through the look-ahead heuritic pairing proce. B. Approximate Iomorphim Determination In addition to conidering the exact iomorphim graph a common peciication logic, a novel approximate iomorphim determination approach i developed in thi work. One obervation i that much more common logic exit by ignoring the inverion. For example, in the cae o a 2-to- multiplexer that elect le than operator and le than or equal to, there i no common logic that can be identiied uing both repreentation while conidering inverion. Thu, we propoe an approximate iomorphim method to overcome thi limitation. Speciically, in the proce o identiying common logic, the inverter will be replaced by a 2-input XOR, with an extra input coming rom the control ignal o the multiplexer, or it complement. Example 3 (Figure 5) The original netlit i hown in Figure 5(a). Uing the approach decribed in Section 3., there will be only one gate in each intance o the common logic, namely g and g 2. However, we can ee that the two logic cone connected to the 2-to- multiplexer are identical without conidering the inverter. Hence, we continue earching or the common logic by kipping the inverter. In thi example, the common logic include two NAND2 and one (b)

4 x x a 3 a cone = i i x 2 x 3 a 4 a 2 x 4 x 5 y y y 2 y 3 a 5 b 3 b 4 a b b b 2 b d a AOI22 F c 2-to- MUX m m m 2 m 3 m 4 a 3 a 4 a a 2 i a F y 4 y 5 b 5 cone = m 5 a 5 (a) Original circuit. (b) Circuit optimized with our approach. Fig. 6: A complete example o multiplexer relocation uing the propoed approach. inverter. To maintain the original unction o, the inverter i replaced by an XOR2, whoe extra input i the control ignal. In Figure 5(b), ignal in the XOR2 actually elect the XOR2 to be a inverter or wire, i.e. when =, XOR2 i a inverter; and when =, XOR2 i a buer. C. Implementation The implementation o ingle multiplexer relocation i hown in Algorithm 2. The multiplexer relocation unction o ub-circuit with a vector multiplexer at the primary output (line 5 in Algorithm ), i applying the ingle relocation unction iteratively on each output bit. The input o Algorithm 2 i a ub-circuit with ingle output bit that i generated by a o 2-to- multiplexer. Algorithm 2 operated in three tep: Algorithm Single Multiplexer Relocation Input: Pre-proceed ub-circuit C Output: An optimized tandard-cell netlit Single Mux Relocate(C) : B = RelocationBoundray(P O) 2: C relocate multiplexer to level B, w/o conidering inverter 3: P = inv2xorpoition(p O, B) 4: C inert XOR to P baed on it location 5: return C RelocationBoundray(P O) : m level(p O) ; inverter i conidered a level. 2: while m do 3: L m the gate in ( = ) logic at level m 4: L m the gate in ( = ) logic at level m 5: i uniquefanoutpair(l m, L m) then 6: U m, U m uniquefanoutpair(l m, L m) 7: L m L m - U m; L m L m - U m 8: L m+, L m+ (U m, U m)+iomorphim(l m, L m) 9: ele : i iomorphim(l m, L m) then : L m+, L m+ iomorphim(l m, L m) 2: ele 3: exit 4: end i 5: end i 6: end while 7: return (level(po) - - m), (L m, L m ) inv2xorpoition(p O, boundary) : P the poition o all invert up to boundary level 2: P the poition o all invert up to boundary level 3: return P P a) The key unction o thi approach i identiying the maximum common peciication logic connected to the multiplexer. The unction i decribed in unction RelocationBoundray in Algorithm 2. Speciically, our algorithm identiie the boundary logic cut where the iomorphim between two logic cone end. Thi unction alo return the pairing o the boundary ignal that maintain the iomorphim cla, which i ued or creating the new multiplexer. We backward travere the graph rom the two input o the 2-to- multiplexer level by level (line - 2). The gate at level m are tored in two vector (line 3-4), depending their electing ignal. A mentioned in Section 3..2, our approach beneit ignicantly rom the anout inormation. Hence, we irt check i there exit unique anout pair. I o, we eliminate thoe pair rom the two vector that tore the gate. The ret o the gate in the two vector will do a regular iomorphim check, with a 3-depth look ahead earch [7]. For example, in Figure 6, there are two NAND2 gate in each vector, (a, a 2) and (b, b 2). There are two pairing choice at thi level, i.e., (a, b ) and (a 2, b 2), or (a, b 2) and (a 2, b ). Uing the anout inormation, there will be only one eaible pairing, i.e., (a, b ) and (a 2, b 2). Thi i becaue a 2 and b 2 have two anout, and a and b have only one anout. b) Relocate the multiplexer acro the common peciication logic, up to the boundary cut returned by the previou tep. The two logic cone between boundary and the multiplexer output have common peciication (not unctionally equivalent), denoted a cone = and cone =, depending on the elect ignal o the multiplexer. To relocate the multiplexer, we diconnect all the pin o cone = and create a et o multiplexer that elect the input ignal o thoe two logic cone. For example, in Figure 6, m i=x i+y i, i={,2,3,4,5}. Then, the input o cone = will be replaced by the output o the new multiplexer. In thi cae, x i i replaced by m i. Finally, the output F will be reconnected to the output o cone =. c) In the unction o RelocationBoundray, we do not conider inverter a a gate, or a node in the DAG. Thi enable the approximate iomorphim determination (Section 3.2). A mentioned earlier, thi allow u to identiy a larger common logic. For example, i we conider inverter a a node in the graph, the common logic will conit o only two NOR2 gate, a in cone = and b in cone =. To maintain the unctionality o the deign, we need to inert XOR2 gate with extra input or depending on which cone the invert belong to. We irt record the location o all inverter in cone = and cone =, denoted a P and P, up to the boundary cut. The location that require an XOR2 replacement i included in the reult o P P. Thi i why the inverter connected to gate a 4 and b 4 do not require XOR2 inertion, ince they maintain the two cone in the iomorphim cla (Figure 6). The inverter connected to b require an XOR2 inertion, and it belong to cone =. Hence, an XOR2 with extra input i inerted to replace i in Figure 6. IV. EXPERIMENTAL RESULTS The propoed approach in thi Section 3 wa implemented in C++ and integrated with the IBM logic ynthei low [8] and urther evaluated with IBM high-level ynthei low and Place and Route (P&R) low. Our approach i perormed beore technology mapping

5 IBM low IBM low IBM low (n-bit) Operator with AIG Opt with our approach Area Lev Area Lev Area Lev (64), A<B:A<C (64), A+B, A+C (64), A+B:A-C (64) A<B:A<=B *(64) A B:A C A B/C[7:]:A B/C[5:8] (32) A B+C:B C+A (6) dec(a):dec(b) lev lev lev TABLE I: Reult o arithmetic tet cae uing the original IBM ynthei Flow, IBM ynthei low with AIG optimization, and original IBM ynthei low with the propoed approach.(*thi deign i not ued or comparion.) Flow with Flow2 with Flow Flow2 Benchmark our approach our approach Area Delay Area Delay Area Delay Area Delay ibm ibm ibm ibm ibm ibm ibm TABLE II: Evaluation o our approach in the complete production Flow uing indutry deign in 4nm technology. Flow i the IBM ynthei low with AIG optimization; Flow2 i the original IBM ynthei low. within the logic ynthei low. The program wa teted on a number o datapath deign in SytemC. The datapath deign include large arithmetic operator, uch a 64-bit multiplier. All the experimental reult are collected at the end o the complete production deign low. Thi demontrate that our approach ucceully overcome the limitation o the exiting logic ynthei and high-level ynthei technique reviewed in Section. All o our experimental reult are obtained uing high-perormance 4nm technology library. To demontrate the runtime improvement compared to the work o [7], we examine the runtime uing a et o deign, including a multiplier circuit up to 64 bit. Our experiment were conducted on a machine with Intel(R) Xeon CPU 756 v6 2.2 GHz x32 with 4 TB memory. Runtime o identiying common peciication logic Number o tandard cell in multiplier Thi work [7] Fig. 7: Evaluation o CPU runtime uing deign with multiplier compared to [7]. We irt evaluate our approach uing a et o arithmetic deign in which there are two arithmetic operator elected by control ignal. The reult are hown in Table. The irt column indicate the bitwidth o the arithmetic operator and the type o the two operator. Thee deign are implemented in SytemC uing i then ele tatement. The econd and third column how the area and logic level reult produced by the original IBM ynthei low. The ourth and ith column how the reult produced by the original low with combinational AIG optimization []. The lat two column how the reult produced by original low with our approach. The lat row how the average improvement gain or lo. Speciically, the increae or decreae area i meaured in percentage o the original low, and the change o logic level i meaured in the number o level. Baed on Table, we can ee that: ) our approach give on average 34% area reduction compared to the other two low. Note that the low include complete high-level and logic-level optimization technique; and 2) our approach can handle large complex arithmetic operator, uch a datapath with large multiplier. With approximate iomorphim determination, we can optimize the deign with variou combination o two dierent operator. We then evaluate our approach uing even indutrial deign implemented in SytemC. Two ynthei low are ued or experiment: Flow i the IBM ynthei low with AIG optimization; Flow2 i the original IBM ynthei low. The reult are hown in Table 2. The econd and third column how the reult produced by Flow, and ourth and ith column are produced by Flow with our approach. The ixth to eventh column how the reult produced by Flow2. We compare the average improvement o the area and the delay at the lat row. We can ee that both area and delay have been improved in thee experiment. Speciically, uing Flow the area on average reduce by 39%, and the delay on average reduce 3%, and Flow2 oer 5% area reduction with 23% delay improvement on average. Note that the delay improvement are not provided directly by our approach. The delay are improved becaue our approach enable other optimization technique. Speciically, or thoe benchmark, an Adder optimization technique [6] implemented in the IBM ynthei low i enabled and igniicantly improve the delay ater relocating the multiplexer. TABLE III: Comparing the PnR reult with multiplexer relocation with the original low. Benchmark Route Length Power Wort-cae delay ibm ibm ibm ibm Additionally, we evaluate our approach uing our deign, ibm, ibm2, ibm4, ibm6, with placement and route (P&R). The input o P&R proce are the deign produced by Flow with AIG optimization (4 th and 5 th column in Table 2). The routing length, power and

(a) P&R reult o deign ibm2 without multiplexer relocation. (b) P&R reult o deign ibm2 with multiplexer relocation. Fig. 8: Comparing the P&R reult uing deign ibm2 with and without our approach.

We can ee that except ibm6, the deign are improved ucceully uing our approach without delay overhead.

6 (a) P&R reult o deign ibm2 without multiplexer relocation. (b) P&R reult o deign ibm2 with multiplexer relocation. Fig. 8: Comparing the P&R reult uing deign ibm2 with and without our approach. wort-cae delay are included in Table III. The improvement o the area o placing the tandard cell remain the ame a hown in Table 2 with the ame denity. The P&R reult o ibm2 are hown in Figure 8. We can ee that except ibm6, the deign are improved ucceully uing our approach without delay overhead. Particularly, we oberve that the power ha been igniicantly improved compared to the original deign. Moreover, we can ee that the improvement o ibm4 and ibm6 gained ater P&R are le than in the other two deign. The poible reaon or that are: ) there are large ( 32) anout ignal generated by multiplexer relocation in thoe two deign; and 2) a large number o the extra multiplexer have been placed tightly, which decreae routability. The reaon why we didn t compare our approach to the work o [7] in the experiment hown in Table and Table 2 i the ollowing: ) that algorithm can t be ucceully applied on all o the deign within eight hour; and 2) or the deign that on which the algorithm run ucceully, the reult are wore, e.g., 3rd and 4th deign in Table. To demontrate that our approach igniicantly improve the CPU runtime compared to the exiting algorithm in the cae o datapath with multiplier, the experimental reult are provided in Figure 7. The deign ued or the experimental reult hown in Figure 7 vary rom 4-bit to 64-bit. In Figure 7, the x-axi repreent the number o tandard cell in the deign, and the y-axi repreent the CPU runtime o the multiplexer relocation algorithm in logarithmic cale. It i clear that our algorithm perorm much ater than the AIG-baed algorithm [7]. V. CONCLUSION Thi paper preent an advanced DAG-baed algorithm that target area minimization uing logic-level reource haring. The common peciication logic identiication i ormulated a unweighted graph iomorphim problem. In addition, an approximate iomorphim algorithm i propoed in thi paper to identiy extra common logic. The propoed approach demontrate that it can igniicantly reduce area, and potentially reduce delay on indutrial deign, within a complete deign low. The runtime ha been reduced rom exponential to linear comparing to the exiting algorithm. Future work will ocu on improving unction o identiying common peciication logic. REFERENCES [] L. Stok, Data path ynthei, Integration, the VLSI journal, vol. 8, no., pp. 7, 994. [2] G. D. Micheli, Synthei and Optimization o Digital Circuit. McGraw- Hill Higher Education, 994. [3] M. Potkonjak and J. Rabaey, Optimizing Reource Utilization uing Tranormation, Computer-Aided Deign o Integrated Circuit and Sytem, IEEE Tranaction on, vol. 3, no. 3, pp , 994. [4] M. B. Srivatava and M. Potkonjak, Optimum and Heuritic Tranormation Technique or Simultaneou Optimization o Latency and Throughput, Very Large Scale Integration (VLSI) Sytem, IEEE Tranaction on, vol. 3, no., pp. 2 9, 995. [5] J. Cong and J. Xu, Simultaneou FU and Regiter Binding-baed on Network Flow Method, in Deign, Automation and Tet in Europe, 28. DATE 8. IEEE, 28, pp [6] S. Roy, M. Choudhury, R. Puri, and D. Z. Pan, Toward optimal perormance-area trade-o in adder by ynthei o parallel preix tructure, IEEE Tranaction on Computer-Aided Deign o Integrated Circuit and Sytem, vol. 33, no., pp , 24. [7] C. Yu, M. J. Cieielki, M. Choudhury, and A. Sullivan, Dag-aware logic ynthei o datapath, in Proceeding o the 53rd Annual Deign Automation Conerence, DAC 26, Autin, TX, USA, June 5-9, 26, 26, pp. 35: 35:6. [8] A. Mihchenko, S. Chatterjee, and R. Brayton, DAG-aware AIG Rewriting A Freh Look at Combinational Logic Synthei, in 43rd DAC. ACM, 26, pp [9] S. Umeyama, An eigendecompoition approach to weighted graph matching problem, IEEE tranaction on pattern analyi and machine intelligence, vol., no. 5, pp , 988. [] A. Mihchenko et al., ABC: A Sytem or Sequential Synthei and Veriication, URL eec. berkeley. edu/alanmi/abc, 2. [] E. Goldberg, Equivalence Checking o Diimilar Circuit II, Technical report, Tech. Rep., 24. [2] R. E. Bryant, Graph-baed Algorithm or Boolean Function Manipulation, Computer, IEEE Tranaction on, vol., no. 8, pp , 986. [3] A. Kuehlmann and F. Krohm, Equivalence checking uing cut and heap, in Proceeding o the 34th annual Deign Automation Conerence. ACM, 997, pp [4] E. Goldberg, M. Praad, and R. Brayton, Uing at or combinational equivalence checking, in Proceeding o the conerence on Deign, automation and tet in Europe. IEEE Pre, 2, pp [5] M. Soeken, B. Sterin, R. Drechler, and R. Brayton, Simulation graph or revere engineering, in Proceeding o 5th FMCAD. FMCAD, 25, pp [6] W. Li, H. Saidi, H. Sanchez, M. Schä, and P. Schweitzer, Detecting imilar program via the weieiler-leman graph kernel, in International Conerence on Sotware Reue. Springer, 26, pp [7] S. Mitra, L. J. Avra, and E. J. McClukey, Eicient multiplexer ynthei technique, IEEE Deign & Tet o Computer, vol. 7, no. 4, pp. 9 97, 2. [8] L. Stok, D. Kung, and et al., BooleDozer: Logic Synthei or ASIC, IBM Journal o Reearch and Development, vol. 4, no. 4, pp , 996.

Advanced Datapath Synthesis using Graph Isomorphism

Advanced Datapath Synthesis using Graph Isomorphism Advanced Datapath Synthei uing Graph Iomorphim Cunxi Yu, Mihir Choudhury 2, Andrew Sullivan 2, Maciej Cieielki ECE Department, Univerity o Maachuett, Amhert *IBM T.J Waton Reearch Center 2 ycunxi@uma.edu,