Optimization of Critical Paths in Circuits with Level-Sensitive Latches

Optmzaton of Crtcal Paths n Crcuts wth Level-enstve Latches Tmothy M. Burks 1 and Karem A. akallah 2 1 ystems Technology and Archtecture Dvson, IBM Corporaton, Austn, TX 2 Department of Electrcal Engneerng and Computer cence, The Unversty of Mchgan, Ann Arbor, MI Abstract A smple extenson of the crtcal path method s presented whch allows more accurate optmzaton of crcuts wth level-senstve latches. The extended formulaton provdes a suffcent set of constrants to ensure that, when all slacks are non-negatve, the correspondng crcut wll be free of late sgnal tmng problems. Cycle stealng s drectly permtted by the formulaton. However, moderate restrctons may be necessary to ensure that the tmng constrant graph s acyclc. orcng the constrant graph to be acyclc allows a broad range of exstng optmzaton algorthms to be easly extended to better optmze crcuts wth level-senstve latches. We descrbe the extenson of two such algorthms, both of whch attempt to solve the problem of selectng parts from a lbrary to mnmze area subject to a cycle tme constrant. 1 The crtcal path method and tmng-drven desgn When a crcut must be desgned to satsfy strngent tmng constrants, we say that the desgn s tmng-drven. Researchers have descrbed a wde varety of tmngdrven desgn problems: logc synthess, retmng, transstor szng, part selecton, nput orderng, and placement and routng. Despte the range and varety of these problems, each approach s derved from a common framework for representng and enforcng tmng constrants: the Crtcal Path Method (CPM) [1]. The applcaton of CPM and a related technque called PERT to dgtal crcuts was frst descrbed by Krkpatrck and Clark [2] and later by Htchcock, mth, and Cheng [3]. In ths paper, crcuts are represented by a graph n whch drected arcs represent delays and nodes represent electrcally equpotental regons. Two specal nodes, the source and snk, group crcut nputs and outputs, respectvely. The crcut computaton tme s obtaned by makng a sngle pass through the graph. Begnnng wth the source node and proceedng n topologcal order, an event tme for each node s calculated: e ( v ) = max u P ( v ) [ e( u) + t uv, ] e( u) and e( v) are the event tmes for nodes u and v, t uv, s the delay of arc u v, and P( v) s the set of node predecessors of node v. When the event tme for the source node s zero, the event tme of the snk node gves the delay of the crcut. A crtcal path s a sequence of arcs that connects the source and snk nodes and whose delays determne the crcut completon tme. Crtcal paths can be dentfed by computng requred tmes n a sngle pass whch vsts nodes n reverse topologcal order. r ( v ) = mn u ( v ) [ r( u) t uv, ] r( u) and r( v) are the requred tmes for nodes u and v and ( v) s the set of successors of node v. The requred tme for the snk node s set to the tme that the crcut calculaton must complete. The slack of a node s defned as: s( v) = r( v) e( v) (3) A smlar quantty, float, s defned for each arc: f( u v) = r( v) e( u) t uv, (4) A crtcal path can be dentfed as a sequence of arcs havng the most negatve float n the graph. Orgnally, crtcal path methods were appled only to combnatonal crcuts. ynchronous sequental crcuts were analyzed by frst parttonng them nto combnatonal sectons whose nputs were drven from edge-trggered flp-flops or prmary nputs, and wth outputs connected to prmary outputs or edge-trggered flp-flops. When level-senstve latches were used, t was necessary to assume fxed sgnal departure tmes, effectvely treatng latches as edge-trggered devces. The purpose of ths paper s to relax these assumptons as much as possble. (1) (2) Permsson to copy wthout fee all or part of ths materal s granted, provded that the copes are not made or dstrbuted for drect commercal advantage, the ACM copyrght notce and the ttle of the publcaton and ts date appear, and notce s gven that copyng s by permsson of the Assocaton for Computng Machnery. To copy otherwse, or to republsh, requres a fee and/or specfc permsson. 1994 ACM 0-89791-690-5/94/0011/0468 $3.50 468

2 CPM-based algorthms for tmng-drven part selecton The tmng-drven part selecton problem can be stated as follows: gven a netlst of parts and a lbrary contanng a dscrete set of mplementatons for each part, where each mplementaton has dfferent drve capablty, nput load, and cost (e.g. area or power), select an mplementaton for each part to mnmze the total cost subject to a fxed constrant on crcut tmng, typcally a mnmum cycle tme. A commonly-used delay model gves the delay through a part P as = τ + R C L τ s the ntrnsc delay of P, R s an effectve output resstance, and C L s the capactance seen by the part output. Typcally, τ and R decrease as the sze of P s ncreased and C L ncreases as the szes of the fanouts of P ncrease. Ths nonlnearty complcates the optmzaton problem, snce we cannot guarantee that the fastest crcut wll be the one composed of the largest parts. We examne two CPM-based part selecton algorthms, each of whch s easly extended to the optmzaton of crcuts wth level-senstve latches. The frst s an adaptaton of the TILO algorthm [5] for transstor szng. An teratve procedure, TILO frst dentfes the crtcal path or paths n a crcut and then selects one transstor from the path(s) to be reszed. The transstor chosen s the one wth the largest senstvty value, whch s defned as the amount of delay reducton per ncremental ncrease n area. Although later work showed that the TILO algorthm may not always produce optmal szngs [6], t s generally seen to produce good results wth moderate runnng tmes. TILO was orgnally developed to sze ndvdual transstors and allowed for a nearly-contnuous range of szes. However, a smlar approach can be used for the part selecton problem, and Ln et. al. developed a comparable procedure whch also used senstvty nformaton to gude szng [7]. In the verson of the algorthm that we use, each pass computes actual and requred tmes for each node. The arcs sharng the smallest float are examned, and from these, the gate wth the largest senstvty s reszed. Each teraton runs n tme lnearly related to the number of parts. The number of teratons vares wth the number of parts that must be reszed to satsfy the tmng constrants. The second algorthm was based on an algorthm for optmally szng a chan of parts developed by Hnsberger and Kolla, who also developed an algorthm for fanoutfree trees [8] and showed that the general problem of tm- (5) ng drven part selecton s NP-complete [8, 9]. To optmze arbtrary crcuts, Hnsberger and Kolla proposed usng ther szng algorthms to teratvely resze the most crtcal path or tree n a crcut. Iteraton stops when the target cycle tme s obtaned or when t s mpossble to reduce the delay of the crtcal path. Experments n [9] suggested that teratons of the path-based approach provded solutons of comparable qualty n sgnfcantly less tme than teratvely reszng trees. As a result, we use the smpler algorthm whch teratvely reszes chans of parts. 3 Extendng CPM for crcuts wth levelsenstve latches The approaches of the prevous secton were orgnally desgned to work on combnatonal logc only 1, but they can be easly extended to optmze across level-senstve latches, allowng latch arrval and departure tmes to move freely durng the optmzaton. We dstngush ths extended optmzaton as cross-latch optmzaton and consder t a superset of the nter-latch optmzaton technques orgnally developed usng CPM. Inter-latch optmzaton optmzes logc between latches, cross-latch optmzaton s able to optmze across latch boundares. To descrbe the necessary extensons, we use the latch tmng model developed by akallah, Mudge, and Olukotun [10]. Model equatons and constrants are lsted n Table 1, where we nclude only those relevant to the latest arrvng sgnals. Varables descrbng clock sgnals nclude the cycle tme T c, phase wdths T p, and endng tmes e p of each phase Φ p specfed n a common frame of reference. Crcut model parameters nclude latch setup tmes, and the maxmum delay between each connected par of latches 2 j. The data nput of each latch s modeled by the latest possble tme at whch a new sgnal can arrve A. Latch outputs are modeled by the latest tmes at whch new sgnals depart from the latch D. Arrval and departure tmes are defned n a frame-of-reference local to the correspondng latch. p denotes the clock controllng latch. A phase-shft operator Epj, p s used to convert sgnal tmes from the frame-of-reference of latch j to that of latch. In [11], t was observed that the constrants n Table 1 can be represented by a graph, whch was used to fnd the optmal clock schedule for a crcut. A smlar graph for- 1. TILO allowed optmzaton across a parameterzable number of latches but recommended that ths number be kept small, probably to avod dffcultes due to feedback loops n crcuts beng optmzed. 2. or smplcty, latch delays are omtted. They can be ncluded by the addton of terms to equaton (7) or (8). 469

Φ 1 Φ 2 T 1 e 1 T 2 e 2 0 T c 1 T c 2 T c B 3 T c 4 T c G B L 0 T c A G A A B C G B E 21, G C E 21, A 0 0 D 0 D G D A 1 0 D 1 D A E 3 12, 0 D 3 G A G D L 3 Z 1 C G C L 1 A 2 D 0 2 G E E E 0 E 12, 21, A 4 D 4 L 2 G E Φ 1 Example Crcut L 4 Φ2 Z 0 T c T 1 Constrant Graph T c T 2 gure 1: Late gnal Constrant Graph A T c (6) e( ) = r( ) = 0, e( ) > 0 f and only f a setup volaton D = max( A, T c T p ) (7) exsts n the crcut, and when postve, e( ) s the amount of the largest setup volaton n the crcut. A = max j I() ( D j + j E pj p) (8) If the crcut contans cycles of latches, we cannot drectly apply the CPM-based technques of ecton 2, E pj p = T (9) c ( ( e pj e p )modt c ) snce n cyclc crcuts of level-senstve latches, we must be careful not to volate constrants mposed by loops of Table 1: Tmng Model ummary transparent latches [11, 12]. Each such loop adds a constrant mulaton s shown n gure 1. In these late sgnal constrant of the form TOTAL nt c, where TOTAL s the graphs, each latch s represented by a par of nodes total delay around the loop and nt c s the tme avalable labeled A and D, whch correspond to the arrval and for sgnals to propagate around the loop. Although there s departure tmes for the latch. A zero-weght arc connects one constrant for each loop, they can be combned nto a the arrval and departure tme nodes and reflects the arrval sngle lower bound on T tme terms n equaton (7). Arcs labeled x model ndvdual c. Tmng-drven desgn requres slack values to ndcate gate delays. Arcs labeled E pj, p connect gates to whch specfc constrants are volated and how changes to latch nputs and complete the representaton of equaton ndvdual delays affect crcut tmng. The late sgnal constrant graphs were formulated so that slack and float values correspond to the amounts by whch tmes or delays (8). All paths through these E pj, p arcs must come from a latch controlled by phase p j, however, fanns of multple could be ncreased wthout volatng a setup constrant, but there s no smlar quantty avalable to ensure that the phases can be accommodated by duplcatng sectons of loop constrants are all satsfed. It s not dffcult to construct crcuts wth large setup tme slacks but wth a crt- the constrant graph. The clock system and ts assocated constrants are ncorporated nto the constrant graph wth cal loop constrant. or these crcuts, ncreases n delays two addtonal vertces and sets of assocated arcs. Clock based on setup slacks can result n tmng errors. dstrbuton s modeled by connectng a source vertex to There are a few alternatves for ensurng that loop constrants are satsfed. If the optmzaton s formulated as a each latch departure tme vertex wth arcs weghted T c T. These arcs model the occurrence of the rsng lnear or nonlnear programmng problem [13], the constrants edge of the clock controllng latch. Arrval tme constrants are enforced by connectng arrval tme vertces to wll be enforced mplctly. If we requre slack val- ues, one soluton would enumerate all possble cycles n a snk vertex wth arcs weghted T c. If necessary, the graph and calculate loop slacks based on total loop delays and loop tmng budgets. Clearly, however, ths s clock skew parameters can be ncluded n the weghts of mpractcal for general crcuts, as the number of loops can these arcs connected to the source and snk nodes. ettng grow exponentally wth the number of latches. 470

T c A 0 D T T c T c T T c T 0. unbroken subgraph 1. ALAP * T c A A D A D * * A = D * T c D 2. AAP 3. ACTUAL gure 2: Cycle-Breakng trateges Our approach breaks cycles n the constrant graph, modfyng the graph to guarantee that all loop constrants wll be satsfed. We artfcally break the cycles n the constrant graph by fxng the departure tmes of selected latches to some maxmum value and requrng that ther arrval tmes be no greater than these fxed departure tmes. Ths ensures that the departure tmes wll be no greater than the specfed values, allowng us to safely gnore the dependency between arrval and departure tmes for ths subset of latches. After breakng loops n ths manner, we have an acyclc graph to whch we can apply a wde varety of CPM-based analyss technques, ncludng those of ecton 2. The approach s conservatve, snce t only allows the optmzaton to cross a subset of latches, but s easy to mplement, and the remanng graph accurately models all the unmodfed latches n the crcut and allows the arrval and departure tmes at these latches to vary durng the optmzaton. We may fx the departure tmes at breakpont latches n several ways, ncludng:. 1. ALAP: gnals depart as late as possble. Departure tmes are set to ther latest possble values. 2. AAP: gnals depart as early as possble. Arrval tme constrants are tghtened to allow ths departure. 3. ACTUAL: gnals are assumed to arrve and depart * * at tmes A and D determned by a prelmnary tmng analyss. Each of the three cycle-breakng methods s llustrated n gure 2. ALAP and AAP arbtrarly fx a departure tme at ts latest or earlest possble value, possbly makng a feasble cycle tme appear nfeasble under the modfed constrants. The thrd method, ACTUAL, uses tmes deter- A D mned by a prelmnary analyss. If the analyss s performed at a feasble cycle tme, the added constrants wll not cause ths cycle tme to appear nfeasble. However, ths requres that all loop constrants be satsfed at the target cycle tme from the outset When selectng arcs to remove from the constrant graph, a reasonable goal would be to mnmze the effects of breakng loops on the tmng of the crcut beng optmzed. nce each broken arc adds extra tmng constrants, t would be natural to seek to break as few arcs as possble to make the crcut acyclc. Ths goal reduces to the problem EEDBACK ARC ET, a well-known NPhard problem [14]. However, a depth-frst traversal wll quckly fnd a suffcent set of arcs to remove that can make the graph acyclc. We recursvely traverse the graph, markng nodes as they are vsted. When a mark s found, a cycle has been located and can be observed n a stack of nodes currently beng expanded. We can then smply look back nto ths stack to fnd the frst A D arc, whch s removed to break the cycle, and contnue untl no more cycles reman. 4 Experments Inter-latch and cross-latch varatons of the optmzaton algorthms of ecton 2 were evaluated usng ICA89 benchmark crcuts. The orgnal ICA89 crcuts were synchronzed usng edge-trggered devces and a sngle-phase clock. To obtan a varety of level-senstve crcut structures, we transformed the benchmarks n three ways: 1. by replacng edge-trggered devces wth level-senstve latches and consderng the late sgnal constrants only (hold tme constrants are gnored). These crcuts have names begnnng wth the letter s, e.g., s953. 2. by replacng edge-trggered devces wth pars of level-senstve latches controlled by alternate phases of a two-phase clock. The crcuts were then retmed to mnmze cycle tme usng a procedure smlar to that of Ish et al. [15]. These crcuts have names begnnng wth the letter t. 3. by usng a doublng transformaton descrbed by zymansk [11]. These crcuts have names begnnng wth the letter d. The crcuts used ranged n sze from 21 latches and 158 gates (s382) to 1642 latches and 15902 gates (d13207) Each was controlled by a symmetrc clock, and all twophase clocks were requred to be non-overlappng. nce the algorthms we consder nvolve modfyng crcut delays to satsfy a fxed clock schedule, ths restrcton s a convenence only and does not affect the generalty of the approach. 471

Parts were obtaned from the Texas Instruments 1-µ CMO standard cell lbrary [4]. Durng retmng transformatons, each part was assumed to be mplemented usng the smallest varant n the lbrary and snce exstng retmng algorthms do not allow for load-dependent delays, the retmed examples were obtaned by assumng that each gate drove a constant number of standard loads. All other analyses ncluded actual loadng effects along wth standard TI pre-layout estmators for nterconnect capactance. We sought to compare results obtaned usng nter-latch optmzaton wth those of cross-latch optmzatons usng the algorthms of ecton 2. Each algorthm was mplemented usng late sgnal constrant graphs. The same mplementatons performed nter-latch or cross-latch optmzaton, dependng on the presence of A D arcs n the graph. The two part selecton algorthms can be used to explore the relatonshp between the area and the mnmum cycle tme of a crcut. Each crcut s capable of operatng at a varety of speeds, dependng on the szng of ts component parts. Assumng all parts are at ther mnmum sze, we can compute a certan mnmum cycle tme for the crcut. In many cases t s possble to reduce ths mnmum by addng area to the crcut n the form of larger part varants. As a result, we expect an nverse relatonshp between area and mnmum cycle tme. gure 3-a shows the area vs. mnmum cycle tme relatonshp for benchmark t953 obtaned usng the TILO algorthm. The IMPLE curve represents smple nterlatch optmzaton. The ALAP and ACTUAL curves show results of cross-latch optmzatons breakng loops usng the respectve methods. or ACTUAL, ntal tmes were obtaned from a szng usng the IMPLE strategy to frst reduce latch-to-latch delays as much as possble. The arrval and departure tmes at breakponts were then computed at the mnmum cycle tme of the preszed crcut. gure 3-b shows the CPU seconds requred by each approach on a lghtly loaded DEC-staton 5000/120 (the ACTUAL curve does not nclude the constant addtonal tme requred for preszng). CPU tmes are drectly related to the amount of addtonal area requred; optmzatons requrng larger amounts of addtonal area requre a proportonately larger number of teratons of the TILO algorthm. Because of the addtonal flexblty allowed by tradng tme across latches, the area-delay curve for the ALAP and ACTUAL approaches are below and to the left of the IMPLE curve. Because less addtonal area s requred, the runnng tmes for these optmzatons are also less than those for the IMPLE strategy. Interestngly, the ACTUAL strategy was superor at small cycle tmes but was unable to fnd mnmal areas at large cycle tmes, probably because the optmzed startng pont ntroduced area cpu seconds 9 8 7 6 5 4 3 2 1 0 b. CPU tme requred to reach target cycle tme gure 3: Optmzaton of t953 wth TILO algorthm an arrval tme constrant that could not be satsfed by the mnmum-area crcut. mlar curves were found for Hnsberger and Kolla s algorthm and are omtted due to space constrants. We observed Hnsberger and Kolla s algorthm to be slghtly faster, perhaps because t optmzes an entre path at a tme. The TILO algorthm found smaller mplementatons for a gven cycle tme, but the dfference was small. Both algorthms show smlar mprovements when crosslatch optmzatons are used, and both produce better results when ntalzed wth the tmng of a pre-szed crcut. Table 2 summarzes addtonal experments usng the TILO and Hnsberger-Kolla algorthms. Each benchmark crcut was optmzed wth a target cycle tme of T m, the mnmum cycle tme reachable usng nter-latch optmzaton. or the cross-latch optmzatons, loops were broken usng the ALAP method. In the table, the frst column dentfes benchmark crcuts and remanng columns lst ratos of addtonal area and optmzaton tme requred to reach the target cycle tmes for each crcut. In all cases, these ratos were less than one, ndcatng that the crosslatch optmzaton approaches requred less addtonal area and less CPU tme to reach the same cycle tme. 5 Conclusons 1090 1080 "IMPLE" "ALAP" 1070 "ACTUAL" 1060 1050 1040 1030 1020 9 10 11 12 13 14 15 16 17 18 19 cycle tme a. Area vs. target cycle tme "IMPLE" "ALAP" "ACTUAL" 9 10 11 12 13 14 15 16 17 18 19 cycle tme We see three general benefts of cross-latch optmzaton. rst, t s smple to mplement. Each algorthm we examned was formulated usng general CPM networks. 472

TILO Hnsberger-Kolla crcut A ALAP CPU ----------------------- ALAP A ---------------------------- ALAP CPU ----------------------- ---------------------------- ALAP A IMPLE CPU IMPLE A IMPLE CPU IMPLE s382 0.38 0.52 0.32 0.57 s444 0.33 0.51 0.39 0.49 s526 0.26 0.64 0.17 0.35 s953 0.32 0.40 0.38 0.62 s1423 0.42 0.69 0.58 0.22 s9234 0.60 0.74 0.53 0.68 s13207 0.75 0.92 0.62 0.89 t382 0.49 0.71 0.45 0.78 t444 0.32 0.43 0.24 0.57 t526 0.29 0.50 0.24 0.53 t953 0.20 0.29 0.23 0.58 t1423 0.43 0.72 0.58 0.23 t9234 0.58 0.77 0.48 0.70 t13207 0.75 0.99 0.71 1.01 d382 0.26 0.35 0.31 0.56 d444 0.33 0.41 0.37 0.50 d526 0.23 0.42 0.15 0.44 d953 0.29 0.35 0.36 0.64 d1423 0.32 0.56 0.46 0.19 d9234 0.56 N/A a 0.60 0.60 d13207 0.73 N/A 0.19 0.38 averages 0.44 0.57 0.42 0.56 Table 2: Experments conducted at T m a. unavalable ratos are due to runnng tmes outsde the measurable range. The extended formulaton ncorporates level-senstve latch tmng behavor and only requres the addtonal step of breakng cycles n the constrant graph. econd, cross-latch optmzaton produces better results. In all cases examned, the addtonal flexblty of cross-latch optmzaton found solutons of equal or better qualty to those of nter-latch optmzaton. Thrd, cross latch optmzaton adds no sgnfcant computatonal cost. The only extra computaton requred s the nexpensve loop-breakng step. In all cases examned, cross-latch optmzatons requred less tme than comparable nter-latch optmzatons. The runnng tmes of these algorthms depend on the dffculty of satsfyng tmng constrants; more accurately modelng latch tmng eases these constrants, allowng the algorthms to more quckly fnd a feasble soluton. To these we add the followng lmtaton: cross-latch optmzaton requres a target clock schedule. Tradtonal crtcal path methods mnmze cycle tme by maxmzng slack, ncreasng the margn on the setup constrants. Ths margn wll have a varyng nfluence on the cycle tme, dependng on the tme budgets of the related paths. Interlatch optmzaton does not necessarly requre a target clock. If the tme budgets of all paths are equal, then the mnmum cycle tme can be obtaned by maxmzng slack regardless of the cycle tme target. We beleve that many other CPM-based optmzatons can be smlarly extended to perform cross-latch optmzaton. Other areas for research nclude evaluaton of technques for loop breakng and development of gudelnes for ther use. Better optmzaton solutons may be found usng teratve approaches that modfy breakpont arrval and departure tmes durng optmzaton. Incremental tmng analyss would reduce runnng tmes. nally, snce the loop breakng modfcatons fundamentally restrct the soluton space, approaches whch can be drectly used on cyclc constrant graphs could allow further mprovement. References [1] K. Lockyer and J. Gordon, Crtcal Path Analyss and other Project Network Technques, Ptman, 1991. [2] T. I. Krkpatrck and N. R. Clark, PERT as an Ad to Logc Desgn, IBM Journal of Res. and Dev., vol. 10, no. 2, p. 135-141, March 1966. [3] R. B. Htchcock, r., G. L. mth, and D. D. Cheng, Tmng Analyss of Computer Hardware, IBM Journal of Res. and Dev., vol. 26, no. 1, p. 100-105, January 1982. [4] Texas Instruments, TC 700 eres 1-mcron CMO tandard Cells, R035B-D3857, 1992. [5] J. P. shburn and A. E. Dunlop, TILO: A Posynomal Programmng Approach to Transstor zng, n ICCAD- 85 Dgest of Techncal Papers, p. 326-328, 1985. [6] J. M. hyu, A. angovann-vncentell, J. P. shburn, and A. E. Dunlop, Optmzaton-Based Transstor zng, IEEE Journal of old-tate Crcuts, 23(2), p. 400-409, Aprl 1988. [7]. Ln, M. Marek-adowska, and E.. Kuh, Delay and Area Optmzaton n tandard-cell Desgn, n Proc. Desgn Automaton Conf., p. 349-352, 1990. [8] U. Hnsberger and R. Kolla, A Cell-Based Approach to Performance Optmzaton of anout-ree Crcuts, IEEE Trans. on Computer-Aded Desgn, 11(10), p. 1317-1321, October 1992. [9] U. Hnsberger and R. Kolla, Cell Based Performance Optmzaton of Combnatonal Crcuts, n Proc. European Conf. on Desgn Automaton, p. 594-599, 1990. [10] K. A. akallah, T. N. Mudge, and O. A. Olukotun. checkt c and mnt c : Tmng Verfcaton and Optmal Clockng of ynchronous Dgtal Crcuts, n ICCAD-90 Dgest of Techncal Papers, p. 552-555, 1990. [11] T. G. zymansk, Computng Optmal Clock chedules, In Proc. Desgn Automaton Conf., p. 399-404, 1992. [12] T. M. Burks, K. A. akallah, and T. N. Mudge, Identfcaton of Crtcal Paths n Crcuts wth Level-enstve Latches, n ICCAD-92 Dgest of Techncal Papers, p. 137-141, 1992. [13] W. Chuang,.. apatnekar, and I. N. Hajj, A Unfed Algorthm for Gate zng and Clock kew Optmzaton to Mnmze equental Crcut Area, n Proc. Desgn Automaton Conf., p. 220-223, 1993. [14] R. M. Karp, Reducablty Among Combnatoral Problems, n R.E. Mller and J. W. Thatcher (eds.), Complexty of Computer Computatons, Plenum Press, New York, p. 85-103, 1972. [15] A. Ish, C. E. Leserson, and M. C. Papaefthymou, Optmzng Two-Phase Level-Clocked Crcutry, n Advanced Research n VLI and Parallel ystems: Proceedngs of the 1992 Brown/MIT Conference, p. 245-264, 1992. 473