Yield Enhancement of Asynchronous Logic Circuits through 3-Dimensional Integration Technology

Size: px

Start display at page:

Download "Yield Enhancement of Asynchronous Logic Circuits through 3-Dimensional Integration Technology"

Kristian Elliott
5 years ago
Views:

1 Yeld Enhancement of Asynchronous Logc Crcuts through 3-Dmensonal Integraton Technology Abstract Ths paper presents a systematc desgn for yeld enhancement of asynchronous logc crcuts usng 3-D (3- Dmensonal) ntegraton technology. In ths desgn, the target asynchronous crcuts on one planar devce layer whch s fabrcated wth aggressve technology, are bult on fault tolerant graph models wth extra spare resources, and can be reconfgured by autonomous reconfguraton logc on another planar devce layer whch s fabrcated wth conservatve technology, n the presence of hard errors. The yeld analyss shows that ths method can result n 2 3% overall yeld enhancement. Ths method can be convenently appled to clocked desgns wthout sgnfcant changes. I/O pad Insulatng materals Intra devce layer va Top devce layer (w/ conservatve technology) Inter devce layer va Metal layer Devce layer (w/ aggressve technology) Substrate Heat spreader Fgure 1. Three-dmensonal ntegrated crcuts wth multple planar devce layers. 1 Introducton Aggressve technology scalng results n contnuously ncreasng process complexty and wafer handlng cost, makng t more dffcult and expensve to acheve hgh fabrcaton yeld by dentfyng and elmnatng most defects [13]. Due to ncreased tme-to-market pressures, however, foundres are often forced to start volume fabrcaton on a gven technology before the fabrcaton process becomes completely mature. Hence, to optmze the crcut desgn for better yeld and to shorten yeld learnng stage have become mportant ssues to semconductor ndustry. Wth the challenges facng conventonal devce scalng, 3-D (3-Dmensonal) ntegraton technology allows scalng to contnue by shftng the focus from devce scalng to crcut scalng. In 3-D ntegrated crcuts (IC), planar devce layers are stacked, one on top of another n a 3-D structure where adjacent devce planes can be connected by short, vertcal vas [2]. Usng 3-D ntegraton, the desgner can construct VLSI systems that exhbt lower nterconnect latences; hgher packng denstes of crcuts; and heterogeneous ntegraton of devces of dfferent materals (such as SGe and III-V materals) or technologes (such as submcron and nanometer) [2]. Durng the fabrcaton of 3-D structure, each planar devce layer can be manufactured and tested separately. All layers are then stacked and assembled together wth nter-layer nterconnects. Fgure 1 shows the dagram of a 3-D structure wth two devce layers. Hgher clock frequency and decreased feature szes by rentless technology scalng present a growng challenge to global clock dstrbuton, makng t dffcult and expensve to desgn a sngly-clocked, globally synchronous VLSI system. On the other hand, the absence of clock makes asynchronous crcuts not suffer such problem, and ther expected performance s decded by average-case (nstead of worst-case) path delay. Besdes, t s easy for an asynchronous system to acheve hgh energy effcency because computatons are purely data-drven and no dynamc energy s consumed for an dle component. All these facts make asynchronous desgn become an ncreasngly practcal alternatve [4]. Future VLSI systems could utlze both types of crcuts: clocked logc for local computatons facltates crcut desgn and asynchronous logc for global computatons allevates clock dstrbuton and tmng headaches. Prevous work on crcut desgn for yeld s prmarly based on hardwred N-modular redundancy (NMR) [6, 12] or fully programmable logc/gate arrays [1, 7, 8]. However, t s non-trval to apply hardwred NMR method to asynchronous crcuts wthout sgnfcant tmng assumptons [11] because the local handshake n asynchronous crcuts makes t unclear when the not-drectly-communcated outputs are expected to match, makng the duplcatonand-comparson phlosophy neffectve. For programmable logc/gate arrays, the large number of confguraton bts sgnfcantly slow down the crcut, and explct fault dagnoss effort requred for manual reconfguraton, consderably ncreases product test cost. In ths paper, we propose a new desgn method for yeld enhancement of asynchronous crcuts. By utlzng 3-D n- 1

2 tegraton technology and addng self-reconfguraton logc onto a separate relable devce layer, the VLSI system can recover from hard errors automatcally and results n less product falure probablty. Unlke NMR method, no sgnfcant tmng assumpton has to be made, makng ths desgn suted for asynchronous crcuts. Moreover, partal redundancy of ths desgn leads to smaller slcon area, helpng reducng hard errors. Compared wth fully programmable arrays, only a lmted number of confguraton bts are added n ths desgn, largely reducng performance and wrng overheads. Besdes, self-healng behavor spares fault dagnoss of target crcuts, resultng n less cost of product testng. Secton 2 presents that desgn n detal. Secton 3 evaluates the potental of yeld mprovement. Secton 4 draws the conclusons. 2 Defect Tolerant Asynchronous Desgn wth 3-D Integraton Technology A systematc way to buld an defect tolerant system s to make each module defect tolerant. In ths case, not only desgn complexty can be largely reduced but also smaller hardware cost s requred by addng less redundancy. Fgure 2 shows the framework of our desgn.... FSM Module I Module II Module III Deadlock Detector Reconfg I Reconfg II Reconfg III Fgure 2. Reconfgurable defect-tolerant asynchronous system. In Fgure 2, self-checkng logc s augmented to each hardware module so that crcut deadlocks when t s faulty. Each model s bult on a fault tolerant (FT) graph wth extra spare resources. Pass-gates whose control nputs (confguraton bts) come from the reconfguraton logc, are appled to all nputs/outputs and necessary nternal wres of each VLSI module so that the crcut topology can be changed dynamcally. Whenever any error occurs n a module, ths module deadlocks due to the fal-stop logc. Because of handshake-based data communcaton, any deadlocked module wll cause the whole asynchronous system to stall. Thus, only one deadlock detector (mplemented as a delay lne of a current-starved nverter chan [1]) s requred to detect the stall and trgger self-reconfguraton logc whch reconfgures the modules by replacng the faulty sub-modules wth the spare workable ones. After reconfguraton completes, computaton restarts from ether the begnnng or the last archtectural checkpont.... Due to the absence of comparson procedure, no sgnfcant tmng assumpton s requred, makng ths desgn suted for asynchronous logc. Unlke fully-programmable arrays, reconfguraton s mplemented at coarse granularty (module-level nstead of gate-level), resultng n much less confguraton overhead. Self-reconfguraton spares fault dagnoss and manual reprogrammng, sgnfcantly reducng product test cost. In addton, the extra reconfguraton crcutry does not ncrease packagng expense (whch s the sgnfcant part of overall manufacturng cost) because no addtonal I/O pn s ntroduced. The followng subsectons further explan ths desgn framework. 2.1 Implementaton wth 3-D IC technology In Fgure 2, all crcutry (self-reconfguraton and deadlock detecton) n the dashed box are crtcal (errorsenstve) and must be made hghly relable. Otherwse, the system cannot be reconfgured n the presence of errors. Wth 3-D ntegraton technology, the system can be mplemented as follows. () All the error-senstve transstors are placed onto one planar devce layer (shown as the top layer n Fgure 1) whch s fabrcated wth conservatve technology (such as mcrometer or submcron technology wth very hgh relablty). () The target VLSI crcuts (the VLSI modules whch are outsde the dashed box) are placed onto other planar devce layers whch are fabrcated wth aggressve technology (such as deep-submcron or nanometer technology wth low relablty). The 3-D mplementaton has the followng advantages. (1) By fabrcatng error-senstve transstors wth relable technology, ths 3D layout makes those crtcal crcutry unlkely affected by hard errors. (2) Inter-devce-layer vas largely reduces wrng overhead between reconfguraton logc and target crcut, causng less planar slcon area. (3) Snce the error-senstve crcutry stand by when the system s not faulty, usng relable technology causes less leakage current and further reduces overall power overhead of the defect tolerant desgn, whle wthout notceably hurtng system performance. 2.2 Fal-stop behavor and fault tolerant graph models A wdely-used asynchronous crcut template, precharge half-buffer (PCHB) [9], s chosen for asynchronous logc mplementaton. A PCHB crcut can have multple nputs and outputs, and t can be used to construct almost any asynchronous logc. For example, an asynchronous MnMIPS mcroprocessor uses PCHBs for more than 9% of ts crcuts. Smlar to a precharge domno crcut n synchronous desgn, a PCHB crcut performs computatons usng pulldown (NMOS) networks, makng t fast and compact. In ths crcut, each nput/output varable X s usually dualral encoded (X, X 1 ) wth an explct acknowledge (X e ). Valdty and neutralty of the varables are checked and synchronzed, generatng the common acknowledge to all nputs as well as the precharge/enable sgnal for computaton of current stage. Further detals can be found n [9]. 2

3 Due to the mult-ral encoded data and explct handshake-based event-orderng, the authors n [3] proved that any falure by a sngle stuck-at fault n the PCHB crcut ether deadlocks the crcut or produces an llegally encoded output (X X 1 = 11 ). Thus, fal-stop behavor can be mplemented by addng a checker (NAND gate) onto each output of a PCHB crcut whch blocks the ncomng handshake sgnals n the presence of any llegal output. In order to acheve reconfguraton, each VLSI module s bult on a fault tolerant graph wth extra spare resources so that faulty components can be replaced wth workable spare ones. In [11], we developed a K-fault tolerant lnear array (called mn-spare array) by addng mnmum (K) nodes and necessary redundant nternal connectons between the nodes. The left graph n Fgure 3 shows a constructon example of 2-FT 2-node lnear array, where 2 spare nodes (shaded), 4 external edges (dashed), and 5 nternal edges (bold) are added for any 2-node fault tolerance. As to VLSI modules whch consst of dentcal components, they generally can be modeled as a lnear array (e.g., full adder) or a collecton of lnear arrays (e.g., multpler, FIR flter), gven that ether data or control propagates lnearly through them. The collecton of lnear arrays usually can be further smplfed nto a lnear array through coalescng all nodes of the same row (column). For all these VLSI modules, mn-spare array model can be reasonably appled. For VLSI modules of other topologes, the K-fault tolerant graph can be smply (K + 1) replcas of the target module (called full-duplcaton graph, shown as the rght graph n Fgure 3). One and only one replca s selected by enablng ts connectons wth the outsde. Compared wth NMR method, only K (nstead of 2K) full replcas are added to ths graph. Other effcent fault tolerant graph models can be appled to further reduce hardware cost. e 2 e 1 u 3 u 4 u 2 u 1. Module Replca K+1 Module Replca II Module Replca I. 2.3 Self-reconfguraton Reconfguraton n Fgure 2 s both dynamc and autonomous so that no manual fault dagnoss and external reprogrammng s requred, sgnfcantly reducng product test cost. For asynchronous crcuts, on-the-fly fault locaton s non-trval and generally results n large hardware overhead, compromsng the overall relablty by exposng more transstors to unrelable envronment. In ths paper, we propose a new method: self-reconfguraton s acheved by searchng a workable confguraton out of all possble ones, wthout fault locaton. To mplement exhaustve search, fnte state machnes are utlzed to remember the current confguraton and decde the next confguraton to take. Reconfguraton s actvated repeatedly untl a workable settng s found. The followng explans further detals. Reconfguraton. The reconfguraton can be mplemented usng synchronous logc whch s trggered by tme-out sgnals from deadlock detector(s). Conservatve tmng can be appled to guarantee crcut functonalty. Let an asynchronous system consst of M modules and K defects are to be tolerated. Each module s bult on a fault tolerant graph. Whenever hard error(s) occur, the sys- Fgure 3. Fault tolerant graphs. tem stalls (due to fal-stop) and the deadlock detector act- 3

4 vates the top fnte state machne (shown as block FSM n Fgure 2) of self-reconfguraton logc. Block FSM selects K modules and reconfgures those modules concurrently by actvatng the per-module reconfguraton crcutry (shown as blocks Reconfg I III). Each per-module reconfguraton logc ncludes a fnte-state machne whch s used to search a workable confguraton from all possble confguratons of current VLSI module. Block FSM has M outputs, and exactly K of them can be vald durng a reconfguraton so that exactly K modules are actvated for concurrent reconfguratons. ( Thus, the core of block FSM s a log M 2 K) -bt cyclc counter whch s used to select all possble K modules, together wth necessary combnatonal logc to derve the M outputs accordng to current counter output. For per-module reconfguraton logc, t depends on the fault tolerant graph topology utlzed n the VLSI module. For a VLSI module of full-duplcaton graph, the per-module reconfguraton logc s smply a (K + 1)-bt one-hot counter (a cyclc shft regster wth unque bt- 1 ) for swtchng to a dfferent replca. For a VLSI module of mn-spare array, per-module reconfguraton s completed by selectng N nodes out of N +K ( nodes (suppose a N-node array). Hence a dedcated log N+K ) 2 N -bt (when K N, M Klog2 N) cyclc counter s utlzed to search all possble ( ) N+K N confguratons. All confguraton bts (whch correspond to an embedded somorphc array wth specfc N nodes) are then derved from current counter output, through supportng combnatonal logc. Fgure 4 shows an constructon example of self-reconfguraton logc for 1-FT 2-module system. CLK a a1 Comb Logc b b1 b1 TD CLK 1 Td (FSM) CLK TD Confg Bts of Module I Confg Bts of Module II Fgure 4. Reconfguraton logc. In Fgure 4, Block FSM s a 2-bt one hot counter whch enables 1 module for reconfguraton. Per-module reconfguraton crcutry are shown n two dashed boxes. Module I s bult on a 1-FT 3-node mn-spare array: 2-bt counter searches all 4 confguratons, and combnatonal logc derves all confguraton bts (for 11 edges) accordngly. Module II s bult on a 1-FT full-duplcaton graph (composed of 2 full replcas): a 2-bt one-hot counter chooses the correspondng replca by enablng ts connectons wth external envronment. Prmary nput TO comes from deadlock detector, and t becomes hgh when system deadlocks, actvatng reconfguraton crcutry. TO s delayed so that FSM becomes updated before per-module reconfguraton starts. 1 TO At the end of each reconfguraton, TO wll be reset to low. Defect recovery tme. Defect recovery tme s decded by the number of confguratons the system has tred before t fnds a workable one. Although concurrent per-module reconfguratons mght unnecessarly reconfgure non-faulty modules, they sgnfcantly reduce defect recovery tme. Let µ d be the total tme requred by one reconfguraton. Dependng on the system complexty, µ d s generally n terms of mcroseconds or mllseconds. The worst defect recovery tme T r, whch can be used to estmate the expected defect recovery tme, s that system has tred all possble confguratons and only the last one s workable. In ths herarchcal reconfguraton framework, we have T r = P =1 τ where P = ( M K) and τ s the longest worst defect recovery tme of current selected K modules. If that module s bult on full-duplcaton graph, τ = (K+1)µ d ; If that module s of mn-spare N-node array, τ = ( ) N+K N µd. Defect recovery tme s reduced wth smaller M and N (gven K). M s reduced by parttonng VLSI system at coarser granularty, and N becomes less by dvdng lnear array(s) nto larger nodes. Durng product testng, the defect recovery (test-and-reconfguraton) tme can be mnutes or hours. Hence, the total confguratons of ths defect tolerant desgn can be up to hundreds of thousands or even mllons (f µ d s n terms of mllseconds). Thus, ths self-recovery method can be convenently appled to large VLSI desgns. 3 Evaluaton In ths secton, we nvestgate yeld enhancement of the defect tolerant desgn wth 3-D mplementaton. One problem for tradtonal yeld modelng s that model parameters are specfc to fabrcaton process and layout, and they are generally hard to be reasonably estmated [6]. In order to make general conclusons, we examne the yeld of the defect tolerant desgn based on the assumed yelds of baselne crcut, nstead of physcal parameters such as crtcal area and fault densty. Let Y o be the yeld of baselne crcut, and Y ft be the yeld of K-defect tolerant crcut. To make the conclusons ndependent on fault clusterng factors, we assume faults to occur ndependently. Thus, Y ft s pessmstc and the yeld enhancement results are conservatve. For the defect-tolerant crcut of mn-spare array topology (suppose t s a N-node array and Y n s the survval rate of each node), the yeld s the probablty that at least N out of N + K nodes survve. Y ft = N+K =N ( N + K ) (Y n ) (1 Y n ) N+K (1) Because faults occur ndependently, Y n = N Y o. For the defect-tolerant crcut of full-duplcaton graph topology, the yeld s the probablty that at least 1 out of K + 1 replcas s workable. Y ft = K+1 =1 ( K + 1 ) (Y o ) (1 Y o ) K+1 (2) 4

5 By choosng defect tolerance at coarse granularty (e.g., large module and array node szes), we can easly make the overhead of pass gates and extra wrng n fault tolerant graphs neglgble so that the extra area of vas between confguraton logc and target crcuts can be reasonably omtted. Thus, the calculated yelds usng equatons (1) and (2) are good approxmatons to the real results. Next we extend the yeld calculatons to 3-D structure. For the baselne (non-ft) crcut of 3-D IC wth L layers, we assume that all devce layers are fabrcated wth the same technology and the yeld of each layer s the same (Y o ). Thus, the overall yeld can be calculated as o = (Y o ) L (3) For the defect-tolerant crcut of 3-D IC, an extra layer (called confguraton layer) s added for reconfguraton (and deadlock detecton) for the target crcutry on the other L layers (called target layers). For smplcty, we assume the yeld of each target layer s the same (Y ft ). We also assume all crcuts on target layers are of unform fault tolerant graph topology (ether mn-spare array or fullduplcaton graph) and nvestgate the correspondng yeld enhancement. The yeld of a general defect tolerant desgn wth hybrd graph topologes, can be estmated as some ntermedate between these two extremes. The overall prmtve yeld of defect-tolerant desgn s Y cfg (Y ft ) L. Due to the extra confguraton layer, however, some slcon des whch are supposed to be used for target crcutry are allocated for reconfguraton logc. To count such slcon cost penalty, the overall (effectve) yeld of defect-tolerant crcut s calculated as follows, ft = L L + 1 Y cfg(y ft ) L (4) where Y cfg the yeld of confguraton layer. Y ft n equaton (4) can be derved from Y o usng equaton (1) or (2), dependng on the graph topology. Because confguraton layer uses conservatve technology, Y cfg s calculated n a dfferent way. Suppose the transstor count of reconfguraton crcut for the th target layer s Tcfg and the transstor count of the crcut on the th target layer s To. Let λ cfg and λ o be the mnmum feature szes of confguraton and target layers respectvely. Snce all layers of 3-D structure are of the same area and all reconfguraton logc share the same confguraton layer, the technology scalng factor for confguraton layer can be estmated usng transstor counts as follows, S = λ cfg = 1 L =1 T o λ o L L =1 T (5) cfg Wth yeld scalng model n [5], the yeld of confguraton layer can be estmated as Y cfg = SP 1 Y o (6) where, P s a constant of defect-sze probablty densty functon and usually P 3 [5]. By choosng approprate fault tolerance granularty, t s easy to make Tcfg to be no more than 4 5% of To [11]. Hence S can be approxmated to be 5/ L. Yeld of Yft 3D for both graph topologes can be calculated by pluggng equaton (6) nto equaton (4). In the remanng of ths secton, we nvestgate the changes of yeld enhancement (Yft 3D Yo 3D ) by varyng Y o wth respect to dfferent Ns, Ks and Ls respectvely. Impact of N. Accordng to equatons (1) and (2), only the yeld of mn-spare array crcut depends on N. The mpact of N to the overall (effectve) yeld enhancement s nvestgated wth gven K and L. We choose the number of target layers (L) to be 3, as ths number has been proved to be reasonable n terms of area-performance effcency and heat dsspaton [2]. Fgure 5 shows the results wth K = 1. The curves are smlar for other Ks. Yeld(Mn Spare) Yeld(Base) (%) (K=1) N=2 N=4 N=6 2 N= Yo (%) Fgure 5. Overall yeld enhancement of 3-D FT desgn wth varyng N (K = 1, L = 3). Fgure 5 shows that the defect tolerant desgn can mprove the overall yeld by up to 45%. When baselne yeld s too low (<3%), a very small yeld mprovement can be earned from addng redundances, resultng n only a lttle overall (effectve) yeld ncrease. When baselne yeld s hgh enough (>95%), the potental of prmtve yeld ncrease becomes very lmted so that even the deal prmtve yeld mprovement cannot compensate the nherent slcon cost penalty (due to extra confguraton layer), resultng n even lower overall (effectve) yeld. For other cases, there s sgnfcant yeld mprovement (2% 3% on average). These results show that ths defect tolerant desgn s able to notceably reduce fabrcaton loss for mmature technologes, helpng proftablty of foundres n the presence of ncreased tme-to-market pressures. For the cases of dfferent Ns, fner fault tolerance granularty (larger N) helps reduce overall falure probablty, resultng n hgher yeld enhancement (gven all other parameters). Such mprovement, however, becomes less sgnfcant wth larger N. Note that N cannot be too large. Otherwse, the transstor count of reconfguraton logc wll be ncreased dramatcally [11], whch may break the assumpton made for scalng factor approxmaton and cause the aforementoned yeld estmate naccurate. In the remanng of ths secton, we choose N to be 4 n order to preserve that assumpton whle wthout sgnfcant yeld loss compared wth other Ns (accordng to Fgure 5). 5

6 Yeld(Mn Spare) Yeld(Base) (%) (N=4) K=1 K=2 K=3 K= Yo (%) Yeld(Full Dup) Yeld(Base) (%) 75 K=1 K=2 6 K=3 K= Yo (%) (a) Mn-spare array (N = 4) (b) Full-duplcaton graph Fgure 6. Overall yeld enhancement of 3-D FT desgn wth varyng K (L = 3). Impact of K. Fgure 6 shows the overall yeld enhancement of both fault tolerant graphs wth respect to dfferent Ks. Generally speakng, more faults can be tolerated wth hgher K, and the system s more lkely to survve from defects. Thus, the overall yeld enhancement becomes hgher wth larger K (gven all other parameters). Because mn-spare array acheves fault tolerance at fner granularty, t generally results n more yeld ncrease than fullduplcaton graph (gven K and L). Impact of L. We examne the changes of overall yeld enhancement wth respect to dfferent number of target layers. Fgure 7 shows the results for K = 1, and the curves are smlar for other Ks. In ths fgure, X-l denotes L = X 1 (target layers). The overall (effectve) yeld calculaton takes slcon cost penalty (due to extra confguraton layer) nto account, and such fxed penalty becomes more sgnfcant for less target layers, generally reducng the yeld enhancement wth smaller L. For 2-l case, such penalty becomes 1% and there s no yeld enhancement. At the same tme, the prmtve yelds (wthout consderng such penalty) are always ncreased (for nstance, the curves of X-l-p), and the overall (effectve) yeld s mproved as long as the prmtve yeld ncrease can compensate the confguraton cost. Although more target layers (larger L) tend to amortze slcon cost penalty, t results n more confguraton crcutry on the confguraton layer and thus more transstors there, makng the fabrcaton technology for reconfguraton logc less conservatve and reducng Y cfg. For low Y o, the reducton of Y cfg by larger L s more sgnfcant than the savngs of amortzed confguraton cost penalty. Hence less target layers results n hgher yeld enhancement. When Y o ncreases (>4 5%), slcon cost penalty savngs begn to domnate, and larger L results n better yeld enhancement. Note that there cannot be too many devce layers n a 3-D structure. Otherwse, nter-layer va overhead must be taken nto account, and heat dsspaton as well as electromagnetc nteracton become problematc ssues [2]. Comparson wth NMR. We compare the overall yelds of ths defect-tolerant desgn wth tradtonal NMR method. Both desgns acheve defect tolerance wthout external nterventon. However, our desgn s more suted for asynchronous crcuts because t does not requre sgnfcant tmng assumpton. Wth dfferent amounts of extra hardware resources, two desgn methods acheves dfferent prmtve yelds. For far comparson, we take spare resource overheads nto account for effectve yeld calculatons, and we call the results normalzed yelds, whch reflects the costeffectveness of a desgn for yeld. Let the 3-D structure have L target layers. Regardng the K defect-tolerant desgn of mn-spare array (suppose N-node array), K out of N + K nodes are spare resources. The overall normalzed yeld s, mn = L L + 1 Y N cfg( N + K Y ft) L (7) Regardng the K defect-tolerant desgn of full-duplcaton graph, K full replcas are spare resources. The overall normalzed yeld s, dup = L L + 1 Y 1 cfg( K + 1 Y ft) L (8) For NMR desgn, 2K full replcas are spare resources 1. The overall normalzed yeld s, 1 nmr = ( 2K + 1 Y nmr) L (9) where, Y nmr = 2K+1 ( 2K+1 ) =K+1 (Yo ) (1 Y o ) K+1 Fgure 8 shows of the normalzed yeld enhancements of the defect-tolerant desgn wth respect to NMR method (Ymn 3D Y nmr 3D and Ydup 3D Y nmr) 3D for dfferent Ks. Here we choose N to 4 and L to 3. Several observatons can be made from Fgure 8. Frst, ths defect tolerant desgn results n hgher normalzed yeld than NMR because fault tolerance s acheved at fner granularty: less slcon penalty s ntroduced, whch s able to compensate ts smaller prmtve yeld ncrease. Wth hgher Y o, the prmtve yeld mprovement reduces, makng slcon penalty become domnant. Ths explans why the 1 We omt the majorty voter and the yeld results are optmstc. 6

7 Yeld(Mn Spare) Yeld(Base) (%) l 2 2 l p 3 l 3 l p 4 4 l 5 l 6 l 7 l Yo (%) (N=4, K=1) Yeld(Full Dup) Yeld(Base) (%) l 2 2 l p 3 l 3 l p 4 4 l 5 l 6 l 7 l Yo (%) (K=1) (a) Mn-spare array (N = 4) (b) Full-duplcaton graph Fgure 7. Overall yeld enhancement of 3-D FT desgn wth varyng L (K = 1). Yeld(Full Dup) Yeld(NMR) (%) (N=4) 4 K=1 35 K=2 K=3 3 K= Base Layer Yeld (%) Yeld(Full Dup) Yeld(NMR) (%) 6 K=1 K=2 5 K=3 K= Base Layer Yeld (%) (a) Mn-spare array (N = 4) (b) Full-duplcaton graph Fgure 8. Overall normalzed yeld enhancement of 3-D FT desgn wth varyng K (L = 3). normalzed yeld enhancement curves all monotoncally ncrease wth respect to Y o. When Y o s close to 1%, there s no normalzed yeld enhancement for both desgns. But our desgn results n less normalzed yeld loss than NMR. Wth the aforementoned analyss, t can be concluded that ths fault-tolerant desgn s more cost-effectve than NMR. Second, larger K results n more fault tolerance capablty but requres more extra hardware resource. For the defecttolerant desgn, less normalzed yelds are expected wth larger K because the extra hardware cost requred becomes more sgnfcant than the prmtve yeld mprovement. The only exceptonal case s mn-spare array wth very small baselne yelds (<2 3%) where the prmtve yeld ncrease always domnates. Comparson wth PLA Method. We gve a coarse comparson of our defect-tolerant desgn and PLA method, as PLA hardware structure strongly depends on the crcut functons and t s hard to make general quanttatve comparsons. Although PLA method could result n lower hardware overhead due to ts fault tolerance at even fner granularty (gate-level nstead of module-level), explct fault dagnoss and manual reprogrammng make the defect tolerance offlne and ncrease the product test cost. Besdes, PLA could cause notceable performance degradaton due to the large number of programmable gates. Compared wth PLA method, however, our desgn mplements reconfguraton at coase granularty, makes fault tolerance autonomous (thus reducng product test cost) and requres less confguraton bts (thus smaller performance overhead). 4 Concluson Ths paper proposed a general asynchronous desgn for yeld enhancement. Wth extra spare resources and specfc self-reconfguraton logc, the VLSI crcuts can mantan functonalty n the presence of hard errors. The evaluaton showed that ths desgn results n sgnfcant yeld enhancement ( 2-3% on average) (even wth pessmstc evaluaton). Unlke tradtonal NMR method, ths desgn s suted for asynchronous crcuts (due to no comparson procedure) and more cost-effectve n terms of yeld mprovement (due to fault tolerance at fner granularty); Compared wth PLA and FPGA-based methods, ths desgn largely reduces product test cost (due to purely hardware-based autonomous reconfguraton) and results n less performance overhead (due to less programmng bts). Ths defect tolerance method can be convenently appled to synchronous desgns wthout sgnfcant change, and smlar yeld enhancement curves are expected (although the numbers mght be dfferent). Note that dfferent 7

8 methods should be used to mplement fal-stop behavor n target clocked crcuts. One way to do ths s duplcatng the target crcut and comparng the results off the crtcal path on each clock cycle. If any msmatch s reported, the global clock wll be shut down, actvatng onlne reconfguraton. References [1] M. Abramovc, C. Stroud, and M. Emmert. Usng embedded FPGAs for SoC yeld mprovement. In Proc. Desgn Automaton Conference, 22. [2] K. Banerjee, S. J. Sour, P. Kapur, and K. C. Saraswat. 3-D ICs: a novel chp desgn for mprovng deep-submcrometer nterconnect performance and systems-on-chp ntegraton. Proc. the IEEE, 89(5), 21. [3] I. Davd, R. Gnosar, and M. Yoel. Self-tmed s selfcheckng. Journal of Electronc Testng: Theory and Applcatons, 6(2), [4] K. Emerson. Asynchronous desgn an nterestng alternatve. In Proc. Internatonal Conference on VLSI Desgn, [5] A. V. Ferrs-Prabhu. Yeld mplcatons and scalng laws for submcrometer devces. IEEE Transactons on Semconductor Manufacturng, 1(2), [6] I. Koren and Z. Koren. Defect tolerance n VLSI crcuts: Technques and yeld analyss. Proceedngs of the IEEE, 86(9), [7] S-Y. Kuo and W. K. Fuchs. Fault dagnoss and spare allocaton for yeld enhancement n large reconfgurable PLAs. IEEE Transactons on Computers, 41(2), [8] R. Leveugle, Z. Koren, and I. Koren et al. The hyet defect tolerant mcroprocessor: a practcal experment and ts cost-effectveness analyss. IEEE Transactons on Computers, 43(12), [9] A. M. Lnes. Ppelned asynchronous crcuts. Master s thess, Calforna Insttute of Technology, [1] N. R. Mahapatra, A. Tareen, and S. V. Garmella. Comparson and analyss of delay elements. In Proc. the 45th Mdwest Symposum on Crcuts and Systems, 22. [11] S. Peng and R. Manohar. Fault tolerant asynchronous adder throught dynamc self-reconfguraton. In Proc. Internatonal Conference on Computer Desgn, 25. [12] C. E. Stroud. Yeld modelng for majorty votng based defect-tolerant VLSI crcuts. In Proc. IEEE Southeast Regonal Conference, [13] Y. Zoran. Optmzng manufacturablty by desgn for yeld. In IEEE/CPMT/SEMI Internatonal Electroncs Manufacturng Technology Symposum, 24. 8

Outline. Digital Systems. C.2: Gates, Truth Tables and Logic Equations. Truth Tables. Logic Gates 9/8/2011

Outline. Digital Systems. C.2: Gates, Truth Tables and Logic Equations. Truth Tables. Logic Gates 9/8/2011 9/8/2 2 Outlne Appendx C: The Bascs of Logc Desgn TDT4255 Computer Desgn Case Study: TDT4255 Communcaton Module Lecture 2 Magnus Jahre 3 4 Dgtal Systems C.2: Gates, Truth Tables and Logc Equatons All sgnals