Structure discovery techniques for circuit design and process model visualization

Size: px

Start display at page:

Download "Structure discovery techniques for circuit design and process model visualization"

Blanche Campbell
5 years ago
Views:

1 Depatament de iències de la omputació Ph.D. in omputing Stuctue discovey techniques fo cicuit design and pocess model visualization Javie de San Pedo Matín Adviso: Jodi otadella Fotuny Bacelona, May 2017

3 Abstact Gaphs ae one of the most used abstactions in many knowledge fields because of the easy and flexibility by which gaphs can epesent elationships between objects. The pevasiveness of gaphs in many disciplines means that huge amounts of data ae available in gaph fom, allowing many oppotunities fo the extaction of useful stuctue fom these gaphs in ode to poduce insight into the data. In this thesis we intoduce a seies of techniques to esolve well-known challenges in the aeas of digital cicuit design and pocess mining. The undelying idea that ties all the appoaches togethe is discoveing stuctues in gaphs. We show how many poblems of pactical impotance in these aeas can be solved utilizing both common and novel stuctue mining appoaches. In the aea of digital cicuit design, this thesis poposes automatically discoveing fequent, epetitive stuctues in a cicuit netlist in ode to impove the quality of physical planning. These stuctues can be used duing flooplanning to poduce egula designs, which ae known to be highly efficient and economical. At the same time, detecting these epeating stuctues can exponentially educe the total design time. The second focus of this thesis is in the aea of the visualization of pocess models. Pocess mining is a ecent aea of eseach which centes on studying the behavio of eal-life systems and thei inteactions with the envionment. omplicated pocess models, howeve, hampe this goal. By discoveing the impotant stuctues in these models, we popose a seies of methods that can deive visualization-fiendly pocess models with minimal loss in accuacy. In addition, and combining the aeas of cicuit design and pocess mining, this thesis opens the aea of specification mining in asynchonous cicuits. Instead of the usual design flow, which involves synthesizing cicuits fom specifications, ou poposal discoves specifications fom implemented cicuits. This aea allows fo many oppotunities fo veification and e-synthesis of asynchonous cicuits. The poposed methods have been tested using eal-life benchmaks, and the quality of the esults compaed to the state-of-the-at.

5 Acknowledgments This thesis would have been impossible without the expetise, guidance, and, sometimes, necessay insistence of my adviso, Pof. Jodi otadella. As all his students will attest, including myself, it is quite an oppotunity to be able to wok with him. I am thus extemely thankful fo this oppotunity, that has allowed me a fisthand view of both the academic and industial wolds of ompute-aided design; as well as his ability to make sense even fom the most teible of my explanations. I would also like to thank evey peson I have had the pleasue to wok with. Fo the extemely useful guidance duing the ealy days, Nikita Nikitin, Pof. Josep amona, and Pof. Jodi Petit. Fo thei insights on hip multipocessos and Netwoks-on-hip, Fancesc Guim, Antoni oca and Daniel ivas. Fo his knowledge on the, fo me, novel topic of Pocess Mining, Joge Muñoz-Gama. Fo ou discussions on asynchonous cicuits and tansition systems, Thomas Bougeat. I hope that I have been as useful to you as you have been to me in, believe me, vey stessful times. And then, I would also like to thank all the people I have not been able to wok with, but would have enjoyed to. Fom my lab colleagues woking on simila topics, I would like to thank Alex Vidal, Albeto Moeno, and Lucas Machado. A special gatitude goes to Seppe vanden Boucke, Joos Buijs, and the oselog, ActiTa, and 4TU.Datacentum pojects, fo poviding the benchmaks that have been used to evaluate many of the methods poposed in this thesis. Plus, big thanks to all the excellent people in UP, including pofessos, eseaches, and administative staff. I obviously will not foget about all the suppot povided by my family, no fom all the fiends made duing this expeience Daniel Alonso, Segi Oliva, Adià Gascón, amón Xuiguea, Josep Lluís Beal, Albeto Gutiéez, Evelia Lizáaga, Eva Matinez, ales eus, Albet Vilamala, Alessanda Tosi, and M. Àngels eveó. Good luck you all in you futue endeavos! This wok has been suppoted by a scolaship fom the atalan Govenment (FI-DG 2013), by the Spanish Ministy fo Economy and ompetitiveness (MINEO) and the Euopean Union (FEDE funds) unde gant OMMAS (ef. TIN ), and by a gift fom the Intel copoation.

7 ontents 1 Intoduction ontibutions of this thesis Stuctue of this document Peliminaies Gaph mining Pocess mining VLSI design flow Asynchonous cicuits Mathematical optimization Physical planning fo chip multipocessos Motivation elated wok Achitectual exploation Flooplanning methodology Wie planning esults onclusions egulaity-constained flooplanning Motivation elated wok Exploing egulaity and hieachy egula flooplanning algoithm esults onclusions Log-based simplification of pocess models Motivation elated wok

8 5.3 Metics fo elevant acs Simplification methods esults onclusions Stuctued mining of Peti nets Motivation elated wok Stuctued mining flow onstuction of an LTS fom a log Extaction of LTS slices Synthesis of Peti Nets esults onclusions Discovey of duplicate tasks Motivation elated wok Local Excitation Sets Discoveing duplicate tasks Meta-tansitions esults onclusions Specification mining of asynchonous contolles Motivation elated wok icuits with constained envionment Specification mining Popeties of the specification model esults onclusions onclusions and futue wok Summay of contibutions Futue wok Bibliogaphy 165

9 hapte 1 Intoduction The eve inceasing amount of infomation geneated and captued duing day-to-day activities have fimly entenched data mining as an essential pat of almost evey secto in the global economy. The goal of data mining is to allow the extaction of useful knowledge fom lage amounts of data. This data may be geneated eithe as a by-poduct of othe activities, e.g. tails of consume inteactions with both physical and online sevices, o data expessly captued by all types of sensos. Data of inteest may be epesented in vaious foms. Unstuctued data is pehaps the most simple, and includes fee-fom text, untagged audio o video, o any othe data that does not eside in fixed fields [105]. On the othe hand, stuctued data is oganized accoding to some model. Fixed fields, tags and othe types of makes ae used to sepaate the individual data elements. With the ise of the Intenet, elational databases, semantic tagging and othe advances, the amount of usable stuctued data has inceased significantly. Futhemoe, in many knowledge domains data natually lends itself to a cetain stuctue. It is common fo tails of consume inteactions to be compised of well-delimited events tagged with timestamps, even if the event infomation itself is unstuctued. It is also common to find data descibed in tems of elations between objects, such as in elational databases, o any dataset descibing a physical stuctue o a netwok. Stuctue mining, a subfield of data mining, centes on extacting infomation fom all types of stuctued and semi-stuctued data. Gaphs povide an ideal abstaction model fo stuctued data. At its coe, a gaph epesents the elationships between a set entities in a specific knowledge domain. Because of the ease and flexibility by which gaphs can epesent stuctued data, they ae abundant in many topics of inteest. Thus, thee has been a gowing inteest in the stuctued mining of gaph data, an aea efeed as gaph mining [81].

10 2 hapte 1. Intoduction oe oe oe oe egiste equest L3 oute oute L3 Examine equest Ask fo moe data L3 oute oute L3 Accept equest eject equest oe oe oe oe (a) A gaph epesenting a pocess (Peti net [112]). (b) A gaph epesenting a hip multipocesso (block-level netlist). Figue 1.1: Examples of gaph-based data stuctues. The geneality of gaphs allows gaph mining techniques to be used in a vaiety of diffeent knowledge fields. Figue 1.1 contains two examples of gaphs, modeling data sets of the two diffeent domains in which this thesis will pimaily opeate. Howeve, gaph mining is applicable to a myiad of othe eal-wold topics. Figue 1.1a descibes the contol flow of a business pocess using a modeling abstaction known as a Peti net [112]. Peti nets povide a fomalism to descibe concuent systems, effectively descibing all possible executions of a system. In the figue, a snippet of the behavio of an online equest system is detailed. While the vetices epesent the diffeent tasks pefomed by the system, the edges specify the dependencies of the diffeent tasks, e.g. it is necessay to egiste the equest befoe examining it. The aea of Pocess mining deals with the discovey of pocess models, such as the one in Fig. 1.1a, fom event logs. Pocess mining also deals with the analysis and extension of these pocess models. Pocess models, Peti nets, and the associated aea of Pocess mining ae documented with moe detail in Section 2.2. Fom a diffeent knowledge field, Fig. 1.1a shows a netlist, which descibes the physical connectivity of an electonic cicuit. In this case, the vetices epesent the diffeent high-level functional blocks of a hip multipocesso: coes, I/O outes, and and L3 cache memoy modules. An edge indicates a physical bus, pehaps compising thousands of individual on-chip wies, between a pai of components. onnectivity gaphs like these ae used, fo example, as inputs duing the design stage of chip multipocessos, to ensue that highly connected components ae kept physically close and thus minimize total wie length. Netlists and the geneal design pocess of integated cicuits will be futhe intoduced in Section 2.3.

11 3 S Q H U B V F P O I N M J G E D Q T A Figue 1.2: Pocess model extacted fom consume inteactions with the helpdesk of a telecommunications company. As in othe elated fields of data mining, one of the significant challenges in gaph and stuctue mining is the eve gowing amount of available data [105]. Moe infomation is being poduced and captued than eve, and the tend fo the next decade is towads exponential gowth. Such gowth can povide new oppotunities to gathe additional insights fom the data. Due to the inceased complexity of the data, howeve, moe advanced mining tools ae equied befoe these benefits can be eaped. We cente on two equiements, that we descibe with moe detail below. Fist, the dange of ovewhelming data, i.e. too much noise and not enough signal so as to povide useful insight into the data. In Figue 1.2, we show a pocess model constucted automatically fom the event logs of eal-life helpdesk consume inteactions of a lage telecommunications company [56]. The gaph shown is too complex to povide any insight into the undelying pocess, and thus fails at one of the pimay objectives of data mining: poviding useful infomation. Simplifying o entiely avoiding these types of models is one of the desiable objectives. Second, lage data sets demand moe efficient algoithms to pocess the available data unde easonable time constaints. Algoithms fo gaph mining should always conside efficiency in tems of the size of the input dataset.

12 4 hapte 1. Intoduction icuit netlists, fo example, typically contain millions of individual vetices. Algoithms should be scalable in the numbe of vetices and edges, wheneve possible. Altenatively, poblems should be patitioned into smalle instances to avoid excessive untimes. The ability to apply gaph mining stategies to a wide ange of poblems in the aeas of cicuit design and pocess mining, and the challenges and oppotunities povided by the eve-inceasing available data in gaph fom motivate this thesis. The est of this chapte will descibe the goals of this thesis with moe detail. 1.1 ontibutions of this thesis In this thesis, we popose using gaph and stuctue mining methods to solve vaious eal-life poblems in the aeas of pocess mining and cicuit design. The undelying idea behind all of the poposed methods is stuctue discovey, i.e., automatically extacting new stuctues fom stuctued data, usually in the fom of a gaph. Specifically, the challenges addessed in this thesis ae: 1. Finding epeating stuctues in netlists to optimize egulaity in chip flooplans (haptes 3 and 4). 2. Simplifying pocess models by discoveing visually-fiendly stuctues (haptes 5, 6 and 7). 3. evese engineeing asynchonous cicuit specifications fom implemented cicuits (hapte 8). The following subsections povide an high-level summay of each one of these challenges Physical planning fo egula layouts In cicuit design, the flooplanning poblem compises allocating the physical space equied to functional blocks in a chip. Stuctues that ae highly connected should be placed close togethe, so as to minimize the total equied length of the wies. The computational complexity of the flooplanning poblem highly depends on the numbe of components of the design. egula flooplans, which contain uns of epeating subpattens, each of them with the same subflooplan, ae well-known to povide multiple benefits. Each of the epeating identical pats needs to be designed only once, theeby exponentially educing the design cost and seach space of the poblem.

13 1.1. ontibutions of this thesis 5 In hip multipocessos (MPs), a standad industy pactice to exploit egulaity is via the use of tiles. A tiled MP is usually entiely compised of a single tile design that is eplicated tens o hundeds of times, usually in a gidlike fashion. hapte 3 pesents a method to pefom efficient flooplanning in pesence of these tiles, while still guaanteeing the global physical constaints fo manufactuing such as outability o abutability. The goal of the poposed method is to be used duing the ealy exploation of MP achitectues, as an efficient estimato of the physical viability of candidate designs. In hapte 4, on the othe hand, we pesent an appoach that is moe geneally oiented towads all types of cicuits. Fo this, we popose automatically finding epeating subgaphs in the design netlist using fequent subgaph mining techniques. We intoduce a flooplanne, Hieg, that can use these fequent subgaphs to acceleate flooplanning untime and automatically poduce egula flooplans, with configuable focus on aea o wie length minimization. The poblem of fequent subgaph mining will be defined in Section 2.1.2, while the algoithm used will be detailed in hapte 4. These chaptes ae based on the following publications: J. de San Pedo, N. Nikitin, J. otadella, and J. Petit, Physical Planning fo the Achitectual Exploation of Lage-Scale hip Multipocessos, in Poceedings of the 2013 IEEE/AM Seventh Intenational Symposium on Netwoks-on-hip, Tempe, Aizona, USA, J. otadella, J. de San Pedo, N. Nikitin, and J. Petit, Physical-awae system-level design fo tiled hieachical chip multipocessos, in Poceedings of the 2013 AM Intenational Symposium on Physical Design, New Yok, NY, USA, 2013, pp J. de San Pedo, J. otadella, and A. oca, A hieachical appoach fo geneating egula flooplans, in Poceedings of the 33th IEEE/AM Intenational onfeence on ompute-aided Design, San Jose, alifonia, USA, Visualization of pocess models As with any othe discipline in data mining, a significant challenge in pocess mining is pesenting the esults in a way such that new insight may actually be gained fom the data. While eve inceasing computing powe combined with huge data sets povide new oppotunities, in pactice, thee is a big gap between what computes can stoe and what humans can intepet and use. Fo cucial aeas like health cae, extacting value fom the data is a challenge [7].

14 6 hapte 1. Intoduction The visualization of pocess models in a undestandable way fo a human is a cucial step towads this end. In this thesis we popose a seies of methods with the goal of simplifying existing pocess models as well as pocess discovey techniques that allow the diect geneation of visually-fiendly models. The poposed methods involve applying gaph mining techniques on top of eithe the pocess models themselves, o deived tansition systems. While the methods ae pimaily oiented towads Peti nets, many of the methods can be extended to othe Pocess model fomalisms. The fist poposed appoach, descibed in hapte 5, intoduces a seies of methods to simplify existing pocess models. We also popose a metic that is able to ank the impotance of the diffeent contol flow stuctues when epoducing the behavio of the oiginal pocess. This way, pats of a model that have high visual complexity but only specify infequent behavio can be identified and emoved. The undestandability of the model can be enhanced with a minimal impact on its fitness and pecision. hapte 6 poposes an altenate appoach in ode to allow the simplification of pocess models without incuing any cost in pecision. epesenting all the behavio of a pocess in a single pocess model may not be possible without sacificing simplicity, fitness o pecision. This is because eal-life pocesses ae usually highly unstuctued. The poposed appoach automatically discoves multiple pocess models, each of them satisfying cetain stuctual popeties while centeing on a specific aspect of the behavio of the pocess. Theefoe, the complexity of the obtained models can be kept unde check even fo complex event logs. Duplicate tasks [7] ae an extension available in many pocess model fomalisms, which allows two o moe vetices in a single model to efe to the same event. Duplicate tasks can be used to simplify pocess models with minimal impact on thei accuacy metics. Howeve, the automatic discovey of duplicate tasks is an open challenge in Pocess mining. In hapte 7, we contibute a method to automatically discove duplicate tasks that is compatible with most pocess mining discovey algoithms. The poposed method utilizes a gaph clusteing stategy that is esilient to commonly used contol flow stuctues, such as loops, choice o concuency. Additional extensions to the fomalisms of pocess models ae also discussed. These chaptes ae based on the following publications: J. de San Pedo, J. amona, and J. otadella, Log-Based Simplification of Pocess Models, in Business Pocess Management (BPM), Innsbuck, Austia, Septembe 2015.

15 1.2. Stuctue of this document 7 J. de San Pedo and J. otadella, Mining Stuctued Peti Nets fo the Visualization of Pocess Behavio, in Poceedings of the 2016 AM Symposium on Applied omputing (SA), Pisa, Italy, Apil J. de San Pedo and J. otadella, Discoveing duplicate tasks in tansition systems fo the simplification of pocess models, in Business Pocess Management (BPM), io de Janeio, Bazil, Septembe Specification mining fo asynchonous cicuits Asynchonous cicuits ae not diven by a global clock signal, and offe many potential benefits in tems of lowe powe consumption and highe pefoming functional units, specially in light of the moden manufactuing pocesses and the involved challenges. Howeve, they have since long claimed to be moe difficult to design. In the taditional design flow of asynchonous contol cicuits, the desied behavio of the cicuit is fomally specified in the fom of a Signal Tansition Gaph, fom which an implementation can be automatically synthesized [46]. In hapte 8, we intoduce the concept of specification mining, in which a fomal specification is obtained fom the implementation of a cicuit. We popose a method to pefom specification mining valid fo many types of asynchonous contolles and consideing seveal delay models. The poposed method combines both gaph mining and pocess mining techniques, such as the discovey algoithm poposed in hapte 6. This chapte is based on the following confeence aticle: Javie de San Pedo, Thomas Bougeat and Jodi otadella, Specification mining fo asynchonous contolles, in Poceedings of the 2016 IEEE Intenational Symposium on Asynchonous icuits and Systems (ASYN), Poto Alege, Bazil, May Stuctue of this document This thesis is stuctued in 9 chaptes. This chapte constitutes the intoduction to the thesis. hapte 2 intoduces the necessay basic concepts in gaphs, cicuit design, and pocess mining equied fo undestanding this thesis. haptes 3 4 deal with the explotation of egulaity duing flooplanning. In hapte 3, we focus on hip multipocessos. hapte 4 extends the method to all types of cicuits by using fequent subgaph mining stategies and poviding a moe geneal vesion of the constaints defined in the pevious chapte.

16 8 hapte 1. Intoduction haptes 5 7 cente on the visualization of pocess models. hapte 5 intoduces a method to simplify pocess models fo visual consumption by diectly emoving contol flow stuctues that ae least impotant to epoduce the most fequent behavio in the log. In hapte 6, we popose an altenative method that allows simplification by geneating a sequence of visually-fiendly pocess models, each of them centeing on specific aspects of the pocess, athe than constucting a unique model that may be difficult to simplify. hapte 7 poposes a method to exploit the concept of duplicate tasks fo futhe simplification of pocess models. hapte 8 opens the topic of specification mining fo asynchonous cicuits, and shows a method to obtain the specifications of asynchonous contolles with specific constaints fo vaious delay models.

17 hapte 2 Peliminaies This chapte intoduces the necessay backgound of concepts, algoithms and knowledge aeas that will be used in the est of this document. Section 2.1 intoduces the aea of gaph mining, which is behind most of the algoithms and methods pesented in this thesis. Section 2.2 gives an oveview of the aea of pocess mining, that will be the basis of haptes 5, 6, 7. Section 2.3 povides a tou of the basic design flow of VLSI cicuits to help undestand the context of haptes 3, 4 and 8. Section 2.4 eviews the aea of asynchonous cicuit design used in hapte 8. Finally, Section 2.5 summaizes the aea of mathematical optimization, including satisfiability and linea pogamming. 2.1 Gaph mining The domain of data mining concens itself with the extaction of pattens and knowledge fom data to facilitate undestanding and futhe use of the data. Stuctue mining is a pope subset of data mining that centes on stuctued datasets. In paticula, gaph mining centes on poviding efficient algoithms to mine stuctues embedded in gaphs [136]. Gaphs, being one of the most geneic types of stuctue, ae natually suited to epesent most types of stuctued datasets. Some examples of popula eseach aeas in gaph mining ae [81]: fequent subgaph mining; gaph classification, clusteing, and seach; wok-flow mining. This section will povide an intoduction to two of the most common subaeas of gaph mining, fequent subgaph mining and gaph clusteing.

18 10 hapte 2. Peliminaies Gaph basics An undiected gaph G = V, E is a two-tuple compising a set V of vetices and a set E of edges. An edge e = {v 1, v 2 } E is an unodeed pai of vetices v 1, v 2 V. We say that any two vetices v 1, v 2 V ae adjacent if thee is an edge e = {v 1, v 2 } E connecting them. In this case, we also say that v 1 and v 2 ae incident to e. The degee of a vetex v V is the numbe of edges incident to v. Note that thee can be only one edge between any pai of vetices v 1, v 2 V. A path in G is a finite sequence of diffeent vetices v 1,..., v n V so that e 1 = {v 1, v 2 },..., e n = {v n 1, v n } E. A path v 1,..., v n V whee v 1 = v n is also called a cycle. A gaph is connected if thee exists a path between each pai of vetices. It is cyclic if contains any cycle. In a diected gaph G = V, E, V is the set of vetices, while E = V 2 is a set of odeed pais of vetices. Evey e = (v 1, v 2 ) E is a diected edge. Unlike undiected gaphs, evey edge e = (v 1, v 2 ) has a diection, with v 1, v 2 being the head and the tail of e espectively. In addition, v 2 is diect successo of v 1, while v 1 is a diect pedecesso of v 2. A path in a diected gaph espects the diection of the edges: v 1,..., v n V is a path only if e 1 = (v 1, v 2 ),..., e n = (v n 1, v n ), E. Given a diected gaph G and two vetices v 1, v 2 V, if thee exists a path between v 1, v 2, we say that v 2 is a successo of v 1, while v 1 is a pedecesso of v 2. Similaly to undiected gaphs, a diected cycle is a path v 1,..., v n V in which v 1 = v n. A diected gaph is stongly connected if thee is a path between evey odeed pai of vetices. Othewise, a gaph is weakly connected if thee is a path between each unodeed pai of vetices. A diected acyclic gaph (DAG) is diected gaph in which thee ae no cycles. A subgaph G of gaph G = V, E is anothe gaph G = V, E in which V V and E E. We say G is an induced subgaph if fo evey v V, all of the incident edges of v in G ae in E. A gaph vetex labeling is a function : V Σ that assigns a label l Σ fo each vetex, whee Σ is the alphabet of labels. onvesely, an edge labeling : E Σ maps a label to evey edge. Often the combination of a gaph and labeling is efeed as labeled gaph. Plana gaphs An embedding o dawing of a gaph G = V, E on a suface S is an assignment of a unique geometic position to each vetex v V and of a cuve to evey edge e = {v 1, v 2 } E so that the stating and ending points of the cuve coespond to the positions of v 1 and v 2. Unless specified, we will always assume S to be the two-dimensional plane, 2. An embedding is plana if no two edges

19 2.1. Gaph mining 11 intesect except possibly at the endpoints. A gaph is plana if it has no plana embedding in 2. The cossing numbe of an embedding is the numbe of all intesections of the cuves epesenting the edges (excluding the common endpoints). Fo a gaph G, its cossing numbe is the minimal cossing numbe fom all of its possible embeddings in 2. Thus, a gaph is plana iff its cossing numbe is 0. The cossing numbe of a gaph has often been used as a moe accuate measue of its complexity than its numbe of vetices o aveage degee. Fo example, the cossing numbe of a netlist has been used to povide bounds on the aea and wie length equied fo outing a design [102]. omputing the cossing numbe of an abitay gaph is a well-known NPcomplete poblem [68]. Despite that, thee ae seveal methods to estimate it. In this thesis we will use a technique deived fom the mincoss pocedue as used in the commonly used gaph dawing pogam dot [66] fom the Gaphviz suite [67]. As it is only an estimation, mincoss may sometimes oveestimate the numbe of cossings in lage, dense gaphs. These pathological cases, most often, ae aleady highly complex even if the cossing numbe is oveestimated. Gaph isomophism Two gaphs G 1 = V 1, E 1, G 2 = V 2, E 2 ae isomophic if thee exists a bijective function f : V 1 V 2 so that if any two vetices v 1, v 2 V 1 ae adjacent in G 1 iif f (v 1 ), f (v 2 ) ae adjacent in G 2. This foms an equivalence elation on gaphs. Two gaphs G 1, G 2 labeled espectively with 1 and 2 ae usually only consideed isomophic if the bijection f peseves the labeling of the vetices. That is, if v V 1, 1 (v) = 2 (f (v)). The complexity of computing whethe two geneal gaphs ae isomophic is cuently an open question. Thee ae polynomial-time algoithms fo specific types of gaphs, such as fo plana gaphs. Howeve, this not the case fo all types of gaphs, even if in pactice it can often be solved efficiently [44]. Given two gaphs G 1, G 2, the subgaph isomophism poblem is defined as finding G 1, a subgaph of G 1 isomophic to G 2. Unlike gaph isomophism, the poblem of subgaph isomophism is well-known to be NP-complete [44] Fequent subgaph mining Fequent subgaph mining (FSM) [81, 90] is one of the most impotant aeas of gaph mining. The objective of FSM is to extact all the subgaphs in a given dataset (a labeled gaph, o a set of labeled gaphs) which satisfy cetain constaints. The most common goal is to find fequently ecuing pattens, i.e. uns of isomophic o almost isomophic subgaphs with high occuence

20 12 hapte 2. Peliminaies S 1 S 1 S 1 S 1 S 1 S 1 S 2 S 2 S 1 S 1 S 2 S 2 (a) Fequent subgaph discovey. S 1 = (b) Iteation of FSM. Full gaph S 2 S 2 S 1 S 2 = S 1 S 1 S 1 S 1 S 1 (c) Discoveed subgaphs. (d) Discoveed hieachal stuctue. Figue 2.1: Fequent subgaph and stuctue discovey. counts. Howeve, additional constaints may be used. Fo example, finding subgaphs that satisfy specific stuctual popeties. Some of the most impotant applications of FSM have been seen in the domains of chemisty, biology, and web mining [81]. An example of the objectives of FSM can be seen in Fig. 2.1a [90]. We will efe to the gaph shown in this figue as G. Diffeent labels fo evey vetex in G ae epesented by diffeent visual shapes in the figue. In G, subgaph S 1 has been identified as the most fequent, since thee ae 4 instances of S 1 in G. Any othe subgaph of G is eithe less fequent than S 1, o has fewe vetices. Note how the south-west instance of S 1 in Fig. 2.1a contains an additional edge that does not exist in othe instances of S 1. Most FSM algoithms allow fo inexact isomophism, in which two subgaphs ae not equied to be entiely isomophic to be counted as two instances of the same patten. Instead, an appoximate measue of similaity is used, depending on the natue of the undelying poblem.

21 2.1. Gaph mining Instances = Instances = 4 Instances = Instances = Figue 2.2: Patial seach tee of fequent subgaph mining. Geneal-pupose FSM is a high-complexity poblem because of its dependence on subgaph isomophism, an NP-complete poblem. The basic idea behind most FSM algoithms is based on two altenating steps: candidate geneation and filteing. Duing candidate geneation, each of the candidate subgaphs fom the pevious iteation is gown by adding new vetices and edges. The new, enlaged subgaphs fom the set of candidates fo the next step. This next step, filteing, counts the numbe of instances of each candidate subgaph: any candidate that is not fequent enough o violates any othe constaint is puged. The pocess iteates until thee ae no futhe candidates. Figue 2.2 illustates this by showing a potential seach tee of FSM in the gaph fom Fig. 2.1a (G). The fist column coesponds to the fist subgaph candidate set, composed of all possible 1-vetex subgaphs of G. The numbe of instances of each subgaph is also shown. On each next iteation (successive columns), evey candidate is extended by adding exactly one vetex. Each candidate has multiple extensions, depending on the numbe of possible vetex labels. Note how, given a candidate, the numbe of instances of each successo is

22 14 hapte 2. Peliminaies Pupose Input gaph Desiable subgaphs Optimizing egulaity in chip Netlist Maximally fequent subgaphs. flooplans (hapte 4) Visually simplify pocess models (hapte 5) Pocess model (e.g. Peti net) Subgaph of a specific size that maximizes an objective function. Mine stuctually-simple pocess models (hapte 6) Mine specifications fom asynchonous cicuits (hapte 8) Labeled tansition system Subgaphs that satisfy stuctual popeties. Table 2.1: Summay of gaph mining vaiations poposed in this thesis. always less o equal that of the candidate. The lagest, most fequent candidate found in the seach tee is indicated by a dashed line. Diffeent FSM algoithms ae distinguished [81] by the candidate geneation method (e.g. whethe to enlage a single vetex at at time o by combining multiple gaphs), seach stategy (e.g. BFS, DFS,... ), and candidate evaluation (metics used fo selecting the best gaph, filteing, etc.). Adapting the algoithm to the natue of the specific application domain may allow significant eductions in the seach space. Subgaph mining is a coe concept behind many of the appoaches poposed in this thesis. In hapte 4, fequent subgaph mining is used to discove epetitive pattens in netlists in ode to incease the egulaity of flooplans. hapte 5 shows how to simplify existing pocess models by extacting a single subgaph that maximizes the quality of the model while keeping the complexity unde check. In both hapte 6 and hapte 8, gaph mining is pefomed on top of a labeled tansition system An oveview of these vaiations is descibed in Table Stuctue discovey One of the pactical applications of FSM is stuctue discovey, sometimes also called hieachical clusteing. The goal of stuctue discovey is to enhance the intepetation of data in gaph fom by poducing a hieachical desciption of the stuctual egulaities in the data [43]. While FSM can be used to discove epeated stuctues in the gaph, stuctue discovey can be used to oganize these epeated stuctues into a hieachical desciption of the data, allowing an highe level of abstaction.

23 2.1. Gaph mining 15 A simple yet common method to pefom stuctue discovey is by pefoming multiple passes of FSM. This appoach is descibed in Fig Once a fequent subgaph S 1 is discoveed in Fig. 2.1a, each of its instances is eplaced by a new vetex that acts as a placeholde to the discoveed subgaph instance, as seen in Fig. 2.1b. This is commonly efeed to as compessing the gaph, since it educes the size of the gaph. The bette a paticula set of fequent subgaphs descibe a gaph, the moe the gaph will be compessed by eplacing the instances of each subgaph with a placeholde vetex. epeated iteations will discove additional subgaphs, including hieachical ones, containing peviously compessed subgaphs. This is exemplified by subgaph S 2 in Fig. 2.1b, which compises two instances of S 1. Wheneve a newly-discoveed subgaph is defined in tems of existing identified subgaphs, these fom a hieachy. Fo example, the hieachy tee fomed by S 1 and S 2 is shown in Fig. 2.1d. This stuctue descibes the gaph in a much moe compact way, and also povides an abstacted view of the egula pattens in the gaph. Stuctue discovey has been applied to aeas such as data compession [117] o knowledge conceptual clusteing [84]. The discoveed hieachies can povide vaying levels of intepetation, with inceased o deceased detail depending on the goals of the data analysis. As in FSM, inexact compession is often used, even allowing fo potentially ovelapping subgaphs. In this thesis, stuctue discovey is the basis of the method descibed in hapte 4 to geneate egula flooplans. Existing stuctue in the input netlist is automatically discoveed and used to enhance the quality of flooplans Gaph clusteing Despite the name, gaph clusteing is the poblem of tying to find sets of elated vetices in a gaph [124]. It should not be confused with the clusteing of gaphs themselves. lusteing in geneal is one of the main aeas of eseach in data mining. Unfotunately, no single definition of a cluste is univesally accepted. Fomally, the vetices assigned to a paticula cluste should be simila and/o connected in some pedefined sense. In some applications, it is desiable fo clustes of vetices to be connected: the numbe of edges that emain within a cluste should be high, while thee should be few edges that coss cluste boundaies. In this scenaio, good clustes usually fom dense subgaphs. Altenatively, it might be desiable fo clustes to be composed of simila vetices. The highe the similaity index, the moe likely two vetices ae clusteed togethe. omputing similaities between vetices may not be necessaily

24 16 hapte 2. Peliminaies simple. The most staightfowad manne to compute a similaity index between two vetices is by using adjacency infomation, i.e., the ovelap of thei neighbohoods [124]. The method descibed in hapte 7 to simplify pocess models using duplicate tasks involves clusteing vetices on a tansition system by thei context. 2.2 Pocess mining The digital data evolution that is taking place woldwide equies new algoithms that enable acquiing value fom the vast amount of data stoed by the cuent technology. Pocess-Awae Infomation Systems (PAIS) ae at the cente of this evolution, since they ae in chage of monitoing pocesses taking place in ou daily life, like banking, municipalities, shopping, health cae, etc. Event data, ecoded in a PAIS in the fom of event logs, denotes the footpints of pocess executions, and is an impotant souce of infomation fo easoning on how the PAIS inteacts with its envionment when unning its pocesses. The aea of Pocess mining uses these logs to discove, analyze and extend pocess models [2]. Pocess models delive valuable insight into the execution of the undelying pocesses. Models can be used to find eos in eal-life systems, such as deadlocks. Bottlenecks and othe factos influencing the esponse time of a system can be identified by using simulation techniques. A pocess model may also be used as desciption o specification of a PAIS. Discovey, one of the majo aeas of pocess mining, fostes this goal by constucting abstact pocess models that descibe the high level stuctue of the pocess. These models ae automatically leaned fom the execution taces of the poces, without using any othe a-pioi infomation. An additional impotant aea in pocess mining is confomance, in which existing pocess models ae compaed with event logs geneated by the same pocess. onfomance checking veifies if the behavio descibed by the model coesponds to the behavio obseved in the event log. Deviations may be detected in eithe the model o the event log, indicating potential hazads in the execution of the PAIS. The aea of enhancement, on the othe hand, aims at changing o extending an existing pocess model to bette eflect the behavio obseved in the event log. Taces and event logs Event logs ae the stating point to apply pocess mining techniques, guided towads the discovey, analysis o extension of pocess models. Infomally, an event log is a set of taces, each of them being the footpint of a single execution

25 2.2. Pocess mining 17 Tace Paikh vecto a b c d abcd ac ba aaaa Table 2.2: Event log and Paikh vectos fo each tace. b a a c a d c b c b d (a) Maked gaph. (b) Fee choice. (c) Non-fee choice. Figue 2.3: Peti net types. of a pocess. Taces, thus, ae a chonological sequence of events, such as equest ejected. Usually we will efe to the diffeent types of events with a single lette, e.g. A, B,.... Let Σ be an alphabet of events. A tace is a wod σ Σ that epesents a finite sequence of events. An event log L (Σ ) is a multiset of taces 1. Given a tace σ Σ, the Paikh vecto of σ, ψ(σ) : Σ maps evey event e Σ to the numbe of times it appeas in σ. Table 2.2 illustates this with an example. Paikh vectos ae an impotant concept in tansition systems, as will be seen in the est of this thesis. Events usually contain additional attibutes, such as the timestamp o the acto that initiated the event. Howeve, this wok centes on the contol flow itself, and thus event attibutes ae not used. In many scenaios we popose how the wok could be impoved by the use of such additional infomation Pocess models Pocess models ae fomalisms to epesent the behavio of a pocess. Among the diffeent fomalisms, Peti nets ae pehaps the most popula, due to its well-defined semantics. In this thesis, we will pimay focus on Peti nets as a pocess model, although some of the wok may be adapted to othe fomalisms like BPMN [140], EP [88] o simila.

26 18 hapte 2. Peliminaies Peti Nets A Labeled Peti Net [112] is a tuple N = P, Σ, T,,, m 0, whee P is the set of places, Σ is the alphabet of labels (coesponding to events), T is the set of tansitions, : T Σ {τ} assigns a label (o the empty label τ) to evey tansition, : (P T) (T P) is the flow elation, and m 0 is the initial making. A making is an assignment of a non-negative intege to each place. If k is assigned to place p by making m, denoted m(p) = k, we say that p is maked with k tokens. Given a node x P T, the set x = {y (y, x) 1} is the pe-set of x, while x = { y (x, y) 1} is the post-set of x. A tansition t is enabled in a making m when p t, m(p) (p, t). When t is enabled, it can fie by emoving (p, t) tokens fom each place p t and putting (t, p) tokens to each place p t. A making m is eachable fom m if thee is a sequence of fiings t 1 t 2... t n that tansfoms m into m, denoted by m[t 1 t 2... t n m. A sequence t 1 t 2... t n is feasible if it is fiable fom m 0. A tace σ fits N if thee exists a feasible sequence in N with the same labels. A Peti net is live if fo evey making m eachable fom m 0, and t T, thee is a making m eachable fom m which enables t. A Peti net is k-bounded if fo each p P and fo evey eachable making m, m(p) k. A 1-bounded Peti net may also be efeed to as safe. In a Peti net, a choice is a place with moe than one output tansition. Two tansitions ae said to be concuent if they do not have dependencies between them, i.e. they can fie in any ode. A tansition labeled with the empty label τ is called a silent tansition. A duplicate task is a tansition with the same label as some othe tansition in N. A set of estictions on the stuctue of Peti nets define seveal classes of Peti nets. A Peti net N is a Maked Gaph if p P : p 1 p 1. It is a Fee-hoice net if p 1, p 2 P : p 1 p 2 p = 1 p = 1. Note that evey 2 maked gaph is a fee-choice net. Figue 2.3 illustates these concepts. In Fig. 2.3c, the choice between a, b is fee, but the choice between c, d is not. Wokflow nets We also intoduce two additional two classes of Peti nets in which the stating and ending makings ae clealy delimited by special souce and sink places. A wokflow net [1] is a Peti net N = P, Σ, T,,, m 0 with exactly one souce place i P, with i =, and exactly one sink place o P with o =. In addition, in a wokflow net thee is a path fom i to evey othe node n P T, and a path fom each node n P T to o. 1 (A) denotes the set of all multisets ove A.

27 2.2. Pocess mining 19 a b c c b e a b c c a d b d (a) (b) (c) Figue 2.4: LTS coesponding to the Peti nets in Fig. 2.3 The Peti nets in Fig. 2.3b and 2.3c ae wokflow nets. Howeve, the Peti net in Fig. 2.3a is not, since thee is no sink place. A wokflow net is sound [6] if and only if it satisfies the following popeties: Option to completion: fom evey making m eachable fom m 0, a making m with m (o) > 0 can always be eached. Thus, the ending making is always eachable. Pope completion: fo any eachable making m whee m(o) > 0, and p P with p o, m(p) = 0. That is, once the ending making has been eached, no othe tansition can fie. No dead tansitions: evey tansition t T is enabled in at least one eachable making. These popeties ensue that a sound wokflow net is both bounded and live fo all makings except the ending makings, whee m(o) > 0. Sound wokflow nets ae heavily used in pocess mining because thei semantics ae simila to eal-life pocesses Labeled Tansition Systems A finite labeled tansition system is a tuple A = S, Σ, T, s 0 whee S is a finite set of states, Σ is the alphabet of labels, T S Σ S ae the tansition elations between states, labeled with Σ, and s 0 is the initial state. A tansition system may also be intepeted as a diected gaph whee S is the set of vetices and T is the set of edges. e We use s s as a shothand fo the ac (s, e, s ) T. Simila to Peti nets, a tace σ = e 1 e 2... e n fits LTS A if thee exists a sequence s 1, s 2,..., s n S with e 1... e n s 0 s 2 s n 1 s n.

28 20 hapte 2. Peliminaies Definition 2.1 (Excitation Set). Fo a given LTS A and event e Σ, we define the Excitation Set of e as the set of states in which e is enabled, i.e., ES(e) = {s S s S : s e s }. The following definitions fomalize causality elations between two events: Definition 2.2 (oncuency and conflict). Two events a, b ae concuent if a b b a thee ae fou states, s 1... s 4 in S such that s 1 s 2 s 4 and s 1 s 3 s 4. In this case we will also say that a, b ae concuent in s 1. Two events a, b ae in conflict if thee is a state s ES(a) ES(b) and a, b ae not concuent in s. Definition 2.3 (Fee-choice conflict). Two events a and b ae in fee-choice conflict if they ae in conflict and ES(a) = ES(b). In this situation the two events ae always enabled o disabled simultaneously, which coesponds to a simila situation in Fee-hoice nets. a Definition 2.4 (Tigge events). Given two states s 1, s 2 with s 1 s 2 T, we say a tigges anothe event b iff b is enabled in s 2, but not in s 1. In a sense, a tiggeing b implies a causality elation between the two events. Analogously, we say a disables b iff b is enabled in s 1, but not in s 2. Definition 2.5 (Pesistence). An event e Σ is pesistent if no event f e disables it. A bounded Peti net can be tansfomed into an LTS by ceating a state fo evey eachable making in the net, and acs accoding to the enabled Peti net tansitions in each making. Figue 2.4 shows the LTS associated to the Peti nets fom Fig The opposite poblem, howeve, is known as the synthesis poblem [45, 60], and is not as staightfowad fo most Peti nets types onfomance checking An impotant set of techniques in pocess mining is confomance checking, which compae the obseved (log) and modeled behavio in ode to evaluate the model. Thee ae fou quality dimensions fo compaing model and log: eplay fitness, simplicity, pecision, and genealization [2]. While the fou dimensions ae not entiely othogonal, balancing them is an impotant aspect to poduce high-quality models fo eal-life pocesses [30].

29 2.2. Pocess mining 21 eplay fitness The eplay fitness of a model indicates how good the model can epoduce the behavio of the pocess as obseved in the event log. A model has pefect eplay fitness if all taces in the log can be eplayed by the model fom beginning to end. This may not be necessay in all scenaios, e.g., if the event log contain noise [7]. Still, fitness is consideed the most impotant metic. Seveal metics fo pecision exist in the liteatue [31]. In this thesis we will use the definition povided by [10], which computes an optimal alignment between the log and tace befoe calculating the fitness scoe, poviding a moe fine-gained evaluation in the pesence of small deviations. Simplicity The simplicity of a model evaluates how easy it is to analyze and undestand it. The simplest model that can explain the behavio seen in the log is the best model, a pinciple known as Occam s azo. On the othe hand, complicated models, with a high numbe of elements and dense contol flow stuctue, pevent the extaction of useful insight fom pocess mining. These complicated models ae often called spaghetti models, such as the one shown in Fig Pocess discovey algoithms may deive spaghetti pocess models in vaious situations, e.g., when the log epesents a complex pocess with hundeds of diffeent event classes o when the log contains noise. Othe poblems like concept dift (the log contains the executions of diffeent vesions of the pocess model) o vetical event ganulaity (event classes fom diffeent hieachies coexist in the log) may also cause the deivation of a dense pocess model. Unfotunately, the afoementioned situations happen often in eal life [79]. Thus, the discovey of simple pocess models and the simplification of lage models is a significant challenge when pesenting data obtained by pocess mining. The most common methods to estimate the complexity of a pocess model involve the size of the model o the aveage degee of its vetices [108]. In this wok, howeve, we popose the use of the cossing numbe of a gaph as measue of complexity, a concept closely elated to the planaity of a gaph. Fo a fomal definition, we efe to Section Pecision Fitness and simplicity alone ae not sufficient to judge the quality of a discoveed pocess model. Fo example, it is vey easy to constuct an extemely simple Peti net (flowe model, as in Fig. 2.5a) that is able to eplay all taces in an event log. Howeve, this model also eplays any othe event log efeing to the

30 22 hapte 2. Peliminaies a b a b c d d c (a) Flowe model (undefitting). a c b d a d b c (b) Tace model (ovefitting). Figue 2.5: Undefitting and ovefitting Peti net examples. same set of activities. Thus, the model is not useful fo descibing the behavio of an specific pocess. Pecision compaes how much behavio is epoduced by the model that is not pesent in the log. A model is pecise if it does not contain behavio that has not been obseved in the log. A model that is not pecise is undefitting, such as the flowe model. In this wok, we use the metic poposed in [111], based on the concept of escaping acs. A escaping ac epesents a choice that is available in the model, but neve taken while eplaying the event log. Genealization On the othe hand, event logs contain only obseved behavio and many taces that ae possible may not have been captued in the logs. Thus, it may be undesiable to have a model that only allows fo the exact behavio seen in the event log. In contast to pecision, a model should genealize and not estict behavio to just the examples seen in the log. A model that does not genealize is ovefitting. Ovefitting is the poblem that a vey specific model is geneated wheeas it is obvious that the log only holds patial behavio. A good example is a tace model (as in Fig. 2.5b). The model may explain a paticula sample log with vey high pecision, but thee is a high pobability that the model will be unable to explain the next batch of cases. In this wok we will geneally use the metic povided in [30], which penalizes models whee most pats ae visited vey infequently when eplaying the logs. If infequently-visited pats ae pevalent in the model, it is unlikely that new, slightly diffeent taces will fit. Finding a good balance between ovefitting and undefitting models is one of the challenges in pocess mining [5].

31 2.3. VLSI design flow Vey-Lage-Scale Integation design flow In this thesis, haptes 3 and 4 focus on using gaph mining duing physical design of lage-scale chip designs. This section povides a tou of the basic design flow of Vey-Lage-Scale Integation (VLSI) cicuits to help undestand the context of these chaptes. Moden chips ae extemely huge and complex. In the pesent day, chip designs with billions of tansistos ae not uncommon (cuent commecial designs have well ove 7 billion tansistos [48]). These designs combine hundeds of coes, on-chip memoies, outes, inteconnects and many othe components in a single chip, possibly pe-designed by thid paties. Thus, many of the challenges found duing the design of moden chips will evolve aound managing this complexity. Because of such complexity, the design pocess fo any chip is patitioned in inteelated tasks, both in space (e.g. the diffeent modules of the chip) and time (e.g. ealy coe achitectue design vesus gate-level). Hieachy and abstaction ae two concepts often used duing VLSI design. Lage systems ae often patitioned into many stuctues that can be ecusively patitioned into smalle, independent units. To save time, it is pefeable to have many instances of the same module vesus many diffeent modules. Thus, the ise of the hip multipocesso (MP), which allows the constuction of highly pefomant machines at a faction of the cost by eplicating hundeds of pe-designed pocessing tiles. Futhemoe, diffeent teams usually wok concuently on tasks that nomally would be done sequentially. Dependencies between stages ae lessened by the use of abstaction mechanisms (e.g., so that the designe of the coe does not need to be involved into logic gate intenals). Even with these abstaction mechanisms, howeve, thee ae still times when the output of one stage is equied as input of anothe stage. Fo example, the designe of the coe micoachitectue needs to know the chaacteistics of the physical design, including clock speed, in ode to popely ceate a pipelined achitectue. Expeienced enginees ae equied in this case in ode to ceate estimations so that all the teams have something to stat with. These estimations ae efined continuously as the design pocess advances. uent VLSI design flows usually split tasks into 4 big stages, each composed of many smalle stages. These ae not fixed and it is common fo the stages to be eaanged depending on the implementation of the design flow. An oveview of the 4 stages [139] can be seen in Fig. 2.6, and in the following list: 1. Achitectual design (also functional o black-box design), which is the fist stage, involves deciding on the geneal (o system-level stuctue) of the

32 24 hapte 2. Peliminaies Achitectual ache A Logical B Q icuit Vdd Physical L1 oe module M assign Q = A o not B A Vss Q oe oute IO Figue 2.6: Summay of the main VLSI design flow stages. chip. In the case of MPs, this stage involves finding the pomising configuations (combinations of diffeent coe achitectues, on-chip memoies, etc.) that satisfy the pefomance, cost and othe equiements. 2. Logic design descibes how the divese components of a chip wok. Fo a coe, the behavio of the ALU, floating point, etc. is descibed. 3. icuit design consides how to implement the logic design into an electonic cicuit (selection of tansistos, etc.) 4. Physical design layouts all equied tansistos and wies and thus detemines the final geomety of the chip. Each of these stages will be descibed in moe detail in the following sections Achitectual design Duing achitectual design the system-level stuctue of the chip is defined. A designe usually has a set of equied metics (desied pefomance, etc.) and a limited budget (in cost, chip aea, powe usage, etc.). In taditional PU design, this involved deciding on paametes such as the length of the pipeline, the memoy hieachy, whethe to use out-of-ode execution, etc. Fo example, in the case of MP design, it is geneally assumed that the designe has a libay of components (coes, caches, etc.) at its disposal, and that he/she needs to decide which (and how many) of those components ae to be placed on the MP, and how to inteconnect each of those components. Because these components may have neve been built yet, o have been manufactued befoe but using diffeent physical paametes, thee might be no eal-wold estimations of thei paametes. Additionally, the high numbe of possible combinations and pemutations of the components in the libay often geneates a design space with billions of possible configuations. Fo this eason, duing achitectual design fast estimatos fo each of the divese component

33 2.3. VLSI design flow 25 metics ae necessay. Intepolation, analytical modeling o simulations ae used, depending on the accuacy equied. These methods will be discussed in Section Logic design At this stage, the equied functionality is implemented in tems of boolean functions. The intefaces (inputs and outputs) fo the diffeent modules ae also usually defined duing this stage. Howeve, logic is not necessaily specified in tems of the taditionally used logic gates. This is because duing physical design thee is a step, technology mapping, in which, depending on the manufactuing technology used, thee might be be moe efficient ways to convey boolean functions that by using the standad logic gates icuit design Duing cicuit design, a electonic cicuit is ceated fom the peviously ceated logic. Exactly at which level the cicuit is designed depends on the manufactuing technology. When using HDLs, this pocess is usually fully automated using a tool called synthesize, which handles an HDL input and convets it to the equied epesentation. The output is usually a list of logic gates and the connections o nets between them (a netlist). Technology mapping is the pocess by which the high-level logic gets mapped into lowe-level elements such as cells Physical design At the stat of physical design, the design is epesented in tems of a netlist including all low-level components (cells, gates, tansistos,... ) with thei shapes and the appopiate inteconnections between those components. The esult of physical design is a physical layout, including both the positions of each of the design components (the placement) as well as the paths fo each of the inteconnections (the outing). Fom this stage and until the final cicuit, limitations of the physical wold ae now taken into account. Depending on the implementation technology, design ules pohibit cetain layouts. Fo example, by setting minimum distance between wies o focing insetion of epeates wheneve wies exceed a cetain maximum length [75]. Physical design is often split in the following sub-stages [86]:

34 26 hapte 2. Peliminaies 1. Flooplanning is often the fist step duing physical design. A flooplan povides an estimation of the shapes and locations of the majo units in the cicuit. Fo the case of MPs, this usually confes the locations of the individual coes, memoies, etc. In a hieachic fashion, a single coe might also have a flooplan containing the locations fo the aithmetic and logic units, etc. Because no physical infomation about the innads of the components is available at this ealy stage of physical design, flooplanning often uses aea estimations as inputs. As moe accuate estimations become available duing late physical design stages, the flooplan is updated. Additionally, ceating a flooplan allows ealy estimations of the length of the lagest nets. This is often called wie planning. 2. Placement defines the final locations fo the individual low-level units (e.g. cells) inside each high-level block. Unlike flooplanning, egula stuctues such as gids ae often consideed because most of the cells have simila sizes o sizes in multiples of a common base unit. 3. lock tee synthesis. In sequential cicuits, keeping the popagation delay of the clock signal to a minimum is usually mandatoy to avoid clock skew. Because of this impotance and singulaity, the clock signal is usually the fist net to be outed. 4. Global outing estimates and eseves the equied outing esouces that ae equied by the inteconnections. Usually, global outing consides only those nets that use the most esouces. Each net is given a estimated width depending on the design ules and the numbe of wies composing it, but individual wies ae not consideed. These estimated paths educe the seach space fo the detailed outing stage that will come next. 5. Detailed outing, on the othe hand, calculates the final outes fo each individual wie of the chip, at the same time veifying that the peviously geneated global outing is feasible. 6. Timing analysis checks that all of the design s timing constaints ae satisfied. Diffeent logic paths have diffeent timing equiements, and duing this pocess evey facto that altes net delay is taken into account: length of wies, epeates, tansisto sizing, etc. If any of the constaints is not met, the design is eiteated, estuctuing the layout of the citical paths. Thus, even the flooplan might need to be changed and the physical design pocess estated.

35 2.4. Asynchonous cicuits 27 In this thesis, hapte 3 will cente on the topic of physical design oiented towads MPs, while hapte 4 will expand the poposed methods towads geneal VLSI design. 2.4 Asynchonous cicuits One the focus of this thesis will be to popose the concept of specification mining fo asynchonous contolles (hapte 8). This chapte intoduces the context of this contibution by poviding an oveview of asynchonous cicuit design. Asynchonous cicuits ae logic cicuits that do not ely on a global synchonization signal, the clock, to dictate when signals ae sampled. Instead of clocks, asynchonous cicuits favo handshaking to synchonize the diffeent components in a cicuit. Asynchonous logic offes many advantages [46], among them: inceased pefomance, educed powe consumption, and bette composability and modulaity. While ealy computes wee mostly asynchonous, the pevalence of asynchonous design today has significantly deceased in favo of synchonous design and global clocks. In pat, this is because of the complexity usually associated with the design of asynchonous cicuits. Thee ae few asynchonous design tools in compaison with the huge amount of well-entenched synchonous tools. In asynchonous cicuits two main pats ae often distinguished: the data path and the contol. While the fome compises wide units pefoming aithmetic o othe types of tansfomations on the data pocessed by the cicuit, the contol unit detemines and geneates the divese contol signals that will configue the data path units. As contol manages the synchonization equiements, this thesis will exclusively focus on the analysis of contol cicuits. We define a cicuit as a tuple = X, G, s 0 whee: X = I O Z is the set of signals, with I, O and Z being paiwise disjoint sets that epesent the input, output and intenal signals of the cicuit, espectively. G : (O Z) f (X ) is a set of gates that assigns a Boolean function to each non-input signal of the cicuit. We denote by f xi (X ) the Boolean function assigned to signal x i. s 0 is a binay vecto epesenting the value of the signals at the initial state. The envionment of a cicuit eacts to the outputs fom the cicuit and sends new inputs to it.

36 28 hapte 2. Peliminaies Opeating modes and delay models A hazad is a deviation fom the expected behavio of a cicuit caused by the delay pesent in eal-wod gates and wies. Hazads ae one of the main challenges duing asynchonous cicuit design. To help manage the complexity associated with asynchonous cicuits, seveal design styles have been developed. Within each style, diffeent models and assumptions ae made about the timing and behavio of physical elements of the cicuit: gates, wies, and the envionment. This section oveviews some of the common styles. Boadly, we distinguish between delay models, i.e. assumptions about the opeating delays of gates and wies, and opeating modes, models about the inteaction between a cicuit and its envionment. Fo an exhaustive analysis into the design of asynchonous cicuits, we efe the eade to existing documentation [17, 46]. Opeating modes Opeating modes model the inteaction between the cicuit and its envionment. In fundamental mode [77], inputs fom the envionment ae constained to change only when all the outputs ae stable, i.e., the envionment allows the cicuit to stabilize befoe geneating new inputs. Bust mode [53] is a elated opeating mode in which a bust of multiple sequential input changes ae allowed. Howeve, when all inputs in the bust have changed, the envionment must wait fo the cicuit to stabilize befoe stating a new bust. ompae with input-output mode [110], in which new inputs may occu at any time, as pat of specified esponses to changes in the outputs. Thus, input and output changes may be concuent. Delay models The delay model defines the assumptions made, duing the design, about delays in gates and wies. Stong assumptions may simplify the design flow, while less stict ones geneally lead to designs that ae moe obust to manufactuing pocess vaiations. In this section we descibe some of the common models. In the bounded delay model, delays of both gates and wies ae assumed to be lying within given minimum and maximum bounds. The cicuit is guaanteed to wok coectly if these bounds ae satisfied. While this may lead to smalle cicuits, extensive analysis is needed to ensue all bounds ae met in all conditions.

37 2.4. Asynchonous cicuits 29 a c b (a) -element. a+ b+ c c+ b a (c) Signal tansition gaph. a b c n c n c n (b) Tuth table. a b+ 010 a+ c c+ b 011 a 001 a 101 b+ 111 b (d) State gaph. Figue 2.7: -element and specifications. A speed-independent (SI) cicuit woks coectly even in the pesence of abitay but finite delay on its gates. Howeve, it woks with the assumption that wies, on the othe hand, ae ideal, i.e., with zeo delay [110]. A cicuit is delay-insensitive (DI) when its coect opeation does not depend on neithe the delays in the gates no the wies. While this is an extemely inteesting class of cicuits because of its obustness, unfotunately it has been poven that vey few cicuits can tuly be made delay insensitive [106, 121]. Delay-insensitive intefacing poposes that only inputs to the cicuit ae equied to be handled in a delay-insensitive fashion. This is a compomise in ode to alleviate the impacticality of DI cicuits while still allowing long inteconnects. The assumption is that even if it is not pactical to assume that long wies have zeo delay, a designe may still keep contol on the delays of shot intenal wies [121] Signal tansition gaphs The most common design methodologies fo asynchonous contolles involve fist specifying the behavio of the cicuit and the necessay equiements fom the envionment. Fom these specifications, automated tools geneate hazadfee implementations of the cicuits [46].

38 30 hapte 2. Peliminaies Asynchonous cicuits ae intinsically concuent, and thus natually suited to Peti nets (Section 2.2.1), one of the most poweful fomalisms fo easoning about concuent systems. A Signal Tansition Gaph (STG) is fomal model based on Peti nets fo conveying the behavio of an asynchonous cicuit. Given a cicuit = X, G, s 0, a Signal Tansition Gaph is a labeled Peti net N = P, Σ, T,,, m 0 in which Σ = X {+, } τ. Thus, each tansition label coesponds to eithe the tansition (ising o falling) fom a signal of, o a silent event τ that does not change the state of the cicuit. By x + and x, we will distinguish the ising and falling tansitions of signal x X espectively. Figue 2.7c shows an example STG of a -element, one of the fundamental components in asynchonous cicuits, whose tuth table is depicted in Fig. 2.7b. The -element is a stateful element that sets its output to 0 when both inputs ae 0, and to 1 when both inputs ae set to 1. In any othe input combination, the output does not change State gaphs An STG is just a succinct epesentation of (a pat of) the behavio of the cicuit, which focuses on the causality elations amongst events. An state gaph also epesents the behavio of the cicuit by enumeating all of its possible states and tansitions between states as a Labeled Tansition System (LTS). While this may esult in a much lage epesentation than an STG, many algoithms equie exhaustive exploations of the state space. Fo the full definition of an LTS, we efe to Section Given a cicuit = X, G, s 0, an state gaph of is an LTS A = S, Σ, T, s 0 in which: S = {0, 1} n, with n = X, the set of binay vectos epesenting all possible states of the signals. Σ = X {+, }, i.e. the set of signals of the cicuit plus the diection of the tansition. T = {s 1 x s 2 } with s 1, s 2 S and x Σ, is the set of tansitions. The initial state s 0 coincides with the initial state of the cicuit. Given a state s = (x 1,..., x n ), we denote by s(x i ) the value of signal x i in s. Given a state s = (x 1,..., x i,..., x n ), we denote by s x i = (x 1,..., x i,..., x n ) the state in which the values of the signals ae identical to the ones of s except fo x i, that has the complementay value. Notice thus that fo each s 1 s 2 = s x and s 1 1 = s x. 2 x s 2 T,

39 2.5. Mathematical optimization 31 Figue 2.7d shows an example of the state gaph associated to the behavio of the -element descibed by the STG in Fig. 2.7c. 2.5 Mathematical optimization Many of the methods descibed in this thesis involve optimization poblems. This section povides a bief intoduction to two commonly used subfields of mathematical optimization, albeit it is not indented to cove the fine details. The appoaches descibed in this section will be fequently used in this thesis when encoding, e.g., desied stuctual constaints in gaphs. An optimization poblem involves finding the best solution out of a set of feasible solutions. The seach space of feasible solutions is delimited by a set of constaints, such as fomulas, inequalities, etc. Geneally, the best solution is that which maximizes (o minimizes) the value, usually defined as a eal function Boolean satisfiability A fomula P is a combination of boolean vaiables (denoted by p, q,...) built using the 3 logical opeatos and, o, not (epesented espectively by, and ). An intepetation I of P is an assignment of {0, 1} to each vaiable in P. I satisfies P iff the evaluation of P unde I is 1. P is satisfiable if it is satisfied by at least one intepetation. Geneally, fomulas ae witten as conjunctions of clauses, which ae disjunctions of (possibly negated) vaiables. Satisfiability (SAT) is the poblem of detemining, given a fomula P, whethe thee is an intepetation I that satisfies it. SAT is a well-known NP-complete poblem, with all known algoithms having wost-case exponential cost on the size of P [24]. The maximum satisfiability poblem (MaxSAT) is the optimization vesion of SAT. Given an fomula P, MaxSAT involves finding an intepetation that maximizes the numbe of clauses that evaluate to 1. Invesely, the minimum satisfiability poblem (MinSAT) finds satisfying intepetations that minimize the numbe of clauses that evaluate to 1. As typical extensions, weights can be added to individual clauses, allowing fo abitaily complicated optimization goals [24]. MaxSAT and MinSAT ae heavily used as a natual way to model many optimization poblems.

40 32 hapte 2. Peliminaies Linea pogamming A linea inequality a x b is defined by a vecto a n and a constant b. A linea pogamming poblem (LP) is a set of linea inequalities plus a linea function that needs to be maximized, called the objective function. It is usually epesented as: maximize c T x subject to A x b whee A is a matix with a ow fo evey linea inequality, b contains the constant tems of the inequalities, and c is a vecto with the coefficients of the objective function. A solution of the LP, thus, is a vecto x n that satisfies all linea inequalities. Fom the potentially infinite set of solutions, a solution x is optimal if it also maximizes the objective function c T x ove the set of all solutions. An LP is feasible if it has at least one solution. Thee ae algoithms to solve LPs in polynomial time [91]. Howeve, the most common algoithm used is simplex, which is exponential in the wost case, although pefoms efficiently in pactice [93]. An intege linea pogam (ILP) is an LP in which some of the vaiables ae constained to have only intege values. Unlike LP, ILP is NP-complete. Thee ae many available methods to solve ILP. ILP solves may also be used to solve MaxSAT poblems [52].

41 hapte 3 Physical planning fo the achitectual exploation of hip multipocessos At the ealy stages of the design of a MPs, physical paametes ae often ignoed and postponed fo late design stages. In this chapte, the impotance of physical-awae system-level exploation is investigated. Additionally, this chapte pesents an stategy fo deiving chip flooplans that include physical constaints specific fo tiled hieachical MPs. Ove-thecell outing is also used as a majo aea savings stategy. Wie planning of the on-chip inteconnect is also studied, as its topology and oganization affect the physical layout of the system. This chapte will be stuctued as follows. Section 3.1 intoduces and motivates the topic, evaluating the impact of the physical aspects on the selection of achitectual paametes. Section 3.2 eviews the existing liteatue. Section 3.3 descibes the poposed combination of achitectual exploation and physical planning. Section 3.4 and Section 3.5 give details on how to pefom efficient physical planning, centeing on flooplanning and wie planning espectively. Finally, in Section 3.6 the poposed flow is evaluated, with futue wok and conclusions discussed in Section Motivation This section will justify the need to adapt the taditional physical design flow with physical planning constaints duing MP design. In addition, it will show how using this new physical design flow duing ealy design stages impoves the quality of designs and minimizes the numbe of design eiteations.

42 34 hapte 3. Physical planning fo MPs M M M M L3 NI M M M (a) I M (b) Figue 3.1: Two diffeent configuations fo a MP: (a) tiled flat, (b) tiled hieachical hip multipocessos Duing the past decades, many-coe chip multipocessos [16] have become the majo tend in designing scalable computing achitectues. Multiple pocessing units with distibuted memoy combined with powe saving schemes ae the platfoms used today fo exploiting application paallelism while keeping powe consumption unde contol. A MP integates moe than one computing coe in a chip. Each of the coes is individually simila to the one inside a single-coe pocesso, containing and aithmetic and logic unit, egistes, pivate cache, a datapath, and a contol unit. In addition to pivate caches, howeve, a MP also contains caches shaed by two o moe coes, and possibly moe than one input/output pots to extenal memoies. The inteconnect povides communication between coes, shaed caches and I/O. The latency and thoughput of the on-chip inteconnect is cucial to the oveall pefomance of a MP. Thus, it is a vey impotant facto to conside duing design. Netwoks-on-hip (Nos) [50] have been fimly established as a the paadigm of choice fo scalable inteconnects. Tiled MP achitectues facilitate the design pocess of MPs by offeing a apid way to assemble platfoms with tens o hundeds of coes. A tiled MP is constucted by eplicating pe-designed tiles [16, 18, 76]. An example is shown in Fig. 3.1a, whee each tile contains a single coe (), cache () and a oute () that connects it with neighboing tiles. Nevetheless, challenges appea when constucting many-coe MPs using tile eplication, as the two-dimensional mesh stuctue means inceased distance and powe consumption fo evey new coe. To ovecome this poblem, hieachical MP oganizations have been poposed to bette exploit spatial locality [16, 51]. Figue 3.1b depicts the block diagam of a tiled hieachical MP with 8 coes and distibuted L3 cache. The chip is oganized as a 2 2 egula gid of

43 3.1. Motivation 35 tiles (clustes), each one including two computing coes () with pivate cache (), a distibuted shaed cache (L3), a oute of the global mesh () and a local inteconnect (I). The two-level hieachical inteconnect constitutes the backbone of this achitectue. The pupose of the global mesh is to povide inte-cluste communication, as well as access to the memoy contolles (M). Inta-cluste communication is suppoted by low-latency ings that significantly impove the bandwidth of the system given the locality of memoy efeences inheent to the applications. The poblem of system-level design fo a many-coe MP consists of selecting high-level achitectual paametes (e.g., numbe of coes, size of cache, topology of the inteconnect, etc.) so as to maximize system pefomance fo the selected wokload and satisfy the design constaints (e.g., aea and powe). System-level design is pefomed ealy in the design cycle. The main complexity of this task is detemined by the vast space of potential achitectual configuations and the inaccuacy of the models to epesent the components of the system and the wokload. To alleviate the poblem complexity, most stategies fo achitectual exploation disegad physical paametes and postpone them to late design stages. Howeve, in this chapte we show that physical planning has a non-negligible impact on pefomance and aea of a MP. In the est of this chapte we will popose methods fo flooplanning and wie planning of tiled hieachical MPs and show the impact of physical paametes in the configuation of the achitectue Physical design flow fo MPs The poblems of physical planning fo MPs ae elated to taditional poblems in VLSI physical design [125]. MP flooplanning is simila to classical VLSI flooplanning, while wie planning is moe common with global outing. Howeve, thee ae seveal aspects inheent to tiled hieachical MPs which motivate us to extend existing appoaches. As shown in Fig. 3.1a, the tiled oganization of MPs educes the flooplanning poblem fom chip to cluste level. Howeve, the cluste flooplan has to satisfy the popety of symmety in the location of the Noth/South and East/West pots at the boundaies of the tile. This enables the constuction of a full chip by eplicating and abutting of tiles. Flooplanning of the local inteconnect intoduces anothe complexity into the design. Fo example, when consideing ings, it is equied that the links between the ing outes () have balanced lengths to guaantee simila hop delays. If the link delays ae imbalanced the communication though the ing may have a negative impact on pefomance.

44 36 hapte 3. Physical planning fo MPs A special type of constaints, such as adjacency o maximum net delay constaints ae equied to pevent cetain components be placed fa fom each othe. A typical example may be a coe and its cache. Placing a cache fa fom the coe may incease its access delay and esult into a significant pefomance penalty. While adjacency of the two components may appea as a too stict constaint, a weake equiement of the inte-component distance to be less than one hop will be enough to assue no loss of pefomance. An impotant obsevation is the ecent tendency to design MPs with wide links. ommunication links of the on-chip inteconnect may incopoate thousands of wies, aiming at tansfeing a complete cache line in one cycle. Given the ITS pediction fo minimal wie spacing [80], links of a global mesh can have a width of about 10 2 µm, occupying a significant amount of chip aea. One of the possible ways to alleviate the aea ovehead is to benefit fom ove-the-component outing. Some of the MP components, such as memoies, do not use all the metal layes available in the technology and, theefoe, these available esouces can be used to implement global nets acoss the chip. In this scenaio, the most complex components using all metals layes may act as blockages fo ove-the-component outing. Hence, one of the puposes of wie planning is to veify chip outability. Anothe pupose is the estimation of wie length, which is one of the main paametes when evaluating design quality [131] Impact of physical planning in exploation Duing achitectual exploation, paametes fo the system-level design of a MP, such as the numbe of coes, size of cache, etc. ae selected so as to maximize the pefomance of the chip and satisfy the aea and powe constaints. Physical planning usually comes afte achitectual exploation in the design flow, and thus, physical paametes ae often disegaded duing achitectual exploation. Howeve, we will show that disegading this infomation at this level has a non-negligible impact in the pefomance and aea of the chip. Sample configuation Let us assume that an achitectual exploation tool has geneated a configuation such as the one shown in Fig This configuation has a total of 224 identical coes, split in tiles of 4 coes each. In total, thee ae 56 tiles aanged in a 7 8 mesh. Figue 3.2 does not include any physical infomation. We assume that these coes ae pedesigned and have an estimated aea of 1.2 mm 2, including pivate

45 3.1. Motivation 37 M M M W L3 N S E M Figue 3.2: Stuctual epesentation of a hieachical tiled MP with twolevel inteconnect: a global mesh and a ing inside each tile. L1 cache, with an aspect atio of 0.8, whee the aspect atio is defined as: = h w The layout of the coes can be flipped and otated, but not esized. At the same time, each coe has an associated pivate cache of 1 mm 2, and each tile has 1 mm 2 of shaed L3 cache (thus having a total of 56 mm 2 of shaed cache in the entie chip). Unlike the coes, we assume that the caches ae soft blocks, whee the width and height ae not fixed as long as the aea is kept constant. It is geneally expected to at least have minimal and maximum aspect atios that limit the available shapes. Apat fom coes and caches, each tile contains a oute fo the global mesh inteconnect (, 0.99 mm 2 ), and evey component paticipating in a ing contains a coesponding ing oute (, 0.17 mm 2 ). Similaly to coes, we assume that evey oute is a hadblock with an aspect atio of 1. Fom the point of view of ove-the-component outing, we also assume that coes and outes ae complex components that will use all available outing esouces (metal layes), while memoies will leave at least two layes available fo outing. Using conventional flooplanning When we conside the flooplan of the entie system, we face a poblem with about 900 components, including coes, and L3 caches and outes. Howeve, in this wok we deal with tiled hieachical MPs, which have seveal poven benefits by enabling a divide-and-conque design stategy. Flooplanning, placement, outing, and timing closue ae pocesses that can be applied to a single tile while guaanteeing coectness fo the global system. Fo this eason, we will cente on the flooplanning of a single tile.

46 38 hapte 3. Physical planning fo MPs Figue 3.3: A minimal aea flooplan fo the configuation in Fig. 3.2 Figue 3.3 depicts a minimum-aea flooplan that could be obtained by a conventional flooplanne such as ompass [37]. In this example, the total aea of the tile is mm 2. Howeve, fom the point of view of a hieachical MP, this flooplan has some undesiable poblems: 1. Some coes ae not adjacent to thei pivate caches, potentially inceasing the communication latency between them. Similaly, thee ae long distances between some caches and the coesponding ing outes. 2. ing outes fo the local inteconnect ae not evenly sepaated. In a ing, the wie length of the longest hop dictates the maximum speed fo the entie ing. If this distance is too long, some timing constaints might be violated. Theefoe, it is desiable to minimize the length of each link hop sepaately instead of minimizing the total link length. 3. Assuming that coes () and the oute () use all metal layes, the two ightmost ing outes () have no available outing aea in thei boundaies. Thus, the design cannot be outed without whitespace insetion. Using MP-awae flooplanning An altenative flooplan is shown in Fig This flooplan has been geneated using all the constaints and enhancements discussed in this wok. Since aea minimization is no longe the only objective, this flooplan has a 16% aea incease (14.57 mm 2 ). Howeve, all of the coes ae now adjacent to thei pivate caches. Additionally, a oute can be found between all the ing outes so that the the link length fo each hop is always between 0.1 and 1.1 mm, and the distance between a component and its attached ing oute is stictly less than 0.5 mm. As an example, Fig. 3.5 shows a flooplan fo the entie system, including all clustes, based on the cluste flooplan fom Fig. 3.4.

47 3.1. Motivation 39 Figue 3.4: MP-awae flooplan fo the configuation in Fig. 3.2 Figue 3.5: MP-awae flooplan (full chip).

48 40 hapte 3. Physical planning fo MPs A 16% incease in aea may induce an unacceptable ovehead in manufactuing cost. This fact may encouage a designe to select an altenative achitectual configuation, with a slightly lowe pefomance, although with bette flooplan popeties. 3.2 elated wok Flooplanning as a pat of the VLSI design flow has been extensively studied fo decades. The taditional definition involves minimizing a linea combination of aea and estimated wie length [85], leaving actual wie planning to posteio stages in the design pocess. Hieachical appoaches to flooplanning have aleady been shown to educe the algoithm untime. Quite often hieachical flooplanning is applied to the design of Systems-on-hip (Sos), fo which evey component can be consideed as a fixed-size block. These blocks can be geneated using fixedoutline flooplannes such as [12], while the system-level flooplanning can be solved using the taditional minimal aea techniques such as [37]. In this wok, we will instead exploit the egulaity of tiled hieachical MPs. When flooplanning a MP, it might also be desiable to optimize factos othe than aea and wie length. Pevious appoaches exist that evaluate flooplans based on othe qualities such as tempeatue minimization [109,123] o powe consumption [129], using analytical models. Fo flooplanning at the system-level, [145] poposes a method that ceates tile aangements which minimize the oveall wie length fo seveal 3D topologies. Flooplanning with constaints is also commonly consideed in moden flooplanning [131]. Fo example, estictions on the valid placement of blocks: adjacency constaints, limits in the distance between pais of blocks, and objects in fixed positions [146]. Specifically consideing MP constaints at this stage, as in this chapte, is less common. In [142] the authos show how ove-simplified models fo those constaints (e.g., disegading pin placement) poduces suboptimal flooplans, but only fo classic bus-based inteconnects. On the integation of flooplanning with ealie design stages, the wok in [20] incopoates a linea pogamming-based flooplanne into a synthesis famewok fo application-specific System-on-hips. The flooplanne is used to obtain bette aea estimates. The influence of physical infomation on system pefomance at the micoachitectual level was studied in [42]. The authos poposed physical planning to estimate aea and link delay, which wee then used to efine the accuacy of thoughput estimations obtained by simulation.

49 3.3. Achitectual exploation Achitectual exploation This section oveviews the flow fo achitectual exploation of MPs and intoduces the context fo physical planning. onside the poblem of maximizing MP pefomance (thoughput) subject to a esouce budget, i.e. constaints on aea and powe. The given fomulation is an example of the achitectual exploation poblem with the objective of efficiently distibuting the chip esouces among the components of a multi-coe system, e.g. coes, memoies and inteconnect. The design space fo exploation is specified though a set of models and design constaints. The models descibe the behavio of individual components. Thee can be diffeent models fo coes chaacteizing diffeent micoachitectual featues that tade-off aea, powe and pefomance (in-ode/outof-ode execution, multi-theading, etc). The memoy models define the size, aea and latency of diffeent memoy modules. The models fo the inteconnect define thei physical and pefomance popeties (latency, contention, etc). The expected wokload fo the MP equies anothe type of models that chaacteize the obsevable behavio poduced by the geneated memoy pattens (memoy locality, bustiness, etc). onstaints on powe consumption and aea ae typically defined to confine the design space. Exploation is a complex optimization poblem due to the vast discete space of achitectual vaiables that detemine the configuation of a MP (e.g. numbe of coes, cache sizes, inteconnect topology, link width). To handle this complexity, in this wok we esot to a thee-stage divide-andconque appoach to solve the exploation poblem. Figue 3.6 illustates ou methodology, with the main stages being the achitectual exploation, physical planning and validation. Achitectual exploation Duing the fist stage, analytical models ae used to apidly pune the design space and geneate a set of pomising configuations in the aea/powe/ pefomance space. The analytical model fom [114] is used to evaluate MP configuations and disciminate those with poo pefomance. Static and dynamic powe ae also evaluated using analytical appoximations based on the aea and activity of the MP components [115]. The aea is appoximated as the sum of the aeas of all components on chip. Analytical models ae used as a cost estimato fo an iteative metaheuisticbased seach to efficiently navigate though the design space. This space is descibed with a set of achitectual vaiables and a set of tansfomations is defined to exploe the neighbohood of any paticula configuation. Some

50 42 hapte 3. Physical planning fo MPs Models (pefomance/powe) oes On-chip caches Off-chip memoies Inteconnects Wokloads onstaints Aea Thoughput Powe Achitectual exploation Geneation of configuations Analytical modeling Pool of pomising configs Physical info oes aches Inteconnects Seach diection Achitectual configuation Numbe of coes luste size /L3 cache size Inta-cluste inteconnect Inte-cluste inteconnect luste-level flooplan Wie length estimation Simulation Validation Wie planning Flooplanning Physical planning Figue 3.6: Modified achitectual exploation flow that includes physical planning. examples of tansfomations include modifying the dimensions of the top-level mesh, the numbe of coes pe cluste o the topology of the local inteconnect, among othes. Simulated Annealing [92] and Extemal Optimization [26] ae used to exploe the design space by pobabilistically applying tansfomations and tacking the best discoveed solution. Physical planning The objective of this stage is to evaluate wie length and give a moe accuate aea estimation. The flooplanning and wie planning algoithms at this stage conside physical constaints fo individual MP components, such as the aspect atio and the numbe of metal layes. This accuacy comes at the expense of a highe algoithmic cost, which is howeve toleated by pefoming the planning fo a modeate numbe of configuations, selected duing the fist stage. Validation Finally, the validation phase of the flow is aimed at veifying pefomance and powe, which may diffe fom the initial analytical estimates. In the cuent setup we use a cycle-accuate simulation fo MP inteconnect, supplied with pobabilistic automata models fo coes and memoies [55]. The following sections will focus on algoithms fo the physical planning of hieachical MPs. Thei objective is to accuately estimate the chip aea and wie length, subject to the physical constaints. The methods poposed in this wok ae applied at the second phase of the descibed exploation flow.

51 3.4. Flooplanning methodology 43 v B A h A B Figue 3.7: Example of a slicing flooplan and associated slicing tee. 3.4 Flooplanning methodology This section pesents the fist step of physical planning: flooplanning Flooplan epesentations Flooplanning is the task of defining tentative locations fo the blocks of system unde cetain geometic constaints. The blocks epesent pe-designed MP components such as coes, memoies and outes. The blocks can eithe have a fixed size o accept a set of diffeent aspect atios. The taditional flooplanning poblem only consides the minimization of the total aea occupied by the components. Moe advanced flooplanning stategies can also conside the minimization of othe metics such as the estimated wie length. Because of the complexity of the poblem, it is essential to select efficient data stuctues to epesent flooplans. In this wok, we use Simulated Annealing fo the exploation of slicing flooplans similaly as poposed in [141], whee the cost function is defined as a linea combination of aea and wie length appoximated with half-peimete wie length. In addition, the cost function is extended with othe components that aim at geneating flooplans with some popeties and constaints fo tiled hieachical MPs. Slicing tees Slicing tees [141] is one of the most popula flooplan epesentations. It can epesent only a family of flooplans called slicing flooplans. This subset of all possible flooplans contains only flooplans that can be epesented entiely by a seies of hoizontal o vetical cuts. A slicing tee is just a tee epesentation of such seies of cuts (see Fig. 3.7). It has been poven that slicing tees, when combined with a compaction post-pocess, can epesent all possible maximally compact layouts of any given libay of components [99]. Hence the use of the slicing flooplans does not limit the seach space. In this wok, compaction is not applied and the geneation of aea-optimal flooplans is not guaanteed. Howeve, the diffeence is expected to be acceptable, specially in the pesence of soft blocks [147].

52 44 hapte 3. Physical planning fo MPs B h B B y (2,1) x y (2,4) (4,2) x y (2,4+1) (4,2+1) Figue 3.8: Example of vetical composition of the bounding cuves fo two components B and. x B Bounding cuves To epesent the possible shapes of the individual components, bounding cuves ae used. A point (x, y) belongs to the bounding cuve of a component if x and y ae a valid width and height fo that component (Fig. 3.8). Thee ae efficient vetical and hoizontal composition opeations on bounding cuves [116] to calculate all the valid aspect atios of such compositions Seach stategy Algoithm 1 Flooplanning algoithm (Simulated Annealing) FP Initial slicing flooplan T Initial tempeatue while impovements in the last k iteations do fo p iteations do select FP new andomly fom neighbos of FP gain OST(FP) OST(FP new ) if ANDOMAEPT(T, gain) then FP FP new T T α etun FP Even educing the seach to slicing flooplans, the space of solutions is high enough that the use of metaheuistics is unavoidable [144]. The flooplanning pocess descibed in this wok uses an extension of the Wong-Liu algoithm, except fo changes to the cost function that will be descibed in this section and Section The Wong-Liu algoithm [141] is a customization of Simulated Annealing [92] fo the seach of slicing tees. It defines a neighbohood function

53 3.4. Flooplanning methodology 45 oe No oute Aea available fo inteconnect ache m6 m5 m4 m3 m2 m1 FEOL BEOL Figue 3.9: Multi-layeed MOS technology: FEOL includes the font-end layes (polysilicon and diffusion), m1-m6 epesents the available metal layes. consisting of thee movements that opeate on top of Polish expessions, which ae a sting epesentation of slicing tees. An oveview of the seach pocedue is pesented in Algoithm 1. One impotant ingedient of any Simulated Algoithm is the cost function. To allow tade-offs between aea and othe factos to be exploed, we intoduce weights to the diffeent factos of the cost function: OST(FP) = αaea(fp) + βwl(fp) + γwl Eq (FP) + P(FP) In this expession, FP is the flooplan being evaluated, Aea is defined as the effective aea of the flooplan, WL is the sum of the wie length estimation fo each net, and WL Eq is the sum of the squaes of the estimated wie lengths fo nets in the ing inteconnect, if any. The goal of WL Eq is to penalize flooplans whee equidistantly-spaced nets have excessively diveging lengths. The last tem, P(FP), aggegates all possible penalties. We will explain each of these factos in moe detail in section Section The α, β and γ paametes ae weights that a designe can use to guide the seach towads flooplans with smalle aea o towads flooplans with smalle wie lengths. An example of this tade-off will be seen in Section MP-awae flooplanning In Section 3.1 we mentioned some of the equiements fo the physical planning of tiled hieachical MPs. In this section we addess them in moe detail. Ove-the-cell outing uent MOS-VLSI design is multilayeed. Individual devices such as tansistos ae pattened on the bottom laye, which in moden fabication is composed of polycystalline silicon (polysilicon) [139]. This laye is often called the font

54 46 hapte 3. Physical planning fo MPs end of line (FEOL). Successive layes ae applied that can be used to make connections between the diffeent tansistos and extenal connections, collectively called the back end of line (BEOL). An example can be seen in Fig The BEOL layes ae used fo the equied wiing inside the vaious MP components. Howeve, diffeent component types have diffeent equiements of outing esouces. On-chip memoy often uses less layes than moe complex cicuits such as coes. If the input data models this, these fee layes can be used fo the wies between the diffeent MP components. Because of the pevalence of cache memoies in MP tiles, we can assume that evey configuation can be outed using the available metal layes on top of the components without equiing any exta whitespace. Duing flooplanning, and as pat of the wie length estimation that will be descibed late in this section, unoutable configuations ae discaded. Abutability Because only a single tile of a chip is flooplanned, some nets that connect diffeent clustes will have floating teminals that must be placed on one of the boundaies of the tile. Howeve, the placement of this teminal must lie adjacent to the placement of a coesponding teminal on the next cluste. Thus, a special symmety constaint is ceated between pais of nets. All the global inteconnectnets have this popety. Wie length constaints Due to pefomance easons, cetain citical nets must have a wie length constaint. In case these constaints ae violated the flooplan is ejected. This maximum length will depend on the desied inteconnect opeating fequency, wie sizing and othe paametes [80]. Equidistantly-spaced nets Fo most inteconnects, the communication delay is detemined by the maximum length of a set of links. Fo example, in a ing, the cycle peiod must be long enough to allow packets to popagate acoss the longest of the ing hops. In these cases, it is desiable not to stictly minimize the total wie length, but to balance the individual lengths of the espective links. Fo this eason, nets that must satisfy this equiements ae evaluated diffeently in the cost function (Section 3.4.2), minimizing the sum of the squaes of the lengths instead: WL Eq (FP) = WL(net) 2 net ing

55 3.4. Flooplanning methodology 47 N S Figue 3.10: Maze outing a pai of nets with abutability constaint (flooplan fom Fig. 3.4, blockages maked in black). Wie length estimation A good wie length estimato is impotant fo the evaluation of the cost function. Wie length estimations ae used in the WL(FP) and WL Eq (FP) tems of the cost function. Additionally, it is used to check satisfiability of some of the constaints, such as abutability and wie length limits. In ove-the-cell outing, the only space consideed fo outing is the fee space ove the components that have the top metal layes available. Since coes and outes typically implement a complex intenal wiing and thus utilize the highest numbe of layes, memoies ae the only components in the entie design that leave some metal layes unused. In fact, the elative aea of memoies in a tile is defined by the configuation, but it usually anges between 50%-60% fo the best configuations as seen in ou tests. Thus, the lowest metal layes will typically have no space fo outing, while the uppe layes will have up to 60% of space available, theeby making ovethe-cell outing possible. An example can be seen in Fig. 3.10, which epesents a middle metal laye fom the flooplan in Fig. 3.4, with the aea occupied by components maked in a dak colo. The wok upon this flooplanning algoithm has been based on, [141], poposes the use of the half-peimete wie length as an estimato. In this wok, we popose the use of Lee s algoithm [119], often known as Maze outing. The educe the complexity of the algoithm, thee elaxations ae applied:

56 48 hapte 3. Physical planning fo MPs 1. outing full links, not individual wies 2. outing each net independently, so that no collisions ae consideed 3. outing is pefomed on one metal laye only Thus, outes might be geneated that may be found unfeasible duing wie planning. Howeve, fo the case of nets with two teminals, we can guaantee that a oute found using this method is a valid lowe bound. Thus, this infomation can be used to veify wie length and outability constaints. Because of simplification (1), the size of the outing gid is detemined by the minimum link width. The use of Lee s algoithm also enables checking fo violations of the abutability equiement. When planning pais of nets with such equiement, the algoithm will only accept a path if a matching path has been found on the opposite side fo the paied net. The algoithm also will not stop at the fist path, but athe collect all paths and select the one whee the oute is shotest to both opposing extemes of the tile. In Fig. 3.10, this algoithm is applied to estimate the length of the two vetical mesh links (fom the oute to the noth side and fom the oute to the south). The shotest oute fo the noth net is discaded because at the opposing side of the tile (same column, last ow) thee has been no path found fo the south net. A moe accuate estimation of outability is pefomed duing wie planning (Section 3.5) to discad those flooplans that ae unoutable when consideing all signals simultaneously. 3.5 Wie planning In ode to fully ealize the flooplan estimated in the pevious section, we need to establish a wie planning that connects all the equied nets between the components and that allows the tiling of the cells. This wie planning must use ove-the-cell outing and minimize its wie lengths, while balancing the nets. This poblem coesponds to a outing poblem and we solve it in two steps. In the fist step, we fomulate the outing poblem as a Boolean satisfiability poblem fo which we obtain a feasible solution with a SAT solve. Then, in the second step, we iteatively educe the wie length of seveal nets by conveting the satisfiability poblem to an intege linea pogamming poblem that we solve with an ILP solve. In the following, we descibe thei essential elements.

3.5. Wie planning 49 Top view oss-section view Figue 3.11: Gid stuctue used fo wie planning. 3.5.1 Poblem fomulation We fomulate the outing poblem as a Boolean satisfiability poblem in the lines of [78], which we extend with some insights that ae needed in the context of MPs.

The outes ae calculated globally fo the complete links and not fo the individual wies that compose each link. The outing egion is epesented by a unifomly-sized coase gid (Fig. 3.11).

outing is pefomed on a 3D gid with blockages accoding to the metal layes occupied by the MP components, as illustated in Fig. 3.9.

57 3.5. Wie planning 49 Top view oss-section view Figue 3.11: Gid stuctue used fo wie planning Poblem fomulation We fomulate the outing poblem as a Boolean satisfiability poblem in the lines of [78], which we extend with some insights that ae needed in the context of MPs. At this ealy design stage, wie planning is only pefomed on the wide communication links of the system, neglecting local contol wies. These links can have moe than 10 3 wies, e.g. full cache lines with data, addess and contol bits. The outes ae calculated globally fo the complete links and not fo the individual wies that compose each link. The outing egion is epesented by a unifomly-sized coase gid (Fig. 3.11). The gid unit is detemined by the minimum width of a link, w = n p, whee n is the numbe of wies of the naowest link and p is the wie pitch, i.e., the smallest distance between wies. outing is pefomed on a 3D gid with blockages accoding to the metal layes occupied by the MP components, as illustated in Fig The main vaiables of the SAT poblem coespond to the pesence (o absence) of a wie segment between two adjacent nodes of this 3D gid. Anothe set of vaiables encodes the assignment of wie segments to specific nets. The SAT poblem includes seveal types of constaints: onsistency constaints enfoce the expected behavio of the vaiables we have intoduced, e.g., if an edge is assigned to a net, then the edge must be occupied by a wie. outability constaints define a legal outing between the components. Basically, these constaints establish that a set of wie segments guaantee the connectivity of all pins of a net. The fomulation is simila to the one pesented in [78] but extended to handle floating teminals. Ou

58 50 hapte 3. Physical planning fo MPs solution is based on the idea that outing must be pefomed among egions of points that define the endpoints of the nets. These egions ae chaacteized by a set of (not necessaily adjacent no disjoint) points that may descibe the location of a component o the set of all possible locations fo a pin. The coectness of ou outability constaints is based on Eule s gaph theoy. Abutability constaints ensue the symmety between the wies that ae used to inteconnect tiled cells. These constaints asset that if a wie in the Noth bounday povides a signal fo a net that inteconnects adjacent cells, anothe wie fo the same cell must be placed in the same position in the South bounday. Simila elations must also occu in the othe diection and fo East/West boundaies. Optionally, constaints fo design ules can be equested in ode to fulfill fabic equiements o to educe unning time. One of the typical design ules is to assign one diection to each metal laye. Solving the pevious satisfiability poblem povides a fist feasible solution fo the wie planning poblem (o shows the absence of such a solution!) eduction of wie length Once we have a feasible solution fo the wie planning poblem, we impove it by educing its wie length while maintaining its feasibility. Ou stategy is iteative, whee each iteation consists in ipping out a small set of nets fom the feasible solution and eoute them, subject to the peviously specified constaints and minimizing the total wie length. To do so, we convet ou Boolean satisfiability poblem into an intege linea poblem: Boolean vaiables ae tansfomed in 0/1 vaiables, Boolean constaints ae easily conveted to linea inequalities and, the linea function that counts the amount of wie is used as the objective function of the ILP. Since the above pocess is applied fo a small set of nets at each iteation, the esulting poblem is tactable and can be solved with efficient solves in a modeate amount of time. Note that solving the oiginal poblem with all the nets and seeking fo the absolute minimum is too slow fo the sizes of the poblems we ae faced to. The cuently implemented iteative pocess poceeds by just ipping out and eouting one net at a time, with the exception of the set of nets that inteconnect tiled cells, which ae ipped out and eouted in one step. This pocess is epeated while eductions in the wie length ae obtained, favoing the eduction of long nets befoe the eduction of shote nets.

59 3.6. esults 51 Paamete Value Maximum chip aea 350 mm 2 Maximum chip powe 350 W Inteconnect fequency 1.6 GHz Global inteconnect types Mesh Global mesh dimensions 2 2 to Local inteconnect types Bus, ing Local inteconnect sizes Limited by chip aea only Memoy density 1 mm 2 /MB ache latency (pe size) 5.0 achesize 0.5 cycles Off-chip memoy latency 100 cycles Inteconnect link width 10 µm (10 3 wies 10 nm) Available metal layes Used by coes Used by outes Used by cache memoies m1, m2, m3, m4 All All m1, m2 oe types oe pefomance (IP) oe aea 1 mm mm 2 2 mm 2 L1 size size L3 size 64, 96, 128 KB pe coe 64 KB to 1 MB pe coe Up to 100 MB pe chip Table 3.1: Paametes fo system-level exploation. 3.6 esults In this section we demonstate the impact of using physical planning duing system-level exploation, and also show the need of MP-specific constaints duing physical planning fo a pope evaluation of achitectual configuations Exploation setup All of the expeiments fom this section use configuations that wee obtained using automated system-level exploation [113]. The paametes of this exploation ae descibed in Table 3.1. We limit the seach to tiled hieachical MPs using a mesh as the global inteconnect, with the second level inteconnect being a bus o a ing (bi-diectional o uni-diectional). The numbe of tiles, the numbe of coes and the distibution of coes among the tiles ae exploation vaiables. We assume that thee diffeent models of coes ae available (1, 2 and 3), with diffeent pefomance and aea chaacteistics obtained by scaling publicly available data of the Intel oe 2 Duo E6400 pocesso [48]. We also assume that, while coes and inteconnect outes occupy all metal layes,

60 52 hapte 3. Physical planning fo MPs cache memoies only use two of them. Theefoe, outing can be pefomed ove the cache memoies. The opeating fequency of the inteconnect has been used to define the constaints on the maximum wie length fo the links. The wie planning models wee solved using PicoSAT [23], and Guobi [72] was used to optimize the wie length as pe Section 3.5. To chaacteize the memoy accesses, a model extacted fom the SPE2006 soplex benchmak is used. The exploation geneates 200 configuations in aound 20 minutes. Each configuation is descibed by its achitectual paametes. Fo example, the best configuation fom this exploation has 25 clustes connected with a 5 5 mesh. Each cluste has a bus as local inteconnect, two 2 coes and two 3 coes, along with 1 MB of cache pe coe. The MP has a total of 50 MB L3 cache distibuted acoss the 25 clustes. It has an estimated thoughput of IP Impact of physical planning In ode to pove how the use of physical planning can significantly alte the esults of system-level exploation, we applied ou physical planning tool to the 200 configuations found by the exploation. This flooplanning pocess, if un sequentially, takes 5 hous (an aveage of 90 seconds pe configuation). Howeve, on a machine with multiple coes each of the 200 configuations can be un sepaately Block aea onventional flooplan No-awae flooplan Aea [mm 2 ] Thoughput [IP] Figue 3.12: Aea as measued by diffeent flooplanning stategies.

61 3.6. esults 53 The esults ae shown in Fig Fo each configuation, block aea indicates the sum of the aeas fom all components. The exploation tool, befoe physical planning, uses this value as estimato fo the expected chip aea in ode to satisfy the maximum aea constaint. In this example, no configuation has a block aea lage than 350 mm 2. onventional flooplan shows a minimal aea flooplan obtained without using any of the constains descibed in this wok (abutability, link length optimization, etc.). On the othe hand, No-awae flooplan depicts the flooplan with minimal aea that satisfies these constaints. A dashed line connects the block aea data point with the minimal No-awae flooplan aea fo the same configuation. Despite the fact that all configuations have a block aea lowe than the limit, a lage numbe exceeds the aea limit once physical planning is taken into account. As an example, the best configuation found by the exploation (ightmost in Fig. 3.12) has a block aea of mm 2, which is below the aea constaint. A conventional, minimal aea flooplan exists with an aea of mm 2, also below the constaint. Howeve, using the tool pesented in this wok, we find that the smallest flooplan satisfying all flooplanning constaints has an aea of mm 2. This violates the aea constaint and, theefoe, is not actually a valid configuation. The fist viable configuation with aea below the limit has a significantly lowe pefomance at IP. Out of the 200 configuations selected duing the exploation, 39% of configuations had no flooplan satisfying all the constaints. Even fo the configuations fo which such a flooplan was found, only 23% satisfy the 350 mm 2 aea limit. onfiguations using ings as local inteconnect, despite thei excellent pefomance chaacteistics, have much sticte physical constaints and thus often violate design constaints. Without physical planning, those configuations would have been tagged as pomising and would have been analyzed with moe accuate simulation tools Physical planning seach space A single MP configuation can have a lage numbe of altenative flooplans. Nevetheless, it is desiable to select one o few candidate flooplans. At the same time, we ae consideing two metics by which feasible flooplans can be evaluated: aea and wie length. Thus, thee is a tade-off. In Section 3.1 we showed two candidate flooplans whee one had much shote total wie length at the cost of a 15% incease in the chip aea. Since this tade-off might be inconvenient fo some designs, the weights in the cost function (descibed in Section 3.4) can be modified to guide the seach towads flooplans with bette aea o towads shote wie length.

62 54 hapte 3. Physical planning fo MPs Wie length [10 6 µm] (a) (b) Aea [mm 2 ] Figue 3.13: Example of physical planning seach space fo a single MP configuation. (a) (b) Figue 3.14: Two design points fom the exploation space in Fig

63 3.7. onclusions 55 Figue 3.13 is an example of the available flooplans fo a given MP configuation. In the chat, each point epesents a valid flooplan and its position depends on the aea and wie length fo that flooplan. The 10 Paetodominating solutions ae epesented as a solid line (Paeto fontie). By changing the weights in the cost function, a designe can decide which of these solutions ae most desiable. We expect this small set to be futhe educed by aea o othe types of physical constaints. To illustate, we selected two epesentative flooplans fom the Paeto fontie that we show in Fig These ae, espectively, the flooplan with the minimal aea (but satisfying all constaints) and the oveall best flooplan assuming we give the same weights to both aea and wie length minimization Publications As pat of this eseach topic we have published the following aticles: J. de San Pedo, N. Nikitin, J. otadella, and J. Petit, Physical Planning fo the Achitectual Exploation of Lage-Scale hip Multipocessos, in Poceedings of the 2013 IEEE/AM Seventh Intenational Symposium on Netwoks-on-hip, Tempe, Aizona, USA, 2013, pp J. otadella, J. de San Pedo, N. Nikitin, and J. Petit, Physical-awae system-level design fo tiled hieachical chip multipocessos, in Poceedings of the 2013 AM Intenational Symposium on Physical Design, New Yok, NY, USA, 2013, pp The fist aticle centes on the integation of physical planning into achitectual exploation, while the second aticle centes on the specifics of flooplanning fo MPs. 3.7 onclusions The impact of physical layout is often neglected when exploing the achitectual paametes of a MP. This chapte has pesented a famewok in which physical planning has been integated with achitectual exploation to geneate physically-viable high-pefomance MPs. The pesence of physical constaints has been shown to have an impotant impact in deciding the paametes fo the design of MPs. This is a fist step towads futue design famewoks in which accuate powe-pefomance models and advanced technologies with hieachical onchip inteconnects can be incopoated. Futue wok in this diection has to

64 56 hapte 3. Physical planning fo MPs addess the issues of physical planning fo altenative inteconnect topologies, such as cossbas. Additionally, the accuacy of pefomance pediction can be impoved by using physical infomation while estimating the system thoughput. Anothe impotant task is to look fo altenative appoaches to physical planning in ode to speed-up the phase of system-level design fo MPs.

65 hapte 4 egulaity-constained flooplanning The complexity of the VLSI physical design flow gows damatically as the level of integation inceases. An effective way to manage this inceasing complexity is though the use of egula designs which contain moe eusable pats. In this chapte we intoduce Hieg, a new flooplanning algoithm that geneates egula flooplans. In hapte 3, the benefits of egula designs wee aleady demonstated with tiled hip multipocessos (MPs). In a tiled MP, thee is a single design fo a tile that is designed once but eplicated many times, theeby educing the design effot. Hieg goes futhe and automatically extacts epeating pattens in a design by using gaph mining techniques, without any infomation fom the designe. As will be seen, this also allows Hieg to extact multiple hieachical levels of epeating pattens, instead of being limited to a single tile patten. egulaity is exploited by eusing the same flooplan fo multiple instances of a patten, as long as neithe aea, wie length o existing hieachy constaints ae violated o compomised. Expeiments will show the scalability of the method fo many-coe MPs and competitive esults in aea and wie length with taditional flooplannes. This chapte stats by giving a bief motivation of the poblem in Section 4.1. The cuent state of the at is eviewed in Section 4.2. Section 4.3 uses an example to demonstate the challenges solved by the pesented appoach. Section 4.4 descibes the inne wokings of Hieg. In Section 4.5, Hieg is compaed with othe flooplanning tools to evaluate the benefits of egulaity. onclusions ae discussed in Section 4.6.

66 58 hapte 4. egulaity-constained flooplanning 4.1 Motivation The computational complexity of the flooplanning poblem highly depends on the numbe of components of the system. Fo lage systems, a flat view makes the flooplanning poblem intactable. Fo this eason, hieachical methods [135, 143] has been poposed and successfully used to educe this complexity. Hieachical methods divide the flooplanning poblem into multiple subpoblems that may be eithe fully o patially independent fom each othe, theeby enhancing scalability. An impotant metic often disegaded duing flooplanning is egulaity, known to lead to efficient and economical designs [128]. Lage-scale systems have significant amounts of egula pattens than can be exploited (on-chip memoies, many-coe MPs, etc.). The design cost of such systems can be bought down by educing the numbe of distinct subcicuits to be designed, and then eplicating the pe-designed subcicuits as many times as possible. To allow this eduction, a egula flooplan uses the exact same layout fo all eplications of a subcicuit. To educe complexity of timing closue, it is also desiable fo all of the adjacent components to be placed in simila elative positions, so that the inteconnect geometies ae egula and timing analysis is simila. In many-coe MPs, tiled layouts [16] ae often used to exploit egulaity, as in the pevious chapte. The design is split into homogeneous tiles that ae only flooplanned once and then eplicated. Howeve, with industy moving towads heteogeneous MPs [97], it is no longe possible to assume that most MPs will have only a single type of tile. Integated gaphics, acceleatos and I/O blocks ae some of the special types of co-pocessos that intoduce heteogeneity. On the othe hand, enfocing egulaity in a design may compomise othe flooplan metics such as aea o wiing. Existing designs ae often hieachical in natue. MPs with hieachical topologies ae a good example. Peseving the pe-defined hieachy may esult in a bette wiing quality (e.g. by educing the numbe of wies that coss between diffeent subcicuits). Vey often, hieachy is manually enfoced by designes to split the design and assign components to diffeent design teams. Thus, beaking the existing hieachy could be countepoductive to the goal of simplifying design. Anothe example is the concept of choppability [122]. In a choppable flooplan, lage functional blocks can be chopped away, educing the total die size and vaying pefomance/powe metics in ode to constuct multiple vesions of a poduct fom the same basic design. In this chapte we intoduce a new flooplanning algoithm, Hieg, that consides aea, wiing, egulaity and hieachy as flooplanning objectives. Vey often these objectives ae conflicting, e.g., educing final aea compomises egulaity and vice vesa. To deal with this issue, Hieg uses a new method

67 4.2. elated wok 59 M M M M M M M M M M M M M M M M L3 L3 L3 M Buf Buf L3 (a) Diamond patten L3 L3 Buf M L3 L3 Buf M M L3 Buf M Buf M L3 L3 L3 L3 L3 L3 L3 L3 L3 L3 L3 L3 L3 L3 L3 M Buf Buf M Buf L3 L3 L3 L3 Buf M M L3 L3 L3 L3 M Buf (c) DeFe L3 L3 L3 Buf M L3 L3 L3 L3 M M Buf L3 M Buf L3 L3 L3 L3 L3 Buf Buf M L3 L3 L3 L3 L3 L3 L3 L3 L3 Buf Buf Buf L3 L3 L3 L3 L3 L3 L3 L3 Buf Buf Buf L3 L3 L3 L3 L3 L3 L3 L3 L3 L3 L3 L3 L3 M L3 M M M M M M M M L3 L3 L3 L3 M M M M M M M Buf Buf L3 L3 L3 L3 Buf Buf Buf Buf L3 L3 L3 L3 L3 L3 Buf Buf Buf L3 L3 Buf L3 L3 L3 L3 Buf Buf M M Buf M Buf M (b) ompass Buf M Buf M Buf M Buf M Buf M Buf M Buf M Buf M (d) Hieg Buf M Buf M Buf Buf M M Figue 4.1: Example MP flooplans geneated using diffeent flooplanning stategies. that can tade-off hieachy and egulaity constaints. Both hieachy and egulaity ae automatically discoveed fom the block netlist. 4.2 elated wok Thee has been little wok in the aea of egula flooplans. egulaity is moe common in the aea of physical design fo analog cicuits, whee it is often a stict equiement due to the peculiaities of analog design [15]. Howeve,

68 60 hapte 4. egulaity-constained flooplanning Hieachy egulaity [38] No Aays only EGULAY [145] Yes Tiles only DeFe [143] Yes No ompass [37] By similaity No [135] Yes No AchFP [64] Manual Manual Hieg Yes Yes Table 4.1: ompaison of elated wok. most of the techniques in analog design involve symmety popeties that ae not elevant fo maximizing design eusability. [38] acknowledges the impotance of egula designs in MPs and descibes a simulated annealing-based flooplanne that oganizes goups of simila blocks in egula aays. Howeve, the technique does not fully exploit egulaity since adjacent components may not be placed in aligned locations that enable egula inteconnection geometies. The blocks that ae to be placed in egula goups must also be manually selected by the designe. In System-on-hip design, EGULAY [145] also mentions the impotance of peseving egulaity and hieachy. EGULAY discoves the optimal mapping of heteogeneous tiles into a egula gid aangement, and does not conside the flooplanning of the individual tiles themselves. On the othe hand, the advantages of using hieachy duing flooplanning ae not new [49]. Nonetheless, most existing wok uses hieachy only to impove the scalability of the flooplanning poblem, allowing efficient geneation of flooplans with lage numbes of components, and diffe in the methodologies used to discove hieachy. DeFe [143] uses gaph bipatitioning to geneate a binay tee of balanced netlist patitions, a method simila to the one poposed in this wok fo hieachy discovey. This hieachy tee is then used to geneate a slicing tee, educing the numbe of flooplans that need to be exploed duing the seach. ompass [37] automatically clustes blocks with simila o identical shapes, and then ceates gid flooplans fo them. Howeve, ompass ignoes connectivity infomation. [135] applies a ecusive slice-and-patition method deived fom cell placement stategies. All thee examples use slicing flooplans and bounding cuves that will also be used in this wok to efficiently epesent flooplans. Slicing flooplans ae not able to epesent the entie set of optimal flooplans. Howeve, this diffeence is minimal given a lage numbe of soft blocks [147], as it often occus in MPs.

69 4.3. Exploing egulaity and hieachy 61 omponent Aea Aspect atio oe () 1.38 mm o 1.25 cache 1 mm L3 cache 3 mm ing oute () 0.27 mm 2 1 Mesh oute () 0.99 mm 2 1 Memoy contolle (M) 2.5 mm o 1.25 Buffe (Buf) 12 mm Table 4.2: Physical infomation fo Fig Whitespace HPWL (m) ompass 6.3% 6801 DeFe 8.2% 630 Hieg 12.6% 516 Table 4.3: Flooplanning esults fo Fig AchFP [64] descibes a diffeent stategy that can poduce flooplans that ae both egula and hieachical. Howeve, AchFP assumes that a designe will constuct, peviously to the flooplanning pocess, a manual hieachy of the MP components and will choose a flooplanning appoach fo each goup of components. Ou wok extacts hieachy and egulaity in an automated way, without any pevious knowledge of the topology of the input netlist. Table 4.1 summaizes the diffeences between these stategies. 4.3 Exploing egulaity and hieachy This section will use an example to illustate the tade-offs between egulaity and hieachy. Figue 4.1 shows the esult of flooplanning the same netlist using Hieg and two othe hieachical flooplannes. Table 4.3 contains whitespace and wie length esults. This netlist epesents a hieachical tiled MP, with 192 coes. It contains 64 tiles, with 48 pocessing tiles containing 4 coes each, and 16 memoy contolle tiles. This MP uses a hieachical Netwok-on- hip topology. An 8 8 mesh inteconnects all tiles. Inside each tile, a ing povides connectivity. The mapping of pocessing and memoy contolle tiles has been selected to match the diamond patten (Fig. 4.1a, [9]) which maximizes off-chip memoy pefomance. Thus, this configuation is epesentative of a potential many-coe MP design. Fo the sake of easy visualization, both types of tiles have simila

70 62 hapte 4. egulaity-constained flooplanning M M M M M M M M M M M M M M M M (a) egulaity without hieachy M Buf L3 Buf L3 M L3 L3 L3 M L3 Buf L3 M Buf L3 L3 L3 M L3 Buf L3 L3 Buf L3 L3 M L3 L3 L3 L3 M M Buf L3 Buf L3 L3 L3 L3 L3 M M L3 L3 Buf Buf L3 L3 M Buf L3 L3 M L3 L3 Buf L3 L3 L3 L3 L3 M M Buf L3 Buf L3 L3 L3 L3 L3 M M L3 L3 Buf Buf L3 L3 (b) Flooplan geneated fom (a) M M M M M M M M M M M M M M M M (c) egulaity peseving hieachy Buf Buf M M Buf M Buf M Buf M Buf M Buf M Buf M Buf M Buf M Buf M Buf M Buf M Buf M Buf Buf M M (d) Flooplan geneated fom (c) Figue 4.2: Example flooplans with and without hieachy constaints.

71 4.3. Exploing egulaity and hieachy 63 aea equiements in this example. In geneal, each tile may have a diffeent aea constaint. In addition to coes (), pocessing tiles contain caches pivate to each coe, a tile-shaed L3 cache block, ing outes fo inta-tile communication () and mesh outes fo inte-tile communication (). The memoy contolle tiles each contain a buffe (Buf), mesh oute, and a memoy contolle itself (M). Physical infomation is descibed in Table 4.2. oes come in seveal had aspect atios, but we assume memoies to be flexible within a limited ange. We also assume evey net epesents a link with 1024 wies. In Fig. 4.1b, ompass goups blocks by similaity and ceates aays in ode to impove packing quality, thus esulting in flooplans that have some egulaity. Howeve, connectivity infomation is not consideed, esulting in flooplans with good aea metics but poo wie length. DeFe minimizes aea and wie length, and uses hieachy to flooplan efficiently. In Fig. 4.1c, we disabled compaction in ode to easily visualize the effects of hieachy, but it was enabled fo obtaining the esults in Table 4.3. Because of hieachy, the flooplan is divided in 4 quadants, with each quadant also divided in 4 quadants, and so on. Howeve, small diffeences in the flooplans used fo evey quadant pevent eusing the same design fo all subquadants. The constuction of hieachy fom connectivity infomation esults in a 2% aea incease compaed to ompass, but geneates a significantly educed wie length. Hieg, on the othe hand, constucts a flooplan that exploits the egulaity and hieachy inheent to a tiled MP design. It is able to extact additional egulaity by gouping coes inside tiles in blocks of 2. This tiled stuctue is discoveed despite Hieg not having any pevious knowledge of the inteconnect topology. The use of hieachy and egulaity causes an additional 4% aea incease ove DeFe esults, but geneates a 20% eduction in wie length. Because of egulaity, the entie MP can now be constucting by eplicating the two types of tiles. At the same time, two coes in evey pocessing tile can be constucted by eplication, esulting in significant design time savings Discoveing egulaity In ode to ceate egula flooplans, Hieg automatically finds epeating pattens in the input netlist. We define a patten as a subgaph fom the netlist. We conside a patten P to be epeated if thee is at least one additional subgaph in the netlist isomophic to P. We call all the epetitions of P the instances of P. An example of the way Hieg extacts egulaity is shown in Fig. 4.4a. The initial netlist contains 4 instances of the same patten (composed of, and each). Afte identifying this patten, Hieg compesses the gaph, eplacing

72 64 hapte 4. egulaity-constained flooplanning P 1 = P 2 = M P 3 = M M (a) Pattens used in Fig. 4.2a Left half Full MP ight half NE cone SE cone NW cone SW cone (b) Discoveed hieachy tee Figue 4.3: List of pattens and hieachy tee. L3 L3 L3 L3 Full MP (a) egula extaction pocess (b) Geneated DAG Figue 4.4: egula extaction pocess and example geneated DAG. evey instance with a new vetex, epesenting the compessed instance. The pocess iteates until no additional pattens can be found. Because of this iteative pocess, the esult of egulaity discovey is actually a diected acyclic gaph (Fig. 4.4b). In this DAG, thee is a leaf node (with no exit edges) fo each component type in the netlist. Evey othe node epesents a patten. An edge between two pattens P a, P b indicates that patten P a contains an instance of P b. The oot patten (with no enty edges) epesents the entie oiginal netlist. Hieg uses this DAG to apply a divide-and-conque stategy. Instead of flooplanning the entie netlist, the poblem is split into flooplanning evey patten. To ensue egulaity, Hieg enfoces using the same o simila flooplans fo all instances of a patten, albeit this estiction may be elaxed if bette aea o wie length esults ae equied Tading off egulaity and hieachy An impotant contibution of this chapte is the impotance of peseving existing hieachy when discoveing egulaity.

73 4.4. egula flooplanning algoithm 65 Ou initial appoach completely disegaded hieachy and centeed on egulaity as defined in Section The egulaity extaction pocess is pimaily based on local decisions and lacks a global vision of the entie netlist. By centeing on egulaity only, the esults may contadict existing design hieachy, which can be countepoductive to the goal of educing design time. A visual example is shown in Fig This example shows a MP design identical to the one in Fig. 4.1, containing pocessing () and memoy contolle (M) tiles. In a and b flooplanning is pefomed using discoveed egulaity only, without peseving hieachy. c and d show the esults of flooplanning using both egulaity and hieachy discovey. The set of epeating pattens that have been extacted to constuct a ae shown in Fig. 4.3a. Because the egulaity discovey pocess lacks global vision of the netlist, it discoves a set of pattens that beak the natual tile hieachy of the MP. Despite pattens P 1 and P 2 being fequent pattens, using these to compess the netlist limits futhe extaction of egulaity. The only emaining pattens left afte compessing the netlist with P 1 and P 2 ae combinations that do not espect the mesh topology, such as P 3. P 3 goups a non-ectangula set of tiles. In these situations, good aea and wie length metics cannot be obtained if the same layout must be stictly eplicated fo all instances of P 3. Thus, the egulaity of the design is compomised, as shown in Fig. 4.2b. In Hieg, existing netlist hieachy is automatically discoveed using ecusive gaph bisection (simila to [143], descibed in Section 4.4.1). The ecusive gaph bisection pocedue divides the design into a ecusive seies of aea-balanced patitions. To account fo topologies with a non-powe-of-two numbe of patitions, Hieg allows fo up to 1-to-3 imbalance in the aea of the automated bisections. Altenatively, the hieachy infomation may be povided by the designe. This hieachy infomation is epesented as a tee, such as the one in Fig. 4.3b, which guides the egulaity discovey pocess. Specifically, no patten instances can coss boundaies delimited by hieachy. In Fig. 4.2c, no patten instance was allowed to extend to moe than one MP quadant, based on the hieachy tee discoveed by bisection (Fig. 4.3b) which sepaates the fou MP quadants. This estiction educes the numbe of P 1 instances, but eventually allows a lage numbe of moe egula pattens to be found. In this way, maintaining hieachy adds a global vision to the egulaity extaction pocess. 4.4 egula flooplanning algoithm The algoithm can be divided in 5 stages, as seen in Fig The fist 3 stages of the algoithm pefom hieachy discovey and egulaity discovey based on

74 66 hapte 4. egulaity-constained flooplanning Netlist Hieachy discovey Flooplan constuction Set of egula flooplans Hieachy tee collapsing Bounding cuve egulaity discovey Set of egula hieachies Bounding cuve constuction Figue 4.5: High-level flow of the algoithm. the input netlist. Duing hieachy tee collapsing, tade-offs between hieachy and egulaity ae exploed. Instead of geneating a single egula hieachy, between stages 3 and 4 we stoe a set of candidate egula hieachies, delaying the selection on which hieachy is most optimal until afte all hieachies have been evaluated. The latte two stages pefom actual flooplanning fo all the candidate hieachies. Stage 4 (bounding cuve constuction) enumeates all possible flooplans fo each of the hieachies, and stoes the outlines efficiently as a single bounding cuve. Afte this stage, the outlines fo each possible flooplan ae known and the designe can select a smalle subset based on physical metics such as aspect atio. Stage 5 (flooplan constuction) constucts the selected flooplans. Algoithm 2 contains a fomal definition of this multiple stage flow, showing all 5 stages. The stages will be explained in detail duing this section Hieachy discovey The fist stage discoves the existing hieachy in the input design. This is pefomed in ode to ensue that an existing high-level topology in the design is peseved. especting existing design hieachies is not only desiable fom a eusability point of view, but also geneates esults with impoved wie length when compaed to esults that ceate layouts which ignoe the existing topology. The method poposed in this section is based on hypegaph patitioning, an extension of the methodology poposed in [143]. We assume that the input netlist, epesented as the hypegaph G, has a natual numbe of patitions. Fo example, a MP with 8 8 tiles would have 64 natual patitions. A 63-way o 65-way patition would esult into moe inteconnections between the diffeent subcicuits than the natual 64-way patition. The goal of the algoithm is to discove these natual patitions.

75 4.4. egula flooplanning algoithm 67 Algoithm 2 Geneal oveview of the algoithm function EGULAFLOOPLANNING(G) 1. Hieachy discovey: andidatehieachies HieachyTee HIEAHYDISOVEY(G) fo theshold in {0... n} do 2. Hieachy tee collapsing: ollapsedhieachytee OLLAPSEHIEAHY(HieachyTee, theshold) 3. egulaity discovey: egulahieachydag EGULAITYDISOVEY(ollapsedHieachyTee) append egulahieachydag to andidatehieachies andidatehieachies contains candidate egula hieachy DAGs 4. Bounding cuve constuction: Γ empty bounding cuve fo all egulahieachydag andidatehieachies do Γ Γ ONSTUTBOUNDINGUVE(egulaHieachyDAG) Γ contains the bounding cuve of all possible flooplans fo all of the candidate hieachies SelectedPoints select desied outlines fom Γ 5. Flooplan constuction: etun ONSTUTFLOOPLANS(SelectedPoints) The input netlist G is patitioned into two smalle cicuits, minimizing the total numbe of inteconnections between the two patitions, as long as both patitions have an aea imbalance 2 3. Such imbalance magin allows handling cicuits whose natual numbe of patitions is not a powe of 2, by dividing the cicuit into a patition with 2 3 of the aea and one with 1 3. Between multiple bipatitions with the same numbe of inteconnections, the most balanced patition is pefeed. The pocess is ecusively applied to the two geneated patitions, until all patitions contain a low enough numbe of blocks so that futhe bisectioning is not equied (MinSize). This pocess is descibed in Algoithm 3. Duing the pocess a binay tee is ceated whee evey leaf node is a subcicuit (with a numbe of components < MinSize), and evey othe node is

76 68 hapte 4. egulaity-constained flooplanning a bipatition, the oot node being a bipatition of the input netlist G. We call this tee the hieachy tee (Fig. 4.3b). Algoithm 3 Hieachy discovey algoithm function HIEAHYDISOVEY(G) G is the input netlist if G <MinSize then Tivial case if thee ae too few elements left etun G G 1, G 2 bipatition of G minimizing numbe of edges between G 1 and G 2 with AEAIMBALANE(G 1, G 2 ) 2 3 T 1 HIEAHYDISOVEY(G 1 ) T 2 HIEAHYDISOVEY(G 2 ) etun EATETEE(T 1, T 2 ) function AEAIMBALANE(G 1, G 2 ) etun MAX(AEA(G 1),AEA(G 2 )) AEA(G 1 )+AEA(G 2 ) Hieachy tee collapsing The tees geneated duing hieachy discovey ae used to constain the egulaity discovey pocedue and ensue that existing cicuit hieachy is peseved. Howeve, stictly peseving all hieachy would pevent the flooplanne fom finding egulaity, as discussed in Section 4.3. Thus, this stage geneates hieachies that have been elaxed, giving moe flexibility to the posteio egulaity discovey pocedue. It is often the case that only the high-level hieachy is significant. Fo example, it is impotant to ensue that tile boundaies ae honoed in a MP, but the contents of the tiles themselves often have a less well defined hieachy, and a educed connectivity impact if such hieachy is not peseved. Fo this eason, it is pefeable to elax hieachy at the leaves of the hieachy tee. Algoithm 4 shows the details of the algoithm. theshold is an input paamete, specified as an absolute aea value. Fo tee nodes whee the total aea is less than this theshold, the tee node is collapsed: all of its descendants ae enumeated, combined into a single subcicuit, and the tee node is eplaced by the new leaf subcicuit node. Thus, the esulting collapsed tee will have fewe nodes than the input hieachy tee, with lage leaf nodes containing moe components. See Fig. 4.6 fo a visual example.

77 4.4. egula flooplanning algoithm 69 6 Top design Top design Figue 4.6: Tee collapsing with a theshold of 4 mm 2. Labels in leaf nodes indicate block aea (in mm 2 ). Multiple possible hieachies ae geneated by this pocess by automatically testing fo seveal values of the theshold paamete. This way, the tadeoff between hieachy and egulaity is exploed. The selection of the best theshold is thus defeed until flooplans ae geneated and aea, wie length and additional metics ae available. Algoithm 4 Tee collapsing function OLLAPSETEE(HieachyTee, theshold) any nodes epesenting pats of the netlist with less aea than theshold ae collapsed if BLOKAEA(T ee) > theshold then Keep this node intact; continue walking the tee T 1 OLLAPSETEE(HieachyTee.lefthild, theshold) T 2 OLLAPSETEE(HieachyTee.ighthild, theshold) etun EATETEE(T 1, T 2 ) else T combine all descendants of H into single node etun T egulaity discovey An essential stage in the flooplanning algoithm is finding epeated pattens of blocks in the netlist. The methodology poposed in this wok is based on the ideas of fequent subgaph discovey (FSM), a popula eseach aea within the domain of data mining [90]. As seen in Section 2.1.2, the goal of FSM is to identify epeated subgaphs in a gaph. We conside two distinct subgaphs G 1, G 2 of a gaph G to be a epetition if they ae isomophic. In such case, we call both G 1 and G 2 instances of the same

78 70 hapte 4. egulaity-constained flooplanning epeating patten P. In ou fomulation, each type of block in the netlist (coe, oute, memoy module, etc.) has a diffeent label. Two vetices of a gaph ae consideed to be isomophic only if they have the same label. The algoithm is shown in Algoithm 5. EGULAITYDISOVEY stats fom a FlattenedTee as input. At evey iteation of the inne loop the most fequent patten is found, and then the netlist gaph is compessed with all the instances of such gaph. This iteative pocess geneates the egulaity DAG as seen in Fig The oute loop ensues that the existing hieachy indicated by FlattenedTee is peseved. Instead of finding the most fequent patten of the entie netlist, we initially limit ou seach to the subcicuits in nodes FlattenedTee whose depth is equal to the maximum depth of the entie tee. Only if no epeating pattens ae found in those subcicuits, the seach poceeds by enlaging the seach ae to include all subcicuits in nodes with fewe depth, deceasing the minimum allowed depth (umindepth) by one. This way, the seach algoithm ensues that pattens that ae fully contained inside the patitions maked by hieachy boundaies ae pefeed befoe pattens that do not. Pocedue FINDMOSTFEQUENTPATTEN implements fequent subgaph discovey based on [90]. It is based on a constuctive beam seach model [120]. At evey iteation, we keep a list L of candidate pattens. This list is initialized with all tivial pattens of size 1 (that is, evey subgaph with a unique label). At evey iteation, evey patten P in L is tested to check which of its instances can be extended by including an adjacent vetex. Each possible extension is stoed in L new. The extended pattens in L new ae soted by thei VALUE and only a subset of them is selected accoding to thei best value. The numbe of suviving pattens is detemined by b (beam width). The algoithm finishes when no futhe extensions fo any patten in L can be found. The VALUE function is used to disciminate between valid pattens when moe than one patten is found fo a given gaph G. In Hieg the following function is used: VALUE(P) = numbe of instances of P in G G + P This value ensues all pattens ae odeed fistly by thei fequency. When compaing two pattens with the same numbe of epetitions, the patten with the lagest vetex count ( P ) is pefeed Bounding cuve constuction In this stage, a bounding cuve is constucted fo each one of the egula hieachy tees discoveed by the pevious stage. The bounding cuve is constucted

79 4.4. egula flooplanning algoithm 71 Algoithm 5 Fequent subgaph discovey function FINDMOSTFEQUENTPATTEN(G) G input gaph L { label l G : a subgaph with a single node v G with label(v)= l} b beam width P best FONT(L) while EMPTY(L) do L new fo all P L do fo all vetex u / P adjacent to v P do L new L new EXTEND(P, u) SOT(L new by descending VALUE()) L fist b elements of L new if VALUE(FONT(L)) > VALUE(P best ) then P best FONT(L) etun P best function EGULAITYDISOVEY(ollapsedTee) PattenList umindepth maximum depth of ollapsedtee while umindepth 0 do G contents of all nodes fom ollapsedtee with depth >= umindepth epeat P FINDMOSTFEQUENTPATTEN(G) append P to PattenList G OMPESS(G, P) until no epeating pattens in G umindepth umindepth 1 etun PattenList

80 72 hapte 4. egulaity-constained flooplanning by a post-ode walk in the hieachy tee. This is simila to the techniques used in othe hieachical flooplannes [37, 143]. Evey egula hieachy tee is a diected acyclic gaph whee evey non-leaf node is a subcicuit epesenting a egula patten, and and edge between P 1 and P 2 indicates that P 1 s definition contains an instance of P 2. Only once the bounding cuves fo evey childen of a patten P have been constucted the algoithm can poceed to constuct the bounding cuve of P itself. To constuct the bounding cuve of a patten P, two diffeent seach stategies ae used depending on the numbe of blocks n. Fo pattens whee n is less than a theshold N, an exhaustive banch-and-bound seach algoithm is used. This algoithm exploes evey possible slicing flooplan. If n N, a moe efficient heuistic seach based on simulated annealing and slicing tees is used. This heuistic seach geneates a much educed numbe of esults than the banch-and-bound appoach, but is capable of handling pattens with a much lage numbe of components. The theshold N depends on the specifications of the host compute, such as the amount of available memoy. The bounding tees fo each hieachy tee ae combined into a single bounding cuve that epesents the outlines of all flooplans found. Because many hieachy tees will contain simila sets of pattens, Hieg uses memoization in ode to decease the untime of this stage Flooplan constuction Afte selecting a subset of points fom the final bounding cuve (Γ ), the final stage of the algoithm constucts flooplans stating fom the outlines specified by these points. A single point in a bounding cuve can epesent multiple flooplans with the same outline, including flooplans that only diffe in mioing and simple block swapping (but also flooplans with entiely diffeent layouts that happen to shae the same outline). Because the outline is fixed, the selection of these flooplans cannot affect whitespace, but it may have a significant impact on the wie length and egulaity metics. The algoithm in Algoithm 6 implements a geedy seach that finds a combination of flooplans fo a selected point P of the bounding cuve that minimizes a given cost function compaing wie length and egulaity. By manipulating the cost function, the designe is able to guide the seach to eithe enfoce moe egula flooplans, o on the othe hand pefe flooplans with inceased connectivity quality. An impotant technique used duing this pocess is teminal popagation [59]. When selecting a flooplan fo a patten T, we know the layout and positions of all instances of othe subpattens t 1, t 2,..., t n contained in T. Thus, fo all

81 4.5. esults 73 nets that have a teminal in a child subpatten t i, but also have othe teminals in any othe subpattens of T, the algoithm can popagate the appoximate teminal positions of those teminals outside t i. This allows calculation of the wie length when selecting flooplans fo t i, even fo nets extenal to t i. Algoithm 6 Flooplan constuction function ONSTUTFLOOPLANS(P, T) P is the selected point of Γ T is the hieachy tee that was used F flooplans fom Γ with size P fo all t in HILDEN(T) do popagate teminal positions fom F to t p shape of t in F F t ONSTUTFLOOPLANS(p, t) select combination of F 1,..., F n that minimizes OST(F, F 1,..., F n ) expand F with F 1,..., F n etun F function OST(F, F 1, F 2,..., F n ) etun WIELENGTH(F 1 )+WIELENGTH(F 2 )+... numbe of flooplans in F 1, F 2,... with the same layout To incease flooplan quality o in ode to geneate moe than one esult, Hieg combines this algoithm with the technique of beam seach [120]. Instead of keeping a single cuent solution, the b best solutions ae kept. 4.5 esults We implemented Hieg in ++ and tested it on a set of design examples. All the expeiments in this section wee un on a Intel Xeon 2.8Ghz PU with 32GB of AM. While the implementation can make use of multiple coes, it was limited to a single thead fo fai compaisons. METIS [87] was used to geneate the gaph bisections equied fo hieachy discovey. Since thee is little pevious wok on egulaity-constained flooplanning fo multi-pocessos, thee ae not many available benchmaks designed to compae the quality of egula flooplans. ommonly used flooplanning public domain benchmaks ae mostly fom old designs that do not contain much egulaity. Thus, duing this section, we will use atificial benchmaks based on many-coe MP designs. It is had to give a numeical metic fo egulaity in a flooplan. Fo Hieg geneated flooplans, we can povide an estimation of egulaity based on the

82 74 hapte 4. egulaity-constained flooplanning A 1 L3 1 B L B A B A... B A Figue 4.7: Netlist used fo the scalability and quality expeiments. egulaity DAG used to geneate them. Evey node in the DAG with multiple input edges epesents a sublayout that has been eplicated. Thus, a egulaity metic can be built by compaing the aea of all nodes in this DAG with an expanded vesion whee none of the layouts ae eplicated: aea of DAG egulaity = 1 aea of equivalent expanded tee Because othe tools do not taget the ceation of egula flooplans, we cannot povide simila egulaity metics fo non-hieg flooplans Heteogeneous tiled MP example We stat this section by mentioning the esults pesented duing Section 4.3, the heteogeneous tiled MP. We also show the diffeences in wie length fom a flooplanning that peseves hieachy vesus one that does not. The netlist used epesents a MP containing 64 tiles, with 48 tiles being pocessing tiles, containing 4 coes each, and 16 memoy contolle tiles. The tiles ae aanged accoding to the diamond patten fom [9]. The benchmak has a total of 816 blocks. The physical infomation fo these blocks is shown in Table 4.4. Fo this example MP configuation, Hieg geneates a flooplan (Fig. 4.2d) with 12.6% whitespace and a HPWL of 516 m. Up to 81% of the flooplan aea is egula. This flooplan was used by combining hieachy and egulaity. Hieg automatically pefes flooplans whee hieachy is peseved to aound the tile level, as those povide the best egulaity with minimal loss in othe metics. Fo compaison, the flooplan in Fig. 4.2b, which was ceated without hieachy constaints, has a slightly wose whitespace (13.5%), wose egulaity (66.6%

83 4.5. esults 75 Flooplanning untime [secs.] Hieg DeFe Total whitespace [%] Hieg DeFe Numbe of components Hieg DeFe Numbe of components HPWL [m] Numbe of components Figue 4.8: ompaison of untime, aea and wie length.

84 76 hapte 4. egulaity-constained flooplanning omponent Aea Aspect atio Global ing outes () 0.27 mm 2 1 lustes of type A 4 oe (1) 1.38 mm o cache 1 mm L3 cache 3 mm Local ing oute () 0.27 mm 2 1 lustes of type B 2 oe (2) 3.75 mm o cache 2 mm L3 cache 6 mm Local ing oute () 0.27 mm 2 1 Table 4.4: Physical infomation fo Fig of aea) and a much wose HPWL (1076 m). These hieachy-less flooplans wee exploed but discaded by Hieg because of the lowe metics. When compaed to othe tools (Table 4.3), Hieg povides esults that ae competitive in wie length but slightly less in aea. We configued DeFe to optimize fo aea and wie length given a maximum aspect atio constaint of 4 5. ompass was configued to optimize fo aea within the same aspect atio constaint. Hieg used 15.4 seconds to geneate that example flooplan. 13% of the time was spent duing hieachy and egulaity discovey, 27% of the time was spent flooplanning all the discoveed pattens, and 60% geneating the final flooplans. Both DeFe and ompass took less than one second to geneate the flooplans Scalability and flooplan quality This expeiment measues the loss of optimality in aea and wie length caused by the use of egulaity, as well as measue the execution time equied fo the implemented algoithm. We compae Hieg, configued to optimize fo maximum egulaity, to DeFe, configued to optimize fo both aea and HPWL. All the test cases wee geneated based on the configuation shown in Fig This configuation is a ing of ings, whee a global ing connects a set of clustes of two altenating types. The physical chaacteistics ae detailed in Table 4.4. Fo this example, evey net contains 1024 wies, and the wie pitch is 0.1µm. This configuation is used in this testcase because the total numbe of clustes in the ing can be easily paameteized, poviding multiple testpoints with diffeent numbes of components.

85 4.5. esults 77 L3 L3 L3 L3 L3 L3 L3 L B A L3 L3 L3 L3 L3 L3 L3 L A B A B A B A B 1 L3 1 L3 1 1 L3 L3 L3 L3 A L3 L3 B L3 1 L3 1 1 L3 L3 L3 L L3 2 2 L A A A A A A A A B A B A B A B A B B B B B B B B (a) (b) Figue 4.9: egula flooplans fo two diffeent ing configuations. Figue 4.8 shows the esults of the compaison. Fo a highly egula netlist such as the one used in this expeiment, the untime gowth of both hieachical flooplannes (poposed and DeFe) is close to linea. Howeve, a puely hieachical flooplanne such as DeFe is still much faste. On the othe hand, the esults show that both the whitespace and HPWL of flooplans geneated by Hieg ae compaable to the esults povided by DeFe Tading off hieachy and egulaity Figue 4.9 shows two diffeent potential configuations fo the ing configuation descibed in Fig These two configuations diffe in the oganization of the global ing. In configuation a, clustes A and B appea in altenating ode in the global ing. onfiguation b contains the same clustes but configued so that clustes of the same type ae gouped togethe. The net fo the global ing is shown in both figues as a dashed line. Fo Fig. 4.9b, Hieg geneates a egula flooplan whee the diffeent cluste types ae stictly sepaate. The flooplan povides good whitespace metics because the simila clustes ae being packed into aays. Howeve, in case a, such packing would not be possible without impacting wie length. Theefoe, the best flooplan found by Hieg elaxes hieachy a bit, and combines the two cluste types into a single epeating patten, poviding bette packing. Hieg can also be configued to futhe elax egulaity when impoved wie length o othe metics ae pefeable. In figue 4.10, we show thee diffeent flooplans fo the configuation in Fig. 4.9b. Figue 4.10b and c ae geneated by vaying the cost function mentioned fom Section 4.4.5, while a

86 78 hapte 4. egulaity-constained flooplanning HPWL [m] Hieg DeFe (b) (c) egulaity L3 1 1 L3 L3 1 1 L L3 L L3 L L3 L3 1 L3 L3 1 1 L3 L3 1 L3 1 L (b) 1 L L L L3 1 L3 1 L3 1 L3 L3 L L3 L L L3 L3 1 L L (a) DeFe 1 L3 1 L3 1 1 L3 L3 L3 L L3 L L3 1 L3 1 1 L3 L3 L3 L L3 2 2 L (c) Figue 4.10: Tading off egulaity and wie length in configuation descibed in Fig. 4.9b

87 4.6. onclusions 79 is geneated using DeFe. The plot shows how by tading off egulaity, HPWL can be impoved, appoaching the HPWL of the flooplan geneated by DeFe fo the same configuation. As DeFe does not geneate egula flooplans, the plot uses only its HPWL as baseline fo compaisons Publications As pat of this eseach topic we have published the following confeence aticle: J. de San Pedo, J. otadella, and A. oca, A hieachical appoach fo geneating egula flooplans, in Poceedings of the 33th IEEE/AM Intenational onfeence on ompute-aided Design, San Jose, alifonia, USA, onclusions This chapte has intoduced Hieg, a new flooplanning tool that geneates egula flooplans while peseving the inheent egulaity of the design. The method is specially suited fo MPs with many coes and can handle systems with heteogeneous tiles. The method delives layouts with high egulaity and acceptable aea, and also educes wie length when compaed to othe hieachical appoaches.

88 80 hapte 4. egulaity-constained flooplanning

89 hapte 5 Log-based simplification of pocess models The visualization of models is essential fo use-fiendly human-machine inteactions duing Pocess mining. A simple gaphical epesentation contibutes to give intuitive infomation about the behavio of a system. Quality-peseving model simplifications can be of paamount impotance to alleviate the complexity of finding useful and attactive visualizations. This chapte pesents a collection of log-based techniques to simplify pocess models. The techniques tade off visual-fiendly popeties with quality metics elated to logs, such as fitness and pecision, to avoid degading the esulting model. The algoithms, eithe cast as optimization poblems o heuistically guided, find simplified vesions of the initial pocess model, and can be applied in the final stage of the pocess mining life-cycle, between the discovey of a pocess model and the deployment to the final use. A tool called PNsimpl has been developed and tested on lage logs, poducing simplified pocess models that ae one ode of magnitude smalle while keeping fitness and pecision unde easonable magins. This chapte is oganized as follows. Section 5.1 exemplifies the goals of this wok with a simple example, and Section 5.2 compaes these goals with the state of the at. In Section 5.3, a log-based technique to estimate the impotance of acs and places in a Peti net is descibed. Section 5.4 poposes seveal algoithms to simplify a Peti net using this infomation. The techniques ae evaluated in Section 5.5. Finally, conclusions ae discussed in Section 5.6.

90 82 hapte 5. Log-based simplification of pocess models 5.1 Motivation The undestandability of a pocess model can be seiously hampeed by a poo visualization. Many factos may contibute to this, being complexity a cucial one: models that ae unnecessaily complex (incopoating edundant components, o components with limited impotance) ae often not useful fo undestanding the pocess behind. On the othe hand, pocess models ae expected to satisfy all quality metics when epesenting an event log: fitness, pecision, simplicity and genealization [2]. In this chapte we pesent techniques to simplify a pocess model while etaining the afoementioned quality metics unde easonable magins. Given a spaghetti-like pocess model, one may simply emove acs and nodes until a nice gaphical object is obtained. Howeve, this naive technique has two main dawbacks. Fist, the capability of the simplified model to eplay the pocess executions may be consideably degaded, thus deiving a highly unfitting model. Second, the model components, acs and places in a Peti net, ae not equally impotant when eplaying pocess executions, and theefoe one may be inteested in keeping those components that povide moe insight into the eal boundaies on what is allowed by the pocess (i.e., its pecision). Given a Peti net and an event log, PNsimpl fist anks the impotance of places and acs using a simple simulation of the log by the Peti net, and then simplifies the model by etaining those acs and places that ae impotant in esticting the behavio allowed by the model. Theeby, PNsimpl is moe accuate at mantaining the pecision of the model. Seveal altenatives ae pesented, which extact cetain Peti net subclasses (State Machines, Fee-hoice) o stuctual subclasses (Seies-Paallel gaphs). Some of the poposed altenatives can, as a use decision, also educe fitness in ode to futhe incease the simplicity of the model. This option can be used when dealing with highly complex pocesses, whee a slightly unfit model may be pefeable to a non-undestandable model. Additionally, this featue may be used to emove the complexity caused by noise [7] in the oiginal pocess logs. In hapte 6, we popose an altenative method to handle highly unstuctued pocess by splitting them into seveal easie to undestand slices Example We will illustate one of the techniques pesented in this chapte with the help of an example. We have used the geneal-pupose tool dot [66] to ende these examples. Figue 5.1a epots a pocess model that has been discoveed by the ILP mine fom a eal-life log, a well-known method fo pocess discovey [138]. This mine guaantees pefect fitness (i.e., the model is able to epoduce all

91 5.2. elated wok 83 the taces in the log), but its pecision value is low (31.5%) which indicates that the model may geneate many taces not obseved in the log. lealy, this model does not give any insight about the executions of the pocess behind. Hence, although it is a model having pefect fitness, some of the othe quality metics (pecision, simplicity) ae not satisfactoy. Applying the simplification techniques of PNsimpl, a pocess model can be tansfomed with the objective of impoving its undestandability. The pocess models at the bottom of Fig. 5.1 ae the esult of applying two of the poposed techniques. In Figue 5.1b the model is simplified while peseving as much as possible the quality metics of the oiginal model. The model has 6 times less places and acs, making it much easie to undestand. The esulting fitness is still pefect, but the pecision has been educed to 22.5%. In Figue 5.1c we educe the model to a seies-paallel gaph, futhe impoving its simplicity and undestandability. Fitness has been educed to 64.1%, but on the othe hand its pecision has impoved consideably (now 48.7%). 5.2 elated wok uently, most of the existing academic tools fo visualization of pocess models ae based on the dot algoithm [66, 67], which is a fou-stage appoach fo laying out diected gaphs. dot ties to minimizes both edges cossing and edge length. Using dot on vey dense gaphs esults in spaghetti-like visualizations. All the models displayed in this thesis have been laid out with dot. When the undelying gaph has cetain stuctue (as geneal business pocess models have), then ad-hoc algoithms that take advantage of this stuctue can be consideed, like the one pesented in [70]. In summay, the afoementioned wok does not conside log-based simplifications like the ones pesented in this wok, and theefoe, they can be used in combination with the techniques of this chapte to optimize the visualization. The closest wok to the methods of this chapte is [63], whee a technique was pesented fo the simplification of pocess models that contols the degee of pecision and genealization. It applies seveal stages. Fist, a log-based unfolding of the model is computed, deiving a pecise unfolded model. Second, this unfolding is then filteed, etainning only the fequent pats. Finally, a folding technique is applied which contols the genealization of the final model. Futhe simplifications can be applied, which help on alleviating the complexity of the deived model. Thee ae significant diffeences between the two appoaches: while in ou case, the techniques ely on light methods and can be oiented towads diffeent objectives, the appoach in [63] equies the computation of unfoldings, which can be exponential on the size of the initial

92 84 hapte 5. Log-based simplification of pocess models (a) Initial pocess model. f s t a h o d b e g l u v i j k m n q c p (b) Simplified fitting pocess model. i m h n k p t j f s o c d q a b u e g l v (c) Simplified seies-paallel pocess model. Figue 5.1: Log-based simplification of an spaghetti-like pocess model.

93 5.3. Metics fo elevant acs 85 model [107]. Also, the filteing on the unfolding is done on simple fequency selection on the unfolding elements, while in this wok the impotance of model elements is assessed with the fequency but also tiggeing infomation, which is elated to the pecision dimension. On the othe hand, the techniques of this chapte may need to veify model connectedness at each iteation. In conclusion, both techniques can be combined to futhe impove the oveall simplification of a model. The simplification of a pocess model should be done with espect to quality metics, and in this chapte we have focused on fitness and pecision. An altenative to this appoach would be to include these quality metics in the discovey, a featue that has only been consideed in the past by the family of genetic algoithms fo pocess discovey [4, 28, 134]. All these techniques include costly evaluations of the metics in the seach fo an optimal pocess model, in ode to discad intemediate solutions that ae not pomising. This makes these appoaches extemely inefficient in tems of computing time. Futhemoe, thee exist discovey techniques that focus on the most fequent paths [71, 137]. These appoaches ae meant to be esilient to noise, but on the othe hand give no guaantees on the quality of the deived model. Additionally, these methods ae oiented towads modeling fomalisms with less expessiveness, such as heuistic nets, o fomalisms with less stict semantics, such as fuzzy models. A ecent technique that is guided towads the discovey of block-stuctued models (pocess tees) and that addesses these issues may be a pomising diection [101]. Howeve, this technique is guided towads a paticula class of Peti nets (wokflow and sound), descibing a vey esticted type of behavios. An impotant dawback of this technique is that silent activities need to be intoduced in the esulting pocess model to epesent the model with this esticted stuctue. Fo instance, fo the log used to geneate the example of Figue 5.3, the inductive mine deives a model with twice the size of the models geneated by ou techniques. Finally, the techniques of this chapte can be combined with abstaction mechanisms to futhe impove the visualization of the undelying pocess model. 5.3 Metics fo elevant acs Given a Peti net and an event log, in this section we intoduce a technique to obtain a scoing of the acs (and, indiectly, places) of the net with espect to thei impotance in descibing the behavio undelying in the log. The idea of the poposed technique is simple: when a Peti net eplays a paticula tace in the log, some acs may have moe impotance than othes fo that paticula tace. Hence, tiggeing and utilization scoes will be defined

94 86 hapte 5. Log-based simplification of pocess models to povide an estimation of the impotance of the acs in eplaying the log. Acs (p, t) 0 with high tigge scoe coespond to fequent situations in the model whee moe behavio should not be allowed (i.e., the ac, and theefoe p, is fequently disabling cetain tansitions to occu). By keeping these acs/places in the model, one aims at deiving a model whee pecision is not degaded. onvesely, an ac (t, p) 0 with high utilization scoe denotes a situation whee tansition t is fequently fied (thus fequently adding tokens to p), and theefoe should not be emoved to avoid degading fitness. Definition 5.1 (Tigge Ac). Let N = P, Σ, T,,, m 0 be a Peti net, σ a fitting tace fo N, and e σ an event that is epesented by fiing m[t m in N. Fo any pai p P, t T, an ac (p, t) 0 is tigge in m[t m iff t is not enabled in m but enabled in m and m(p) < (p, t) but m (p) (p, t). Intuitively, an ac (p, t) 0 is tigge at evey tansition t σ in which t becomes enabled and p is in the set of places which, in that tansition t, eceived the last tokens equied fo enabling t. Thus, a fequently-tigge ac indicates p is impotant in esticting the behavio allowed by the model, and that p o (p, t) cannot be emoved without sacificing pecision. Note that fo a single tansition t thee may be moe than one tigge ac, even in the same tansition t σ. To use this infomation, we define a tigge scoe which chaacteizes the fequency of an ac in playing the tigge ole: Definition 5.2 (Tigge Scoe of an Ac). Let N = P, Σ, T,,, m 0 be a Peti net, and L a log fitting N. The tigge scoe of an ac (p, t) 0, denoted by (p, t), is the numbe of tansitions fom L in which (p, t) is tigge. Fo (t, p) 0 acs, we use a simple fequency metic: Definition 5.3 (Utilization Scoe of an Ac). Let N = P, Σ, T,,, m 0, and L a log fitting N. The utilization scoe of an ac (t, p) 0, denoted by (t, p), is the numbe of times tansition t is fied in L. Given a log and a Peti net, obtaining the tigge scoes can be done by eplaying all taces in the log, shown by Algoithm 7. Fo evey tansition in the log, the scoes ae updated by compaing the makings fom the pedecesso places of all newly enabled tansitions. Figue 5.2 shows the esults of computing, on an example tace and model, both tigge and utilization scoes. Utilization scoes ae shown in italics. Finally, notice that the definitions of this section assume fitting taces. Given an unfitting tace (i.e., a tace that cannot be eplayed by the model), an alignment between the tace and the model will povide a feasible sequence that is closest to the tace [10]. This allows widening the applicability of the scoing techniques of this section to any pai (log, model).

95 5.3. Metics fo elevant acs 87 A BDEF BDEF BDEF BEDF BEDF A B D E F (a) Example tace (b) Peti net with tigge/utilization scoes Figue 5.2: Tigge and utilization scoe computation fo an example tace and model. Algoithm 7 TIGGESOES Input: An event log L and a Peti net N = P, Σ, T,,, m 0 Output: A scoe (p, t) fo evey ac (p, t) 0 fo σ L do Let m 0 [t 1 m 1 [t 2... [t n m n = σ fo i 1... n do fo t T do if t is enabled in m i t was not enabled in m i 1 then fo p t do t is enabled in m i = m i (p) (p, t) if m i 1 (p) < (p, t) then (p, t) (p, t) + 1 etun

96 88 hapte 5. Log-based simplification of pocess models 5.4 Simplification methods The tiggeing and utilization scoes computed in pevious section povide an estimation of the impotance of the acs in eplaying the log. Acs (p, t) 0 with high tigge scoe coespond to fequent situations in the model whee moe behavio should not be allowed (i.e., the ac, and theefoe p, is fequently disabling cetain tansitions to occu). By keeping these acs/places in the model, one aims at deiving a model whee pecision is not degaded. onvesely, an ac (t, p) 0 with high utilization scoe denotes a situation whee tansition t is fequently fied (thus fequently adding tokens to p), and theefoe should not be emoved to avoid degading fitness. In this section, 3 diffeent appoaches to the simplification poblem ae shown. Figue 5.3 illustates these appoaches by applying each one to the input model shown in Fig. 5.3a. The fist appoach educes the input to a Peti net that is visually close to a seies-paallel gaph, emoving the least impotant acs and places accoding to thei scoes (Fig. 5.3b). Howeve, it has the geatest computational cost. We intoduce a second appoach that educes the simplification poblem to an Intege Linea Pogamming (ILP) optimization poblem that is moe efficient and optionally guaantees the pesevation of fitness (Fig. 5.3c and d). These two techniques use scoing infomation computed fom the log, as descibed in the pevious section. The thid appoach, howeve, does not conside this infomation. Instead, the Peti net is pojected into diffeent stuctual classes: fee choice (Fig. 5.3e) and state machine (5.3f). The following subsections will descibe each one of these appoaches in detail Simplification to a Seies-Paallel Net A seies-paallel net is one obtained by the ecusive seies o paallel composition of smalle nets. Seies-paallel Peti nets ae amongst the most compehensible types of models. In a seies-paallel net, foks and choices (and thus concuency) ae immediately visible. In fact, existing documentation often uses seies-paallel nets as examples to illustate concepts elated to Peti nets. Fo this eason, one of the main contibutions in this wok is a heuistic that educes a complex Peti net into an almost seies-paallel net. The algoithm iteatively emoves the least impotant edges until the gaph is eithe stictly seies-paallel, o no additional eduction can be applied without losing the connectedness of the net. The impotance of evey ac is detemined by thei tigge scoe (p, t), fo place-tansition acs, and thei utilization scoe (t, p) fo tansition-place acs. The appoach is gounded in the notion of a set of eduction ules, explained below.

97 5.4. Simplification methods 89 b d e g f c a (a) Oiginal model using the algoithm in [138]. a c g f d b e (b) Simplified to seies-paallel. a c d b g e f (c) Simplified using ILP model. a c f g e b d (d) Simplified using ILP model, peseving fitness. b d c f e g a (e) Simplified to fee choice. f b a g d e c (f) Simplified to state machine. Figue 5.3: Oveview of the diffeent simplification techniques.

98 90 hapte 5. Log-based simplification of pocess models (2) (1) (a) eduction ule. (b) Souce Peti net (with ule violations). (c) Tansfomed net. Figue 5.4: Applying a tansfomation and tansfomation cost. eduction ules In [112] a set of eduction ules used fo the analysis of lage Peti net systems is intoduced. Each of the tansfomations peseves liveness, safeness and boundedness of a Peti net. Thus, veification of these popeties can be done in the simplified net instead of the oiginal one. The tansfomations poposed ae: fusion of seies places/tansitions, fusion of paallel places/tansitions and elimination of self-loop places/tansitions. A ule can be applied only when its peconditions ae satisfied. An example of the fusion of paallel places ule can be seen in Fig. 5.4a. Because of the constuction of a seies-paallel Peti net, it is possible to educe such a net to a single place o tansition by ecusive application of these tansfomations. Theefoe, evey violation of the peconditions of a ule indicates a subnet which is not seies-paallel. To educe a Peti net to a seies-paallel skeleton, this wok uses these eduction ules in an indiect way. We do not use the tansfomed Peti net that esults fom the application of the ules. Instead, the poposed method emoves those acs and places which pevent the ules fom being applied. Fo evey one of the eduction ules, a tansfomation cost is defined: the sum of the tigge and utilization scoes of all the acs that would need to be emoved in ode to apply such tansfomation. The tansfomation cost theefoe models the impotance of the acs that would need to be emoved. Figue 5.4 shows an example ule, the computation of its tansfomation cost, and the esulting gaph afte applying the tansfomation ule. This ule can only be applied in this input Peti net if two acs (dashed lines in Fig. 5.4b) ae emoved. Thus, the tansfomation cost is equal to the tigge scoe of ac (1) and utilization scoe of ac (2).

99 5.4. Simplification methods 91 Algoithm Algoithm 8 descibes the main iteation of the method. Function APPLIABLE- TANSFOMATIONS identifies all possible applications of the eduction ules, and computes the tansfomation cost fo each of the possible applications. Algoithm 8 Seies-Paallel algoithm Input: A Peti net N 0 = P, Σ, T,,, m 0, a tigge scoe (p, t) fo evey (p, t) ac, and a utilization scoe (t, p) fo evey (t, p) ac. Output: A simplifed Peti net N N 0 M APPLIABLETANSFOMATIONS(N) while M > 0 do m tansfomation with least cost fom M N APPLYTANSFOMATION(N, m) if N is disconnected then M M \ {m} continue N 0 N 0 \ PEONDITIONVIOLATINGAS(N, m) N N M APPLIABLETANSFOMATIONS(N) etun N 0 At evey iteation the tansfomation m with the least cost is selected, that is, the one that equies emoving the least amount of impotant acs in ode to be applied. Function APPLYTANSFOMATION applies such tansfomation m. If applying the tansfomation beaks the net into moe than one connected component, the next best tansfomation is selected instead. Othewise, function PEONDITIONVIOLATINGAS enumeates all the acs that had to be emoved in ode to satisfy the peconditions of tansfomation m. Those acs ae emoved them fom the oiginal Peti net N 0. The next iteation epeats the pocess on the tansfomed net N, finding new APPLIABLETANSFOMATIONS only aound the nodes that wee changed on the pevious iteation. Once no additional eduction ules can be applied (e.g. because the net is now a single place o tansition), the algoithm stops. The cuently gaph N is discaded, and the esult of the algoithm is the simplified Peti net N 0. A final postpocessing step emoves unneeded places (e.g. without incident acs). The nets geneated by this heuistic ae not necessaily fully seies-paallel, since acs necessay to peseve connectivity ae neve emoved. This is the only method fom this wok that pesents such a global guaantee, with the

100 92 hapte 5. Log-based simplification of pocess models othe methods poviding weake connectivity constaints. It is also possible to configue the method to geneate stictly seies-paallel models Simplification Using ILP Models This section shows a diffeent appoach to simplify a Peti net fo visualization. The selection of which acs to emove is seen as an optimization poblem, and modeled as an Intege Linea Pogam (ILP). The use of ILP allows fo highly efficient solving stategies. On the othe hand, some constaints cannot be modeled using ILP. Fo example, the models attempt to peseve connectivity of the net at a localized level (i.e. ensuing tansitions maintain at least one pedecesso and successo place), but cannot guaantee global net connectivity. The aim of the ILP model is to educe the numbe of acs as much as possible. The inputs ae a Peti net N = P, Σ, T,,, m 0, tigge scoes (p, t) and utilization scoes (t, p). We define a binay vaiable S(p) fo evey p P, and a binay vaiable A(p, t) o A(t, p) fo evey ac in N. In a solution of this model, vaiable S(p) is 0 when place p is to be emoved fom the input gaph (similaly fo ac vaiables A(p, t) and A(t, p)). Below we descibe the ILP model in detail. min (p,t)>0 A(p, t) + s.t. p P : S(p) (p,t)>0 (t,p)>0 (t,p)>0 A(t, p) (5.1) A(p, t) > 0 A(t, p) > 0 (5.2) t p t p (p, t)a(p, t) >= Γ (5.3) (t, p)a(t, p) >= Φ (5.4) t T : A(t, p) > 0 A(p, t) > 0 (5.5) p t p t p P : M(p) > 0 = S(p) (5.6) t T, p P : (t, p) > 0 S(p) = A(t, p) (5.7) The objective function, Eq. 5.1, minimizes the numbe of peseved acs. onstaint 5.2 encodes the elationship between A and S vaiables. A place is etained in the output net iff at least one pedecesso/successo ac is etained. The model ensues that the most impotant acs, accoding to the tigge scoes, ae peseved. Fo this, constaint 5.3 imposes a minimum numbe of peseved acs. Γ can be configued as a pecentage of the combined tigge

101 5.4. Simplification methods 93 scoe fom all place tansition acs. A simila theshold constant Φ is imposed using the utilization scoe fo tansicion place acs (Eq. 5.4). A fully connected gaph cannot be guaanteed by the ILP model. Instead, Eq. 5.5 models a weake constaint: evey tansition will peseve at least one pedecesso and successo ac. In addition, evey place maked in m 0 is always peseved, to avoid deiving a stuctually deadlocked model (Eq. 5.6). Peseving fitness (optional) The ILP model as descibed so fa does not guaantee pesevation of fitness fom the oiginal Peti net. A simple modification can ensue that the existing fitness is peseved, at the cost of being able to emove only a educed numbe of acs fom the model. Following a well-known esult in Peti net theoy, emoving only (t, p) acs neve educes the fitness of a model fo any given log. onstaint 5.7 implements this estiction Pojection into stuctual classes In this section we pesent ILP models to educe Peti nets to two types of stuctual classes: fee choice and state machines [112]. These methods do not equie a log as they do not use tigge o utilization scoes. Theefoe, these poposals can be used to simplify Peti nets fo visualization even when logs ae not available, albeit thei esults may be of lowe quality since scoing infomation is not used. Note that [138] can also be configued to geneate state machines o maked gaphs, but this appoach equies having a log. In addition, the models extacted may still be complex because of the equiement to peseve fitness. Fee hoice In this method, we simplify Peti nets by conveting them into fee choice nets. This method peseves the fitness of the model, but educes pecision. While this eduction does not necessaily esult in models simple enough fo visualization, complexity is educed while mantaining most stuctual popeties. Thus, educing a dense net into fee choice both opens the doo to efficient analysis and to futhe decomposition (state machine o maked gaph coves) techniques [57]. We encode this definition as a set of constaints and ceate a ILP poblem which maximizes the numbe of acs. Fo evey p P, t T, we define a binay vaiable A(p, t) which indicates whethe ac (p, t) is peseved.

102 94 hapte 5. Log-based simplification of pocess models max s.t. (p,t)>0 A(p, t) (5.8) p P : p > 1 (p ) > 1 = t p A(p, t) = 1 t p, p t : p p = A(p, t) (5.9) Equation 5.9 guaantees a fee choice net. If p > 1 (it is a choice) and p > 1 (it is not fee), then p contains a non-fee choice, and one of the conditions must be emoved. Eithe only one of the successo acs of p is peseved, eliminating the choice, o it is tuned fee by emoving evey pedecesso ac of p except fo the ones oiginating fom p itself. Because (t, p) acs ae neve being emoved, this simplification peseves fitness. State Machine In a state machine Peti net, evey tansition has exactly one pedecesso ac and one successo ac. To encode this equiement into an ILP model, we again define a binay vaiable A(p, t) o A(t, p) fo evey ac in N. max (p,t)>0 s.t. t T : t T : A(p, t) + (p,t)>0 (t,p)>0 (t,p)>0 A(t, p) (5.10) A(p, t) = 1 (5.11) A(t, p) = 1 (5.12) onstaints 5.11 and 5.12 encode the definition of a state machine. Howeve, note that this method may educe the fitness of the model. A simila ILP model can be ceated to extact a maked gaph. 5.5 esults The methods poposed in this chapte have been implemented in ++. The ILP-based methods have been implemented using a commecial ILP solve, Guobi [72]. To obtain the input models, the ILP mine [138] available in PoM 6.4 was used ove a set of 10 complex logs, both eal-life [29, 56] and synthetic [137]. The publicly available dot utility [66] has been used to geneate the visualizations of all the models of this section. The measuements

103 5.5. esults 95 Nodes Acs ossings Fitness Pecision (a) Oiginal net % 43.1% (b) Seies-paallel % 37.9% (c) ILP model % 75.4% (d) ILP (fitting) % 40.7% (e) Fee choice % 31.3% (f) State machine % 81.3% Table 5.1: Simplicity, pecision and fitness compaison fo models in Fig of fitness and pecision have been done using alignment-based confomance checking techniques [10]. Both the logs and ou implementation ae publicly available at ompaison of the simplification techniques In Section 5.4 (Fig. 5.3), an atificial model was used to illustate the diffeent simplification techniques pesented in this wok. Table 5.1 shows the details fo each one of the simplified models. Seveal metics ae used to evaluate the esults fom the simplification techniques. To evaluate the undestandability and simplicity of a model, we use the size of the gaph, in numbe of nodes and acs, as well as the numbe of cossings. This is the numbe of acs that intesect when the gaph is embedded on a plane. Thus, a plana gaph has no cossings. A gaph with many cossing acs is clealy a spaghetti that is pooly suited fo visualization. To appoximate the numbe of cossings, the mincoss algoithm fom dot [66] is used. To measue how much the simplified Peti nets model the behavio of the oiginal pocess we use fitness and pecision, as defined in [10]. In this example, the seies-paallel eduction offes pefect fitness, and only 5% loss of pecision while emoving half of the acs and all cossings. Howeve, the othe methods also emain inteesting. Fo example, the state machine simplification offes the best eduction in simplicity and inceases the pecision of the model to 80%, at the cost of educing the fitness by 50%. This section also includes a compaison with some of the pevious wok in the aea: the Inductive Mine (IM) [101] and a unfolding-based method [63]. The IM is a mine guided towads discoveing block-stuctued models and which we see as a pomising technique (see Section 5.2) since it can be tunned to guaantee pefect fitness. On the othe hand, the unfolding pocedue is moe closely elated to the methods poposed in this wok. This technique uses an unfolding pocess to simplify an existing Peti net. We have evaluated both methods using the same nets as with ou poposed methods.

104 96 hapte 5. Log-based simplification of pocess models Figue 5.5 shows the Peti nets poduced by the diffeent techniques on a eal-life log [56] that is moe spaghetti-like. The high numbe of cossings in the oiginal model make it unsuitable fo visualization. In this example, the seies-paallel method no longe offes pefect fitness but still shows a good tade-off between complexity and fitness/pecision. The othe methods may be used if fo example stict fitness pesevation is equied, at the cost of moe complex models. In Fig. 5.6, we compae numeically the techniques of this chapte fo the 10 logs. Fo most of the logs, the seies-paallel eduction and the ILP-based techniques ae able to educe the numbe of cossing edges by seveal odes of magnitude (note the logaithmic scale), ceating small visualizable gaphs fom models that would othewise be impossible to layout. On the othe hand, the simplification to fee choice esults in vey lage and complex models. As mentioned, the benefits of deiving fee choice models come fom the ability to apply additional eduction stategies. Simplifying to state machines geneally poduces pooly fitting models, but they tend to have vey few cossings and high pecision. The models geneated by the Inductive Mine, one of the existing methods included in the compaison, geneally contain fewe cossings, caused by the addition of a significant numbe of silent tansitions 1 which incease the size of the model. Fo example, in the incidenttelco example the numbe of tansitions of the model deived by the IM is 37, whilst the oiginal model (and, coespondingly, those geneated by the simplification techniques) has 22. The addition of silent activities can be beneficial fo visualization, specially if the undelying pocess model is meant to be block-stuctued. On the othe hand, the unfolding pocedue is moe closely elated to the methods poposed in this wok. This technique uses an unfolding pocess to simplify an existing Peti net, and has been evaluated using the same nets as with ou poposed methods. In geneal, it poduces bette esults in tems of fitness and pecision with espect to the ILP models, at the expense of longe computation time. When compaed with the seies-paallel method, the esults in fitness and pecision ae compaable, but the unfolding method equies moe computation time and the esults ae wost in tems of visualization. In Fig. 5.7 we compae the untimes of the diffeent methods. The ILP solve esolved all the ILP simplification models in less than 1 minute, even fo the lagest of the input Peti nets fom the test set (25K nodes and acs). The seies-paallel simplification, which is not ILP based, has a lowe pefomance. Howeve, thee ae many pats whee the algoithm could be optimized. Still, the total execution untime fo the lagest gaph (25 minutes) was less than the 1 A silent activity in the model is not elated to any event in the log.

a l c e q h k g j o f i v p s t n u m b d Nodes Acs ossings Fitness Pecision (a) 54 448 9805 100% 31.5% (b) 54 320 5069 100% 19.4% (c) 44 93 76 76.7% 42.2% (d) 37 163 728 100% 15.

105 s q g p j d n f a l b o e h t v c m k i u 5.5. esults 97 (a) Oiginal Peti net. (b) Simplified to Fee hoice. d b q p s o i v c t h j k a m u l g e n f (c) Using ILP model, 60% theshold. (d) Using ILP model (fitting), 60% thes. a b c d e f h s d g k v b j i j k m u a h o n f l t s l m n p q q e o t g p c u i v (e) Simplified to Seies-paallel. (f) Using Inductive Mine [101]. a l c e q h k g j o f i v p s t n u m b d Nodes Acs ossings Fitness Pecision (a) % 31.5% (b) % 19.4% (c) % 42.2% (d) % 15.6% (e) % 37.8% (f) % 47.24% (g) % 25.8% (g) Using unfoldings-based method [63]. (h) Fitness and pecision esults. Figue 5.5: unning all methods on eal-life log (incidenttelco).

Controlled Information Maximization for SOM Knowledge Induced Learning

Controlled Information Maximization for SOM Knowledge Induced Learning 3 Int'l Conf. Atificial Intelligence ICAI'5 Contolled Infomation Maximization fo SOM Knowledge Induced Leaning Ryotao Kamimua IT Education Cente and Gaduate School of Science and Technology, Tokai Univeisity