Parallelization Optimization of System-Level Specification

Similar documents
CS 241 Week 4 Tutorial Solutions

Error Numbers of the Standard Function Block

CMPUT101 Introduction to Computing - Summer 2002

Midterm Exam CSC October 2001

Internet Routing. IP Packet Format. IP Fragmentation & Reassembly. Principles of Internet Routing. Computer Networks 9/29/2014.

Duality in linear interval equations

Paradigm 5. Data Structure. Suffix trees. What is a suffix tree? Suffix tree. Simple applications. Simple applications. Algorithms

COMP 423 lecture 11 Jan. 28, 2008

Lecture 8: Graph-theoretic problems (again)

Enterprise Digital Signage Create a New Sign

CS553 Lecture Introduction to Data-flow Analysis 1

Chapter 9. Greedy Technique. Copyright 2007 Pearson Addison-Wesley. All rights reserved.

Distance vector protocol

McAfee Web Gateway

Lesson 4.4. Euler Circuits and Paths. Explore This

Type Checking. Roadmap (Where are we?) Last lecture Context-sensitive analysis. This lecture Type checking. Symbol tables

Calculus Differentiation

Lecture 12 : Topological Spaces

CS453 INTRODUCTION TO DATAFLOW ANALYSIS

4.3 Balanced Trees. let us assume that we can manipulate them conveniently and see how they can be put together to form trees.

The Network Layer: Routing in the Internet. The Network Layer: Routing & Addressing Outline

Distance Computation between Non-convex Polyhedra at Short Range Based on Discrete Voronoi Regions

Introduction to Algebra

CS 551 Computer Graphics. Hidden Surface Elimination. Z-Buffering. Basic idea: Hidden Surface Removal

CS 340, Fall 2016 Sep 29th Exam 1 Note: in all questions, the special symbol ɛ (epsilon) is used to indicate the empty string.

10.2 Graph Terminology and Special Types of Graphs

Greedy Algorithm. Algorithm Fall Semester

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have

Minimal Memory Abstractions

GENG2140 Modelling and Computer Analysis for Engineers

CS201 Discussion 10 DRAWTREE + TRIES

Package Contents. Wireless-G USB Network Adapter with SpeedBooster USB Cable Setup CD-ROM with User Guide (English only) Quick Installation

Final Exam Review F 06 M 236 Be sure to look over all of your tests, as well as over the activities you did in the activity book

COMP108 Algorithmic Foundations

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

An Efficient Code Update Scheme for DSP Applications in Mobile Embedded Systems

[SYLWAN., 158(6)]. ISI

LINX MATRIX SWITCHERS FIRMWARE UPDATE INSTRUCTIONS FIRMWARE VERSION

Fig.25: the Role of LEX

Compiling a Parallel DSL to GPU

[Prakash* et al., 5(8): August, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

Width and Bounding Box of Imprecise Points

COSC 6374 Parallel Computation. Non-blocking Collective Operations. Edgar Gabriel Fall Overview

FASTEST METHOD TO FIND ALTERNATIVE RE-ROUTE

Lecture 13: Graphs I: Breadth First Search

c s ha2 c s Half Adder Figure 2: Full Adder Block Diagram

Pattern Matching. Pattern Matching. Pattern Matching. Review of Regular Expressions

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5

Troubleshooting. Verify the Cisco Prime Collaboration Provisioning Installation (for Advanced or Standard Mode), page

INTEGRATED WORKFLOW ART DIRECTOR

Outline. Motivation Background ARCH. Experiment Additional usages for Input-Depth. Regular Expression Matching DPI over Compressed HTTP

A decision support system prototype for fuzzy multiple objective optimization

SOFTWARE-BUG LOCALIZATION WITH GRAPH MINING

this grammar generates the following language: Because this symbol will also be used in a later step, it receives the

Balanced Trees. 2-3 trees red-black trees B-trees. 2-3 trees red-black trees B-trees smaller than. 2-node. 3-node E J S X A C.

Slides for Data Mining by I. H. Witten and E. Frank

Towards Unifying Advances in Twig Join Algorithms

Advanced Programming Handout 5. Enter Okasaki. Persistent vs. Ephemeral. Functional Queues. Simple Example. Persistent vs.

Inter-domain Routing

Photovoltaic Panel Modelling Using a Stochastic Approach in MATLAB &Simulink

From Dependencies to Evaluation Strategies

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries

Selecting the Most Highly Correlated Pairs within a Large Vocabulary

Distributed Systems Principles and Paradigms. Chapter 11: Distributed File Systems

COSC 6374 Parallel Computation. Communication Performance Modeling (II) Edgar Gabriel Fall Overview. Impact of communication costs on Speedup

Incremental Design Debugging in a Logic Synthesis Environment

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming

Table-driven look-ahead lexical analysis

What are suffix trees?

Efficient Subscription Management in Content-based Networks

6.045J/18.400J: Automata, Computability and Complexity. Quiz 2: Solutions. Please write your name in the upper corner of each page.

Shared Memory Architectures. Programming and Synchronization. Today s Outline. Page 1. Message passing review Cosmic Cube discussion

Distributed Systems Principles and Paradigms

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs.

Convex Hull Algorithms. Convex hull: basic facts

Additional Measurement Algorithms in the Overhauser Magnetometer POS-1

MITSUBISHI ELECTRIC RESEARCH LABORATORIES Cambridge, Massachusetts. Introduction to Matroids and Applications. Srikumar Ramalingam

Computational geometry

Taming Subgraph Isomorphism for RDF Query Processing

All in One Kit. Quick Start Guide CONNECTING WITH OTHER DEVICES SDE-4003/ * 27. English-1

Compression Outline :Algorithms in the Real World. Lempel-Ziv Algorithms. LZ77: Sliding Window Lempel-Ziv

Don Thomas, 1998, Page 1

Problem Final Exam Set 2 Solutions

2 Computing all Intersections of a Set of Segments Line Segment Intersection

COMPUTER EDUCATION TECHNIQUES, INC. (WEBLOGIC_SVR_ADM ) SA:

Fault tree conversion to binary decision diagrams

Compilers. Topic 4. The Symbol Table and Block Structure PART II. Mick O Donnell: Alfonso Ortega:

To access your mailbox from inside your organization. For assistance, call:

Ma/CS 6b Class 1: Graph Recap

COMMON FRACTIONS. or a / b = a b. , a is called the numerator, and b is called the denominator.

Class Overview. Database Design. Database Design Process. Database Design. Introduction to Data Management CSE 414

Stack Manipulation. Other Issues. How about larger constants? Frame Pointer. PowerPC. Alternative Architectures

High-speed architectures for binary-tree based stream ciphers: Leviathan case study

McAfee Data Loss Prevention Prevent

the machine and check the components AC Power Cord Carrier Sheet/ Plastic Card Carrier Sheet DVD-ROM

Single-Layer Trunk Routing Using 45-Degree Lines within Critical Areas for PCB Routing

UTMC APPLICATION NOTE UT1553B BCRT TO INTERFACE PSEUDO-DUAL-PORT RAM ARCHITECTURE INTRODUCTION ARBITRATION DETAILS DESIGN SELECTIONS

EECS150 - Digital Design Lecture 23 - High-level Design and Optimization 3, Parallelism and Pipelining

V = set of vertices (vertex / node) E = set of edges (v, w) (v, w in V)

Doubts about how to use azimuth values from a Coordinate Object. Juan Antonio Breña Moral

Transcription:

Prlleliztion Optimiztion of System-Level Speifition Luki i niel. Gjski enter for Emedded omputer Systems University of liforni Irvine, 92697, US {li, gjski} @es.ui.edu strt This pper introdues the prlleliztion optimiztion of system-level speifition, whih eplores miml prllelism mong funtionl loks of the design. We introdue two tools, spe profiler nd spe optimizer, to implement the prlleliztion optimiztion utomtilly.

Inde 1Introdution...1 2Implementtion of Prlleliztion Optimiztion...2 2.1 Prlleliztion Optimiztion Tsks...2 2.2 Sequentil ehvior Serhing...2 2.3 ependeny nlysis...3 2.3.1 efinition...3 2.3.2 ependeny nlysis...3 2.4 Instne Struture Optimiztion...3 2.4.1 Hierrhil Prllel Struture...3 2.4.2 Gols of Instne Struture Optimiztion...3 2.4.3 lgorithms for Instne Struture Optimiztion...4 3Speifition Modeling Proess...8 4Eperientil results...9 4.1 Mnul Prlleliztion vs. utomti Prlleliztion...9 4.2 Results for 10 Instne Emples...9 4.3 Results for 20 Instne Emples...9 4.4 Rel Projet Emples...10 5onlusion...11 Referene...11

List of Figures Figure 1: Etended Gjski nd Kuhn s Y hrt...1 Figure 2: Emple 1 of prlleliztion optimiztion...2 Figure 3: Prlleliztion optimiztion tsks...2 Figure 4: Emple 2 of prlleliztion optimiztion...3 Figure 5: Three types of PrGroup...5 Figure 6: Four ses of inserting instne to Flt PrGroup...5 Figure 7: Three ses of inserting instne to Pr PrGroup...7 Figure 8: Three ses of inserting instne to Sequ PrGroup....8 Figure 9: n emple of designers improvements on the results of onstrutive lgorithm...8

List of Tles Tle 1 : Overview of three solutions for the emple in Figure 9...8 Tle 2: esign time of the mnul prlleliztion...9 Tle 3: Results for 10 instne emples...9 Tle 4: Results for 20-30 instne emples...10 Tle 5: Results for JPEG nd Voode Projet Emples...10

Prlleliztion Optimiztion of System-Level Speifition Luki i, niel. Gjski enter for Emedded omputer Systems University of liforni Irvine, 92697, US {li, gjski} @es.ui.edu strt This pper introdues the prlleliztion optimiztion of system-level speifition, whih eplores miml prllelism mong funtionl loks of the design. We introdue two tools, spe profiler nd spe optimizer, to implement the prlleliztion optimiztion utomtilly. ehviorl Speifition tuning S System RTL Logi Trnsistor rhiteturl 1 Introdution In order to hndle the ever inresing ompleity nd time-to-mrket pressures in the design of system-onhips(sos) or emedded systems, the design hs een rised to the system level to inrese produtivity. Figure 1 illustrtes etended Gjski nd Kuhn s Y hrt[1] representing the entire design flow, whih is omposed of four different levels: system level, RTL level, logi level, nd trnsistor level. The thik r represents the system level design. It strts from the speifition representing the design s funtionlity, whih is denoted y point S. The system level design then synthesizes the speifition to the system rhiteture denoted y point. system rhiteture onsists of numer of PEs (proessing elements) onneted y uses. Eh PE implements numer of funtionl loks in the speifition. The system level design ontins series of tsks inluding PE llotion nd ehvior inding. PE llotion selets PEs for the rhiteture. ehvior inding mps different funtion loks in the speifition to different PEs. In ddition to tsks in eisting system level design, we dd tsk speifition tuning to the system design flow, whih is denoted y the dotted irle round point S. Speifition tuning not only redues the ompleity of the speifition, ut lso eplores miml prllelism eisting in the speifition, whih re used for tsks PE llotion nd ehvior inding in lter steps. For emple, if only two funtionl loks n e eeuted in prllel in the speifition, then PE llotion will hoose no more thn two PEs in the rhiteture euse of prllel eeution. For the sme reson, ehvior inding lso mps the two funtionl loks to different PEs. Physil Figure 1: Etended Gjski nd Kuhn s Y hrt This pper introdues the prlleliztion optimiztion of speifition tuning, whih eploits miml prllelism mong funtionl loks of the design s speifition. esigners n implement prlleliztion optimiztion mnully. In generl, designers strt modeling the speifition from eisting /++ ode. Sine /++ lnguge does not support prllelism, designers must mnully find the prllelism y nlyzing the ode or designs lgorithms, whih is time-onsuming. fter finding the prllelism mong the funtionl loks in the speifition, designers must determine the hierrhil prllel struture of the speifition. fter prlleliztion optimiztion, one originl speifition my produe different hierrhil prllel strutures. For emple, in Figure 2(), funtionl loks,,, nd re eeuted sequentilly. In Figure 2(), the dependenies mong the funtionl loks re displyed. lok n only e eeuted fter the eeution of, while lok n only e eeuted fter the eeution of. In Figure 2() nd (d), two possile hierrhil prllel strutures re shown. The funtionl loks seprted y dotted line represent prllel eeuted loks. In Figure 2(), lok 1

nd re eeuted prllel fter the prllel eeution of nd. In Figure 2(d), lok is eeuted fter while lok is eeuted fter. The eeution of nd is prllel with the eeution of nd. euse one originl speifition my produe different hierrhil prllel strutures, we prefer implementing prlleliztion optimiztion struturlly y tools rther thn rndomly y hnd. Spe uses keyword ehvior to represent funtionl lok. Eh ehvior ontins numer of methods tht define the funtionlity, set of ports tht onnet it with other ehviors, nd numer of ehvior instnes to support ehvior hierrhil modeling. The pper is orgnized s follows: Setion 2 desries the implementtion of the utomti prlleliztion; Setion 3 introdues the speifition modeling proess with the utomti prlleliztion; Setion 4 gives eperimentl results. Finlly, the onlusion is mde in Setion 5. () Originl eeuting sequene () ependenies mong funtionl loks 2 Implementtion of Prlleliztion Optimiztion 2.1 Prlleliztion Optimiztion Tsks In this pper, we prllelize sequentil ehviors. sequentil ehvior is defined s the ehvior tht only ontins numer of sequentil eeuting ehvior instnes. () Solution 1 fter prlleliztion optimiztion (d) Solution 2 fter prlleliztion optimiztion Figure 2: Emple 1 of prlleliztion optimiztion Therefore we mke two tools, spe profiler nd spe optimizer, to implement prlleliztion optimiztion utomtilly: spe profiler nlyzes the dependenies mong funtionl loks; spe optimizer finds out the hierrhil prllel struture with miml prllelism. We ompred the mnul prlleliztion with the utomti prlleliztion nd onluded the utomti prlleliztion produed etter results in terms of design time nd hierrhil prllel strutures. The prlleliztion optimiztion ontins three tsks shown in Figure 3. The first tsk, sequentil ehvior serhing, finds ll the sequentil ehviors in the speifition. The seond tsk, dependeny nlysis, omputes the dependenies mong ehvior instnes of the sequentil ehviors. Finlly, the third tsk, instne struture optimiztion, finds the hierrhil prllel struture for eh sequentil ehvior ording to the dependenies. Sequentil ehvior serhing ependeny nlysis We use Spe lnguge[2][3] to model the speifition. In ontrst to other system level design lnguges suh s System[4], Spe lnguge is synthesis-sed design lnguge euse it provides keywords suh s pr nd pipe to model prllel nd pipeline eeuting reltions mong funtionl loks. Epliitly speifying the eeuting reltions enles system-level synthesis tools to reognize the hierrhil prllel strutures, whih mke it possile for them to implement PE seletion nd ehvior inding utomtilly. Instne struture optimiztion Figure 3: Prlleliztion optimiztion tsks 2.2 Sequentil ehvior Serhing We first find ll the sequentil ehviors in the speifition. Sequentil ehviors re identified y internlly ttriutes of Spe internl representing formt. 2

2.3 ependeny nlysis 2.3.1 efinition sequentil ehvior ontins ehvior instnes nd, set of lol vrile V i, nd set of port P j. is eeuted efore. If there eists V i or P j tht () write to V i /P j nd reds from V i /P j, or () oth nd write to V i /P j, or () There eists ehvior instne of suh tht depends on nd depends on,. then ehvior instne depends on ehvior instne If ehvior instne depends on ehvior instne, then must e eeuted fter the eeution of. Otherwise, ehvior instnes nd n e eeuted prllel. 2.3.2 ependeny nlysis We ompute the dependenies mong ehvior instnes y nlyzing the port trffi of ehvior instnes. First, we use spe profiler [5] to produe the speifition sttistis. Spe profiler genertes the stti trffi nd dynmi trffi of ehvior ports. Stti trffi of the port refers to the numer of ports of lef ehviors to whih it is onneted. Lef ehvior is the ehvior ontining only set of methods without ny ehvior instnes, whih is used s the instne of other ehviors. ynmi trffi of the port refers to the numer of port ess during simultion. If the port is n input port, nd stti/dynmi trffi is greter thn 0 for tht port, we onlude tht the ehvior sttilly/dynmilly red from the port. Likewise, if the port is n output port nd stti/dynmi trffi is greter thn 0, we onlude tht the ehvior sttilly/dynmilly write to the port. The inout port n e treted in similr wy. Seond, we nlyze the port onnetions of ehvior instnes. If ehvior instnes nd of sequentil ehvior meet the onditions () or () in 2.3.1 sttilly or dynmilly, then sttilly or dynmilly depends on. Finlly, we find the stti/dynmi ehvior dependenies sed on the ondition () in 2.3.1. fter dependeny nlysis, designers n determine whether one ehvior instne depends on nother sed on either stti dependeny or more greedy, dynmil dependeny. 2.4 Instne Struture Optimiztion 2.4.1 Hierrhil Prllel Struture Instne struture optimiztion hnges the instne struture from one-level pure-sequentil struture to multilevel hierrhil prllel struture. Figure 4 gives n emple of the hierrhil prllel struture. fter instne struture optimiztion, the produed hierrhil prllel struture hs three levels shown in Figure 4(). In the first level, nd E re prllel eeuted. In the seond level, is eeuted efore the eeution of nd E. In the third level, nd ( is eeuted fter ) re eeuted prllel with,, nd E. Note tht two of three levels re prllel struture. E () Originl eeuting sequene () ependenies mong funtionl loks () Hierrhil prllel struture fter prlleliztion optimiztion Figure 4: Emple 2 of prlleliztion optimiztion 2.4.2 Gols of Instne Struture Optimiztion uring instne struture optimiztion, we wnt to hieve two gols. () Minimize the numer of dded dependenies mong ehvior instnes. fter instne struture optimiztion, some independent ehvior instnes will e hnged to dependent ehvior instnes euse of overuse prllelism. For emple, the solution shown in Figure 2() dds two pirs of dependenies: depends on nd depends on, while do not eist in Figure 2(). dding dependenies mong ehvior instnes re unvoidle; therefore we hoose minimizing the numer of dded dependenies s the first gol. E E 3

() Minimize the length of ritil pth of produed hierrhil prllel struture. The length of the ritil pth of hierrhil prllel struture is defined s the numer of ehvior instnes on the longest pth from the first strting ehvior instne to the lst ending ehvior instne, while prllel-eeuted instnes n e eeuted simultneously. 2.4.3 lgorithms for Instne Struture Optimiztion We implemented two lgorithms for instne struture optimiztion: SP(s soon s possile) lgorithm nd onstrutive lgorithm. 2.4.3.1 SP lgorithm lgorithm 1 outlines the SP lgorithm for instne struture optimiztion for eh sequentil ehvior. is n instne group tht ontins set of ehvior instnes in sequentil ehviors. Hier_Strut denotes the generted hierrhil prllel struture ontining link of groups, eh of whih is eeuted sequentilly from the hed to the til of the link. The funtion ependenton() returns Φ if no ehvior instne on whih depends is in, otherwise it returns the first instne on whih depends. urgroup.ppend() inserts to group urgroup. ll the instnes in urgroup re eeuted prllel. fter the eeution of for loop eh time, urgroup reords set of prllel eeuting ehvior instnes. Hier_Strut.ppend(urGroup) then ppends the urrent urgroup t the end of the link of Hier_Strut. Figure 2() is the hierrhil prllel struture generted y SP lgorithm. The SP lgorithm hs only gol (), whih is to minimize the length of the ritil pth. It gives the optiml solution in terms of the ritil pth ut my dd lrge mount of dependenies mong ehvior instnes. lgorithm 1: SP lgorithm. = {ll the ehvior instnes}; Hier_Strut = {}; while Φ do urgroup = {}; for eh instne i do if ependenton(i ) = Φ then urgroup = urgroup.ppend(i, Pr); = - {i ); do endfor Hier_Strut = Hier_Strut.ppend(urGroup, Sequ); do 2.4.3.2 onstrutive lgorithm esides SP lgorithm, we lso implemented onstrutive lgorithm. The onstrutive lgorithm shedules one ehvior instne t time in the order of the eeution sequene in the originl sequentil ehvior nd produes temporl hierrhil prllel struture. It is onstrutive euse it onstruts the hierrhil prllel struture without performing ny ktrking, i.e. hnging the previously produed temporl struture. The onstrutive lgorithm hs oth gols () nd () during instne struture optimiztion. Figure 2(d) is the hierrhil prllel struture generted y the onstrutive lgorithm. 2.4.3.2.1 t Struture of Hierrhil Prllel Struture efore introduing the onstrutive lgorithm, we first speify the dt struture for the hierrhil prllel struture in the lgorithm. Eh hierrhil prllel struture is represented y dt struture PrGroup. Eh PrGroup ontins set/link of hild PrGroups, or link of ehvior instnes. The items in the links re eeuted sequentilly from hed to til of the link. The items in the set re eeuted prllel. Figure 5 shows three types of PrGroups. Flt PrGroup ontins ehvior instne link without ny hild PrGroups. Pr PrGroup ontins set of hild PrGroups. Sequ PrGroup ontins link of hild PrGroups. 4

G1 G1 G1 (efore) (fter) result = {} () se 1: no instntition in group(efore) () G1 is Flt PrGroup () G1 is Pr PrGroup Figure 5: Three types of PrGroup () G1 is Sequ PrGroup (efore) (fter) new_group = {} result = { {,, }, {} } 2.4.3.2.2 lgorithm Overview lgorithm 2.1 outlines the onstrutive lgorithm. The ehvior instne link ontins ll of the ehvior instnes of the sequentil ehvior, whih re sved in the order of eeution sequene of the sequentil ehvior. Strting from the hed of link, n instne of is inserted into PrGroup Hier_Strut t time y funtion Insert. Funtion Insert lls different inserting funtions ording to the type of Hier_Strut. fter ll of the instnes in re inserted, Hier_Strut represents the finl hierrhil prllel struture. lgorithm 2.1: The onstrutive lgorithm = {ll the ehvior instnes} Hier_Strut = {}; () se 2: does not depends on,, (efore) () se 3: depends on, (fter) result = {,,, } for eh instne i do Insert(Hier_Strut, i); endfor Funtion Insert(Hier_Strut, ) swith Type(Hier_Strut) do se FLT: Hier_Strut = InsertToFlt(Hier_Strut, i ); rek; se PR: Hier_Strut = InsertToPr(Hier_Strut, i); rek; se SEQU: Hier_Strut = InsertToSequ(Hier_Strut, i ); endswith (efore) (d) se 4: depends on, (fter) new_group1 = {} new_group2 = {} new_group3 = {, } new_group4 ={ {}, {} } result = { {, }, { {}, {} } } Figure 6: Four ses of inserting instne to Flt PrGroup. 5

2.4.3.2.3 Insert to Flt PrGroup lgorithm 2.2 outlines the funtion InsertToFlt tht inserts n instne to Flt PrGroup Hier_Strut. InsertToFlt ontins four different ses ording to different dependeny reltions etween nd instnes in Hier_Strut s instne links. The result reords the produed hierrhil prllel struture. The emples of the four ses re displyed in Figure 6. lgorithm 2.2: InsertToFlt(Hier_Strut, ) // se 1 if NoInstInGroup(Hier_Strut) = 1 do result = ppendinst(hier_strut, ); // se 2 else if NotependOnGroup(Hier_Strut, ) = 1 do new_group = Group(, FLT); result = Group(Hier_Strut, new_group, PR) // se 3 else if ependonlstinst(hier_strut, ) = 1 do result = ppendinst(hier_strut, ); // se 4 else if d1 = FindLstependInst(Hier_Strut, ); new_group1 = Group(, FLT); new_group2 = Group( llsu(hier_strut,d1),flt); new_group3 = Group( llpred(hier_strut,d1),flt); new_group4 = Group( new_group1, new_group2, PR); result = Group(new_group3, new_group4, SEQU); endif return result; In the first se, funtion NoInstInGroup finds whether Hier_Strut s instne link ontins ny instnes. If not, is inserted in to the link y funtion ppendinst. In the seond se, if funtion NotependOnGroup finds tht does not depend on ny instnes in the link, then funtion Group retes new Flt PrGroup new_group ontining only nd retes new Pr PrGroup result whih ontins Hier_Strut nd new_group s its hild PrGroups. In the third se, funtion ependonlstinst finds whether depends on the lst instne in the link. If so, ppendinst ppends to the end of the instne link of Hier_Strut. In the lst se, funtion FindLstependInst finds the ltest instne d1 on whih depends. The ltest instne refers to the instne tht is most lose to the til of the instne link of Hier_Strut. New_group1 is new Flt PrGroup ontining. New_group2 is nother new Flt PrGroup ontining ll the instnes following d1 in the instne link of Hier_Strut. The instnes in New_group2 re stored in New_group2 s instne link in the sme order s tht of Hier_Strut. New_group3 is the third new Flt PrGroup tht ontins ll the instnes in front of d1 inlusively, sved in the sme order s tht of Hier_Sturt. New_group4 is Pr PrGroup ontining new_group1 nd new_group2. The result is new Sequ PrGroup ontining new_group3 followed y new_group4 in its hild PrGroup link. 2.4.3.2.4 Insert to Pr PrGroup lgorithm 2.3 outlines the funtion InsertToPr tht inserts n instne to Pr PrGroup Hier_Strut. InsertToPr ontins three different ses ording to different dependeny reltions etween nd hild PrGroups in Hier_Strut s instne. We define tht n instne depends on PrGroup if nd only if depends on t lest one instne in PrGroup. The result reords the produed hierrhil prllel struture. The emples of the three ses re displyed in Figure 7. lgorithm 2.3: InsertToPr(Hier_Strut, ) // se 1 if NotependOnhildGroup(Hier_Strut, ) = 1 do new_group = Group(, FLT); result = ddhildgroup(hier_strut, new_group); // se 2 else if ependononehildgroup(hier_strut, ) = 1 do su_group = FindependhildGroup(Hier_Strut, ); result = Insert(su_group, ); // se 3 else if new_group1 = Group(, FLT); new_group2 = Group( ependhildgroup(hier_strut,d1),pr); new_group3 = Group(IndependhildGroup(Hier_Strut,d1),PR); new_group4 = Group( new_group2, new_group1, SEQU); result = Group(new_group3, new_group4, PR); endif In the first se, if funtion NotependOnhildGroup finds tht does not depend on ny hild PrGroups of Hier_Strut, then funtion Group retes new Flt PrGroup new_group ontining. Funtion ddhildgroup then dds new_group into Hier_Strut s its hild PrGroup. In the seond se, if funtion ependononehildgroup finds tht only depends on one hild PrGroup of Hier_strut denoted y su_group, then funtion Insert desried in lgorithm 2.1 inserts to su_group. In the lst se, if depends on more thn one hild PrGroups of Hier_strut, then five new PrGroups will e produed. New_group1 is Flt PrGroup ontining. New_group2 is Pr PrGroup ontining ll the hild PrGroups of Hier_strut tht depends on. New_group3 is Pr PrGroup ontining ll the hild PrGroups of 6

Hier_strut tht does not depend on. New_group4 is Sequ PrGroup ontining new_group2 followed y new_group1 in this hild PrGroup link. Finlly, the result is Pr PrGroup ontining hild PrGroup new_group3 nd new_group4 in its hild PrGroup set. (efore) () se 1: does not depend on,, nd. new_group = {} result = { {}, {}, {}, {} } () se 2: depends on. is inserted to 's PrGroup y Insert result = { {}, {, }, {} } new_group1 = {} new_group2 = { {}, {} } new_group3 = {} new_group4 = { { {}, {} }, {} } result = { { { {}, {} }, {} }, {} } () se 3: depends on nd Figure 7: Three ses of inserting instne to Pr PrGroup. PrGroups in Hier_Strut. The emples of the three ses re displyed in Figure 8. lgorithm 2.4: InsertToSequ(Hier_Strut, ) // se 1 if NotependOnhildGroup(Hier_Strut, ) = 1 do new_group = Group(, FLT); result = result = ddhildgroup(hier_strut, new_group); // se 2 else if ependonlsthildgroup(hier_strut, ) = 1 do hild_group = FindLsthildGroup(Hier_Strut); result = Insert(hild_group, ); // se 3 else if lst_depend_hild = FindLstependhildGroup (Hier_Strut, ); net_hild = Net(lst_depend_hild); solution1 = Insert(lst_depend_hild, ); solution2 = Insert(net_hild, ) result = est(solution1, solution2); The first se of InsertToSequ is the sme s the first se of InsertToPr. In the seond se, if funtion ependonlsthildgroup finds tht depends on the til PrGroup of hild PrGroup link of Hier_strut, then funtion Insert desried in lgorithm 2.1 inserts to this hild PrGroup hild_group. In the third se, funtion FindLstependhildGroup finds lst_depend_group, whih is the hild PrGroup in its hild PrGroup link tht is most losest to the til of its hild PrGroup link, mong the hild PrGroups on whih depends. Net_hild is the immedite suessive hild PrGroup of lst_depend_group in the link. Then two lternte solutions, inserting in lst_depend_group nd inserting in net_hild, re eplored. The first solution ensures tht its mount of dded dependenies re not greter thn tht of the seond solution, while the seond solution ensures tht its length of ritil pth is not longer thn tht of the first solution. Finlly, funtion est hooses the solution1 in the se tht the length of ritil pth of solution1 is not longer thn tht of solution2. Otherwise, est hooses solution2 s the result. 2.4.3.2.5 Insert to Sequ PrGroup lgorithm 2.4 outlines the funtion InsertToSequ tht inserts n instne to Sequ PrGroup Hier_Strut. InsertToSequ ontins three different ses ording to different dependeny reltions etween nd hild 7

d e f d e f (efore) new_group = {} result = {{ {}, {} }, {} } () se 1: does not depend on nd result = { {}, {,} } () se 2: depends on () ependenies mong instntitions () Solution1: SP lgorithm's result d d () se3 - solution1 (d) se3 - solution2 e f () Solution2: onstrutive lgorithm 's result e (d) Solution3: improved result f lst_depend_hild = {} net_hild = {} result = { {}, { {}, {} } } se 3: depends on. Solution2 is the result Figure 8: Three ses of inserting instne to Sequ PrGroup. 3 Speifition Modeling Proess We introdue the proess of speifition modeling using the spe profiler nd the spe optimizer, whih ontins three steps. First, designers write Spe speifition model y referening originl /++ ode. esigners n speify top level prllelism mong ehvior instnes ording to design lgorithms/stndrds. Seond, designers use the spe profiler nd the spe optimizer. The tools red Spe speifition model nd generte hierrhil prllel strutures in the formt of teturl file for sequentil ehviors. Third, designers optimize the speifition model sed either on the result of SP lgorithm or on the result of the onstrutive lgorithm. Figure 9: n emple of designers improvements on the results of onstrutive lgorithm In the third step, designers n lso hnge the result of onstrutive lgorithm y referening SP lgorithm for shorter ritil pth, whih is illustrted in Figure 9. y referening the result of SP lgorithms shown in Figure 9(), designers n hnge the result of onstrutive lgorithm shown in Figure 9(), to prllel eeute instne e nd f. s shown Figure 9(d), the improved solution hs the shorter ritil pth thn the solution in Figure 9(). Tle 1 gives overview of three solutions. Tle 1 : Overview of three solutions for the emple in Figure 9 dded ependeny Length of ritil pth SP 4 3 onstrutive 1 4 Improved 2 3 8

4 Eperimentl results We evlute the effiieny of the spe profiler nd the spe optimizer in terms of design time, the length of the ritil pth of the resulting hierrhil prllel struture, nd the dded dependenies of the resulting struture. We hose four sets of testing emples. First, we hose three emples for ompring the mnul prlleliztion with the utomti prlleliztion. Seond, we hose ehviors with no more thn 10 instnes. Third, we hose ehviors with more thn 20 instnes. Finlly, we hose rel projet emples. 4.1 Mnul Prlleliztion vs. utomti Prlleliztion First, we evlute the effiieny of the spe profiler nd the spe optimizer in terms of design time. We rndomly generted three sequentil ehviors, two of whih ontins 10 ehvior instnes, the rest of whih ontins 20 ehvior instnes. The required design time for the mnul prlleliztion is listed in Tle 2. Tle 2 lso shows tht the mnul prlleliztion nnot nlyze dynmi dependeny. Tle 2: esign time of the mnul prlleliztion. Mnul design tsks nlyze stti dependeny nlyze dynmi dependeny E. 1 (10 inst.) esign time (mins) E. 2 (10 inst.) 6 7 16 E. 3 (20 inst.) Not vl. Not vl. Not vl. SP 2 2 6 onstrutive 3 2 17 Totl 11 11 39 In ontrst to 11/11/39 minutes required y the mnul prlleliztion in the emples, the utomti prlleliztion took less thn 3 seonds for eh emple, whih is 220/220/780 times fster thn the time for the mnul prlleliztion. s the ompleity of design inreses, designers n sve more time y using the tools. 4.2 Results for 10 Instne Emples Tle 3: Results for 10 instne emples E1 E2 E3 E4 E5 Originl Num. instntition 8 10 10 10 10 Num. depedeny 14 34 13 22 36 Length. P 8 10 10 10 10 onstrutive lgorithm Num. depedeny 18 39 15 23 37 Length. P 4 8 4 5 7 dded dependeny (%) 28.57% 14.71% 15.38% 4.55% 2.78% Redued P (%) 50.00% 20.00% 60.00% 50.00% 30.00% SP lgorithm Num. ependeny 24 42 32 38 40 Length. P 4 7 4 5 6 dded ependeny (%) 71.43% 23.53% 146.15% 72.73% 11.11% Redued P (%). 50.00% 30.00% 60.00% 50.00% 40.00% We rndomly generted 5 ehviors, eh of whih ontins no more thn 10 instnes. Tle 3 shows the results of prlleliztion optimiztion for the emples. Length. P represents the length of ritil pth. Num. dependeny represents the numer of stti dependenies mong instnes of ehvior. dded dependeny (%) is equl to the differene etween Num. dependeny(originl) nd Num. dependeny(onstrutive/sp lgorithm) divided y Num. dependeny(originl). Redued P(%) is equl to the differene etween Length. P (Originl) nd Length. P (onstrutive/sp lgorithm) divided y Length. P(Originl). Tle 3 shows tht the verge dded dependeny for the onstrutive lgorithm is 13.2%, while for SP lgorithm is 65.0%. Therefore, onstrutive lgorithm is muh etter thn SP lgorithm in terms of gol () desried in 2.4.2. On the other hnd, the verge Redued P for the onstrutive lgorithm is 42%, while for the SP lgorithm is 46%, oth of whih re similr. y onsidering the numer of dependeny s well s the length of the ritil pth, we onlude tht the onstrutive lgorithm is etter for 10 instne ehviors. 4.3 Results for 20 Instne Emples We generted nother five emples shown in Tle 4, eh of whih hs no less thn 20 instnes. E6 nd E9 do not hve lolity ttriute, while E7, E8, nd E10 hve. For ehvior without lolity ttriute, the instne hs the sme proility of hving dependent reltions with ny other instnes. For ehvior with lolity ttriute, the instne hs lrger proility of hving dependent reltions with instnes lose to it thn 9

instnes not lose to it. The loseness etween two instnes is equl to the numer of instnes etween them during the eeution of originl sequentil ehvior. ttriute lolity eists in most designs. In this pper, when the loseness of two instnes is no more thn 4, we lled them lose instnes. For E7, E8, nd E10, eh instne n depend on ny lose instnes, ut n depend on only one un-lose instne. Tle 4: Results for 20-30 instne emples E. 6 E. 7 E. 8 E. 9 E. 10 Originl trriute rndom lolity lolity rndom lolity Num. Instntition 20 20 21 30 30 Num. ependeny 98 95 79 193 109 Length. P 20 20 21 30 30 onstrutive lgorithm Num. epedeny 158 135 126 328 239 Length. P 11 8 7 12 7 dded ependeny (%) 61.22% 42.11% 59.49% 69.95% 119.27% Redued P (%) 45.00% 60.00% 66.67% 60.00% 76.67% SP lgorithm Num. epedeny 169 173 185 400 375 Length. P 7 8 7 10 7 dded ependeny (%) 72.45% 82.11% 134.18% 107.25% 244.04% Redued P (%) 65.00% 60.00% 66.67% 66.67% 76.67% Tle 4 shows tht the verge dded dependeny for the onstrutive lgorithm is 70%, while for SP lgorithm is 128%. lthough dded dependeny of onstrutive lgorithm is muh etter thn SP lgorithm, they re muh worse thn the results for 10 instne ehviors. It is resonle euse lter eeuted instnes will dd more dependenies thn the previous ones. On the other hnd, the verge Redued P for the onstrutive lgorithm is 61%, while for SP lgorithm is 67%, oth of whih re similr. Oviously, Redued Ps of 20 instne ehviors re greter thn the results of 10 instne ehviors. Furthermore, we do more reserh on emple 7, 8, nd 10 for ehviors with lolity ttriute. We find tht the Redued Ps of onstrutive lgorithm nd SP lgorithm re the sme for these emples. euse of this nd the nlysis on the onstrutive lgorithm, we onlude tht the proility of hving similr length of ritil pth of the results of SP nd onstrutive lgorithm for ehviors with lolity ttriute is lrger thn the proility for ehviors without lolity ttriute. Therefore, onstrutive lgorithm is more suitle for ehviors with lolity ttriute. 4.4 Rel Projet Emples Tle 5: Results for JPEG nd Voode Projet Emples JPEG_Init (Jpeg) Pre_ proess (vooder) E_syn_ upd_sh (vooder) Lp_ nlysis (vooder) Originl Num. Instntition 3 3 5 9 Num. ependeny 2 2 8 30 Length. P 3 3 5 9 onstrutive lgorithm Num. epedeny 2 2 8 30 Length. P 2 2 3 6 dded ependeny (%) 0.00% 0.00% 0.00% 0.00% Redued P (%) 33.33% 33.33% 40.00% 33.33% SP lgorithm Num. epedeny 2 2 8 32 Length. P 2 2 3 6 dded ependeny (%) 0.00% 0.00% 0.00% 6.67% Redued P (%) 33.33% 33.33% 40.00% 33.33% We use the spe profiler nd the spe optimizer on JPEG projet[6] nd Vooder projet[7]. lthough designers hve implemented the prlleliztion optimiztion mnully for the projets, the tools still found prlleliztion instnes eisting in four sequentil ehviors shown in Tle 5. We updted the speifition sed on the produed hierrhil prllel strutures nd hd the sme simultion results with the originl speifitions. It proves tht using the tools re more relile thn implementing prlleliztion optimiztion mnully. In ddition to sequentil ehviors shown in Tle 5, the tools lso found tht sequentil ehvior oder_12k2 of Vooder ontined ehvior instnes eeuted in prllel. However, the simultion result of updted speifition ording to the tools is different from the simultion result of the originl speifition. The reson for this differene is tht n ddress of oder_12k2 s port is ssigned s vlue to n ddress of nother oder_12k2 s port. Sine the tsk dependeny nlysis ould not tret red/write ess of the seond port s the red/write ess of the first port, the tools produed wrong result. To prevent this from hppening, designers need to void ddress trnsfer etween ports in the speifition. 10

5 onlusion This pper introdues the prlleliztion optimiztion for speifition tuning. Prlleliztion optimiztion is ritil optimiztion for design speifition, whih will e used for the PE llotion nd ehvior inding. [6] Luki i, Junyu Peng et l. esign of JPEG Enoding System, University of liforni, Irvine, Tehnil Report IS-99-54, Nov. 1999. [7] ndres Gerstluer, Shuqing Zho et l. esign of GSM Vooder using Spe Methodology, University of liforni, Irvine, Tehnique report IS-99-11, Fe 1999. We introdue two tools for prlleliztion optimiztion. The spe profiler nlyzes the stti nd dynmi dependenies mong ehvior instnes. The spe optimizer produes the hierrhil prllel struture sed on SP lgorithm nd onstrutive lgorithm. In omprison to the mnul prlleliztion, the utomti prlleliztion hs three dvntges. First, it shortens the design time. The utomti prlleliztion is 200 times fster thn the mnul prlleliztion for 10-instne ehviors, 700 times fster for 20-instne ehviors. s the ompleity of design inreses, the utomti prlleliztion n sve more time. Seond, it genertes required hierrhil prllel strutures. SP lgorithm produes the optiml strutures in terms of the length of the ritil pth. onstrutive lgorithm produes the strutures tht hve the similr length of the ritil pth s tht of the SP lgorithm nd hve the muh smller numer of dded dependenies mong ehvior instnes thn tht of SP lgorithm. Third, it optimizes every possile prllelism in the design. We lso find tht with the inrese in the numer of instnes of ehviors, or with the loss of ehvior s lolity ttriute, it is impossile to keep oth the length of the ritil pth nd the mount of the dded dependenies to minimum for generted strutures. This is due to the nture of the prolem rther thn the limittion of the tools. Referene [1]. Gjski Silion ompilers, ddison-wesley, 1987 [2]. Gjski, J. Zhu et l. Spe: Speifition lnugeg nd esign methodology Kluwer demi Pulishers, 2000 [3]. Gerstluer, R. omer, et l. System esign: prtil guide of with Spe. Kluwer demi Pulishers 2001 [4] www.system.org [5] Luki i, n Gjski, Introdution of esign-oriented Profiler of Spe Lnguge, University of liforni, Irvine, Tehnil Report IS-00-47, June 2001 11