PARALLEL AND DISTRIBUTED MULTI-ALGORITHM CIRCUIT SIMULATION. A Thesis RUICHENG DAI

Size: px
Start display at page:

Download "PARALLEL AND DISTRIBUTED MULTI-ALGORITHM CIRCUIT SIMULATION. A Thesis RUICHENG DAI"

Transcription

1 PARALLEL AND DISTRIBUTED MULTI-ALGORITHM CIRCUIT SIMULATION A Thesis by RUICHENG DAI Submitted to the Office of Graduate Studies of Texas A&M Uiversity i partial fulfillmet of the requiremets for the degree of MASTER OF SCIENCE August 2012 Major Subject: Computer Egieerig

2 PARALLEL AND DISTRIBUTED MULTI-ALGORITHM CIRCUIT SIMULATION A Thesis by RUICHENG DAI Submitted to the Office of Graduate Studies of Texas A&M Uiversity i partial fulfillmet of the requiremets for the degree of MASTER OF SCIENCE Approved by: Chair of Committee, Committee Members, Head of Departmet, Peg Li Nacy Amato Jiag Hu Costas N. Georghiades August 2012 Major Subject: Computer Egieerig

3 iii ABSTRACT Parallel ad Distributed Multi-Algorithm Circuit Simulatio. (August 2012) Ruicheg Dai, B.S., Zhejiag Uiversity Chair of Advisory Committee: Dr. Peg Li With the proliferatio of parallel computig, parallel computer-aided desig (CAD) has received sigificat research iterests. Trasiet trasistor-level circuit simulatio plays a importat role i digital/aalog circuit desig ad verificatio. Icreased VLSI desig complexity has made circuit simulatio a ever growig bottleeck, makig parallel processig a appealig solutio for addressig this challege. I this thesis, we propose ad develop a parallel ad distributed multialgorithm approach to leverage the power of multi-core computer clusters for speedig up trasistor-level circuit simulatio. The targeted multi-algorithm approach provides a atural paradigm for exploitig parallelism for circuit simulatio. Parallel circuit simulatio is facilitated through the exploratio of algorithm diversity where multiple simulatio algorithms collaboratively work o a sigle simulatio task. To utilize computer clusters comprisig of multi-core processors, each algorithm is executed o a separate ode with sufficiet system resource such as processig power, memory ad I/O badwidth. We propose two commuicatio schemes, amely master-slave ad peer-to-peer schemes, to allow for iter-algorithm commuicatio. Compared with the shared-memory based multi-

4 iv algorithm implemetatio, the proposed simulatio approach alleviates cache/memory cotetio as a result of multi-algorithm executio ad provides further rutime speedups.

5 v DEDICATION To my parets

6 vi ACKNOWLEDGEMENTS First ad foremost, I would like to thak my advisor, Dr. Peg Li. Dr. Li has supervised, advised ad guided me from the very begiig stage of this work, as well as gave me extraordiary experiece throughout the research. His dedicatio to excellece, ecouragemet to studets, ad ethusiasm for research, will leave a lastig imprit o me. I would like to thak other professors as well, who are always willig to discuss with me ad give ew ideas. Particular thaks to Dr. Amato ad Dr. Hu, for their costructive commets o this thesis. Thaks also to my colleagues, departmet faculty ad staff for makig my time at Texas A&M Uiversity a great experiece. Fially, I am grateful for my family ad frieds. Thaks to my mother ad father for their ecouragemet ad love.

7 vii TABLE OF CONTENTS ABSTRACT... iii DEDICATION... v TABLE OF CONTENTS...vii LIST OF FIGURES... ix LIST OF TABLES... xi 1. INTRODUCTION Motivatio Previous Work ad Limitatios Overview ad Orgaizatio BACKGROUND Trasistor-level Circuit Simulatio Parallel Computig MULTI-ALGORITHM PARALLELISM Multi-Algorithm Parallelism Simulatio Algorithms Diversity i Noliear Iterative Methods Diversity i Numerical Methods Algorithm Selectio HIERARCHY OF PARALLEL AND DISTRIBUTED CIRCUIT SIMULATION Multi-Algorithm Commuicatio Structure Master-slave Structure Peer-to-Peer Structure Multiple Threads i A Sigle Algorithm Parallel Device Evaluatio Parallel Matrix Solver Page

8 viii 5. RESULT AND ANALYSIS Supercomputer Result MPI vs. Sequetial Algorithm MPI vs. HMAPS Compariso Betwee MPI Methods Accuracy CONCLUSIONS REFERENCES VITA... 53

9 ix LIST OF FIGURES Figure 1. Trasistor-level circuit simulatio i digital/asic desig flow... 6 Figure 2. A simple circuit Figure 3. Work flow of the trasiet circuit simulatio Figure 4. A sample for circuit simulatio result Figure 5. Illustratio of the multi-algorithm parallelism Figure 6. Newto-Raphso method Figure 7. Successive Chord method Figure 8. Stability regio of umerical itegratio methods Figure 9. Stability regio of Absolute Stability Figure 10. A computer cluster Figure 11. Global sychroizer ode i Master-Slave structure Figure 12. Details of a algorithm ode i the Master-Slave structure Figure 13. Flow chart of the algorithm ode ad global sychroizer i MasterSlave scheme Figure 14. Peer-to-Peer commuicatio scheme Figure 15. Flow chart of the algorithm ode i peer to peer scheme Figure 16. A sapshot of the supercomputer Hydra Figure 17. Compariso betwee master-slave ad peer-to-peer commuicatio structure Figure 18. Simulatio results o Node Page

10 Figure 19. Simulatio results o Node x

11 xi LIST OF TABLES Page Table 1. Compariso betwee sequetial algorithm ad MPI methods Table 2. Resource allocatio betwee HMAPS ad MPI methods Table 3. Compariso betwee HMAPS ad MPI methods... 44

12 1 1. INTRODUCTION 1.1 Motivatio As a fudametal techology i computer-aided desig, circuit simulatio provides isights ito electroic circuits by leveragig mathematical models to replicate the behavior of a actual electroic device or circuit [1]. I trasistor-level time-domai circuit simulatio, DC aalysis is used to obtai quiescet operatig poit ad trasiet aalysis is employed to compute the time-domai respose of the circuit. Accurate, fast ad robust trasistor-level circuit simulatio plays a critical part i the desig ad verificatio of digital/aalog circuit. Sice 1965, Gordo E. Moore, the co-fouder of Itel put forward that the umber of trasistors o itegrated circuits would double every two years. This prophecy, also kow as Moore s law, became the guidace of the developmet of itegrated circuit techology for later decades. A typical Very Large Scale Itegrated (VLSI) Circuit may itegrate millios of trasistors ad other compoets i a few square millimeters o a chip. Simulatio of large IC desigs as well as iheret high accuracy requiremets places a heavy burde o circuit simulatio. For istace, circuit desigers may have to sped several days or eve weeks o expesive circuit simulatio, which greatly iflueces the desig efficiecy. However, with the recet idustry s shift to multi- ad may-core processor This thesis follows the style of IEEE Trasactios o Computer-aided Desig of Itegrated Circuit ad System.

13 2 techology, parallel computig is ubiquitous ad chagig the ladscape of computig ad data processig. This chage has made profoud implicatios o the developmet of compute-itesive applicatios. Leveragig the available parallel compute hardware leaves ew opportuities ad challeges to large-scale circuit simulatio. 1.2 Previous Work ad Limitatios Parallel circuit simulatio is ot a ew topic. The two key challeges of applyig parallelism to CAD area are parallel algorithm developmet ad parallel program implemetatio. Prior work attempted to realize more parallelism from several differet perspectives. Parallel device evaluatio ad matrix solve [2][3] are the most direct methods. Device evaluatio ad matrix solve are the most time cosumig parts i simulatio ad domiate the total simulatio time. It is straightforward to leverage more threads/cpus i these two parts to gai large parallelism. However, the speedup is ot liear due to the characteristic of the circuit ad multi-core computers. Creatig threads, termiatig ad sychroizatio also will add some overhead to the system. There also have bee attempts to realize parallel capabilities i a sigle simulatio algorithm. Waveform pipeliig approach [4] simultaeously computes circuit solutios at multiple adjacet time poits i a way resemblig hardware pipeliig. Circuit decompositio ca divide a large circuit ito several small subcircuits which ca be solved i parallel. However, decompositio-based circuit simulatio algorithms like multilevel ewto algorithm [5] ad waveform relaxatio

14 3 algorithm [6] have issues i terms of covergece. I additio, these two methods exploit fie-graied parallelism, hece require large programmig effort. The multi-algorithm parallel approach [7] exploits iter-algorithm parallelism by ruig several simulatio algorithms o a shared-memory multi-core machie simultaeously. However, most of these works are carried o multi-core shared memory machies. While the methods are gaiig the beefits from these platforms, like low ochip commuicatio overhead, they also have to pay a price for the drawbacks. For istace, the memory o a multi-core machie is shared by all processes/threads ad the umber of CPUs o oe computer is limited due to the maufacture process ad power cosumptio. Hece, memory cotetio is ievitable as well as severe thread cotetio whe the umber of threads is greater tha the umber of CPUs. The system performace will suffer oticeable degradatio. Computer clusters offer a promisig computig solutio to address ever complex, computatioally itesive simulatio problems with sufficiet computig resources ad high memory badwidth. 1.3 Overview ad Orgaizatio I this thesis, we propose a distributed ad parallel multi-algorithm circuit simulatio where multiple simulatio algorithms are mapped o separated odes i a supercomputer ad work o the same simulatio task with effective commuicatio schemes to realize the o-the-fly sychroizatio ad exploratio of algorithm diversity. With sufficiet

15 4 computig resource utilized for parallel device evaluatios ad parallel matrix solvers i each algorithm, simulatio rutime is further reduced. As a coarse-graied parallel approach, the proposed distributed circuit simulatio requires less programmig effort ad is applicable for a icreasig umber of simulatio algorithms. This thesis is orgaized as follows. I Chapter 2, we itroduce the backgroud for time-domai circuit simulatio ad parallel computig. The the priciple of multialgorithm circuit simulatio as well as the diversity of umerical itegratio methods ad oliear iterative methods will be discussed i Chapter 3. I Chapter 4, we will preset the details of the MPI based parallel ad distributed circuit simulatio. I Chapter 5, the platform where the experimets are carried o ad experimetal results will be give. Fially, coclusios are draw i Chapter 6.

16 5 2. BACKGROUND 2.1 Trasistor-level Circuit Simulatio Trasistor-level time-domai circuit simulatio, a computer-aided desig tool, greatly improves desig efficiecy ad reduces the labor itesity i digital/asic VLSI circuit desig. Figure 1 is a flow chart of digital/asic circuit desig. First, system specificatios ad requiremets eed to be completed. A graph editor or text editor is used to describe the circuit s structure ad behavior. After the behavioral descriptio, sythesis realizes the automatic coversio from high level abstractio to low level descriptio where RTL code is traslated to a gate-level circuit. Physical desig icludig floorplaig, placemet ad routig is the carried out to geerate the layout of the desig. At last, maufacturig process fabricates desigs oto silico dies which are packaged ito ICs [1]. Trasistor-level circuit simulatio ca be performed at the circuit desig level based o pre-layout schematic. Also, it may be performed after the post-layout circuit etlists are extracted out. It is ot surprisig that simulatio plays a vital part i predictig circuit performace ad rejectig a failig desig due to trasistor-level circuit simulatio also plays a importat role i the desig of aalog ad RF circuits.

17 Figure 1. Trasistor-level circuit simulatio i digital/asic desig flow. 6

18 7 I trasistor-level circuit simulatio, circuit aalysis problem is formulated accordig to circuit structure, device parameters ad aalysis requiremets. KVL (Kirchhoff's voltage laws) ad KCL (Kirchhoff's curret laws) are two basic priciples i simulatio. Hece, a electroic circuit ca be described as a differetial-algebraic equatio, d dt q( x) f ( x) u( t) (2.1) here, u (t) is the iput vector, x (t) is the vector of odal voltages ad brach currets. q (x) ad f (x) correspodig to dyamic elemets ad static elemets are oliear fuctios. Regardig equatio (2.1), the existece of oliear fuctios, q (x) ad f (x) is due to the fact that the trasistors i the CMOS techology are oliear elemets with complex oliear characteristic. The differetial operatio represets the behavior of eergy storage compoets like capacitors ad iductors which have delay i followig the chages of iput sources. For istace, a simple circuit i Figure 2 ca be described as equatio (2.2) R1 R2 R 1 R2 4 1 E R V 2 1 * 1 1 R4 V2 0 R 2 R3 (2.2)

19 8 Figure 2. A simple circuit. To solve equatio (2.1), DC aalysis is used to obtai a iitial operatig poit. I DC aalysis, all the dyamic circuit elemets are removed ad a oliear iterative method is applied to get the solutio coverged i several iteratios. The a umerical itegratio method is applied to calculate the trasiet solutios. At each time poit, trasiet aalysis, similarly, eeds to utilize the oliear iterative method to obtai a coverged solutio. I other words, by adoptig a umerical itegratio formula, the time-domai trasiet respose of the circuit is obtaied by solvig a sequece of equivalet oliear DC problems sequetially at all time poits [8]. The flow chart of the circuit simulatio is show i Figure 3. I trasistor-level circuit simulatio, device evaluatio ad matrix solve are the two most time cosumig parts. At each iteratio i a sigle time poit, device

20 9 evaluatio is performed to obtai equivalet mathematical models of circuit compoets. The evaluatio requires umerous computatios, especially for oliear compoets such as diodes, trasistors, oliear resistaces ad oliear capacitaces which have a large amout of device model derivatives. For istace, a diode s voltage ad curret ca be represeted as VD VT I I ( e 1) (2.3) D S Here, I S is the reverse bias saturatio curret ad V T is the thermal voltage. The model of the device has a importat positio i the whole procedure of circuit aalysis because the accuracy of simulatio results depeds o the precisio of the model sigificatly. Matrix solve is the applied to obtai the solutio for that specific iteratio. We LU decompose the matrix to solve the equatios. Whe the coefficiet matrix is a sparse matrix, the time complexity of solvig the equatios will be approximately O () [9], here is the umber of the odes i the circuit.

21 Figure 3. Work flow of the trasiet circuit simulatio. 10

22 Parallel Computig From the perspective of computer architecture, symmetric multiprocessor (SMP) machie is a system with two or more homogeeous processors o oe chip, sharig memory subsystem ad bus structure. Although multiple CPUs are ruig at the same time, they perform as a sigle machie. The system distributes the tasks i a queue symmetrically over multiple CPUs, thus greatly improvig data processig ability of the whole system. Computer clusters emerged as a result of developmets of low cost microprocessors ad high speed etworks. May idepedet computer odes are coected to each other i the cluster through fast local area etworks. Oe computer ode ca be a sigle processor or a multiple-processor system, which has memory, I/O devices ad operatig system. The system ca provide a fast ad reliable service solutio, which ca hardly be obtaied eve through a very expesive shared memory system. For these parallel platforms, Pthreads ad MPI are two most popular parallel programmig APIs. POSIX threads [10], commoly kow as Pthreads, specifies a set of iterfaces (fuctios, header files) for threaded programmig where a sigle process ca create multiple threads. Every thread ca be assiged differet kid of work ad ru idepedetly. These threads share data ad heap segmets, but each thread has its ow stack to store automatic variables. MPI, a kid of Message Passig Iterface released i May 1994, is actually a stadard of message passig fuctio library [11]. It absorbs beefits from may existig message passig fuctio libraries ad becomes oe of the most popular parallel

23 12 programmig eviromets, especially for distributed storage computers ad etworkbased workstatios. MPI has may advatages i providig the ecessary coditios for the developmet of parallel software idustry: portable ad flexible complete asychroous commuicatio fuctio. formal, detailed ad precise defiitio I the MPI based programmig model, a fixed set of processes are created i the iitializatio of the program. Processes receive ad sed massages by callig library fuctios. These processes ca execute the same or differet code paths, correspodigly called sigle program multiple data (SPMD) or multiple program multiple data (MPMD). Commuicatios betwee the processes ca be poit-to-poit or collective.

24 13 3. MULTI-ALGORITHM PARALLELISM 3.1 Multi-Algorithm Parallelism From the foregoig discussio, the trasiet circuit simulatio problem ca be formulated as equatio (3.1). d dt q( x( t)) f ( x( t)) u( t) (3.1) I a circuit simulatio algorithm, oe oliear iterative method is utilized to liearize the oliear fuctios ad oe umerical itegratio method replaces differetial operatio with differece operatio. Newto Raphso ad Successive Chord are typical oliear iterative methods while Backward Euler, Gear2 ad DASSL are classic umerical itegratio methods. A variety of simulatio algorithms are the geerated withi a set of combiatio betwee these two kids of methods. SPICE (Simulatio Program with Itegrated Circuit Emphasis) [12] is takig Newto-Raphso ad Backward Euler as its basic circuit simulatio algorithm. It is a geeral-purpose, ope source electroic circuit simulator for itegrated circuit ad board-level desig. Compared to Newto-Raphso ad Backward Euler algorithm, Successive Chord is a higher speed simulatio algorithm. While the algorithm pool provides a great diversity, it also brigs i the complexity i choosig a optimal algorithm for a specific circuit because the algorithms behave quite differetly for differet kids of circuits, eve i differet stages o the same circuit durig the whole simulatio time.

25 14 Figure 4. A sample for circuit simulatio result. Figure 4 is simulatio results obtaied by usig SC algorithm ad Newto + BE algorithm for iverter chai circuit. Durig the simulatio, we fid SC algorithm prits out results much faster o part A ad C but slower o part B. From the figure above, we ca see the waveform remais stable durig parts A ad C. Cosiderig SC algorithm s advatage, it ca coverge very quickly ad the cost for each iteratio is very small by usig a costat Jacobia matrix. I part B, the waveform chages sigificatly, SC algorithm eeds a large umber of iteratios to coverge to the fial solutio at every time step. Although the cost for each iteratio is still small, the time spet o oe time step is icreasig sigificatly. Whe the waveform gets steeper, SC probably will diverge. Ispired by this observatio, we kow a optimal solutio will be obtaied if the beefit of SC algorithm o parts A ad C is exploited as well as the beefit of Newto + BE algorithm o part B. Cosequetly, we refer to the multi-algorithm approach i [7] ad propose a ew approach that builds o a distributed memory platform to ru multiple simulatio

26 15 algorithms o multiple computer odes i parallel to exploit the diversity of these algorithms. To illustrate, we assume two algorithms are iitiated o the same circuit simulatio. I Figure 5, part A is correspodig to the first time period while part B is the secod period. I the first period, algorithm SC is the fastest due to the reaso discussed, it ca iform its results to algorithm BE + Newto at the ed of the first period. With this faster solutio, Algorithms BE + Newto ca skip its slow part ad begi its ext period calculatio. I part B, Algorithms BE + Newto turs out to be faster ad it shares the solutio with algorithm SC. I this way, whe we adopt more algorithms, we are pickig out the best performig algorithm for every small period alog the whole simulatio ad all algorithms beefits are explored ad simulatio speed will be optimal. Figure 5. Illustratio of the multi-algorithm parallelism

27 16 Cocerig the commuicatio graularity, if we set the iterval as whole simulatio time, the system will perform as pickig out the fastest simulatio algorithm for the simulatio task. The diversity will ot be fully exploited. However, if we choose a small iterval, the commuicatio will be frequet ad ifluece the calculatio speed as mutual memory access coflicts are icreasig. Hece, there exist tradeoffs betwee efficiecy ad commuicatio frequecy. I the implemetatio, we eed to choose a reasoable graularity ad make the iformatio sharig amog all the algorithms efficiet. This will be discussed i Chapter Simulatio Algorithms I this sectio, we discuss the advatages ad disadvatages of differet oliear iterative methods ad umerical itegratio methods as well as their roles i simulatio algorithm selectio Diversity i Noliear Iterative Methods At a sigle time poit, the equatio (3.1) ca be represeted as equatio (3.2). A. Newto-Raphso F ( x) 0 (3.2) Newto-Raphso is a effective method i solvig oliear equatios [12]. The solutio at k 1 iteratio is determied by equatio (3.3). here, J x ) is called the Jacobia matrix. ( k J ( xk 1 k )( xk xk ) F( x ) (3.3)

28 17 F1 x1 F2 x1 J ( x k ).. F x1 F1 x2 F2 x 2 F x F1 x F2 x F x (3.4) Assumig k th iteratio's solutio is kow, the Jacobia matrix ad F x ) ca be calculated by device evaluatio, the ( k 1) th solutio is extracted by solvig the equatio (3.3). If the differece betwee solutios at iteratio k 1 ad k is smaller tha a give threshold, it is accepted as the coverged solutio. If ot, we eed to proceed to the ext iteratio. For istace, r 1 is the root of equatio f ( x) 0 i Figure 6. The iitial solutio is assumed at poit P 0( x0, y0), x 1 is obtaied by usig the taget lie 1 which is correspodig to equatio (3.3). However, y 1 is larger tha expected. The ext solutio x 2 is calculated based o poit P 1 similarly. ( k

29 18 Figure 6. Newto-Raphso method. Whe xk is close to the exact solutio, it ca be proved that [12] x (3.5) 2 k 1 C( xk ) Here C is costat. Hece, Newto's method has a quadratic covergece rate. Whe Newto s method is applied i circuit simulatio, its Jacobia matrix eeds to be recalculated by evaluatig all the devices ad decomposed i each iteratio. There are a large umber of expesive derivative computatios. Although Newto method is robust with the quadratic covergece rate, the cost for each iteratio is really high ad the simulatio time at oe step is large. B. Successive Chord method Aother oliear iterative method is Successive Chord method (SC) [13]. It ca be represeted as

30 19 J sc( 1 k xk xk ) F( x ) (3.6) here, the Jacobia matrix J sc is costat. I the followig Figure 7, we ca get x 1 by usig the taget lie 1 which is correspodig to equatio (3.6). The fial solutio x 2 will be obtaied i ext iteratio based o poit P 1. The obvious differece is that the taget lies are parallel. Figure 7. Successive Chord method Compared to Newto Raphso, SC method s advatage is that it uses costat Jacobia matrix J sc i simulatio. The Jacobia matrix is costructed, decomposed at the begiig ad the lower upper triagular (LU) factors are stored to reuse efficietly. So the method does ot eed to calculate the derivative of device equatios durig the whole simulatio. Cosequetly, the cost for each iteratio i SC method is very small. However, the covergece rate of the SC method is liear which meas for every

31 20 time step, the method probably eeds more iteratios. The strict covergece criteria for SC method is 1 I J sc J F ( v ) 1 (3.7) Here, I is idetity matrix, J sc is chord value, J ( v F ) is the exact Jacobia matrix. Cosequetly, the J sc matrix should be selected wisely. Otherwise this method will probably diverge. Accordig to our research, SC method is hard to coverge for aalog circuits which have greater chages compared to the combiatio circuits Diversity i Numerical Methods I trasiet aalysis, equatio (3.1) may be represeted as a first order differetial equatio: x f ( x, t) t0 t T (3.8) with iitial coditio: x( t0) x 0 Here, x is the derivative of x, t is the time variable. The iitial solutio x( t ) x is 0 0 solved by DC aalysis. I order to solve the differetial-algebraic equatios, first we eed to discretize t 0,T to several distict time poits ( t0, t1, t2, t T). The we use the differece equatio to replace the differetial equatio to get the approximate values at these poits x, x,, x x ). For the solutio at t 1, the umber of the previous ( m

32 21 solutios ( x, 1, ) used is determied by the umerical methods which ca be x classified ito oe-step ad multi-step methods. A. Oe-step method Backward Euler is a oe step method [12] with x x h x (3.9) 1 1 The local trucatio errors (LTEs) is LTE BE 2 x( ) h (3.10) 2 here, h t 1 t. I circuit simulatio, a fixed step-size method is adopted if h is fixed as a reasoable value. There also exists variable step-size method for Backward Euler. After a acceptable value is decided as the boud for local trucatio error, variable h is calculated as h 2 x ( ) (3.11) Here, x ( ) is secod order derivative. x 1 is calculated by equatio (3.9). If the local trucatio error at t 1 is smaller tha, the solutio is acceptable. Otherwise, it will be abadoed ad the solutio eeds re-computatio with a smaller h util the solutio satisfies the error tolerace. The variable step-size method ehaces Backward Euler method with a larger time step. Forward Euler is also a oe step method with

33 22 x h x x 1 (3.12) It does ot iclude 1 x so the calculatio is explicit ad simple. The solutio at ay time ca be obtaied oly by its previous solutios, which cotributes to its fast speed as well as low robustess. Aother oe step method is Trapezoidal [14]. The formula is ) ( x x h x x (3.13) with local trucatio errors (LTEs) as 12 ) ( 3 x h LTE TR (3.14) It has smaller local trucatio error ad larger step size. B. Multi-step methods Muliti-step methods employ the solutio ),, ( 1 1 p x x x at poits ),, ( 1 1 p t t t i umerical itegratio: p i i i p i i i x x x (3.15) p is the order of the itegratio method. Gear2 [15] method uses the followig formula to get the solutio at 1 t. h h h h h x h h h h h x h h h h x x ) ( ) (2 ) ( ) (2 (3.16) Here, 1 1 1, t t h t t h, the local trucatio error is

34 23 LTE 2 2 h 1( h 1 h ) x ( ) (3.17) 6(2h h ) Gear2 1 Here t t 1. Compared to Backward Euler, Gear2 has more complicated itegratio formula ad is much faster with smaller LTE ad larger time step size. DASSL [16], a variable-order variable-stepsize method, uses the predictor ad corrector to solve the differetial equatio. The predictor for a k th order formula is geerated by iterpolatig the last k 1 solutios. ) x i 0,1,..., k. (3.18) P ( 1 t i i P Hece, the solutio at time 1 ca be predicted by usig the predictor fuctio, 1 x (0) P ( t ) 1 x 1( t 1) (0) P (3.19) C The corrector polyomial is a iterpolatio of the predictor at last k time poits 1 ad ca be solved by the equatio (3.20), C (0) C (0) ( x ) h ( x ) 0 (3.20) s k 1 here s, h 1 is predicted step size for t 1. j j 1 After the corrector C 1 at 1 t is obtaied, the circuit solutio is solved by equatio (3.21) with LTE applied to determie x is accepted or ot. F C C, ( t ), ( t )) 0 (3.21) ( t DASSL uses the LTE to cotrol the step size ad the itegratio order dyamically. Before calculatig x, DASSL utilize the existig step size ad the order

35 24 k to estimates the LTE at t. With the estimated LTE, DASSL determies the order k for the ext time step. After x is solved with above equatios, k is used to solve the ext time poit solutio or recompute x based o whether x is accepted or ot. DASSL has very complex cotrol scheme to maitai stability ad is possible to achieve sigificat speedup Algorithm Selectio About the oliear iterative methods, we will use the Newto-Raphso ad Successive Chord method. I the umerical methods, the values we got at t, t, t, t ) is ( ( T approximatio to the exact values, they are actually x, x, x, x x ). The errors are itroduced by two ways. First, local trucatio error is brought i because at time t 1, we abado the high order differetial item. Secod, we get the solutio at time t 1 m with the previous solutios ( x, 1, ) which we assume are exact values. However, x these solutios are approximatios because of the LTE. Hece, the errors may accumulate. If the iflueces of the previous errors o later time poites do ot icrease with time, this method is stable. If the errors are accumulated ad exceed the error limit, the method is ot stable. I order to clarify this, we itroduce a test equatio, If we apply the Forward Euler to the test equatio, we will get x x (3.22)

36 25 x 1 x xh x( 1 h ) x0(1 h) (3.23) Whe error at the iitial solutio is assumed as 0, the error at time t is 1 (1 1 0 h) (3.24) here 0 ad real. Cosequetly, whe 1 h 1 or 0 h 2, 1 is bouded ad the method is stable. If we represet 1 h 1 like Figure 8(a). The shaded part is called stability regio. i the complex plae of h, it will be A stability cocept, called Absolute Stability, specifies that a method is absolutely stable if the regio of the absolute stability covers the etire left plae as i Firgure 9. Accordig to this cocept, Forward Euler is ustable while Backward Euler, Trapezoidal method ad fixed step size Gear2 method i Figure 8(b)(c)(d) are ucoditioally stable. Actually, stability ad local trucatio error are two major cosideratios i selectig umerical itegratio methods. BE is robust ad easy to implemet, with large local trucatio error ad small time step size. Fixed step size Gear2 has much smaller local trucatio error ad larger time step size. However, Gear2 is much more complex to implemet ad brigs i a large computatio cost at every time poit. The stability of the DASSL method is more difficult to aalyze. Accordig to the experimets, DASSL is stable i most cases as Figure 9 ad potetially leads to the largest time step size. I practice, the performace idex of a particular algorithm is determied by the circuit type ad iput sigal. It is difficult to tell which oe is the optimal before executig it oe time. I the system, we choose Newto-Raphso method (Newto) as a

37 26 solid base for the system ad Successive Chord method (SC), Gear2 + Newto ad DASSL + Newto as aggressive algorithms to speed up the whole system. Figure 8. Stability regio of umerical itegratio methods. Figure 9. Stability regio of Absolute Stability.

38 27 4. HIERARCHY OF PARALLEL AND DISTRIBUTED CIRCUIT SIMULATION The hierarchy of parallel ad distributed circuit simulatio, built o a computer cluster i Figure 10, adopts two levels of parallelism, iter-algorithm parallelism ad itraalgorithm parallelism. At the higher level of parallelism, multiple simulatio algorithms are performed i parallel o separate computer odes with MPI methods trasferrig data betwee them to exploit the algorithm diversity. The cloud i Figure 10 represets the commuicatio structures betwee odes. Two MPI commuicatio structures are proposed, amely master-slave structure ad peer-to-peer structure, with differet characteristic correspodig to the type ad size of circuit. At the lower level of parallelism, each algorithm has full cotrol of all resources like CPUs, memory badwidth ad I/O, which allows it to reach to high itra-algorithm parallelism. Figure 10. A computer cluster

39 Multi-Algorithm Commuicatio Structure Master-slave Structure I the master-slave structure, a flexible global sychroizer is utilized. Each algorithm commuicates with the global sychroizer rather tha talks to each other i the simulatio. The sychroizer broadcasts to iform all the algorithms the ew solutios. The commuicatio betwee the sychroizer ad algorithm odes is as Figure 11. Figure 11. Global sychroizer ode i Master-Slave structure. I order to show a clear view of the hierarchy, we discuss the mai roles that the algorithm ode side ad global sychroizer side play. Oe algorithm ode is demaded to sed all circuit odes iformatio icludig voltages or currets to the other algorithms to brig them to where it is stadig. I additio, some algorithms like Gear2, DASSL, ot oly eed the iformatio at most recet time poit, but also eed several previous time steps solutios to calculate the ew result. Hece, every algorithm seds k time steps results to the global sychroizer. Here,

40 29 k is determied by the highest order amog the umerical itegratio methods i the system. From the foregoig discussio, Newto-Raphso eeds previous oe time step solutio; Gear2 eeds previous two time steps solutios while DASSL eeds previous five time steps solutios. We keep k as 6 after takig the ew solutio ito cosideratio. I additio, a algorithm ode fully cotrols graularity of the commuicatio with the global sychroizer. I this implemetatio, we choose the graularity as oe time step for all the algorithms. Hece, the algorithm ode sigals a commuicatio thread to trasfer the solutio after it fiishes oe time step computatio. The reaso of creatig a ew thread to take over the iteractio task is to overcome the couplig betwee commuicatio ad computatio. Although the algorithm ode ca use the oblockig MPI sed method to trasfer its ow solutios, the MPI broadcast method i receivig the most recet solutios back is blockig. Figure 12 shows a computer ode with 4 cores o which the BE + Newto is mapped.

41 30 Figure 12. Details of a algorithm ode i the Master-Slave structure. Because the commuicatio load i the global sychroizer is impressively large, the sychroizer is mapped to a sigle ode to avoid memory cotetio. Durig simulatio, it moitors all algorithm odes. As soo as oe algorithm ode is sedig a ew solutio, the sychroizer makes the coectio ad receives the solutio. The sychroizer maitais the most recet solutio data structure the system has durig the simulatio. The data structure cotais k time steps solutios. After the sychroizer receives a ew message, the message is merge-sorted with the stored data, ad the first k solutios are kept ad the data structure is updated. If the ew solutio provided by a algorithm is ahead of the existig solutios, after merge sort, the data structure will be updated with the ew solutio by isertig it ito the structure.

42 31 However, if the ew solutio is stale ad lags behid the existig solutios, it will be abadoed ad the solutio structure stays uchaged. After the global sychroizer processes oe message ad gets updated, it will broadcast ew solutios to all algorithm odes. Hece, all algorithms will be updated with the latest solutios ad begi their ext step calculatio. I this way, the global sychroizer will always keep the most recet solutios ad algorithm odes iteract with each other idirectly. The detailed work flow of the system is show i Figure 13. I the master-slave structure, all algorithms will be sychroized cotiuously. Slow executio of each of these algorithms is sidestepped by others ad their advatages will be fully exploited. However, the global sychroizer eeds to process ad trasfer a large amout of data sice there are several odes cotiuously sedig messages to it. Cosequetly, the sychroizer may easily be the bottleeck of the system ad affect system efficiecy.

43 Figure 13. Flow chart of the algorithm ode ad global sychroizer i Master-Slave scheme. 32

44 Peer-to-Peer Structure To avoid the bottleeck o the sychroizer, we come up with a peer-to-peer scheme. I this structure, oe algorithm ode similarly creates two threads for computatio ad commuicatio, respectively. The commuicatio thread receives messages from its precedig ode, processes the received message with its ow solutios, the seds the updated solutio to the ext ode. The four algorithms form a loop ad the most recet solutios keep circulatig i the loop to sychroize all algorithms ad explore their diversity. The commuicatio structure is show i Figure 14. Figure 14. Peer-to-Peer commuicatio scheme. Apparetly, this structure saves the resource by abadoig the global sychroizer ad distributes the large amout of data processig work burde o the global sychroizer to each algorithm ode. It elimiates the effect of bottleeck ad also decreases the etwork load because i the master-slave structure the commuicatio is collective ad algorithm ode may be ot aware the status of the global sychroizer

45 34 ad seds a stale solutio which will occupy the etwork badwidth ad hamper effective commuicatio. The mai disadvatage is that i the peer-to-peer structure, all algorithms will be updated oly whe oe-loop data trasfer is completed. However, i the master-slave structure, all other algorithms will be iformed immediately as soo as ay oe algorithm gets a ew effective solutio. I this loop structure, deadlock, start ad exit of the program eeds additioal attetio. For istace, deadlocks happe whe the successor ode waits o a blockig MPI message from the precursor ode which has reached the ed of the simulatio ad exited. I our implemetatio, algorithm BE + Newto which is the most stable ad has low computatioal cost for the iitial time steps is used to trigger the trasfer of data as a loop. At the ed of the simulatio, a flag is used to track how may odes have fiished. Every ode will icremet the flag before it exits. The flag is stored i the MPI message. Hece, whe a ode receives a message with a flag value equal to the umber of all other algorithms, it kows all previous odes have fiished ad it skips sedig the message to the ext ode ad exits. This way, the system ca exit correctly. Figure 15 shows the detailed work flow i this structure.

46 Figure 15. Flow chart of the algorithm ode i peer to peer scheme. 35

47 Multiple Threads i A Sigle Algorithm Trasiet aalysis may be coducted over a large umber of time steps. At every time step, it eeds several iteratios to get covergece. Hece, the umber of iteratios ca be very high. Device evaluatio ad matrix solve carried o at every iteratio are very time cosumig ad take early the whole simulatio time. I previous discussio, there is a tradeoff betwee the umber of the iteratios per time step ad the cost of each iteratio for differet oliear iterative methods. Here we further made use of the power of multi-core processor to expedite the device evaluatio ad matrix solve i a sigle algorithm ode. A distributed platform provides the possibility of fully realizig itra-algorithm parallelism as oe algorithm mapped o oe ode ca exclusively access all the compute ad memory resources Parallel Device Evaluatio I the device evaluatio, Jacobia matrix J x ) has a large umber of partial differetial ( k items. I parallelizatio, oliear elemets are divided ito several groups, ad each group is hadled by oe thread. The speedup for this ca reach liear scalig whe there are sufficiet oliear elemets. However, because of the cost of spawig, executio ad termiatio of threads, the beefits of parallelizatio may be reduced especially whe oliear elemets i the circuit are few.

48 Parallel Matrix Solver I our platform, SuperLU [17] is made use of as parallel matrix solver. SuperLU is a geeral purpose library providig direct solutio to large, sparse, o-symmetric systems of liear equatios o high performace machies. The library routies perform LU decompositio with partial pivotig ad triagular system solves through forward ad backward substitutio. It exploits two sources of parallelism i the sparse LU factorizatio. The coarse level parallelism comes from the sparsity of the matrix, ad is exposed by the colum elimiatio tree of the matrix. The secod level of parallelism comes from pipeliig the computatios of depedet colums. The performace of matrix solve has bottleeck after the umber of threads used reaches a certai umber due to the circuit s ad the computer ode's characteristics. For istace, whe usig more threads i SuperLU, accessig critical sectios via locks will icrease ad result i degradatio of parallel performace. The more processors there are, the larger commuicatio loss there will be. Secod, the solver eeds to divide the matrix ito several parts ad pipelie the operatio o every part. Hece, the dese ad small matrix geerated by device evaluatio has more depedece ad is hard to be divided to several idepedet parts, makig the parallel performace worse. O the cotrary, the speedup is large for the sparse ad large matrices. The computer ode o our platform is a symmetric multi-processor system with 8 dual core processors. The commuicatio betwee the dual cores i oe packaged processor chip is twice as faster as the commuicatio betwee the cores i differet processors chips. Hece, the performace of the parallel matrix solve has a degradatio

49 38 whe the umber of the cores reaches to a odd umber sice the ew added core eeds to trasfer data to cores i other chips. We choose to use eve umber of threads for parallel device evaluatio ad matrix solve which achieve better speedups.

50 39 5. RESULT AND ANALYSIS 5.1 Supercomputer Hydra (see Figure 16) is a 52-ode, 832-processor IBM cluster. The 52 odes are further orgaized ad housed ito five physical frames [18]. The cluster uses IBM highperformace commuicatio switch for parallel processig ad other commuicatio betwee the odes. Each ode coects to the HPS etwork usig two adapters. HPS routes a message packet to aother ode [18]. Figure 16. A sapshot of the supercomputer Hydra. O Hyrda, whe ruig a Pthreads program, the umber of threads durig executio ca be set by the eviromet variable OMP_SET_NUM_THREADS. A MPI program is executed uder the Parallel Operatig Eviromet (POE). Whe the

51 40 program is beig executed, the umber of tasks ca be set by the eviromet variable PROCS. Typically, tasks are mapped 1-to-1 o processors. I the batch file, we ca specify how tasks to be assiged. We assig the MPI tasks to 5 odes with variable ode. Every ode ca use 4 CPUs ad 1.5gb memory by settig CosumableCpus as 4, CosumableMemory as 1500mb where 1500mb is the aggregate amout of memory take up by 4 threads. 5.2 Result MPI vs. Sequetial Algorithm First, we compare the MPI master-slave (MPI-MS) structure s rutime results with the four sigle sequetial algorithms: Newto+BE, SC, Newto+Gear2, Newto+DASSL for several circuits i Table 1. The rutime results are i secods. MPI-MS 1 core meas that we use oe core for oe algorithm i the system. The speedup1 is MPI-MS 1 core over Newto + BE, which is the basic SPICE setup. MPI-MS 2 cores is that we assig 2 cores for every algorithm. The speedup2 is its speedup over MPI-MS with 1 core. The N/A i the table meas the algorithms are ot stable or diverge i the simulatio.

52 Table 1. Compariso betwee sequetial algorithm ad MPI methods size /MB No. of Li. ele. No. of FETs No. of odes Newto BE/s SC/s Newto Gear2/s Newto DASSL/s MPI-MS 1 core/s speed up1 MPI-MS 2 cores/s speed up2 mesh mesh N/A mesh18k k 50 10k N/A mesh28k k 50 15k N/A iv_chai iv_chai grid20k k grid30k k 0 12k b_adder la_mixer N/A mixer N/A

53 42 For mesh circuits [19], which have lots of liear elemets ad few oliear trasistors, SC method is the fastest algorithm by avoidig repeatedly evaluatig devices ad factorizig large matrix. It ca get covergece at every time poit quickly. Compared to SC method, other algorithms caot save this large amout of time ad eeds loger time to fiish the simulatio. This situatio is more obvious for larger mesh circuits like mesh18k, ad mesh28k which takes BE + Newto algorithm several hours to complete. MPI master-slave structure takes advatage of SC method ad reaches a sigificat large speedup over Newto + BE. The ivert-chai circuits have more oliear elemets. SC algorithm demads a lot of iteratios to get covergece due to more complicated circuit operatig coditio ad its worse covergece rate. I this case, the umber of iteratios domiates the cost for each time step eve the cost for oe iteratio is still small. The multi-step itegratio methods perform better i these circuits especially whe the circuits are small. The MPI master-slave structure which exploits the diversity of differet algorithms ad the advatages of differet algorithms i differet stages, reaches the smallest simulatio time. Mixer circuits are oe kid of aalog circuits with small size, high accuracy requiremets ad complex trasistor operatig coditio chages. SC algorithm may ot get covergece for whole simulatio time. The Newto + Gear2 algorithm is gettig results fast. The MPI master-slave method ca ru a little faster tha Newto + Gear2 with other algorithms cotributios.

54 43 After applyig more threads i sigle algorithm i the distributed system, we fid that speedup2 almost reaches the optimal for the iverter chai circuits. This may be due to the fact that the iverter chai circuits cosist of a large umber of trasistors which ca be divided equally ito two groups ad hadled efficietly by two threads. I additio, the size of the matrix obtaied by device evaluatio is suitable for the parallel matrix solver. The speedup for other circuits is ot as good as iverter chais. Eve worse, aalog circuits have performace drop after beig applied two threads for a sigle algorithm. Aalog circuits are either small or with a small umber of oliear elemets ad have large overhead i parallel device evaluatio ad matrix solve. Creatig/termiatig threads itroduces a relatively larger cost to these small circuits. The beefits itroduced by multiple threads are smaller tha the overhead. These results demostrate the beefits brought by the MPI based multi-algorithm circuit simulatio ad multiple threads i a sigle algorithm for certai classes of circuits MPI vs. HMAPS I this sectio, the results betwee HMAPS [20] ad MPI based distributed simulatio are compared. HMAPS ru i oe ode with 8 threads ad 2 gigabytes memory while MPI methods are usig two threads for each algorithm o several odes. The resource allocatio ad results are i Table 2 ad Table 3. The size/mb colum shows the memory size of oe circuit data copy. Colum HMAPS, MPI-MS ad MPI-P2P show the rutimes i secod. The MPI-MS speedup is the MPI master-slave structure s

55 44 speedup over HMAPS while the MPI-P2P speedup is the MPI peer-to-peer structure s speedup over HMAPS. Table 2. Resource allocatio betwee HMAPS ad MPI methods. Threads/algorithm Nodes Threads/ode Memory HMAPS GB MPI master-slave GB MPI Peer-to-Peer GB Table 3. Compariso betwee HMAPS ad MPI methods. Circuit size/mb HMAPS/s MPI-MS/s MPI-MS MPI-P2P MPI-P2P/s speedup speedup mesh mesh mesh mesh iv_chai iv_chai grid20k grid30k b_adder la_mixer mixer

56 45 I HMAPS [20], multiple algorithms are mapped to a sigle shared-memory system ad every algorithm shares computig resources. I the results above, four algorithms are ruig with their ow copy of circuit data, with totally four copies o oe computer ode. It requests 3 gigabytes for the mesh18 circuit ad 2.5 gigabytes memory for grid30k. O the 2 gigabytes shared-memory system, the memory cotetio is large ad the simulatio takes loger time to fiish. The MPI based distributed system rus algorithms o separate odes. The memory used o oe ode is 800 megabytes for mesh18 circuit ad 600 megabytes memory for grid30k. Hece, the memory cotetio is smaller ad speedup ca reach as high as The MPI based methods are about 15 percetages faster for mesh4, mesh6, grid20k ad iverter chai circuits. These circuits ormally eed about several hudred megabytes memory but MPI structures have more commuicatio overhead tha HMAPS where threads access shared local memory quickly ad the commuicatio betwee the algorithms ca be made frequet. I the distributed system, commuicatio speed is limited by the etwork badwidth ad the size of messages. The commuicatio cost ad delay could be large whe simulatig large circuits. However, the MPI based platform is capable of icorporatig more algorithms to further exploit iter-algorithm parallelism which is more difficult for HMAPS.

57 Compariso Betwee MPI Methods The compariso betwee the MPI master-slave structure ad the peer-to-peer structure is show i Figure 17. The speedups are the two MPI based methods speedups over HMAPS. Figure 17. Compariso betwee master-slave ad peer-to-peer commuicatio structure. For small circuits like mesh4, mesh6, mesh8 ad grid20k, each algorithm updates the global sychroizer quickly after gettig its ow solutio i the MPI master-slave structure. The sychroizer will also broadcast ad iform every algorithm the most recet solutio immediately. It has little bottleeck due to the fact that the circuit size is small ad the data processig is quick. However, the MPI peer-to-peer structure has a delay i updatig all algorithms because the algorithms receive the latest solutio oly after the solutio experieces oe loop trasfer. This is demostrated i the figure above

58 47 which show that the MPI master-slave structure is faster tha the MPI peer-to-peer structure. For large circuits, like mesh18 ad grid30k, the speedup of the MPI masterslave scheme is much smaller tha the MPI peer-to-peer scheme. I these circuits, the messages geerated by the circuit have huge size ad the sychroizer eeds to receive a large amout data from the algorithm odes durig the simulatio as well as the data processig time i sychroizer is icreasig. These factors put a large work load o the global sychroizer ad cause a bottleeck. Moreover, the algorithms may sed the stale solutios to the sychroizer because they are ot aware of the status of the sychroizer. I this case, the etwork badwidth is occupied ad wasted by these kids of useless commuicatio. I the peer-to-peer scheme, the processig ad etwork load is distributed amog all the algorithm odes ad the bottleeck effect is alleviated. I additio, oe ode resource which is occupied by the sychroizer is saved Accuracy We compare the results betwee BE + Newto ad the distributed circuit simulatio o two odes of mesh4 circuit i Figure 18 ad Figure 19. The BE + Newto is the basic SPICE setup ad accurate. We compare the two voltages from the two methods o the same time poits, ad the stadard deviatio is smaller tha volt. Hece, the simulatio results are acceptable.

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method A ew Morphological 3D Shape Decompositio: Grayscale Iterframe Iterpolatio Method D.. Vizireau Politehica Uiversity Bucharest, Romaia ae@comm.pub.ro R. M. Udrea Politehica Uiversity Bucharest, Romaia mihea@comm.pub.ro

More information

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis Itro to Algorithm Aalysis Aalysis Metrics Slides. Table of Cotets. Aalysis Metrics 3. Exact Aalysis Rules 4. Simple Summatio 5. Summatio Formulas 6. Order of Magitude 7. Big-O otatio 8. Big-O Theorems

More information

Chapter 3 Classification of FFT Processor Algorithms

Chapter 3 Classification of FFT Processor Algorithms Chapter Classificatio of FFT Processor Algorithms The computatioal complexity of the Discrete Fourier trasform (DFT) is very high. It requires () 2 complex multiplicatios ad () complex additios [5]. As

More information

Elementary Educational Computer

Elementary Educational Computer Chapter 5 Elemetary Educatioal Computer. Geeral structure of the Elemetary Educatioal Computer (EEC) The EEC coforms to the 5 uits structure defied by vo Neuma's model (.) All uits are preseted i a simplified

More information

Ones Assignment Method for Solving Traveling Salesman Problem

Ones Assignment Method for Solving Traveling Salesman Problem Joural of mathematics ad computer sciece 0 (0), 58-65 Oes Assigmet Method for Solvig Travelig Salesma Problem Hadi Basirzadeh Departmet of Mathematics, Shahid Chamra Uiversity, Ahvaz, Ira Article history:

More information

AN OPTIMIZATION NETWORK FOR MATRIX INVERSION

AN OPTIMIZATION NETWORK FOR MATRIX INVERSION 397 AN OPTIMIZATION NETWORK FOR MATRIX INVERSION Ju-Seog Jag, S~ Youg Lee, ad Sag-Yug Shi Korea Advaced Istitute of Sciece ad Techology, P.O. Box 150, Cheogryag, Seoul, Korea ABSTRACT Iverse matrix calculatio

More information

A Study on the Performance of Cholesky-Factorization using MPI

A Study on the Performance of Cholesky-Factorization using MPI A Study o the Performace of Cholesky-Factorizatio usig MPI Ha S. Kim Scott B. Bade Departmet of Computer Sciece ad Egieerig Uiversity of Califoria Sa Diego {hskim, bade}@cs.ucsd.edu Abstract Cholesky-factorizatio

More information

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution Multi-Threadig Hyper-, Multi-, ad Simultaeous Thread Executio 1 Performace To Date Icreasig processor performace Pipeliig. Brach predictio. Super-scalar executio. Out-of-order executio. Caches. Hyper-Threadig

More information

Data diverse software fault tolerance techniques

Data diverse software fault tolerance techniques Data diverse software fault tolerace techiques Complemets desig diversity by compesatig for desig diversity s s limitatios Ivolves obtaiig a related set of poits i the program data space, executig the

More information

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 12: Virtual Memory Prof. Yajig Li Uiversity of Chicago A System with Physical Memory Oly Examples: most Cray machies early PCs Memory early all embedded systems

More information

3D Model Retrieval Method Based on Sample Prediction

3D Model Retrieval Method Based on Sample Prediction 20 Iteratioal Coferece o Computer Commuicatio ad Maagemet Proc.of CSIT vol.5 (20) (20) IACSIT Press, Sigapore 3D Model Retrieval Method Based o Sample Predictio Qigche Zhag, Ya Tag* School of Computer

More information

Adaptive Resource Allocation for Electric Environmental Pollution through the Control Network

Adaptive Resource Allocation for Electric Environmental Pollution through the Control Network Available olie at www.sciecedirect.com Eergy Procedia 6 (202) 60 64 202 Iteratioal Coferece o Future Eergy, Eviromet, ad Materials Adaptive Resource Allocatio for Electric Evirometal Pollutio through the

More information

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design College of Computer ad Iformatio Scieces Departmet of Computer Sciece CSC 220: Computer Orgaizatio Uit 11 Basic Computer Orgaizatio ad Desig 1 For the rest of the semester, we ll focus o computer architecture:

More information

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem A Improved Shuffled Frog-Leapig Algorithm for Kapsack Problem Zhoufag Li, Ya Zhou, ad Peg Cheg School of Iformatio Sciece ad Egieerig Hea Uiversity of Techology ZhegZhou, Chia lzhf1978@126.com Abstract.

More information

How do we evaluate algorithms?

How do we evaluate algorithms? F2 Readig referece: chapter 2 + slides Algorithm complexity Big O ad big Ω To calculate ruig time Aalysis of recursive Algorithms Next time: Litterature: slides mostly The first Algorithm desig methods:

More information

Software development of components for complex signal analysis on the example of adaptive recursive estimation methods.

Software development of components for complex signal analysis on the example of adaptive recursive estimation methods. Software developmet of compoets for complex sigal aalysis o the example of adaptive recursive estimatio methods. SIMON BOYMANN, RALPH MASCHOTTA, SILKE LEHMANN, DUNJA STEUER Istitute of Biomedical Egieerig

More information

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS APPLICATION NOTE PACE175AE BUILT-IN UNCTIONS About This Note This applicatio brief is iteded to explai ad demostrate the use of the special fuctios that are built ito the PACE175AE processor. These powerful

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 19 Query Optimizatio Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Query optimizatio Coducted by a query optimizer i a DBMS Goal:

More information

Project 2.5 Improved Euler Implementation

Project 2.5 Improved Euler Implementation Project 2.5 Improved Euler Implemetatio Figure 2.5.10 i the text lists TI-85 ad BASIC programs implemetig the improved Euler method to approximate the solutio of the iitial value problem dy dx = x+ y,

More information

Course Site: Copyright 2012, Elsevier Inc. All rights reserved.

Course Site:   Copyright 2012, Elsevier Inc. All rights reserved. Course Site: http://cc.sjtu.edu.c/g2s/site/aca.html 1 Computer Architecture A Quatitative Approach, Fifth Editio Chapter 2 Memory Hierarchy Desig 2 Outlie Memory Hierarchy Cache Desig Basic Cache Optimizatios

More information

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 1 Itroductio to Computers ad C++ Programmig Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 1.1 Computer Systems 1.2 Programmig ad Problem Solvig 1.3 Itroductio to C++ 1.4 Testig

More information

Chapter 4 Threads. Operating Systems: Internals and Design Principles. Ninth Edition By William Stallings

Chapter 4 Threads. Operating Systems: Internals and Design Principles. Ninth Edition By William Stallings Operatig Systems: Iterals ad Desig Priciples Chapter 4 Threads Nith Editio By William Stalligs Processes ad Threads Resource Owership Process icludes a virtual address space to hold the process image The

More information

Lecture 5. Counting Sort / Radix Sort

Lecture 5. Counting Sort / Radix Sort Lecture 5. Coutig Sort / Radix Sort T. H. Corme, C. E. Leiserso ad R. L. Rivest Itroductio to Algorithms, 3rd Editio, MIT Press, 2009 Sugkyukwa Uiversity Hyuseug Choo choo@skku.edu Copyright 2000-2018

More information

End Semester Examination CSE, III Yr. (I Sem), 30002: Computer Organization

End Semester Examination CSE, III Yr. (I Sem), 30002: Computer Organization Ed Semester Examiatio 2013-14 CSE, III Yr. (I Sem), 30002: Computer Orgaizatio Istructios: GROUP -A 1. Write the questio paper group (A, B, C, D), o frot page top of aswer book, as per what is metioed

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 18 Strategies for Query Processig Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio DBMS techiques to process a query Scaer idetifies

More information

Bezier curves. Figure 2 shows cubic Bezier curves for various control points. In a Bezier curve, only

Bezier curves. Figure 2 shows cubic Bezier curves for various control points. In a Bezier curve, only Edited: Yeh-Liag Hsu (998--; recommeded: Yeh-Liag Hsu (--9; last updated: Yeh-Liag Hsu (9--7. Note: This is the course material for ME55 Geometric modelig ad computer graphics, Yua Ze Uiversity. art of

More information

Civil Engineering Computation

Civil Engineering Computation Civil Egieerig Computatio Fidig Roots of No-Liear Equatios March 14, 1945 World War II The R.A.F. first operatioal use of the Grad Slam bomb, Bielefeld, Germay. Cotets 2 Root basics Excel solver Newto-Raphso

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5.

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5. Morga Kaufma Publishers 26 February, 208 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Virtual Memory Review: The Memory Hierarchy Take advatage of the priciple

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5 Morga Kaufma Publishers 26 February, 28 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Set-Associative Cache Architecture Performace Summary Whe CPU performace icreases:

More information

Cluster Computing Spring 2004 Paul A. Farrell

Cluster Computing Spring 2004 Paul A. Farrell Cluster Computig Sprig 004 3/18/004 Parallel Programmig Overview Task Parallelism OS support for task parallelism Parameter Studies Domai Decompositio Sequece Matchig Work Assigmet Static schedulig Divide

More information

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming Lecture Notes 6 Itroductio to algorithm aalysis CSS 501 Data Structures ad Object-Orieted Programmig Readig for this lecture: Carrao, Chapter 10 To be covered i this lecture: Itroductio to algorithm aalysis

More information

Linearising Calibration Methods for a Generic Embedded Sensor Interface (GESI)

Linearising Calibration Methods for a Generic Embedded Sensor Interface (GESI) 1st Iteratioal Coferece o Sesig Techology November 21-23, 2005 Palmersto North, New Zealad Liearisig Calibratio Methods for a Geeric Embedded Sesor Iterface (GESI) Abstract Amra Pašić Work doe i: PEI Techologies,

More information

CS 683: Advanced Design and Analysis of Algorithms

CS 683: Advanced Design and Analysis of Algorithms CS 683: Advaced Desig ad Aalysis of Algorithms Lecture 6, February 1, 2008 Lecturer: Joh Hopcroft Scribes: Shaomei Wu, Etha Feldma February 7, 2008 1 Threshold for k CNF Satisfiability I the previous lecture,

More information

6.854J / J Advanced Algorithms Fall 2008

6.854J / J Advanced Algorithms Fall 2008 MIT OpeCourseWare http://ocw.mit.edu 6.854J / 18.415J Advaced Algorithms Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 18.415/6.854 Advaced Algorithms

More information

Optimal Mapped Mesh on the Circle

Optimal Mapped Mesh on the Circle Koferece ANSYS 009 Optimal Mapped Mesh o the Circle doc. Ig. Jaroslav Štigler, Ph.D. Bro Uiversity of Techology, aculty of Mechaical gieerig, ergy Istitut, Abstract: This paper brigs out some ideas ad

More information

Appendix D. Controller Implementation

Appendix D. Controller Implementation COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Appedix D Cotroller Implemetatio Cotroller Implemetatios Combiatioal logic (sigle-cycle); Fiite state machie (multi-cycle, pipelied);

More information

performance to the performance they can experience when they use the services from a xed location.

performance to the performance they can experience when they use the services from a xed location. I the Proceedigs of The First Aual Iteratioal Coferece o Mobile Computig ad Networkig (MobiCom 9) November -, 99, Berkeley, Califoria USA Performace Compariso of Mobile Support Strategies Rieko Kadobayashi

More information

Bayesian approach to reliability modelling for a probability of failure on demand parameter

Bayesian approach to reliability modelling for a probability of failure on demand parameter Bayesia approach to reliability modellig for a probability of failure o demad parameter BÖRCSÖK J., SCHAEFER S. Departmet of Computer Architecture ad System Programmig Uiversity Kassel, Wilhelmshöher Allee

More information

LU Decomposition Method

LU Decomposition Method SOLUTION OF SIMULTANEOUS LINEAR EQUATIONS LU Decompositio Method Jamie Traha, Autar Kaw, Kevi Marti Uiversity of South Florida Uited States of America kaw@eg.usf.edu http://umericalmethods.eg.usf.edu Itroductio

More information

Cubic Polynomial Curves with a Shape Parameter

Cubic Polynomial Curves with a Shape Parameter roceedigs of the th WSEAS Iteratioal Coferece o Robotics Cotrol ad Maufacturig Techology Hagzhou Chia April -8 00 (pp5-70) Cubic olyomial Curves with a Shape arameter MO GUOLIANG ZHAO YANAN Iformatio ad

More information

Harris Corner Detection Algorithm at Sub-pixel Level and Its Application Yuanfeng Han a, Peijiang Chen b * and Tian Meng c

Harris Corner Detection Algorithm at Sub-pixel Level and Its Application Yuanfeng Han a, Peijiang Chen b * and Tian Meng c Iteratioal Coferece o Computatioal Sciece ad Egieerig (ICCSE 015) Harris Corer Detectio Algorithm at Sub-pixel Level ad Its Applicatio Yuafeg Ha a, Peijiag Che b * ad Tia Meg c School of Automobile, Liyi

More information

Consider the following population data for the state of California. Year Population

Consider the following population data for the state of California. Year Population Assigmets for Bradie Fall 2016 for Chapter 5 Assigmet sheet for Sectios 5.1, 5.3, 5.5, 5.6, 5.7, 5.8 Read Pages 341-349 Exercises for Sectio 5.1 Lagrage Iterpolatio #1, #4, #7, #13, #14 For #1 use MATLAB

More information

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Pseudocode ( 1.1) High-level descriptio of a algorithm More structured

More information

MATHEMATICAL METHODS OF ANALYSIS AND EXPERIMENTAL DATA PROCESSING (Or Methods of Curve Fitting)

MATHEMATICAL METHODS OF ANALYSIS AND EXPERIMENTAL DATA PROCESSING (Or Methods of Curve Fitting) MATHEMATICAL METHODS OF ANALYSIS AND EXPERIMENTAL DATA PROCESSING (Or Methods of Curve Fittig) I this chapter, we will eamie some methods of aalysis ad data processig; data obtaied as a result of a give

More information

Task scenarios Outline. Scenarios in Knowledge Extraction. Proposed Framework for Scenario to Design Diagram Transformation

Task scenarios Outline. Scenarios in Knowledge Extraction. Proposed Framework for Scenario to Design Diagram Transformation 6-0-0 Kowledge Trasformatio from Task Scearios to View-based Desig Diagrams Nima Dezhkam Kamra Sartipi {dezhka, sartipi}@mcmaster.ca Departmet of Computig ad Software McMaster Uiversity CANADA SEKE 08

More information

. Written in factored form it is easy to see that the roots are 2, 2, i,

. Written in factored form it is easy to see that the roots are 2, 2, i, CMPS A Itroductio to Programmig Programmig Assigmet 4 I this assigmet you will write a java program that determies the real roots of a polyomial that lie withi a specified rage. Recall that the roots (or

More information

BOOLEAN MATHEMATICS: GENERAL THEORY

BOOLEAN MATHEMATICS: GENERAL THEORY CHAPTER 3 BOOLEAN MATHEMATICS: GENERAL THEORY 3.1 ISOMORPHIC PROPERTIES The ame Boolea Arithmetic was chose because it was discovered that literal Boolea Algebra could have a isomorphic umerical aspect.

More information

What are we going to learn? CSC Data Structures Analysis of Algorithms. Overview. Algorithm, and Inputs

What are we going to learn? CSC Data Structures Analysis of Algorithms. Overview. Algorithm, and Inputs What are we goig to lear? CSC316-003 Data Structures Aalysis of Algorithms Computer Sciece North Carolia State Uiversity Need to say that some algorithms are better tha others Criteria for evaluatio Structure

More information

1 Enterprise Modeler

1 Enterprise Modeler 1 Eterprise Modeler Itroductio I BaaERP, a Busiess Cotrol Model ad a Eterprise Structure Model for multi-site cofiguratios are itroduced. Eterprise Structure Model Busiess Cotrol Models Busiess Fuctio

More information

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers *

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers * Load balaced Parallel Prime umber Geerator with Sieve of Eratosthees o luster omputers * Soowook Hwag*, Kyusik hug**, ad Dogseug Kim* *Departmet of Electrical Egieerig Korea Uiversity Seoul, -, Rep. of

More information

APPLICATION NOTE. Automated Gain Flattening. 1. Experimental Setup. Scope and Overview

APPLICATION NOTE. Automated Gain Flattening. 1. Experimental Setup. Scope and Overview APPLICATION NOTE Automated Gai Flatteig Scope ad Overview A flat optical power spectrum is essetial for optical telecommuicatio sigals. This stems from a eed to balace the chael powers across large distaces.

More information

The University of Adelaide, School of Computer Science 22 November Computer Architecture. A Quantitative Approach, Sixth Edition.

The University of Adelaide, School of Computer Science 22 November Computer Architecture. A Quantitative Approach, Sixth Edition. Computer Architecture A Quatitative Approach, Sixth Editio Chapter 2 Memory Hierarchy Desig 1 Itroductio Programmers wat ulimited amouts of memory with low latecy Fast memory techology is more expesive

More information

Neural Networks A Model of Boolean Functions

Neural Networks A Model of Boolean Functions Neural Networks A Model of Boolea Fuctios Berd Steibach, Roma Kohut Freiberg Uiversity of Miig ad Techology Istitute of Computer Sciece D-09596 Freiberg, Germay e-mails: steib@iformatik.tu-freiberg.de

More information

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments Ruig Time Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects. The

More information

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON Roberto Lopez ad Eugeio Oñate Iteratioal Ceter for Numerical Methods i Egieerig (CIMNE) Edificio C1, Gra Capitá s/, 08034 Barceloa, Spai ABSTRACT I this work

More information

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments Ruig Time ( 3.1) Aalysis of Algorithms Iput Algorithm Output A algorithm is a step- by- step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects.

More information

Analysis of Algorithms

Analysis of Algorithms Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Ruig Time Most algorithms trasform iput objects ito output objects. The

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter The Processor Part A path Desig Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler. CPI ad

More information

Pattern Recognition Systems Lab 1 Least Mean Squares

Pattern Recognition Systems Lab 1 Least Mean Squares Patter Recogitio Systems Lab 1 Least Mea Squares 1. Objectives This laboratory work itroduces the OpeCV-based framework used throughout the course. I this assigmet a lie is fitted to a set of poits usig

More information

1. SWITCHING FUNDAMENTALS

1. SWITCHING FUNDAMENTALS . SWITCING FUNDMENTLS Switchig is the provisio of a o-demad coectio betwee two ed poits. Two distict switchig techiques are employed i commuicatio etwors-- circuit switchig ad pacet switchig. Circuit switchig

More information

Data Structures and Algorithms. Analysis of Algorithms

Data Structures and Algorithms. Analysis of Algorithms Data Structures ad Algorithms Aalysis of Algorithms Outlie Ruig time Pseudo-code Big-oh otatio Big-theta otatio Big-omega otatio Asymptotic algorithm aalysis Aalysis of Algorithms Iput Algorithm Output

More information

Lecture 28: Data Link Layer

Lecture 28: Data Link Layer Automatic Repeat Request (ARQ) 2. Go ack N ARQ Although the Stop ad Wait ARQ is very simple, you ca easily show that it has very the low efficiecy. The low efficiecy comes from the fact that the trasmittig

More information

EE 435. Lecture 26. Data Converters. Architectures. Characterization

EE 435. Lecture 26. Data Converters. Architectures. Characterization EE 435 Lecture 26 Data Coverters Architectures Characterizatio . Review from last lecture. Data Coverters Types: A/D (Aalog to Digital) Coverts Aalog Iput to a Digital Output D/A (Digital to Aalog) Coverts

More information

Session Initiated Protocol (SIP) and Message-based Load Balancing (MBLB)

Session Initiated Protocol (SIP) and Message-based Load Balancing (MBLB) F5 White Paper Sessio Iitiated Protocol (SIP) ad Message-based Load Balacig (MBLB) The ability to provide ew ad creative methods of commuicatios has esured a SIP presece i almost every orgaizatio. The

More information

Markov Chain Model of HomePlug CSMA MAC for Determining Optimal Fixed Contention Window Size

Markov Chain Model of HomePlug CSMA MAC for Determining Optimal Fixed Contention Window Size Markov Chai Model of HomePlug CSMA MAC for Determiig Optimal Fixed Cotetio Widow Size Eva Krimiger * ad Haiph Latchma Dept. of Electrical ad Computer Egieerig, Uiversity of Florida, Gaiesville, FL, USA

More information

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation Improvemet of the Orthogoal Code Covolutio Capabilities Usig FPGA Implemetatio Naima Kaabouch, Member, IEEE, Apara Dhirde, Member, IEEE, Saleh Faruque, Member, IEEE Departmet of Electrical Egieerig, Uiversity

More information

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control EE 459/500 HDL Based Digital Desig with Programmable Logic Lecture 13 Cotrol ad Sequecig: Hardwired ad Microprogrammed Cotrol Refereces: Chapter s 4,5 from textbook Chapter 7 of M.M. Mao ad C.R. Kime,

More information

Python Programming: An Introduction to Computer Science

Python Programming: An Introduction to Computer Science Pytho Programmig: A Itroductio to Computer Sciece Chapter 6 Defiig Fuctios Pytho Programmig, 2/e 1 Objectives To uderstad why programmers divide programs up ito sets of cooperatig fuctios. To be able to

More information

Algorithms for Disk Covering Problems with the Most Points

Algorithms for Disk Covering Problems with the Most Points Algorithms for Disk Coverig Problems with the Most Poits Bi Xiao Departmet of Computig Hog Kog Polytechic Uiversity Hug Hom, Kowloo, Hog Kog csbxiao@comp.polyu.edu.hk Qigfeg Zhuge, Yi He, Zili Shao, Edwi

More information

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19 CIS Data Structures ad Algorithms with Java Sprig 09 Stacks, Queues, ad Heaps Moday, February 8 / Tuesday, February 9 Stacks ad Queues Recall the stack ad queue ADTs (abstract data types from lecture.

More information

Fast Fourier Transform (FFT) Algorithms

Fast Fourier Transform (FFT) Algorithms Fast Fourier Trasform FFT Algorithms Relatio to the z-trasform elsewhere, ozero, z x z X x [ ] 2 ~ elsewhere,, ~ e j x X x x π j e z z X X π 2 ~ The DFS X represets evely spaced samples of the z- trasform

More information

FAST BIT-REVERSALS ON UNIPROCESSORS AND SHARED-MEMORY MULTIPROCESSORS

FAST BIT-REVERSALS ON UNIPROCESSORS AND SHARED-MEMORY MULTIPROCESSORS SIAM J. SCI. COMPUT. Vol. 22, No. 6, pp. 2113 2134 c 21 Society for Idustrial ad Applied Mathematics FAST BIT-REVERSALS ON UNIPROCESSORS AND SHARED-MEMORY MULTIPROCESSORS ZHAO ZHANG AND XIAODONG ZHANG

More information

GPUMP: a Multiple-Precision Integer Library for GPUs

GPUMP: a Multiple-Precision Integer Library for GPUs GPUMP: a Multiple-Precisio Iteger Library for GPUs Kaiyog Zhao ad Xiaowe Chu Departmet of Computer Sciece, Hog Kog Baptist Uiversity Hog Kog, P. R. Chia Email: {kyzhao, chxw}@comp.hkbu.edu.hk Abstract

More information

Introduction to SWARM Software and Algorithms for Running on Multicore Processors

Introduction to SWARM Software and Algorithms for Running on Multicore Processors Itroductio to SWARM Software ad Algorithms for Ruig o Multicore Processors David A. Bader Georgia Istitute of Techology http://www.cc.gatech.edu/~bader Tutorial compiled by Rucheek H. Sagai M.S. Studet,

More information

Improving Template Based Spike Detection

Improving Template Based Spike Detection Improvig Template Based Spike Detectio Kirk Smith, Member - IEEE Portlad State Uiversity petra@ee.pdx.edu Abstract Template matchig algorithms like SSE, Covolutio ad Maximum Likelihood are well kow for

More information

CIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8)

CIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8) CIS 11 Data Structures ad Algorithms with Java Fall 017 Big-Oh Notatio Tuesday, September 5 (Make-up Friday, September 8) Learig Goals Review Big-Oh ad lear big/small omega/theta otatios Practice solvig

More information

CS2410 Computer Architecture. Flynn s Taxonomy

CS2410 Computer Architecture. Flynn s Taxonomy CS2410 Computer Architecture Dept. of Computer Sciece Uiversity of Pittsburgh http://www.cs.pitt.edu/~melhem/courses/2410p/idex.html 1 Fly s Taxoomy SISD Sigle istructio stream Sigle data stream (SIMD)

More information

Using VTR Emulation on Avid Systems

Using VTR Emulation on Avid Systems Usig VTR Emulatio o Avid Systems VTR emulatio allows you to cotrol a sequece loaded i the Record moitor from a edit cotroller for playback i the edit room alog with other sources. I this sceario the edit

More information

A Note on Least-norm Solution of Global WireWarping

A Note on Least-norm Solution of Global WireWarping A Note o Least-orm Solutio of Global WireWarpig Charlie C. L. Wag Departmet of Mechaical ad Automatio Egieerig The Chiese Uiversity of Hog Kog Shati, N.T., Hog Kog E-mail: cwag@mae.cuhk.edu.hk Abstract

More information

Isn t It Time You Got Faster, Quicker?

Isn t It Time You Got Faster, Quicker? Is t It Time You Got Faster, Quicker? AltiVec Techology At-a-Glace OVERVIEW Motorola s advaced AltiVec techology is desiged to eable host processors compatible with the PowerPC istructio-set architecture

More information

CMSC Computer Architecture Lecture 11: More Caches. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 11: More Caches. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 11: More Caches Prof. Yajig Li Uiversity of Chicago Lecture Outlie Caches 2 Review Memory hierarchy Cache basics Locality priciples Spatial ad temporal How to access

More information

SCI Reflective Memory

SCI Reflective Memory Embedded SCI Solutios SCI Reflective Memory (Experimetal) Atle Vesterkjær Dolphi Itercoect Solutios AS Olaf Helsets vei 6, N-0621 Oslo, Norway Phoe: (47) 23 16 71 42 Fax: (47) 23 16 71 80 Mail: atleve@dolphiics.o

More information

A General Framework for Accurate Statistical Timing Analysis Considering Correlations

A General Framework for Accurate Statistical Timing Analysis Considering Correlations A Geeral Framework for Accurate Statistical Timig Aalysis Cosiderig Correlatios 7.4 Vishal Khadelwal Departmet of ECE Uiversity of Marylad-College Park vishalk@glue.umd.edu Akur Srivastava Departmet of

More information

Numerical Methods Lecture 6 - Curve Fitting Techniques

Numerical Methods Lecture 6 - Curve Fitting Techniques Numerical Methods Lecture 6 - Curve Fittig Techiques Topics motivatio iterpolatio liear regressio higher order polyomial form expoetial form Curve fittig - motivatio For root fidig, we used a give fuctio

More information

Higher-order iterative methods free from second derivative for solving nonlinear equations

Higher-order iterative methods free from second derivative for solving nonlinear equations Iteratioal Joural of the Phsical Scieces Vol 6(8, pp 887-89, 8 April, Available olie at http://wwwacademicjouralsorg/ijps DOI: 5897/IJPS45 ISSN 99-95 Academic Jourals Full Legth Research Paper Higher-order

More information

n Explore virtualization concepts n Become familiar with cloud concepts

n Explore virtualization concepts n Become familiar with cloud concepts Chapter Objectives Explore virtualizatio cocepts Become familiar with cloud cocepts Chapter #15: Architecture ad Desig 2 Hypervisor Virtualizatio ad cloud services are becomig commo eterprise tools to

More information

Appendix A. Use of Operators in ARPS

Appendix A. Use of Operators in ARPS A Appedix A. Use of Operators i ARPS The methodology for solvig the equatios of hydrodyamics i either differetial or itegral form usig grid-poit techiques (fiite differece, fiite volume, fiite elemet)

More information

Chapter 11. Friends, Overloaded Operators, and Arrays in Classes. Copyright 2014 Pearson Addison-Wesley. All rights reserved.

Chapter 11. Friends, Overloaded Operators, and Arrays in Classes. Copyright 2014 Pearson Addison-Wesley. All rights reserved. Chapter 11 Frieds, Overloaded Operators, ad Arrays i Classes Copyright 2014 Pearso Addiso-Wesley. All rights reserved. Overview 11.1 Fried Fuctios 11.2 Overloadig Operators 11.3 Arrays ad Classes 11.4

More information

Computer Graphics Hardware An Overview

Computer Graphics Hardware An Overview Computer Graphics Hardware A Overview Graphics System Moitor Iput devices CPU/Memory GPU Raster Graphics System Raster: A array of picture elemets Based o raster-sca TV techology The scree (ad a picture)

More information

ISSN (Print) Research Article. *Corresponding author Nengfa Hu

ISSN (Print) Research Article. *Corresponding author Nengfa Hu Scholars Joural of Egieerig ad Techology (SJET) Sch. J. Eg. Tech., 2016; 4(5):249-253 Scholars Academic ad Scietific Publisher (A Iteratioal Publisher for Academic ad Scietific Resources) www.saspublisher.com

More information

UNIVERSITY OF MORATUWA

UNIVERSITY OF MORATUWA UNIVERSITY OF MORATUWA FACULTY OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING B.Sc. Egieerig 2014 Itake Semester 2 Examiatio CS2052 COMPUTER ARCHITECTURE Time allowed: 2 Hours Jauary 2016

More information

Announcements. Reading. Project #4 is on the web. Homework #1. Midterm #2. Chapter 4 ( ) Note policy about project #3 missing components

Announcements. Reading. Project #4 is on the web. Homework #1. Midterm #2. Chapter 4 ( ) Note policy about project #3 missing components Aoucemets Readig Chapter 4 (4.1-4.2) Project #4 is o the web ote policy about project #3 missig compoets Homework #1 Due 11/6/01 Chapter 6: 4, 12, 24, 37 Midterm #2 11/8/01 i class 1 Project #4 otes IPv6Iit,

More information

Master Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1

Master Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1 Advaced Architectures Master Iformatics Eg. 2017/18 A.J.Proeça Memory Hierarchy (most slides are borrowed) AJProeça, Advaced Architectures, MiEI, UMiho, 2017/18 1 Itroductio Programmers wat ulimited amouts

More information

Big-O Analysis. Asymptotics

Big-O Analysis. Asymptotics Big-O Aalysis 1 Defiitio: Suppose that f() ad g() are oegative fuctios of. The we say that f() is O(g()) provided that there are costats C > 0 ad N > 0 such that for all > N, f() Cg(). Big-O expresses

More information

It just came to me that I 8.2 GRAPHS AND CONVERGENCE

It just came to me that I 8.2 GRAPHS AND CONVERGENCE 44 Chapter 8 Discrete Mathematics: Fuctios o the Set of Natural Numbers (a) Take several odd, positive itegers for a ad write out eough terms of the 3N sequece to reach a repeatig loop (b) Show that ot

More information

Performance Plus Software Parameter Definitions

Performance Plus Software Parameter Definitions Performace Plus+ Software Parameter Defiitios/ Performace Plus Software Parameter Defiitios Chapma Techical Note-TG-5 paramete.doc ev-0-03 Performace Plus+ Software Parameter Defiitios/2 Backgroud ad Defiitios

More information

Analysis of Algorithms

Analysis of Algorithms Presetatio for use with the textbook, Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 2015 Aalysis of Algorithms Iput 2015 Goodrich ad Tamassia Algorithm Aalysis of Algorithms

More information

Outline and Reading. Analysis of Algorithms. Running Time. Experimental Studies. Limitations of Experiments. Theoretical Analysis

Outline and Reading. Analysis of Algorithms. Running Time. Experimental Studies. Limitations of Experiments. Theoretical Analysis Outlie ad Readig Aalysis of Algorithms Iput Algorithm Output Ruig time ( 3.) Pseudo-code ( 3.2) Coutig primitive operatios ( 3.3-3.) Asymptotic otatio ( 3.6) Asymptotic aalysis ( 3.7) Case study Aalysis

More information

NON-LINEAR MODELLING OF A GEOTHERMAL STEAM PIPE

NON-LINEAR MODELLING OF A GEOTHERMAL STEAM PIPE 14thNew Zealad Workshop 1992 NON-LNEAR MODELLNG OF A GEOTHERMAL STEAM PPE Y. Huag ad D. H. Freesto Geothermal stitute, Uiversity of Aucklad SUMMARY Recet work o developig a o-liear model for a geothermal

More information

CMSC Computer Architecture Lecture 10: Caches. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 10: Caches. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 10: Caches Prof. Yajig Li Uiversity of Chicago Midterm Recap Overview ad fudametal cocepts ISA Uarch Datapath, cotrol Sigle cycle, multi cycle Pipeliig Basic idea,

More information