Shared Virtual Memory Machines. Mississippi State, MS Abstract

Size: px
Start display at page:

Download "Shared Virtual Memory Machines. Mississippi State, MS Abstract"

Transcription

1 Performance Consderatons of Shared Vrtual Memory Machnes Xan-He Sun Janpng Zhu Department of Computer Scence NSF Engneerng Research Center Lousana State Unversty Dept. of Math. and Stat. Baton Rouge, LA Msssspp State Unversty Msssspp State, MS Abstract Generalzed speedup s dened as parallel speed over sequental speed. In ths paper the generalzed speedup and ts relaton wth other exstng performance metrcs, such as tradtonal speedup, ecency, scalablty, etc., are carefully studed. In terms of the ntroduced asymptotc speed, we show that the derence between the generalzed speedup and the tradtonal speedup les n the denton of the ecency of unprocessor processng, whch s a very mportant ssue n shared vrtual memory machnes. A scentc applcaton has been mplemented on a KSR-1 parallel computer. Expermental and theoretcal results show that the generalzed speedup s dstnct from the tradtonal speedup and provdes a more reasonable measurement. In the study of derent speedups, an nterestng relaton between xed-tme and memory-bounded speedup s revealed. Varous causes of superlnear speedup are also presented. Manuscrpt receved March 5, 1994; revsed Nov. 14, 1994 and March 14, Ths research was supported n part by the Natonal Aeronautcs and Space Admnstraton under NASA contract NAS and NAS1-1672/MMP.

2 Index Terms: Hgh Performance Computng, Parallel Processng, Performance Evaluaton, Performance Metrcs, Scalablty, Speedup, Shared Vrtual Memory

3 Address for correspondence: Xan-He Sun Department of Computer Scence Lousana State Unversty Baton Rouge, LA (504) ,

4 1 Introducton In recent years parallel processng has enjoyed unprecedented attenton from researchers, government agences, and ndustres. Ths attenton s manly due to the fact that, wth the current crcut technology, parallel processng seems to be the only remanng way to acheve hgher performance. However, whle varous parallel computers and algorthms have been developed, ther performance evaluaton s stll elusve. In fact, the more advanced the hardware and software, the more dcult t s to evaluate the parallel performance. In ths paper, targetng recent development of shared vrtual memory machnes, we study the generalzed speedup [1] performance metrc, ts relaton wth other exstng performance metrcs, and the mplementaton ssues. Dstrbuted-memory parallel computers domnate today's parallel computng arena. These machnes, such as the Kendall Square KSR-1, Intel Paragon, TMC CM-5, and IBM SP2, have successfully delvered hgh performance computng power for solvng some of the so-called \grandchallenge" problems. From the vewpont of processes, there are two basc process synchronzaton and communcaton models. One s the shared-memory model n whch processes communcate through shared varables. The other s the message-passng model n whch processes communcate through explct message passng. The shared-memory model provdes a sequental-lke program paradgm. Vrtual address space separates the user logcal memory from physcal memory. Ths separaton allows an extremely large vrtual memory to be provded on a sequental machne when only a small physcal memory s avalable. Shared vrtual address combnes the prvate vrtual address spaces dstrbuted over the nodes of a parallel computer nto a globally shared vrtual memory [2]. th shared vrtual address space, the shared-memory model supports shared vrtual memory, but requres sophstcated hardware and system support. An example of a dstrbuted-memory machne whch supports shared vrtual address space s the Kendall Square KSR-1 1. Shared vrtual memory smples the software development and portng process by enablng even extremely large programs to run on a sngle processor before beng parttoned and dstrbuted across multple processors. However, the memory access of the shared vrtual memory s non-unform [2]. The access tme of local memory and remote memory s derent. Runnng a large program on a small number of processors s possble but could be very necent. The necent sequental processng wll lead to a msleadng hgh performance n terms of speedup or ecency. Generalzed speedup, dened as parallel speed over sequental speed, s a newly proposed performance metrc [1]. In ths paper, through both theoretcal proofs and expermental results, we show that generalzed speedup provdes a more reasonable measurement than tradtonal speedup. In the process of studyng generalzed speedup, the relaton between the generalzed speedup and many other metrcs, such as ecency, scaled speedup, scalablty, are also studed. The relaton 1 Tradtonally, the message-passng model s bounded by the local memory of the processng processors. th recent technology advancement, the message-passng model has extended the ablty to support shared vrtual memory. 1

5 between xed-tme and memory-bounded scaled speedup s analyzed. Varous reasons for superlnearty n derent speedups are also dscussed. Results show that the man derence between the tradtonal speedup and the generalzed speedup s how to evaluate the ecency of the sequental processng on a sngle processor. The paper s organzed as follows. In secton 2 we study tradtonal speedup, ncludng the scaled speedup concept, and ntroduce some termnology. Analyss shows that the tradtonal speedup, xed-sze or scaled sze, may acheve superlnearty on shared vrtual memory machnes. Furthermore, wth the tradtonal speedup metrc, the slower the remote memory access s, the larger the speedup. Generalzed speedup s studed n Secton 3. The term asymptotc speed s ntroduced for the measurement of generalzed speedup. Analyss shows the derences and the smlartes between the generalzed speedup and the tradtonal speedup. Relatons between derent performance metrcs are also dscussed. Expermental results of a producton applcaton on a Kendall Square KSR-1 parallel computer are gven n Secton 4. Secton 5 contans a summary. 2 The Tradtonal Speedup One of the most frequently used performance metrcs n parallel processng s speedup. It s de- ned as sequental executon tme over parallel executon tme. Parallel algorthms often explot parallelsm by sacrcng mathematcal ecency. To measure the true parallel processng gan, the sequental executon tme should be based on a commonly used sequental algorthm. To dstngush t from other nterpretatons of speedup, the speedup measured wth a commonly used sequental algorthm has been called absolute speedup [3]. Another wdely used nterpretaton s the relatve speedup [3], whch uses the unprocessor executon tme of the parallel algorthm as the sequental tme. There are several reasons to use the relatve speedup. Frst, the performance of an algorthm vares wth the number of processors. Relatve speedup measures the varaton. Second, relatve speedup avods the dculty of choosng the practcal sequental algorthm, mplementng the sequental algorthm, and matchng the mplementaton/programmng skll between the sequental algorthm and the parallel algorthm. Also, when problem sze s xed, the tme rato of the chosen sequental algorthm and the unprocessor executon of the parallel algorthm s xed. Therefore, the relatve speedup s proportonal to the absolute speedup. Relatve speedup s the speedup commonly used n performance study. In ths study we wll focus on relatve speedup and reserve the terms tradtonal speedup and speedup for relatve speedup. The concepts and results of ths study can be extended to absolute speedup. From the problem sze pont of vew, speedup can be dvded nto the xed-sze speedup and the scaled speedup. Fxed-sze speedup emphaszes how much executon tme can be reduced wth parallel processng. Amdahl's law [4] s based on the xed-sze speedup. The scaled speedup s concentrated on explorng the computatonal power of parallel computers for solvng otherwse 2

6 ntractable large problems. Dependng on the scalng restrctons of the problem sze, the scaled speedup can be classed as the xed-tme speedup [5] and the memory-bounded speedup [6]. As the number of processors ncreases, xed-tme speedup scales problem sze to meet the xed executon tme. Then the scaled problem s also solved on an unprocessor to get the speedup. As the number of processors ncreases, memory-bounded speedup scales problem sze to utlze the assocated memory ncrease. A detaled study of the memory-bounded speedup can be found n [6]. Let p and S p be the number of processors and the speedup wth p processors. Denton 1 Superlnear speedup: S p > p Untary speedup: S p = p. Lnear speedup: S p = a p for some constant a > 0. It s debatable f any machne-algorthm par can acheve \truly" superlnear speedup. Seven possble causes of superlnear speedup are lsted n Fg. 1. The rst four causes n Fg. 1 are patterned from [7]. 1. cache sze ncreased n parallel processng 2. overhead reduced n parallel processng 3. latency hdden n parallel processng 4. randomzed algorthms 5. mathematcal necency of the seral algorthm 6. hgher memory access latency n the sequental processng 7. prole shftng Fgure 1. Causes of Superlnear Speedup. Cause 1 s unlkely applcable for scaled speedup, snce when problem sze scales up, by memory or by tme constrant, the cache ht rato s unlkely to ncrease. Cause 2 n Fg. 1 can be consdered theoretcally [8], there s no measured superlnear speedup ever attrbuted to t. Cause 3 does not exst for relatve speedup snce both the sequental and parallel executon use the same algorthm. Snce parallel algorthms are often mathematcally necent, cause 5 s a lkely source of superlnear 3

7 speedup of relatve speedup. A good example of superlnear speedup based on 5 can be found n [9]. Cause 7 wll be explaned n the end of Secton 3, after the generalzed speedup s ntroduced. th the vrtual memory and shared vrtual memory archtecture, cause 6 can lead to an extremely hgh speedup, especally for scaled speedup where an extremely large problem has to be run on a sngle processor. Fgure 5 shows a measured superlnear speedup on a KSR-1 machne. The measured superlnear speedup s due to the nherent decency of the tradtonal speedup metrc. To analyze the decency of the tradtonal speedup, we need to ntroduce the followng denton. Denton 2 The cost of parallelsm s the rato of the total number of processor cycles consumed n order to perform one unt operaton of work when processors are actve to the machne clock rate. The sequental executon tme can be wrtten n terms of work: Sequental executon tme = Amount of work Processor cycles per unt of work : (1) Machne clock rate The rato n the rght hand sde of Eq. (1), processor cycles per unt of work over machne clock rate, s the cost of sequental processng. ork can be dened as arthmetc operatons, nstructons, transtons, or whatever s needed to complete the applcaton. In scentc computng the number of oatng-pont operatons (FLOPS) s commonly used to measure work. In general, work may be of derent types, and unts of derent operatons may requre derent numbers of nstructon cycles to nsh. (For example, the tmes consumed by one dvson and one multplcaton may be derent dependng on the underlyng machne, and operaton and memory reference rato may be derent for derent computatons.) The nuence of work type on the performance s one of the topcs studed n [1]. In ths paper, we study the nuence of necent memory access on the performance. e assume that there s only one work type and that any ncrease n the number of processor cycles s due to necent memory access. In a shared vrtual memory envronment, the memory avalable depends on the system sze. Let be the amount of work executed when processors are actve (work performed n all steps that use processors), and let = P p represent the total work. The cost of parallelsm n a p processor system, denoted as c p (; ), s the elapsed tme for one unt operaton of work when processors are actve. Then, c p (; ) gves the accumulated elapsed tme where processors are actve. c p (; ) contans both computaton tme and remote memory access tme. The unprocessor executon tme can be represented n terms of unprocessor cost. t(1) = px c p (s; ); 4

8 where c p (s; ) s the cost of sequental processng on a parallel system wth p processors. It s derent from c p (1; ) whch s the cost of the sequental porton of the parallel processng. Parallel executon tme can be represented n terms of parallel cost, The tradtonal speedup s dened as t(p) = px P p c p (; ): P S p = t(1) p t(p) = c p (s; ) c p (; ) : (2) Dependng on archtecture memory herarchy, n general c p (; ) may not equal c p (j; ) for 6= j [10]. If c p (; ) = c p (p; ), for 1 < p, then S p = c p(s; ) c p P (p; ) p : (3) The rst rato of Eq. (3) s the cost rato, whch gves the nuence of memory access delay. The second rato, P p s the smple analytc model based on degree of parallelsm [6]. It assumes that memory access tme s constant as problem sze and system sze vary. The cost rato dstngushes the derent performance analyss methods wth or wthout consderaton of the memory nuence. In general, cost rato depends on memory mss rato, page replacement polcy, data reference pattern, etc. Let remote access rato be the quotent of the number of remote memory accesses and the number of local memory accesses. For a smple case, f we assume there s no remote access n parallel processng and the remote access rato of the sequental processng s (p? 1)=p, then c p (s; ) c p (p; ) = 1 p + p? 1 tme of per remote access p tme of per local access : (5) Equaton (5) approxmately equals the tme of per remote access over the tme of per local access. Snce the remote memory access s much slower than the local memory access under the current technology, the speedup gven by Eq. (3) could be consderably larger than the smple analytc model (4). In fact, the slower the remote access s, the larger the derence. For the KSR-1, the tme rato of remote and P local access s about 7.5 (see Secton 4). Therefore, for p = 32, the cost p rato s 7.3. For any = > 0:14, under the assumed remote access rato, we wll have a superlnear speedup. (4) 5

9 3 The Generalzed Speedup hle parallel computers are desgned for solvng large problems, a sngle processor of a parallel computer s not desgned to solve a very large problem. A unprocessor does not have the computng power that the parallel system has. hle solvng a small problem s napproprate on a parallel system, solvng a large problem on a sngle processor s not approprate ether. To create a useful comparson, we need a metrc that can vary problem szes for unprocessor and multple processors. Generalzed speedup [1] s one such metrc. Generalzed Speedup = Parallel Speed Sequental Speed : (6) Speed s dened as the quotent of work and elapsed tme. Parallel speed mght be based on scaled parallel work. Sequental speed mght be based on the unscaled unprocessor work. By denton, generalzed speedup measures the speed mprovement of parallel processng over sequental processng. In contrast, the tradtonal speedup (2) measures tme reducton of parallel processng. If the problem sze (work) for both parallel and sequental processng are the same, the generalzed speedup s the same as the tradtonal speedup. From ths pont of vew, the tradtonal speedup s a specal case of the generalzed speedup. For ths and for hstorcal reasons, we sometmes call the tradtonal speedup the speedup, and call the speedup gven n Eq. (6) the generalzed speedup. Lke the tradtonal speedup, the generalzed speedup can also be further dvded nto xedsze, xed-tme, and memory-bounded speedup. Unlke the tradtonal speedup, for the generalzed speedup, the scaled problem s solved only on multple processors. speedup s szeup [1]. The xed-tme benchmark SLALOM [11] s based on szeup. The xed-tme generalzed If memory access tme s xed, one mght always assume that the unprocessor cost c p (s) wll be stablzed after some ntal decrease (due to ntalzaton, loop overhead, etc.), assumng the memory s large enough. hen cache and remote memory access are consdered, cost wll ncrease when a slower memory has to be accessed. Fgure 2 depcts the typcal cost changng pattern. From Eq. (1), we can see that unprocessor speed s the recprocal of unprocessor cost. hen the cost reaches ts lowest value, the speed reaches ts hghest value. The unprocessor speed correspondng to the stablzed man memory cost s called the asymptotc speed (of unprocessor). Asymptotc speed represents the performance of the sequental processng wth ecent memory access. The asymptotc speed s the approprate sequental speed for Eq. (6). For memorybounded speedup, the approprate memory bound s the largest problem sze whch can mantan the asymptotc speed. After choosng the asymptotc speed as the sequental speed, the correspondng asymptotc cost has only local access and s ndependent of the problem sze. e use c(s; 0 ) to denote the correspondng asymptotc cost, where 0 s a problem sze whch acheves the asymptotc speed. If there s no remote access n parallel processng, as assumed n Secton 2, then c(s; 0 )=c p (p; 0 ) = 1. By Eq. (3), the correspondng speedup equals the smple speedup 6

10 Cost Insuffcent Memory Increases Sequental Executon Tme Fts n Cache Fts n Man Memory Fts n Remote Memory Problem Sze Fgure 2. Cost Varaton Pattern. whch does not consder the nuence of memory access tme. In general, parallel work s not the same as 0, and c p (; ) may not equal c p (p; ) for 1 p. So, n general, we have Generalzed Speedup = P p cp(; ) 1 c(s;0) = P p c(s; 0 ) c p (; ) : (7) Equaton (7) s another form of the generalzed speedup. It s a quotent of sequental and parallel tme as s tradtonal speedup (2). The derence s that, n Eq. (7), the sequental tme s based on the asymptotc speed. hen remote memory s needed for sequental processng, c(s; 0 ) s smaller than c p (s; ). Therefore, the generalzed speedup gves a smaller speedup than tradtonal speedup. Parallel ecency s dened as Ecency = The Generalzed ecency can be dened smlarly as By denton, Generalzed Ecency = speedup number of processors : (8) generalzed speedup number of processors : (9) and Ecency = p P p Generalzed Ecency = c(s; ) c p (; ) p P p c(s; 0 ) (10) c p (; ) : (11) 7

11 Equatons (10) and (11) show the derence between the two ecences. Tradtonal speedup compares parallel processng wth the measured sequental processng. Generalzed speedup compares parallel processng wth the sequental processng based on the asymptotc cost. From ths pont of vew, generalzed speedup s a reform of tradtonal speedup. The followng lemmas are drect results of Eq.(7). Lemma 1 If c p (s; ) s ndependent of problem sze, tradtonal speedup s the same as generalzed speedup. Lemma 2 If the parallel work,, acheves the asymptotc speed, that s = 0 xed-sze tradtonal speedup s the same as the xed-sze generalzed speedup., then the By Lemma 1, f the smple analytc model (4) s used to analyze performance, there s no derence between the tradtonal and the generalzed speedup. If the problem sze s larger than the suggested ntal problem sze 0, then the sngle processor speedup S 1 may not equal to one. S 1 measures the sequental necency due to the derence n memory access. The generalzed speedup s also closely related to the scalablty study. Isospeed scalablty has been proposed recently n [12]. The sospeed scalablty measures the ablty of an algorthmmachne combnaton mantanng the average (unt) speed, where the average speed s dened as the speed over the number of processors. hen the system sze s ncreased, the problem sze s scaled up accordngly to mantan the average speed. If the average speed can be mantaned, we say the algorthm-machne combnaton s scalable and the scalablty s where 0 (p; p 0 ) = p0 p 0 ; (12) s the amount of work needed to mantan the average speed when the system sze has been changed from p to p 0, and s the problem sze solved when p processors were used. By denton Average Speed = p P p c p (; ) : Snce the sequental cost s xed n Eq. (11), xng average speed s equvalent to xng generalzed ecency. Therefore the sospeed scalablty can be seen as the so-generalzed-ecency scalablty. hen the memory nuence s not consedered,.e. c p (s; ) s ndependent of the problem sze, the so-generalzed-ecency wll be the same as the so-tradtonal-ecency. In ths case, the sospeed scalablty s the same as the soecency scalablty proposed by Kumar [13, 2]. Lemma 3 If the sequental cost c p (s; ) s ndependent of problem sze or f the smple analyss model (4) s used for speedup, the soecency and sospeed scalablty are equvalent to each other. The followng theorem gves the relaton between the scalablty and the xed-tme speedup. 8

12 Theorem 1 Scalablty (12) equals one f and only f the xed-tme generalzed speedup s untary. Proof: Let c(s; 0 ); c p (; ),, be as dened n Eq. (7). If scalablty (12) equals 1, let 0, p 0 be as dened n Eq. (12) and dene 0 smlarly as, we have p 0 0 = p ; (13) for any number of processors p and p 0. By denton, generalzed speedup G S p 0 = P p c(s; 0 ) c p 0(; 0 ) : th some arthmetc manpulaton, we have Smlarly, we have 0 = G S p 0 p 0 p 0 p = G S p p By Eq. (13) and the above two equatons, P p 0 0 c p 0(; 0 ) : c(s; 0 ) P p c p (; ) : c(s; 0 ) G S p 0 p 0 P p 0 0 c p 0(; 0 ) c(s; 0 ) = G S p p P p c p (; ) c(s; 0 ) : (14) For xed speed, By equaton (13), 0 P p p c p 0(; 0 ) = Xp 0 0 c p 0(; 0 ) = Substtutng Eq. (15) nto Eq. (14), we have p P p px c p (; ) : c p(; ): (15) G S p 0 p 0 = G S p p : For p = 1, G S p 0 = p 0 G S p : (16) Equaton (16) s the correspondng untary speedup when G S 1 s not equal to one. If the work 9

13 equals 0, then G S 1 = 1 and Eq. (16) becomes G S p 0 = p 0 ; whch s the untary speedup dened n denton 1. If the xed-tme generalzed speedup s untary, then for any number of processors, p and p 0, and the correspondng problem szes, and 0, where 0 xed-tme constrant, we have and Therefore, p P p P p c(s; 0 ) c p (; ) = p; P p 0 0 c(s; 0 ) 0 c p 0(; 0 ) = : p0 The average speed s mantaned. Also snce we have the equalty px c p (; ) = P 0 p p 0 0 c p 0(; 0 ) : c p(; ) = Xp 0 0 p = 0 p 0 : 0 s the scaled problem sze under the c p 0(; 0 ); The scalablty (12) equals one. 2 The followng theorem gves the relaton between memory-bounded speedup and xed-tme speedup. The theorem s for generalzed speedup. However, based on Lemma 1, the result s true for tradtonal speedup when unprocessor cost s xed or the smple analyss model s used. Theorem 2 If problem sze ncreases proportonally to the number of processors n memorybounded scaleup, then memory-bounded generalzed speedup s lnear f and only f xed-tme generalzed speedup s lnear. Proof: Let c(s; 0 ); c p (; ), and be as dened n Theorem 1. Let 0 ; be the scaled problem sze of xed-tme and memory-bounded scaleup respectvely, and 0 and accordngly. If memory-bounded speedup s lnear, we have P p c(s; 0 ) c p (; ) = a p; 10 be dened

14 and P p 0 c(s; 0 ) c p 0(; ) = a ; p0 for some constant a > 0. Combne the two equatons, we have the equaton By assumpton, p P p c p (; ) = P p p 0 0 c p 0(; ) : (17) s proportonal to the number of processors avalable, = p0 p : (18) Substtutng Eq. (18) nto Eq. (17), we get the xed-tme equalty: Xp 0 c p 0(; ) = That s 0 =, and the xed-tme generalzed speedup s lnear. px c p(; ): (19) If xed-tme speedup s lnear, then, followng smlar deductons as used for Eq. (17), we have p P p c p (; ) = P 0 p p 0 0 c p 0(; 0 ) : (20) Applyng the xed-tme equalty Eq. (19) to Eq. (20), we have the reduced equaton 0 0 = p0 p : (21) th the assumpton Eq. (18), Eq. (21) leads to = 0 ; and memory-bounded generalzed speedup s lnear. 2 The assumpton of Theorem 2 s problem sze (work) ncreases proportonally to the number of processors. The assumpton s true for many applcatons. However, t s not true for dense matrx computaton where the memory requrement s a square functon of the order of the matrx and the computaton s a cubc functon of the order of the matrx. For ths knd of computatonal ntensve applcatons, n general, memory-bounded speedup wll lead to a large speedup. The followng corollares are drect results of Theorem 1 and Theorem 2. Corollary 1 If problem sze ncreases proportonally to the number of processors n memorybounded scaleup, then memory-bounded generalzed speedup s untary f and only f xed-tme 11

15 generalzed speedup s untary. Corollary 2 If work ncreases proportonally wth the number of processors, then scalablty (12) equals one f and only f the memory-bounded generalzed speedup s untary. Snce unprocessor cost vares on shared vrtual memory machnes, the above theoretcal results are not applcable to tradtonal speedup on shared vrtual memory machnes. Fnally, to complete our dscusson on the superlnear speedup, there s a new cause of superlnearty for generalzed speedup. The new source of superlnear speedup s called prole shftng [11], and s due to the problem sze derence between sequental and parallel processng (see Fgure 1). An applcaton may contan derent work types. hle problem sze ncreases, some work types may ncrease faster than the others. hen the work types wth lower costs ncrease faster, superlnear speedup may occur. A superlnear speedup due to prole shftng was studed n [11]. 4 Expermental Results In ths secton, we dscuss the tmng results for solvng a scentc applcaton on KSR-1 parallel computers. e rst gve a bref descrpton of the archtecture and the applcaton, and then present the tmng results and analyses. 4.1 The Machne The KSR-1 computer dscussed here s a representatve of parallel computers wth shared vrtual memory. Fgure 3 shows the archtecture of the KSR-1 parallel computer [14]. Each processor on the KSR-1 has 32 Mbytes of local memory. The CPU s a super-scalar processor wth a peak performance of 40 Mops n double precson. Processors are organzed nto derent rngs. The local rng (rng:0) can connect up to 32 processors, and a hgher level rng of rngs (rng:1) can contan up to 34 local rngs wth a maxmum of 1088 processors. If a non-local data element s needed, the local search engne (SE:0) wll search the processors n the local rng (rng:0). If the search engne SE:0 can not locate the data element wthn the local rng, the request wll be passed to the search engne at the next level (SE:1) to locate the data. Ths s done automatcally by a herarchy of search engnes connected n a fat-tree-lke structure [14, 15]. The memory herarchy of KSR-1 s shown n Fg. 4. Each processor has 512 Kbytes of fast subcache whch s smlar to the normal cache on other parallel computers. Ths subcache s dvded nto two equal parts: an nstructon subcache and a data subcache. The 32 Mbytes of local memory on each processor s called a local cache. A local rng (rng:0) wth up to 32 processors can have 1 Gbytes total of local cache whch s called Group:0 cache. Access to the Group:0 cache s provded by Search Engne:0. Fnally, a hgher level rng 12

16 rng:0 rng:1 connectng up to 34 rng:0 s rng:0 P M rng:0 connectng up to 32 processers M P P M Fgure 3. Conguraton of KSR-1 parallel computers. P : processor M : 32 Mbytes of local memory Processor 512 KB Subcache 32 MB Local Cache 1GB Group:0 Cache 34 GB Group:1 Cache Search Engne:0 Search Engne:1 Fgure 4. Memory herarchy of KSR-1. 13

17 of rngs (rng:1) connects up to 34 local rngs wth 34 Gbytes of total local cache whch s called Group:1 cache. Access to the Group:1 cache s provded by Search Engne:1. The entre memory herarchy s called ALLCACHE memory by the Kendall Square Research. Access by a processor to the ALLCACHE memory system s accomplshed by gong through derent Search Engnes as shown n Fg. 4. The latences for derent memory locatons [16] are: 2 cycles for subcache, 20 cycles for local cache, 150 cycles for Group:0 cache, and 570 cycles for Group:1 cache. 4.2 The Applcaton Regularzed least squares problems (RLSP) [17] are frequently encountered n scentc and engneerng applcatons [18]. The major work s to solve the equaton (A T A + I)x = A T b (22) by orthogonal factorzaton schemes (Householder Transformatons and Gvens rotatons). Ecent Householder algorthms have been dscussed n [19] for shared memory supercomputers, and n [20] for dstrbuted memory parallel computers. or Note that Eq. (22) can also be wrtten as 0 (A T ; p A p I 1 A x = (A T ; p I) B T Bx = B T b 0 b 0 1 A (23) 1 A ; (24) so that the major task s to carry out the QR factorzaton for matrx B whch s nether a complete full matrx nor a sparse matrx. The upper part s full and the lower part s sparse (n dagonal form). Because of the specal structure n B, not all elements n the matrx are aected n a partcular step. Only a submatrx of B wll be transformed n each step. If the columns of the submatrx B at step are denoted by B = [b b +1 b n], then the Householder Transformaton can be descrbed as: 14

18 Householder Transformaton Intalze matrx B for = 1, n end for 1: =?sgn(a () )(bt b )1=2 2: w = b? e 1 3: j = w T b j (2? a () ); 4: b j = b j? jw ; j = + 1; n j = + 1; ; n The calculaton of j 's and updatng of b j 's can be done n parallel for derent ndex j. 4.3 Tmng Results The numercal experments reported here were conducted on the KSR-1 parallel computer nstalled at the Cornell Theory Center. There are 128 processors altogether on the machne. Durng the perod when our experments were performed, however the computer was congured as two standalone machnes wth 64 processors each. Therefore, the numercal results were obtaned usng less than 64 processors. Fgure 5 shows the tradtonal xed-sze speedup curves obtaned by solvng the regularzed least squares problem wth derent matrx szes n. The matrx s of dmensons 2n n. e can see clearly that as the matrx sze n ncreases, the speedup s gettng better and better. For the case when n = 2048, the speedup s 76 on 56 processors. Although t s well known that on most parallel computers, the speedup mproves as the problem sze ncreases, what s shown n Fg. 5 s certanly too good to be a reasonable measurement of the real performance of the KSR-1. The problem wth the tradtonal speedup s that t s dened as the rato of the sequental tme to the parallel tme used for solvng the same xed-sze problem. The complex memory herarchy on the KSR-1 makes the computatonal speed of a sngle processor hghly dependent on the problem sze. hen the problem s so bg that not all data of the matrx can be put n the local memory (32 Mbytes) of the sngle computng processor, part of the data must be put n the local memory of other processors on the system. These data are accessed by the computng processor through Search Engne:0. As a result, the computatonal speed on a sngle processor slows down sgncantly due to the hgh latency of Group:0 cache. The sustaned computatonal speed on a sngle processor s 5.5 Mops, 4.5 Mops and 2.7 Mops for problem szes 1024, 1600 and 2048 respectvely. On the other hand, wth multple processors, most of the data needed are n the local memory of each processor, so the computatonal speed suers less from the hgh Group:0 cache 15

19 Speedup Ideal Speedup n = 1024 n = 1600 n = Number of Processors Fgure 5. Fxed-sze (Tradtonal) Speedup on KSR-1 latency. Therefore, the excellent speedups shown n Fg. 5 are the results of sgncant unprocessor performance degradaton when a large problem s solved on a sngle processor. Fgure 6 shows the measured sngle processor speed as a functon of problem sze n. The Householder Transformaton algorthm gven before was mplemented n KSR Fortran. The algorthm has a numercal complexty of = 2n 3 + 8:5n :5n, and the speed s calculated usng s = =t where t s the CPU tme used to nsh the computaton. As can be seen from Fg. 6, the three segments represent sgncantly derent speeds for derent matrx szes. hen the whole matrx can be t nto the subcache, the performance s close to 7 Mops. The speed decreases to around 5.5 Mops when the matrx can not be t nto the subcache, but stll can be accommodated n the local cache. Note, however, when the matrx s so bg that access to Group:0 cache through Search Engne:0 s needed, the performance degrades sgncantly and there s no clear stable performance level as can be observed n the other two segments. Ths s largely due to the hgh Group:0 cache latency and the contenton for the Search Engne whch s used by all processors on the machne. Therefore, the access tme of Group:0 cache s less unform as compared to that of the subcache and local cache. To take the derence of sngle processng speeds for derent problem szes nto consderaton, we have to use the generalzed speedup to measure the performance of multple processors on the KSR-1. As can be seen from the denton of Eq. (6), the generalzed speedup s dened as the rato of the parallel speed to the asymptotc sequental speed, where the parallel speed s based on a scaled problem. In our numercal tests, the parallel problem was scaled n a memory- 16

20 Speed Subcache Local Cache Group:0 Cache Order of the Matrces Fgure 6. Speed Varaton of Unprocessor Processng on KSR-1 bounded fashon as the number of processors ncreases. The ntal problem was selected based on the asymptotc speed (5.5 Mops from Fg. 6) and then scaled proportonally accordng to the number of processors,.e. wth p processors, the problem s scaled to a sze that wll ll M p Mbytes of memory, where M s the memory requred by the unscaled problem. Fgure 7 shows the comparsons of the tradtonal scaled speedup and the generalzed speedup. For the tradtonal scaled speedup, the scaled problem s solved on both one and p processors, and the value of the speedup s calculated as the rato of the tme of one processor to that of p processors. hle for the generalzed speedup, the scaled problem s solved only on multple processors, not on a sngle processor. The value of the speedup s calculated usng Eq. (6), where the asymptotc speed s used for the sequental speed. It s clear that Fg. 7 shows that the generalzed speedup gves much more reasonable performance measurement on KSR-1 than does the tradtonal scaled speedup. th the tradtonal scaled speedup, the speedup s above 20 wth only 10 processors. Ths excellent superlnear speedup s a result of the severely degraded sngle processors speed, rather than the perfect scalablty of the machne and the algorthm. Fnally, table 1 gves the measured sospeed scalablty (see Eq. (12)) of solvng the regularzed least squares problem on a KSR-1 computer. The speed to be mantaned on derent number of processors s 3.25 Mops, whch s 60% of the asymptotc speed of 5.5 Mops. The sze of the 2nn matrx s ncreased as the number of processors ncreases. It starts as n = 27 on one processor and ncreases to n = 2773 on 56 processors. One may notce that (2; 4) > (1; 2) n table 1, whch means that the machne-algorthm par scales better from 2 processors to 4 processors than t does 17

21 20 16 Ideal Speedup Generalzed Speedup Tradtonal Speedup 12 Speedup Number of Processors Fgure 7. Comparson of Generalzed and Tradtonal Speedup on KSR-1 from one processor to two processors. Ths can be explaned by the fact that on one processor, the matrx s small enough that all data can be accommodated n the subcache. Once all the data s loaded nto the subcache, the whole computaton process does not need data from local cache and Group:0 cache. Therefore, the data access tme on one processor s sgncantly shorter than that on two processors whch nvolves subcache, local cache and Group:0 cache to pass messages. As a result, sgncant ncrease n the work s necessary n the case of two processors to oset the extra data access tme nvolvng derent memory herarches. Ths s the major reason for the low (1; 2) value. hen the number of processors ncreases from 2 to 4, the data access pattern s the same for both cases wth subcache, local cache and Group:0 cache all nvolved, so that the work does not need to be ncreased sgncantly to oset the extra communcaton cost when gong from 2 processors to 4 processors. It s nterestng to notce, whle the scalablty of the RLSP-KSR1 combnaton s relatvely low, the data n Table 1 has a smlar decreasng pattern as the measured and computed scalablty of Burg-nCUBE, SLALOM-nCUBE, Burg-MasPar and SLALOM-MasPar combnatons [12]. The scalabltes are all decreasng along columns and have some rregular behavor at (1; 2) and (2; 4). Interested readers may wonder how the measured scalablty s related to the measured generalzed speedup gven n Fg. 7. hle Fg. 7 demonstrates a nearly lnear generalzed speedup, the correspondng scalablty gven n Table 1 s far from deal (the deal scalablty would be unty). The low scalablty s expected. Recall that the scaled speedup gven n Fg. 7 s memory-bounded speedup [6]. That s when the number of processors s doubled, the usage of memory s also doubled. 18

22 (N; N 0 ) Table 1. Measured Scalablty of RLSP-KSR1 combnaton. As a result, the number of elements n the matrx s ncreased by a factor of 2. Corollary 2 shows that f work ncreases lnearly wth the number of processors, then untary memory-bounded speedup wll lead to deal scalablty. For the regularzed least squares applcaton, however, the work s a cubc functon of the matrx sze n. hen the memory usage s doubled, the number of oatng pont operatons s ncreased by a factor of eght. If a perfect generalzed speedup s acheved from p to p 0 = 2p, the average speed at p and p 0 should be the same. By Eq. (12) we have (p; p 0 ) = 2p 8p = 1 4 : th the measured speedup beng a lttle lower than untary as shown n Fg. 7, a less than 0:25 scalablty s expected. Table 1 conrms ths relaton, except at (2; 4) for the reason ponted out earler. The scalablty n the last column s notceably lower than other columns. It s because when 56 nodes are nvolved n computatons, communcaton has to pass through rng:1, whch slows down the communcaton sgncantly. Computaton ntensve applcatons have often been used to acheve hgh ops. The RLSP applcaton s a computaton ntensve applcaton. Table 1 shows that sospeed scalablty does not gve credts for computaton ntensve applcatons. The computaton ntensve applcatons may acheve a hgh speed on multple processors, but the ntal speed s also hgh. The sospeed scalablty measures the ablty to mantan the speed, rather than to acheve a partcular speed. The mplementaton s conducted on a KSR-1 shared vrtual memory machne. The theoretcal and analytcal results gven n Secton 2 and Secton 3, however, are general and can be appled on derent parallel platforms. For nstance, for Intel Paragon parallel computers, where vrtual memory s supported to swap data n and out from memory to dsk, we expect that necent sequental processng wll cause smlar superlnear (tradtonal) speedup as demonstrated on KSR-1. For dstrbuted-memory machnes whch do not support vrtual memory, such as CM-5, tradtonal speedup has another draw back. Due to memory constrant, scaled problems often cannot be solved on a sngle processor. Therefore, scaled speedup s unmeasurable. Denng asymptotc speed 19

23 smlarly as gven n Secton 3, the generalzed speedup can be appled to ths knd of dstrbutedmemory machnes to measure scalable computatons. Generalzed speedup s dened as parallel speed over sequental speed. Gven a reasonable ntal sequental speed, t can be used on any parallel platforms to measure the performance of scalable computatons. 5 Concluson Snce the scaled up prncple was proposed n 1988 by Gustafson and other researchers at Sanda Natonal Laboratory [21], the prncple has been wdely used n performance measurement of parallel algorthms and archtectures. One dculty of measurng scaled speedup s that vary large problems have to be solved on unprocessor, whch s very necent f vrtual memory s supported, or s mpossble otherwse. To overcome ths shortcomng, generalzed speedup was proposed [1]. Generalzed speedup s dened as parallel speed over sequental speed and does not requre solvng large problems on unprocessor. The study [1] emphaszed the xed-tme generalzed speedup, szeup. To meet the need of the emergng shared vrtual memory machnes, the generalzed speedup, partcularly mplementaton ssues, has been carefully studed n the current research. It has shown that tradtonal speedup s a specal case of generalzed speedup, and, on the other hand, generalzed speedup s a reform of tradtonal speedup. The man derence between generalzed speedup and tradtonal speedup s how to dene the unprocessor ecency. hen unprocessor speed s xed these two speedups are the same. Extendng these results to scalablty study, we have found that the derence between sospeed scalablty [12] and soecency scalablty [13] s also due to the unprocessor ecency. hen the unprocessor speed s ndependent of the problem sze, these two proposed scalabltes are the same. As part of the performance study, we have shown that an algorthm-machne combnaton acheves a perfect scalablty f and only f t acheves a perfect speedup. An nterestng relaton between xed-tme and memory-bounded speedups s revealed. Seven causes of superlnear speedup are also lsted. A scentc applcaton has been mplemented on a Kendall Square KSR-1 shared vrtual memory machne. Expermental results show that unprocessor ecency s an mportant ssue for vrtual memory machnes, and that the asymptotc speed provdes a reasonable way to dene the unprocessor ecency. The results n ths paper on shared vrtual memory can be extended to general parallel computers. Snce unprocessor ecency s drectly related to parallel executon tme, scalablty, and benchmark evaluatons, the range of applcablty of the unprocessor ecency study s wder than speedups. The unprocessor ecency mght be explored further n a number of contexts. 20

24 Acknowledgement The authors are grateful to the Cornell Theory Center for provdng access to ts KSR-1 parallel computer, and to the referees for ther helpful comments on the revson of ths paper. References [1] X.-H. Sun and J. Gustafson, \Toward a better parallel performance metrc," Parallel Computng, vol. 17, pp. 1093{1109, Dec [2] K. Hwang, Advanced Computer Archtecture: Parallelsm, Scalablty, Programmablty. McGraw-Hll Book Co., [3] J. Ortega and R. Vogt, \Soluton of partal derental equatons on vector and parallel computers," SIAM Revew, pp. 149{240, June [4] G. Amdahl, \Valdty of the sngle-processor approach to achevng large scale computng capabltes," n Proc. AFIPS Conf., pp. 483{485, [5] J. Gustafson, \Reevaluatng Amdahl's law," Communcatons of the ACM, vol. 31, pp. 532{ 533, May [6] X.-H. Sun and L. N, \Scalable problems and memory-bounded speedup," J. of Parallel and Dstrbuted Computng, vol. 19, pp. 27{37, Sept [7] D. Helmbold and C. McDowell, \Modelng speedup(n) greater than n," IEEE Trans. on Parallel and Dstrbuted Sys., pp. 250{256, Apr [8] D. Parknson, \Parallel ecency can be greater than unty," Parallel Computng, vol. 3, pp. 261{262, [9] D. Ncol, \Inated speedups n parallel smulatons va malloc()," Internatonal Journal on Smulaton, vol. 2, pp. 413{426, Dec [10] X.-H. Sun and J. Zhu, \Performance predcton of scalable computng: A case study," n Proc. of the 28th Hawa Internatonal Conference on System Scences, pp. 456{465, Jan [11] J. Gustafson, D. Rover, S. Elbert, and M. Carter, \The desgn of a scalable, xed-tme computer benchmark," J. of Parallel and Dstrbuted Computng, vol. 12, no. 4, pp. 388{401, [12] X.-H. Sun and D. Rover, \Scalablty of parallel algorthm-machne combnatons," IEEE Transactons on Parallel and Dstrbuted Systems, pp. 599{613, June [13] A. Y. Grama, A. Gupta, and V. Kumar, \Isoecency: Measurng the scalablty of parallel algorthms and archtectures," IEEE Parallel & Dstrbuted Technoloty, vol. 1, pp. 12{21, Aug [14] Kendall Square Research, \KSR parallel programmng." altham, USA, [15] C. Leserson, \Fat-trees: Unversal networks for hardware-ecent supercomputng," IEEE Transactons on Computers, vol. 34, no. 10, pp. 892{901,

25 [16] Kendall Square Research, \KSR techncal summary." altham, USA, [17] A. N. Tkhnov and V. Arsenn, Soluton of Ill-posed Problems. John ley and Sons, [18] Y. M. Chen, J. P. Zhu,. H. Chen, and M. L. asserman, \GPST nverson algorthm for hstory matchng n 3-d 2-phase smulators," n IMACS Trans. on Scentc Computng I, pp. 369{374, [19] J. Dongarra, I. S. Du, D. C. Sorensen, and H. A. van der Vorst, Solvng Lnear Systems on Vector and Shared Memory Computers. Phladelpha: SIAM, [20] A. Pothen and P. Raghavan, \Dstrbuted orthogonal factorzaton: Gvens and Householder algorthms," SIAM J. of Sc. and Stat. Computng, vol. 10, pp. 1113{1135, [21] J. Gustafson, G. Montry, and R. Benner, \Development of parallel methods for a processor hypercube," SIAM J. of Sc. and Stat. Computng, vol. 9, pp. 609{638, July

and NSF Engineering Research Center Abstract Generalized speedup is dened as parallel speed over sequential speed. In this paper

and NSF Engineering Research Center Abstract Generalized speedup is dened as parallel speed over sequential speed. In this paper Shared Vrtual Memory and Generalzed Speedup Xan-He Sun Janpng Zhu ICASE NSF Engneerng Research Center Mal Stop 132C Dept. of Math. and Stat. NASA Langley Research Center Msssspp State Unversty Hampton,

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Virtual Machine Migration based on Trust Measurement of Computer Node

Virtual Machine Migration based on Trust Measurement of Computer Node Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on

More information

USING GRAPHING SKILLS

USING GRAPHING SKILLS Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp

More information

AADL : about scheduling analysis

AADL : about scheduling analysis AADL : about schedulng analyss Schedulng analyss, what s t? Embedded real-tme crtcal systems have temporal constrants to meet (e.g. deadlne). Many systems are bult wth operatng systems provdng multtaskng

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

A One-Sided Jacobi Algorithm for the Symmetric Eigenvalue Problem

A One-Sided Jacobi Algorithm for the Symmetric Eigenvalue Problem P-Q- A One-Sded Jacob Algorthm for the Symmetrc Egenvalue Problem B. B. Zhou, R. P. Brent E-mal: bng,rpb@cslab.anu.edu.au Computer Scences Laboratory The Australan Natonal Unversty Canberra, ACT 000, Australa

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc.

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 97-735 Volume Issue 9 BoTechnology An Indan Journal FULL PAPER BTAIJ, (9), [333-3] Matlab mult-dmensonal model-based - 3 Chnese football assocaton super league

More information

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface.

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface. IDC Herzlya Shmon Schocken Assembler Shmon Schocken Sprng 2005 Elements of Computng Systems 1 Assembler (Ch. 6) Where we are at: Human Thought Abstract desgn Chapters 9, 12 abstract nterface H.L. Language

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Abstract Ths paper ponts out an mportant source of necency n Smola and Scholkopf's Sequental Mnmal Optmzaton (SMO) algorthm for SVM regresson that s c

Abstract Ths paper ponts out an mportant source of necency n Smola and Scholkopf's Sequental Mnmal Optmzaton (SMO) algorthm for SVM regresson that s c Improvements to SMO Algorthm for SVM Regresson 1 S.K. Shevade S.S. Keerth C. Bhattacharyya & K.R.K. Murthy shrsh@csa.sc.ernet.n mpessk@guppy.mpe.nus.edu.sg cbchru@csa.sc.ernet.n murthy@csa.sc.ernet.n 1

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6)

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6) Harvard Unversty CS 101 Fall 2005, Shmon Schocken Assembler Elements of Computng Systems 1 Assembler (Ch. 6) Why care about assemblers? Because Assemblers employ some nfty trcks Assemblers are the frst

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

3D vector computer graphics

3D vector computer graphics 3D vector computer graphcs Paolo Varagnolo: freelance engneer Padova Aprl 2016 Prvate Practce ----------------------------------- 1. Introducton Vector 3D model representaton n computer graphcs requres

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to:

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to: 4.1 4.2 Motvaton EE 457 Unt 4 Computer System Performance An ndvdual user wants to: Mnmze sngle program executon tme A datacenter owner wants to: Maxmze number of Mnmze ( ) http://e-tellgentnternetmarketng.com/webste/frustrated-computer-user-2/

More information

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

an assocated logc allows the proof of safety and lveness propertes. The Unty model nvolves on the one hand a programmng language and, on the other han

an assocated logc allows the proof of safety and lveness propertes. The Unty model nvolves on the one hand a programmng language and, on the other han UNITY as a Tool for Desgn and Valdaton of a Data Replcaton System Phlppe Quennec Gerard Padou CENA IRIT-ENSEEIHT y Nnth Internatonal Conference on Systems Engneerng Unversty of Nevada, Las Vegas { 14-16

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Computer models of motion: Iterative calculations

Computer models of motion: Iterative calculations Computer models o moton: Iteratve calculatons OBJECTIVES In ths actvty you wll learn how to: Create 3D box objects Update the poston o an object teratvely (repeatedly) to anmate ts moton Update the momentum

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

ELEC 377 Operating Systems. Week 6 Class 3

ELEC 377 Operating Systems. Week 6 Class 3 ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems

More information

Channel 0. Channel 1 Channel 2. Channel 3 Channel 4. Channel 5 Channel 6 Channel 7

Channel 0. Channel 1 Channel 2. Channel 3 Channel 4. Channel 5 Channel 6 Channel 7 Optmzed Regonal Cachng for On-Demand Data Delvery Derek L. Eager Mchael C. Ferrs Mary K. Vernon Unversty of Saskatchewan Unversty of Wsconsn Madson Saskatoon, SK Canada S7N 5A9 Madson, WI 5376 eager@cs.usask.ca

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Maintaining temporal validity of real-time data on non-continuously executing resources

Maintaining temporal validity of real-time data on non-continuously executing resources Mantanng temporal valdty of real-tme data on non-contnuously executng resources Tan Ba, Hong Lu and Juan Yang Hunan Insttute of Scence and Technology, College of Computer Scence, 44, Yueyang, Chna Wuhan

More information

Solitary and Traveling Wave Solutions to a Model. of Long Range Diffusion Involving Flux with. Stability Analysis

Solitary and Traveling Wave Solutions to a Model. of Long Range Diffusion Involving Flux with. Stability Analysis Internatonal Mathematcal Forum, Vol. 6,, no. 7, 8 Soltary and Travelng Wave Solutons to a Model of Long Range ffuson Involvng Flux wth Stablty Analyss Manar A. Al-Qudah Math epartment, Rabgh Faculty of

More information

Preconditioning Parallel Sparse Iterative Solvers for Circuit Simulation

Preconditioning Parallel Sparse Iterative Solvers for Circuit Simulation Precondtonng Parallel Sparse Iteratve Solvers for Crcut Smulaton A. Basermann, U. Jaekel, and K. Hachya 1 Introducton One mportant mathematcal problem n smulaton of large electrcal crcuts s the soluton

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach Data Representaton n Dgtal Desgn, a Sngle Converson Equaton and a Formal Languages Approach Hassan Farhat Unversty of Nebraska at Omaha Abstract- In the study of data representaton n dgtal desgn and computer

More information

Assembler. Building a Modern Computer From First Principles.

Assembler. Building a Modern Computer From First Principles. Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Video Proxy System for a Large-scale VOD System (DINA)

Video Proxy System for a Large-scale VOD System (DINA) Vdeo Proxy System for a Large-scale VOD System (DINA) KWUN-CHUNG CHAN #, KWOK-WAI CHEUNG *# #Department of Informaton Engneerng *Centre of Innovaton and Technology The Chnese Unversty of Hong Kong SHATIN,

More information

Efficient Distributed File System (EDFS)

Efficient Distributed File System (EDFS) Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate

More information

Scheduling Remote Access to Scientific Instruments in Cyberinfrastructure for Education and Research

Scheduling Remote Access to Scientific Instruments in Cyberinfrastructure for Education and Research Schedulng Remote Access to Scentfc Instruments n Cybernfrastructure for Educaton and Research Je Yn 1, Junwe Cao 2,3,*, Yuexuan Wang 4, Lanchen Lu 1,3 and Cheng Wu 1,3 1 Natonal CIMS Engneerng and Research

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements Module 3: Element Propertes Lecture : Lagrange and Serendpty Elements 5 In last lecture note, the nterpolaton functons are derved on the bass of assumed polynomal from Pascal s trangle for the fled varable.

More information

Evaluation of an Enhanced Scheme for High-level Nested Network Mobility

Evaluation of an Enhanced Scheme for High-level Nested Network Mobility IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.15 No.10, October 2015 1 Evaluaton of an Enhanced Scheme for Hgh-level Nested Network Moblty Mohammed Babker Al Mohammed, Asha Hassan.

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Simulation Based Analysis of FAST TCP using OMNET++

Simulation Based Analysis of FAST TCP using OMNET++ Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months

More information

An Accurate Evaluation of Integrals in Convex and Non convex Polygonal Domain by Twelve Node Quadrilateral Finite Element Method

An Accurate Evaluation of Integrals in Convex and Non convex Polygonal Domain by Twelve Node Quadrilateral Finite Element Method Internatonal Journal of Computatonal and Appled Mathematcs. ISSN 89-4966 Volume, Number (07), pp. 33-4 Research Inda Publcatons http://www.rpublcaton.com An Accurate Evaluaton of Integrals n Convex and

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to

More information

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss. Today s Outlne Sortng Chapter 7 n Wess CSE 26 Data Structures Ruth Anderson Announcements Wrtten Homework #6 due Frday 2/26 at the begnnng of lecture Proect Code due Mon March 1 by 11pm Today s Topcs:

More information

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation Intellgent Informaton Management, 013, 5, 191-195 Publshed Onlne November 013 (http://www.scrp.org/journal/m) http://dx.do.org/10.36/m.013.5601 Qualty Improvement Algorthm for Tetrahedral Mesh Based on

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Speedup of Type-1 Fuzzy Logic Systems on Graphics Processing Units Using CUDA

Speedup of Type-1 Fuzzy Logic Systems on Graphics Processing Units Using CUDA Speedup of Type-1 Fuzzy Logc Systems on Graphcs Processng Unts Usng CUDA Durlabh Chauhan 1, Satvr Sngh 2, Sarabjeet Sngh 3 and Vjay Kumar Banga 4 1,2 Department of Electroncs & Communcaton Engneerng, SBS

More information

Fast Computation of Shortest Path for Visiting Segments in the Plane

Fast Computation of Shortest Path for Visiting Segments in the Plane Send Orders for Reprnts to reprnts@benthamscence.ae 4 The Open Cybernetcs & Systemcs Journal, 04, 8, 4-9 Open Access Fast Computaton of Shortest Path for Vstng Segments n the Plane Ljuan Wang,, Bo Jang

More information

A SYSTOLIC APPROACH TO LOOP PARTITIONING AND MAPPING INTO FIXED SIZE DISTRIBUTED MEMORY ARCHITECTURES

A SYSTOLIC APPROACH TO LOOP PARTITIONING AND MAPPING INTO FIXED SIZE DISTRIBUTED MEMORY ARCHITECTURES A SYSOLIC APPROACH O LOOP PARIIONING AND MAPPING INO FIXED SIZE DISRIBUED MEMORY ARCHIECURES Ioanns Drosts, Nektaros Kozrs, George Papakonstantnou and Panayots sanakas Natonal echncal Unversty of Athens

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introducton 1.1 Parallel Processng There s a contnual demand for greater computatonal speed from a computer system than s currently possble (.e. sequental systems). Areas need great computatonal

More information

Intra-procedural Inference of Static Types for Java Bytecode 1

Intra-procedural Inference of Static Types for Java Bytecode 1 McGll Unversty School of Computer Scence Sable Research Group Intra-procedural Inference of Statc Types for Java Bytecode 1 Sable Techncal Report No. 5 Etenne Gagnon Laure Hendren October 14, 1998 w w

More information

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated. Some Advanced SP Tools 1. umulatve Sum ontrol (usum) hart For the data shown n Table 9-1, the x chart can be generated. However, the shft taken place at sample #21 s not apparent. 92 For ths set samples,

More information

A Facet Generation Procedure. for solving 0/1 integer programs

A Facet Generation Procedure. for solving 0/1 integer programs A Facet Generaton Procedure for solvng 0/ nteger programs by Gyana R. Parja IBM Corporaton, Poughkeepse, NY 260 Radu Gaddov Emery Worldwde Arlnes, Vandala, Oho 45377 and Wlbert E. Wlhelm Teas A&M Unversty,

More information

Chapter 1. Comparison of an O(N ) and an O(N log N ) N -body solver. Abstract

Chapter 1. Comparison of an O(N ) and an O(N log N ) N -body solver. Abstract Chapter 1 Comparson of an O(N ) and an O(N log N ) N -body solver Gavn J. Prngle Abstract In ths paper we compare the performance characterstcs of two 3-dmensonal herarchcal N-body solvers an O(N) and

More information

Constructing Minimum Connected Dominating Set: Algorithmic approach

Constructing Minimum Connected Dominating Set: Algorithmic approach Constructng Mnmum Connected Domnatng Set: Algorthmc approach G.N. Puroht and Usha Sharma Centre for Mathematcal Scences, Banasthal Unversty, Rajasthan 304022 usha.sharma94@yahoo.com Abstract: Connected

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces Range mages For many structured lght scanners, the range data forms a hghly regular pattern known as a range mage. he samplng pattern s determned by the specfc scanner. Range mage regstraton 1 Examples

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Positive Semi-definite Programming Localization in Wireless Sensor Networks

Positive Semi-definite Programming Localization in Wireless Sensor Networks Postve Sem-defnte Programmng Localzaton n Wreless Sensor etworks Shengdong Xe 1,, Jn Wang, Aqun Hu 1, Yunl Gu, Jang Xu, 1 School of Informaton Scence and Engneerng, Southeast Unversty, 10096, anjng Computer

More information

The Shortest Path of Touring Lines given in the Plane

The Shortest Path of Touring Lines given in the Plane Send Orders for Reprnts to reprnts@benthamscence.ae 262 The Open Cybernetcs & Systemcs Journal, 2015, 9, 262-267 The Shortest Path of Tourng Lnes gven n the Plane Open Access Ljuan Wang 1,2, Dandan He

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Load-Balanced Anycast Routing

Load-Balanced Anycast Routing Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Communication-Minimal Partitioning and Data Alignment for Af"ne Nested Loops

Communication-Minimal Partitioning and Data Alignment for Afne Nested Loops Communcaton-Mnmal Parttonng and Data Algnment for Af"ne Nested Loops HYUK-JAE LEE 1 AND JOSÉ A. B. FORTES 2 1 Department of Computer Scence, Lousana Tech Unversty, Ruston, LA 71272, USA 2 School of Electrcal

More information

Loop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation

Loop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation Loop Transformatons for Parallelsm & Localty Last week Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Scalar expanson for removng false dependences Loop nterchange Loop

More information

Finite Element Analysis of Rubber Sealing Ring Resilience Behavior Qu Jia 1,a, Chen Geng 1,b and Yang Yuwei 2,c

Finite Element Analysis of Rubber Sealing Ring Resilience Behavior Qu Jia 1,a, Chen Geng 1,b and Yang Yuwei 2,c Advanced Materals Research Onlne: 03-06-3 ISSN: 66-8985, Vol. 705, pp 40-44 do:0.408/www.scentfc.net/amr.705.40 03 Trans Tech Publcatons, Swtzerland Fnte Element Analyss of Rubber Sealng Rng Reslence Behavor

More information