and NSF Engineering Research Center Abstract Generalized speedup is dened as parallel speed over sequential speed. In this paper

Size: px
Start display at page:

Download "and NSF Engineering Research Center Abstract Generalized speedup is dened as parallel speed over sequential speed. In this paper"

Transcription

1 Shared Vrtual Memory and Generalzed Speedup Xan-He Sun Janpng Zhu ICASE NSF Engneerng Research Center Mal Stop 132C Dept. of Math. and Stat. NASA Langley Research Center Msssspp State Unversty Hampton, VA Msssspp State, MS Abstract Generalzed speedup s dened as parallel speed over sequental speed. In ths paper the generalzed speedup and ts relaton wth other exstng performance metrcs, such as tradtonal speedup, ecency, scalablty, etc., are carefully studed. In terms of the ntroduced asymptotc speed, we show that the derence between the generalzed speedup and the tradtonal speedup les n the denton of the ecency of unprocessor processng, whch s a very mportant ssue n shared vrtual memory machnes. A scentc applcaton has been mplemented on a KSR-1 parallel computer. Expermental and theoretcal results show that the generalzed speedup s dstnct from the tradtonal speedup and provdes a more reasonable measurement. In the study of derent speedups, varous causes of superlnear speedup are also presented. Ths research was supported by the Natonal Aeronautcs and Space Admnstraton under NASA contract NAS whle the rst author was n resdence at the Insttute for Computer Applcatons n Scence and Engneerng (ICASE), NASA Langley Research Center, Hampton, VA

2 1 Introducton In recent years parallel processng has enjoyed unprecedented attenton from researchers, government agences, and ndustres. Ths attenton s manly due to the fact that, wth the current crcut technology, parallel processng seems to be the only remanng waytoacheve hgher performance. However, whle varous parallel computers and algorthms have been developed, ther performance evaluaton s stll elusve. In fact, the more advanced the hardware and software, the more dcult t s to evaluate the parallel performance. In ths paper we target recent development of shared vrtual memory machnes and revst the generalzed speedup [17] performance metrc. Dstrbuted-memory parallel computers domnate today's parallel computng arena. These machnes, such as the Kendall Square KSR-1, Intel Paragon, and TMC CM-5, have successfully delvered hgh performance computng power for solvng certan of the so-called \grand-challenge" problems. From the vewpont of processes, there are two basc process synchronzaton and communcaton models. One s the shared-memory model n whch processes communcate through shared varables. The other s the message-passng model n whch processes communcate through explct message passng. The shared-memory model provdes a sequental program paradgm. th shared vrtual address space, the shared-memory model supports shared vrtual memory, but requres sophstcated hardware and system support. An example of a dstrbuted-memory machne whch supports shared vrtual address space s the Kendall Square KSR-1. Tradtonally, the message-passng model s bounded by the local memory of the processng processors. th recent technology advancement, the message-passng model has extended the ablty to support shared vrtual memory. Shared vrtual memory smples the software development and portng process by enablng even extremely large programs to run on a sngle processor before beng parttoned and dstrbuted across multple processors. However, the memory access of the shared vrtual memory s non-unform [8]. The access tme of local memory and remote memory s derent. Runnng a large program on a small number of processors s possble but could be very necent. The necent sequental processng wll lead to a msleadng hgh performance n terms of speedup or ecency. Generalzed speedup, dened as parallel speed over sequental speed, s a new performance metrc proposed n [17]. In ths paper, we revst generalzed speedup and address the measurement ssues. Through both theoretcal proofs and expermental results, we show that generalzed speedup provdes a more reasonable measurement than tradtonal speedup. In the process of studyng generalzed speedup, the relaton between the generalzed speedup and many other metrcs, such as ecency, scaled speedup, scalablty, are also studed. Varous reasons for superlnearty n derent speedups are also dscussed. Results show that the man derence between the tradtonal speedup and the generalzed speedup s how toevaluate the ecency of the sequental processng on a sngle processor. 1

3 The paper s organzed as follows. In secton 2 we study tradtonal speedup, ncludng the scaled speedup concept, and ntroduce some termnology. Analyss shows that the tradtonal speedup, xed-sze or scaled sze, may acheve superlnearty on shared vrtual memory machnes. Furthermore, wth the tradtonal speedup metrc, the slower the remote memory access s, the larger the speedup. Generalzed speedup s studed n Secton 3. The term asymptotc speed s ntroduced for the measurement of generalzed speedup. Analyss shows the derences and the smlartes between the generalzed speedup and the tradtonal speedup. Ecency and scalablty ssues are also dscussed. Expermental results of a producton applcaton on a Kendall Square KSR-1 parallel computer are gven n Secton 4. Secton 5 contans a summary. 2 The Tradtonal Speedup One of the best accepted and the most frequently used performance metrcs n parallel processng s speedup. It measures the parallel processng gan over sequental processng and s dened as sequental executon tme over parallel executon tme. Parallel algorthms often explot parallelsm by sacrcng mathematcal ecency. To measure the true parallel processng gan, the sequental executon tme should be based on a commonly used sequental algorthm. To dstngush t from other nterpretatons of speedup, the speedup measured wth a commonly used sequental algorthm has been called absolute speedup [14]. Absolute speedup s an mportant metrc, especally when new parallel algorthms are ntroduced. Another wdely used nterpretaton s the relatve speedup [14], whch uses the unprocessor executon tme of the parallel algorthm as the sequental tme. There are several reasons to use the relatve speedup. Frst, the performance of an algorthm vares wth the number of processors. Relatve speedup measures the varaton. Second, relatve speedup avods the dculty ofchoosng the practcal sequental algorthm, mplementng the sequental algorthm, and matchng the mplementaton/programmng skll between the sequental algorthm and the parallel algorthm. Also, when problem sze s xed, the tme rato of the chosen sequental algorthm and the unprocessor executon of the parallel algorthm s xed. Therefore, the relatve speedup s proportonal to the absolute speedup. Relatve speedup s the speedup commonly used n performance study. The well known Amdahl's law [1] and Gustafson's scaled speedup [4] are both based on relatve speedup. In ths study we wll focus on relatve speedup and reserve the terms tradtonal speedup and speedup for relatve speedup. The concepts and results of ths study can be extended to absolute speedup. The absolute speedup and the relatve speedup are dstngushed by the sequental algorthm. After a sequental algorthm s chosen, from the problem sze pont of vew, speedup can be further dvded nto the xed-sze speedup and the scaled speedup. Fxed-sze speedup emphaszes how much executon tme can be reduced wth parallel processng. Amdahl's law s based on the xed-sze speedup. th one parameter, the sequental processng rato, Amdahl's law gves the lmtaton of 2

4 the xed-sze speedup. The scaled speedup s concentrated on explorng the computatonal power of parallel computers for solvng otherwse ntractable large problems. Dependng on the scalng restrctons of the problem sze, the scaled speedup can be classed as the xed-tme speedup and the memory-bounded speedup [18]. hen p processors are used, xed-tme speedup scales problem sze to meet the xed executon tme. Then the scaled problem s also solved on an unprocessor to get the speedup. Correspondng to Amdahl's law, Gustafson has gven a smple xed-tme speedup formula [5]. The memory-bounded speedup [18] s another practcally used scaled speedup. It s dened n a smlar way to the xed-tme speedup. The derence s that n memory-bounded speedup the problem sze s scaled based on the avalable memory, whle n xed-tme speedup the problem sze s scaled up to meet the xed executon tme. A detaled study of the memory-bounded speedup can be found n [18]. Speedup can also be classed based on the acheved performance. Let p and S p be the number of processors and the speedup wth p processors. The followng terms were used n [7]. Denton 1 Super-lnear speedup: lm p!1 S p p = 1 Lnear super-untary speedup: p<s p <cpfor some constant c>1. Untary speedup: S p = p. Lnear sub-untary speedup: p<s p <pfor some postve constant <1. Sub-lnear speedup: lm p!1 S p p =. e say a speedup s a superlnear speedup f t s ether super-lnear or lnear super-untary. It s debatable f any machne-algorthm par can acheve \truly" superlnear speedup. Four possble causes of superlnear speedup gven n [7] are lsted n Fg cache sze ncreased n parallel processng 2. overhead reduced n parallel processng 3. latency hdden n parallel processng 4. Randomzed algorthms Fgure 1. Causes of Superlnear Speedup: part 1 Cause 2 n Fg. 1 can be consdered theoretcally [15], there s no measured superlnear speedup ever attrbuted to t. Cause 3 does not exst for relatve speedup snce both the sequental and 3

5 parallel executon use the same algorthm. Cause 1 s unlkely applcable for scaled speedup, snce when problem sze scales up, by memory or by tme constrant, the cache ht rato s unlkely to ncrease. Two other causes of superlnear relatve speedup and scaled speedup are lsted n Fg mathematcal necency of the seral algorthm 6. hgher memory access latency n the sequental processng Fgure 2. Causes of Superlnear Speedup: part 2 Snce parallel algorthms are often mathematcally necent, cause 5 s a lkely source of superlnear speedup of relatve speedup. A good example of superlnear speedup based on 5 can be found n [13]. th the vrtual memory and shared vrtual memory archtecture, cause 6 can lead to an extremely hgh speedup, especally for scaled speedup where an extremely large problem has to be run on a sngle processor. Fgure 7 shows a measured superlnear speedup on a KSR-1 machne. The measured superlnear speedup s due to the nherent decency of the tradtonal speedup metrc. To analyze the decency of the tradtonal speedup, we need to ntroduce the followng denton. Denton 2 The cost of parallelsm s the rato of the total number of processor cycles consumed n order to perform one unt operaton of work when processors are actve to the machne clock rate. The sequental executon tme can be wrtten n terms of work: Sequental executon tme = Amount of work Processor cycles per unt of work : (1) Machne clock rate The rato n the rght hand sde of Eq. (1), processor cycles per unt of work over machne clock rate, s the cost of sequental processng. ork can be dened as arthmetc operatons, nstructons, transtons, or whatever s needed to complete the applcaton. In scentc computng the number of oatng-pont operatons (FLOPS) s commonly used to measure work. In general, work may be of derent types, and unts of derent operatons may requre derent numbers of nstructon cycles to nsh. (For example, the tmes consumed by one dvson and one multplcaton may be derent dependng on the underlyng machne, and operaton and memory reference rato may be derent for derent computatons.) The nuence of work type on the performance s one of the topcs studed n [17]. In ths paper, 4

6 we study the nuence of necent memory access on the performance. e assume that there s only one work type and that any ncrease n the number of processor cycles s due to necent memory access. In a shared vrtual memory envronment, the memory avalable depends on the system sze. Let be the amount ofwork executed when processors are actve, and let = P p =1 represent the total work. The cost of parallelsm n a p processor system, denoted as c p (; ), s the elapsed tme for one unt operaton of work when processors are actve. Then, c p (; ) gves the accumulated elapsed tme where processors are actve. c p (; ) contans both computaton tme and remote memory access tme. The unprocessor executon tme can be represented n terms of unprocessor cost. t(1) = px =1 c p (s; ); where c p (s; ) s the cost of sequental processng on a parallel system wth p processors. It s derent from c p (1;) whch s the cost of the sequental porton of the parallel processng. Parallel executon tme can be represented n terms of parallel cost, t(p) = px =1 c p (; ): The tradtonal speedup s dened as If c p (; )=c p (p; ), for 1 <p, then S p = t(1) P p t(p) = =1 c p (s; ) P p =1 c p (; ) : (2) S p = c p(s; ) c p (p; ) P p : : (3) =1 The rst rato of Eq. (3) s the cost rato, whch gves the nuence of memory access delay. The second rato, P p =1 s the smple analytc model based on degree of parallelsm [18]. It assumes that memory access tme s constant as problem sze and system sze vary. The cost rato dstngushes the derent performance analyss methods wth or wthout consderaton of the memory nuence. In general, cost rato depends on memory mss rato, page replacement polcy, data reference pattern, etc. For a smple case, f we assume there s no remote access n parallel processng and the remote access rato of the sequental processng s (p 1)=p, then (4) 5

7 c p (s; ) c p (p; ) = 1 p + p 1 tme of per remote access p tme of per local access : (5) Equaton (5) approxmately equals the tme of per remote access over the tme of per local access. Snce the remote memory access s much slower than the local memory access under the current technology, the speedup gven by Eq. (3) could be consderably larger than the smple analytc model (4). In fact, the slower the remote access s, the larger the derence. For the KSR-1, the tme rato of remote and local access s about 7.5 (see Secton 4). Therefore, for p = 32, the cost rato s 7.3. For any P p = =1 superlnear speedup. > :14, under the assumed remote access rato, we wll have a 3 The Generalzed Speedup hle parallel computers are desgned for solvng large problems, a sngle processor of a parallel computer s not desgned to solve a very large problem. A unprocessor does not have the computng power that the parallel system has. hle solvng a small problem s napproprate on a parallel system, solvng a large problem on a sngle processor s not approprate ether. To create a useful comparson, we need a metrc that can vary problem szes for unprocessor and multple processors. Generalzed speedup [17] s one such metrc. Generalzed Speedup = Parallel Speed Sequental Speed : (6) Speed s dened as the quotent of work and elapsed tme. Parallel speed mght be based on scaled parallel work. Sequental speed mght be based on the unscaled unprocessor work. By denton, generalzed speedup measures the speed mprovement of parallel processng over sequental processng. In contrast, the tradtonal speedup (2) measures tme reducton of parallel processng. If the problem sze (work) for both parallel and sequental processng are the same, the generalzed speedup s the same as the tradtonal speedup. From ths pont of vew, the tradtonal speedup s a specal case of the generalzed speedup. For ths and for hstorcal reasons, we sometmes call the tradtonal speedup the speedup, and call the speedup gven n Eq. (6) the generalzed speedup. Lke the tradtonal speedup, the generalzed speedup can also be further dvded nto xedsze, xed-tme, and memory-bounded speedup. Unlke the tradtonal speedup, for the generalzed speedup, the scaled problem s solved only on multple processors. speedup s szeup [17]. The xed-tme benchmark SLALOM [6] s based on szeup. The xed-tme generalzed If memory access tme s xed, one mght always assume that the unprocessor cost c p (s) wll be stablzed after some ntal decrease (due to ntalzaton, loop overhead, etc.), assumng the memory s large enough. hen cache and remote memory access are consdered, cost wll ncrease when a slower memory has to be accessed. Fgure 3 depcts the typcal cost changng pattern. 6

8 Cost Insuffcent Memory Increases Sequental Executon Tme Fts n Cache Fts n Man Memory Fts n Remote Memory Problem Sze Fgure 3. Cost Varaton Pattern. From Eq. (1), we can see that unprocessor speed s the recprocal of unprocessor cost. hen the cost reaches ts lowest value, the speed reaches ts hghest value. The unprocessor speed correspondng to the stablzed man memory cost s called the asymptotc speed (of unprocessor). Asymptotc speed represents the performance of the sequental processng wth ecent memory access. The asymptotc speed s the approprate sequental speed for Eq. (6). For memorybounded speedup, the approprate memory bound s the largest problem sze whch can mantan the asymptotc speed. After choosng the asymptotc speed as the sequental speed, the correspondng asymptotc cost has only local access and s ndependent of the problem sze. e use c(s; ) to denote the correspondng asymptotc cost, where s a problem sze whch acheves the asymptotc speed. If there s no remote access n parallel processng, as assumed n Secton 2, then c(s; )=c p (p; ) = 1. By Eq. (3), the correspondng speedup equals the smple speedup whch does not consder the nuence of memory access tme. In general, parallel work s not the same as.sowehave Generalzed Speedup = P p =1 cp(; ) 1 c(s;) = P p =1 c(s; ) c p (; ) : (7) Equaton (7) s another form of the generalzed speedup. It s a quotent of sequental and parallel tme as s tradtonal speedup (2). The derence s that, n Eq. (7), the sequental tme s based on the asymptotc speed. hen remote memory s needed for sequental processng, c(s; )s smaller than c p (s; ). Therefore, the generalzed speedup gves a smaller speedup than tradtonal speedup. Parallel ecency s dened as Ecency = speedup number of processors : (8) 7

9 The Generalzed ecency can be dened smlarly as By denton, and Generalzed Ecency = c(s; ) Ecency = P p p =1 c p (; ) generalzed speedup number of processors : (9) (1) c(s; ) Generalzed Ecency = P p p =1 c p (; ) : (11) Equatons (1) and (11) show the derence between the two ecences. The tradtonal ecency assumes that the measured sequental processng acheves hundred percent ecency. The generalzed ecency assumes that the sequental processng based on the asymptotc cost acheves hundred percent ecency. Tradtonal speedup compares parallel processng wth the measured sequental processng. Generalzed speedup compares parallel processng wth the sequental processng based on the asymptotc cost. From ths pont of vew, generalzed speedup s a reform of tradtonal speedup. The followng propostons are drect results of Eq.(7). Proposton 1 If c p (s; ) s ndependent of problem sze, tradtonal speedup s the same as generalzed speedup. Proposton 2 If the parallel work,, acheves the asymptotc speed, that s =, then the xed-sze tradtonal speedup s the same as the xed-sze generalzed speedup. By Proposton 1, f the smple analytc model (4) s used to analyze performance, there s no derence between the tradtonal and the generalzed speedup. If the problem sze s larger than the suggested ntal problem sze, then the sngle processor speedup S 1 may not equal to one. S 1 measures the sequental necency due to the derence n memory access. The generalzed speedup s also closely related to the scalablty study. Isospeed scalablty has been proposed recently n [19]. The sospeed scalablty measures the ablty of an algorthmmachne combnaton mantanng the average (unt) speed, where the average speed s dened as the speed over the number of processors. hen the system sze s ncreased, the problem sze s scaled up accordngly to mantan the average speed. If the average speed can be mantaned, we say the algorthm-machne combnaton s scalable and the scalablty s (p; p )= p p ; (12) where s the amount ofwork needed to mantan the average speed when the system sze has been changed from p to p, and s the problem sze solved when p processors were used. By 8

10 denton Average Speed = P p p =1 c p (; ) : Snce the sequental cost s xed n Eq. (11), xng average speed s equvalent to xng generalzed ecency. Therefore the sospeed scalablty can be seen as the so-generalzed-ecency scalablty. hen the memory nuency s not consedered,.e. c p (s; ) s ndependent of the problem sze, the so-generalzed-ecency wll be the same as the so-tradtonal-ecency. In ths case, the sospeed scalablty s the same as the soecency scalablty proposed by Kumar [11, 8]. Proposton 3 If the sequental cost c p (s; ) s ndependent of problem sze or f the smple analyss model (4) s used for speedup, the soecency and sospeed scalablty are equvalent to each other. The followng theorem gves the relaton between the scalablty and the xed-tme speedup. Theorem 1 Scalablty (12) equals one f and only f the xed-tme generalzed speedup s untary. Proof: Let c(s; );c p (; ),, be as dened n Eq. (7). If scalablty (12) equals 1, let, p be as dened n Eq. (12) and dene smlarly as, we have p = p ; (13) for anynumber of processors p and p. By the denton of generalzed speedup, generalzed speedup G S p = P p c(s; ) c p (; ) : th some arthmetc manpulaton, we have p = GS p p P p c p (; ) c(s; ) : Smlarly, wehave p =GS p p By Eq. (13) and the above two equatons, P p c p (; ) : c(s; ) G S p p P p c p (; ) = G S p c(s; ) p P p c p (; ) : c(s; ) For xed-tme speedup Xp c p (; )= px c p(; ): 9

11 Thus, For p =1, G S p p = G S p p : GS p =p GS p : (14) Equaton (14) s the correspondng untary speedup when G S 1 s not equal to one. If the work equals, then G S 1 = 1 and Eq. (14) becomes G S p = p ; whch s the untary speedup dened n denton 1. If the xed-tme generalzed speedup s untary, then for any number of processors, p and p, and the correspondng problem szes, and, where s the scaled problem sze under the xed-tme constrant, we have and Therefore, c(s; ) P p c p (; ) = p; P p The average speed s mantaned. Also snce c(s; ) c p (; ) = p : P p p c p (; ) = P p p c p (; ) : px c p(; )= Xp c p (; ); we have the equalty p = p : The scalablty (12) equals one. 2 The followng theorem gves the relaton between memory-bounded speedup and xed-tme speedup. The theorem s for generalzed speedup. However, based on Proposton 1, the result s true for tradtonal speedup when unprocessor cost s xed or the smple analyss model s used. Theorem 2 If problem sze ncreases proportonally to the number of processors n memorybounded scaleup, then memory-bounded generalzed speedup s untary f and only f xed-tme generalzed speedup s untary. Proof: Let c(s; );c p (; ), and be as dened n Theorem 1. Let ; be the scaled 1

12 problem sze of xed-tme and memory-bounded scaleup respectvely, and and accordngly. If memory-bounded speedup s untary, we have be dened c(s; ) P p c p (; ) = p; and P p Combne the two equatons, we have the equaton c(s; ) c p (; ) = p : P p p c p (; ) = P p p c p (; ) : (15) By assumpton, s proportonal to the number of processors avalable, = p p Substtutng Eq. (16) nto Eq. (15), we get the xed-tme equalty: : (16) Xp c p (; )= px c p(; ): (17) That s =, and the xed-tme generalzed speedup s untary. If xed-tme speedup s untary, then, followng smlar deductons as used for Eq. (15), we have P p p c p (; ) = P p p c p (; ) : (18) Applyng the xed-tme equalty Eq. (17) to Eq. (18), we have the reduced equaton = p p th the assumpton Eq. (16), Eq. (19) leads to : (19) = ; and memory-bounded generalzed speedup s untary. 2 The followng corollary s a drect result of Theorem 1 and Theorem 2. Corollary 1 If work ncreases proportonally wth the number of processors, then scalablty (12) 11

13 equals one f and only f the memory-bounded generalzed speedup s untary. Fnally, to complete our dscusson on the superlnear speedup, there s a new cause of superlnearty for generalzed speedup. The new source of superlnear speedup s called prole shftng [6], and s due to the problem sze derence between sequental and parallel processng. An applcaton may contan derent work types. hle problem sze ncreases, some work types may ncrease faster than the others. hen the work types wth lower costs ncrease faster, superlnear speedup may occur. A superlnear speedup due to prole shftng was studed n [6]. 7. prole shftng Fgure 4. Causes of Superlnear Speedup: part 3 4 Expermental Results In ths secton, we dscuss the tmng results for solvng an applcaton problem on KSR-1 parallel computers. e rst gve bref descrptons of the archtecture and the applcaton problem, and then present the tmng results and analyses. 4.1 The Machne The machne to be dscussed here can be vewed as a combnaton of (or a compromse between) the dstrbuted and shared memory parallel archtectures. Ther hybrd s called the Shared Vrtual Memory archtecture. A representatve of ths category s the new KSR-1 parallel computer from Kendall Square Research. It has dstrbuted physcal memory whch makes the system scalable to a large number of processors, and a shared address space whch provdes users a shared-memory-lke programmng envronment. Fgure 5 shows the archtecture of the KSR-1 parallel computer [9]. Each processor on the KSR- 1 has 32 Mbytes of local memory. The CPU s a super-scalar processor wth a peak performance of 4 Mops n double precson. Processors are organzed nto derent rngs. The local rng (rng:) can connect up to 32 processors, and a hgher level rng of rngs (rng:1) can contan up to 34 local rngs wth a maxmum of 188 processors. If a non-local data element s needed, the local search engne (SE:) wll search the processors n the local rng (rng:). If the search engne SE: can not locate the data element wthn the local rng, the request wll be passed to the search engne at the next level (SE:1) to locate the data. 12

14 rng: rng:1 connectng up to 34 rng: s rng: P M rng: connectng up to 32 processers M P P M Fgure 5. Conguraton of KSR-1 parallel computers. P : processor M :32Mbytes of local memory Ths s done automatcally by a herarchy of search engnes connected n a fat-tree-lke structure [9, 12]. The memory herarchy of KSR-1 s shown n Fg. 6. Each processor has 512 Kbytes of fast subcache whch s smlar to the normal cache on other parallel computers. Ths subcache s dvded nto two equal parts: an nstructon subcache and a data subcache. The 32 Mbytes of local memory on each processor s called a local cache. A local rng (rng:) wth up to 32 processors can have 1Gbytes total of local cache whch s called Group: cache. Access to the Group: cache s provded by Search Engne:. Fnally, a hgher level rng of rngs (rng:1) connects up to 34 local rngs wth 34 Gbytes of total local cache whch s called Group:1 cache. Access to the Group:1 cache s provded by Search Engne:1. The entre memory herarchy s called ALLCACHE memory by the Kendall Square Research. Access by a processor to the ALLCACHE memory system s accomplshed by gong through derent Search Engnes as shown n Fg. 6. The latences for derent memory locatons [1] are: 2 cycles for subcache, 2 cycles for local cache, 15 cycles for Group: cache, and 57 cycles for Group:1 cache. 4.2 The Applcaton Least squares problems are frequently encountered n scentc and engneerng applcatons. The major work of solvng least squares problems s to solve the normal equaton A T Ax = A T b (2) by orthogonal factorzaton schemes (Householder Transformatons and Gvens rotatons). Ecent Householder algorthms have been dscussed n [3] for shared memory supercomputers, and n [16] 13

15 Processor 512 KB Subcache 32 MB Local Cache Search Engne: 1GB Group: Cache Search Engne:1 34 GB Group:1 Cache Fgure 6. Memory herarchy of KSR-1. for dstrbuted memory parallel computers. In many cases, for nstance the nverse problem of partal derental equatons [2], the normal equaton system resultng from the dscretzaton s too ll-condtoned to be solved drectly. Tkhnov's regularzaton method [2] s frequently used n ths case to ncrease numercal stablty. The key step n ths process s to ntroduce a regularzaton factor >. Instead of solvng (2) drectly, we solve the followng system for x. Eq. (21) can also be wrtten as (A T ; p I) (A T A + I)x = A T b 1 p A I A x =(A T ; p b 1 A (22) or B T Bx = B b 1 A ; (23) 14

16 so that the major task s to carry out the QR factorzaton for matrx B whch has the structure B = a (1) 11 a (1) 12 a (1) 1n.. a (1) m1 a (1) p p. m2 a (1) mn... p ; (24) where we usually have m n wth m of the same order as n. Matrx B s nether a complete full matrx nor a sparse matrx. The upper part s full and the lower part s sparse (n dagonal form). Because of the specal structure n (24), not all elements n the matrx are aected n a partcular transformaton step. In the rst step, all elements wthn the frame n matrx (24) wll be aected. In each new step, the frame n (24) wll shft downwards one row wth the left most column out of the game. Therefore, at the th step, the submatrx B aected n the transformaton has the form: B = a () a () a () n.. m+ 1; a () p.. m+ 1;n If the columns of matrx B of (25) are denoted by b j,.e : (25) B =[b b +1 b n]; (26) then the Householder Transformaton can be descrbed as: 15

17 Householder Transformaton Intalze matrx B for =1,n end for 1: = sgn(a () )(bt b )1=2 2: w = b e 1 3: j =w T b j (2 a () ); 4: b j =b j j w ; j =+1;n j = +1;;n The calculaton of j 's and updatng of b j 's can be done n parallel for derent ndex j. 4.3 Tmng Results The numercal experments reported here were conducted on the KSR-1 parallel computer nstalled at the Cornell Theory Center. There are 128 processors altogether on the machne. Durng the perod when our experments were performed, however the computer was congured as two standalone machnes wth 64 processors each. Therefore, the numercal results were obtaned usng less than 64 processors. Fgure 7 shows the tradtonal xed-sze speedup curves obtaned by solvng the regularzed least squares problem wth derent matrx szes n. The matrx s of dmensons 2n n. e can see clearly that as the matrx sze n ncreases, the speedup s gettng better and better. For the case when n = 248, the speedup s 76 on 56 processors. Although t s well known that on most parallel computers, the speedup mproves as the problem sze ncreases, what s shown n Fg. 7 s certanly too good to be a reasonable measurement of the real performance of the KSR-1. The problem wth the tradtonal speedup s that t s dened as the rato of the sequental tme to the parallel tme used for solvng the same xed-sze problem. The complex memory herarchy on the KSR-1 makes the computatonal speed of a sngle processor hghly dependent on the problem sze. hen the problem s so bg that not all data of the matrx can be put n the local memory (32 Mbytes) of the sngle computng processor, part of the data must be put n the local memory of other processors on the system. These data are accessed by the computng processor through Search Engne:. As a result, the computatonal speed on a sngle processor slows down sgncantly due to the hgh latency of Group: cache. The sustaned computatonal speed on a sngle processor s 5.5 Mops, 4.5 Mops and 2.7 Mops for problem szes 124, 16 and 248 respectvely. On the other hand, wth multple processors, most of the data needed are n the local 16

18 Speedup4 Ideal Speedup n = 124 n = 16 n = Number of Processors Fgure 7. Fxed-sze (Tradtonal) Speedup on KSR-1 memory of each processor, so the computatonal speed suers less from the hgh Group: cache latency. Therefore, the excellent speedups shown n Fg. 7 are the results of sgncant unprocessor performance degradaton when a large problem s solved on a sngle processor. Fgure 8 shows the measured sngle processor speed as a functon of problem sze n. The Householder Transformaton algorthm gven before was mplemented n KSR Fortran. The algorthm has a numercal complexty of=2n 3 +8:5n 2 +26:5n, and the speed s calculated usng s = =t where t s the CPU tme used to nsh the computaton. As can be seen from Fg. 8, the three segments represent sgncantly derent speeds for derent matrx szes. hen the whole matrx can be t nto the subcache, the performance s close to 7 Mops. The speed decreases to around 5.5 Mops when the matrx can not be t nto the subcache, but stll can be accommodated n the local cache. Note, however, when the matrx s so bg that access to Group: cache through Search Engne: s needed, the performance degrades sgncantly and there s no clear stable performance level as can be observed n the other two segments. Ths s largely due to the hgh Group: cache latency and the contenton for the Search Engne whch s used by all processors on the machne. Therefore, the access tme of Group: cache s less unform as compared to that of the subcache and local cache. To take the derence of sngle processng speeds for derent problem szes nto consderaton, we have to use the generalzed speedup to measure the performance of multple processors on the KSR-1. As can be seen from the denton of Eq. (6), the generalzed speedup s dened as the rato of the parallel speed to the asymptotc sequental speed, where the parallel speed s 17

19 Speed Subcache All Cache Remote Memory Order of the Matrces Fgure 8. Speed Varaton of Unprocessor Processng on KSR-1 based on a scaled problem. In our numercal tests, the parallel problem was scaled n a memorybounded fashon as the number of processors ncreases. The ntal problem was selected based on the asymptotc speed (5.5 Mops from Fg. 8) and then scaled proportonally accordng to the number of processors,.e. wth p processors, the problem s scaled to a sze that wll ll M p Mbytes of memory, where M s the memory requred by the unscaled problem. Fgure 9 shows the comparsons of the tradtonal scaled speedup and the generalzed speedup. For the tradtonal scaled speedup, the scaled problem s solved on both one and p processors, and the value of the speedup s calculated as the rato of the tme of one processor to that of p processors. hle for the generalzed speedup, the scaled problem s solved only on multple processors, not on a sngle processor. The value of the speedup s calculated usng Eq. (6), where the asymptotc speed s used for the sequental speed. It s clear that Fg. 9 shows that the generalzed speedup gves much more reasonable performance measurement on KSR-1 than does the tradtonal scaled speedup. th the tradtonal scaled speedup, the speedup s above 2 wth only 1 processors. Ths excellent superlnear speedup s a result of the severely degraded sngle processors speed, rather than the perfect scalablty of the machne and the algorthm. 5 Concluson Snce the scaled up prncple was proposed n 1988 by Gustafson and other researchers at Sanda Natonal Laboratory [5], the prncple has been wdely used n performance measurement of parallel algorthms and archtectures. One dculty of measurng scaled speedup s that vary large problems 18

20 2 16 Ideal Speedup Generalzed Speedup Tradtonal Speedup 12 Speedup Number of Processors Fgure 9. Comparson of Generalzed and Tradtonal Speedup on KSR-1 have to be solved on unprocessor, whch svery necent f vrtual memory s supported, or s mpossble otherwse. To overcome ths shortcomng, generalzed speedup was proposed and studed by Gustafson and Sun [17]. Generalzed speedup s dened as parallel speed over sequental speed and does not requre solvng large problems on unprocessor. The study [17] emphaszed the xed-tme generalzed speedup, szeup. To meet the need of the emergng shared vrtual memory machnes, the generalzed speedup, partcularly mplementaton ssues, has been carefully studed n the current research. It has shown that tradtonal speedup s a specal case of generalzed speedup, and, on the other hand, generalzed speedup s a reform of tradtonal speedup. The man derence between generalzed speedup and tradtonal speedup s how to dene the unprocessor ecency. hen unprocessor speed s xed these two speedups are the same. Extendng these results to scalablty study, wehave found that the derence between sospeed scalablty [19] and soecency scalablty [11] s also due to the unprocessor ecency. hen the unprocessor speed s ndependent of the problem sze, these two proposed scalabltes are the same. As part of the performance study, wehave shown that an algorthm-machne combnaton acheves a perfect scalablty f and only f t acheves a perfect speedup. Seven causes of superlnear speedup are also lsted. A scentc applcaton has been mplemented on a Kendall Square KSR-1 shared vrtual memory machne. Expermental results show that unprocessor ecency s an mportant ssue for vrtual memory machnes, and that the asymptotc speed provdes a reasonable way to dene the unprocessor ecency. The results n ths paper on shared vrtual memory can be extended to general parallel com- 19

21 puters. Snce unprocessor ecency s drectly related to parallel executon tme, scalablty, and benchmark evaluatons, the range of applcablty of the unprocessor ecency study s wder than speedups. The unprocessor ecency mght be explored further n a number of contexts. Acknowledgement The authors are grateful to the Cornell Theory Center for provdng access to ts KSR-1 parallel computer. References [1] Amdahl, G. Valdty of the sngle-processor approach to achevng large scale computng capabltes. In Proc. AFIPS Conf. (1967), pp. 483{485. [2] Chen, Y. M., Zhu, J. P., Chen,. H., and asserman, M. L. GPST nverson algorthm for hstory matchng n 3-d 2-phase smulators. In IMACS Trans. on Scentc Computng I (1989), pp. 369{374. [3] Dongarra, J., Duff, I. S., Sorensen, D. C., and van der Vorst, H. A. Solvng Lnear Systems on Vector and Shared Memory Computers. SIAM, Phladelpha, [4] Gustafson, J. Reevaluatng Amdahl's law. Communcatons of the ACM 31 (May 1988), 532{533. [5] Gustafson, J., Montry, G., and Benner, R. Development of parallel methods for a 124-processor hypercube. SIAM J. of Sc. and Stat. Computng 9, 4 (July 1988), 69{638. [6] Gustafson, J., Rover, D., Elbert, S., and Carter, M. The desgn of a scalable, xedtme computer benchmark. J. of Parallel and Dstrbuted Computng 12, 4 (1991), 388{41. [7] Helmbold, D., and McDowell, C. Modelng speedup(n) greater than n. In Proc. of the 1989 Int'l Conf. on Parallel Processng, Vol. III (1989), pp. 219{225. [8] Hwang, K. Advanced Computer Archtecture: Parallelsm, Scalablty, Programmablty. McGraw-Hll Book Co., [9] Kendall Square Research. KSR parallel programmng. altham, USA, [1] Kendall Square Research. KSR techncal summary. altham, USA, [11] Kumar, V., and Gupta, A. Analyss of scalablty of parallel algorthms and archtectures: A survey. In Proc. of 1991 Int'l Conf. on Supercomputng (June 1991), pp. 396{45. [12] Leserson, C. Fat-trees: Unversal networks for hardware-ecent supercomputng. IEEE Transactons on Computng 34, 1 (1985), 892{91. [13] Ncol, D. Inated speedups n parallel smulatons va malloc(). Internatonal Journal on Smulaton 2 (Dec. 1992), 413{426. [14] Ortega, J., and Vogt, R. Soluton of partal derental equatons on vector and parallel computers. SIAM Revew (June 1985), 149{24. 2

22 [15] Parknson, D. Parallel ecency can be greater than unty. Parallel Computng 3 (1986), 261{262. [16] Pothen, A., and Raghavan, P. Dstrbuted orthogonal factorzaton: Gvens and Householder algorthms. SIAM J. of Sc. and Stat. Computng 1 (1989), 1113{1135. [17] Sun, X.-H., and Gustafson, J. Toward a better parallel performance metrc. Parallel Computng 17 (Dec 1991), 193{119. [18] Sun, X.-H., and N, L. Scalable problems and memory-bounded speedup. J. of Parallel and Dstrbuted Computng 19 (Sept. 1993), 27{37. [19] Sun, X.-H., and Rover, D. Scalablty of parallel algorthm-machne combnatons. IEEE Transactons on Parallel and Dstrbuted Systems (1994). to appear. [2] Tkhnov, A. N., and Arsenn, V. Soluton of Ill-posed Problems. John ley and Sons,

Shared Virtual Memory Machines. Mississippi State, MS Abstract

Shared Virtual Memory Machines. Mississippi State, MS Abstract Performance Consderatons of Shared Vrtual Memory Machnes Xan-He Sun Janpng Zhu Department of Computer Scence NSF Engneerng Research Center Lousana State Unversty Dept. of Math. and Stat. Baton Rouge, LA

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

A One-Sided Jacobi Algorithm for the Symmetric Eigenvalue Problem

A One-Sided Jacobi Algorithm for the Symmetric Eigenvalue Problem P-Q- A One-Sded Jacob Algorthm for the Symmetrc Egenvalue Problem B. B. Zhou, R. P. Brent E-mal: bng,rpb@cslab.anu.edu.au Computer Scences Laboratory The Australan Natonal Unversty Canberra, ACT 000, Australa

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

Virtual Machine Migration based on Trust Measurement of Computer Node

Virtual Machine Migration based on Trust Measurement of Computer Node Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on

More information

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc.

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 97-735 Volume Issue 9 BoTechnology An Indan Journal FULL PAPER BTAIJ, (9), [333-3] Matlab mult-dmensonal model-based - 3 Chnese football assocaton super league

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

AADL : about scheduling analysis

AADL : about scheduling analysis AADL : about schedulng analyss Schedulng analyss, what s t? Embedded real-tme crtcal systems have temporal constrants to meet (e.g. deadlne). Many systems are bult wth operatng systems provdng multtaskng

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

USING GRAPHING SKILLS

USING GRAPHING SKILLS Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface.

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface. IDC Herzlya Shmon Schocken Assembler Shmon Schocken Sprng 2005 Elements of Computng Systems 1 Assembler (Ch. 6) Where we are at: Human Thought Abstract desgn Chapters 9, 12 abstract nterface H.L. Language

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6)

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6) Harvard Unversty CS 101 Fall 2005, Shmon Schocken Assembler Elements of Computng Systems 1 Assembler (Ch. 6) Why care about assemblers? Because Assemblers employ some nfty trcks Assemblers are the frst

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach Data Representaton n Dgtal Desgn, a Sngle Converson Equaton and a Formal Languages Approach Hassan Farhat Unversty of Nebraska at Omaha Abstract- In the study of data representaton n dgtal desgn and computer

More information

3D vector computer graphics

3D vector computer graphics 3D vector computer graphcs Paolo Varagnolo: freelance engneer Padova Aprl 2016 Prvate Practce ----------------------------------- 1. Introducton Vector 3D model representaton n computer graphcs requres

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Abstract Ths paper ponts out an mportant source of necency n Smola and Scholkopf's Sequental Mnmal Optmzaton (SMO) algorthm for SVM regresson that s c

Abstract Ths paper ponts out an mportant source of necency n Smola and Scholkopf's Sequental Mnmal Optmzaton (SMO) algorthm for SVM regresson that s c Improvements to SMO Algorthm for SVM Regresson 1 S.K. Shevade S.S. Keerth C. Bhattacharyya & K.R.K. Murthy shrsh@csa.sc.ernet.n mpessk@guppy.mpe.nus.edu.sg cbchru@csa.sc.ernet.n murthy@csa.sc.ernet.n 1

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Maintaining temporal validity of real-time data on non-continuously executing resources

Maintaining temporal validity of real-time data on non-continuously executing resources Mantanng temporal valdty of real-tme data on non-contnuously executng resources Tan Ba, Hong Lu and Juan Yang Hunan Insttute of Scence and Technology, College of Computer Scence, 44, Yueyang, Chna Wuhan

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES

More information

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to:

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to: 4.1 4.2 Motvaton EE 457 Unt 4 Computer System Performance An ndvdual user wants to: Mnmze sngle program executon tme A datacenter owner wants to: Maxmze number of Mnmze ( ) http://e-tellgentnternetmarketng.com/webste/frustrated-computer-user-2/

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

Channel 0. Channel 1 Channel 2. Channel 3 Channel 4. Channel 5 Channel 6 Channel 7

Channel 0. Channel 1 Channel 2. Channel 3 Channel 4. Channel 5 Channel 6 Channel 7 Optmzed Regonal Cachng for On-Demand Data Delvery Derek L. Eager Mchael C. Ferrs Mary K. Vernon Unversty of Saskatchewan Unversty of Wsconsn Madson Saskatoon, SK Canada S7N 5A9 Madson, WI 5376 eager@cs.usask.ca

More information

Solitary and Traveling Wave Solutions to a Model. of Long Range Diffusion Involving Flux with. Stability Analysis

Solitary and Traveling Wave Solutions to a Model. of Long Range Diffusion Involving Flux with. Stability Analysis Internatonal Mathematcal Forum, Vol. 6,, no. 7, 8 Soltary and Travelng Wave Solutons to a Model of Long Range ffuson Involvng Flux wth Stablty Analyss Manar A. Al-Qudah Math epartment, Rabgh Faculty of

More information

an assocated logc allows the proof of safety and lveness propertes. The Unty model nvolves on the one hand a programmng language and, on the other han

an assocated logc allows the proof of safety and lveness propertes. The Unty model nvolves on the one hand a programmng language and, on the other han UNITY as a Tool for Desgn and Valdaton of a Data Replcaton System Phlppe Quennec Gerard Padou CENA IRIT-ENSEEIHT y Nnth Internatonal Conference on Systems Engneerng Unversty of Nevada, Las Vegas { 14-16

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Computer models of motion: Iterative calculations

Computer models of motion: Iterative calculations Computer models o moton: Iteratve calculatons OBJECTIVES In ths actvty you wll learn how to: Create 3D box objects Update the poston o an object teratvely (repeatedly) to anmate ts moton Update the momentum

More information

Preconditioning Parallel Sparse Iterative Solvers for Circuit Simulation

Preconditioning Parallel Sparse Iterative Solvers for Circuit Simulation Precondtonng Parallel Sparse Iteratve Solvers for Crcut Smulaton A. Basermann, U. Jaekel, and K. Hachya 1 Introducton One mportant mathematcal problem n smulaton of large electrcal crcuts s the soluton

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Communication-Minimal Partitioning and Data Alignment for Af"ne Nested Loops

Communication-Minimal Partitioning and Data Alignment for Afne Nested Loops Communcaton-Mnmal Parttonng and Data Algnment for Af"ne Nested Loops HYUK-JAE LEE 1 AND JOSÉ A. B. FORTES 2 1 Department of Computer Scence, Lousana Tech Unversty, Ruston, LA 71272, USA 2 School of Electrcal

More information

Loop Permutation. Loop Transformations for Parallelism & Locality. Legality of Loop Interchange. Loop Interchange (cont)

Loop Permutation. Loop Transformations for Parallelism & Locality. Legality of Loop Interchange. Loop Interchange (cont) Loop Transformatons for Parallelsm & Localty Prevously Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Loop nterchange Loop transformatons and transformaton frameworks

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Accounting for the Use of Different Length Scale Factors in x, y and z Directions

Accounting for the Use of Different Length Scale Factors in x, y and z Directions 1 Accountng for the Use of Dfferent Length Scale Factors n x, y and z Drectons Taha Soch (taha.soch@kcl.ac.uk) Imagng Scences & Bomedcal Engneerng, Kng s College London, The Rayne Insttute, St Thomas Hosptal,

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

SENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR

SENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR SENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR Judth Aronow Rchard Jarvnen Independent Consultant Dept of Math/Stat 559 Frost Wnona State Unversty Beaumont, TX 7776 Wnona, MN 55987 aronowju@hal.lamar.edu

More information

High resolution 3D Tau-p transform by matching pursuit Weiping Cao* and Warren S. Ross, Shearwater GeoServices

High resolution 3D Tau-p transform by matching pursuit Weiping Cao* and Warren S. Ross, Shearwater GeoServices Hgh resoluton 3D Tau-p transform by matchng pursut Wepng Cao* and Warren S. Ross, Shearwater GeoServces Summary The 3D Tau-p transform s of vtal sgnfcance for processng sesmc data acqured wth modern wde

More information

Assembler. Building a Modern Computer From First Principles.

Assembler. Building a Modern Computer From First Principles. Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought

More information

Video Proxy System for a Large-scale VOD System (DINA)

Video Proxy System for a Large-scale VOD System (DINA) Vdeo Proxy System for a Large-scale VOD System (DINA) KWUN-CHUNG CHAN #, KWOK-WAI CHEUNG *# #Department of Informaton Engneerng *Centre of Innovaton and Technology The Chnese Unversty of Hong Kong SHATIN,

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Network Coding as a Dynamical System

Network Coding as a Dynamical System Network Codng as a Dynamcal System Narayan B. Mandayam IEEE Dstngushed Lecture (jont work wth Dan Zhang and a Su) Department of Electrcal and Computer Engneerng Rutgers Unversty Outlne. Introducton 2.

More information

Positive Semi-definite Programming Localization in Wireless Sensor Networks

Positive Semi-definite Programming Localization in Wireless Sensor Networks Postve Sem-defnte Programmng Localzaton n Wreless Sensor etworks Shengdong Xe 1,, Jn Wang, Aqun Hu 1, Yunl Gu, Jang Xu, 1 School of Informaton Scence and Engneerng, Southeast Unversty, 10096, anjng Computer

More information

with `ook-ahead for Broadcast WDM Networks TR May 14, 1996 Abstract

with `ook-ahead for Broadcast WDM Networks TR May 14, 1996 Abstract HPeR-`: A Hgh Performance Reservaton Protocol wth `ook-ahead for Broadcast WDM Networks Vjay Svaraman George N. Rouskas TR-96-06 May 14, 1996 Abstract We consder the problem of coordnatng access to the

More information

A SYSTOLIC APPROACH TO LOOP PARTITIONING AND MAPPING INTO FIXED SIZE DISTRIBUTED MEMORY ARCHITECTURES

A SYSTOLIC APPROACH TO LOOP PARTITIONING AND MAPPING INTO FIXED SIZE DISTRIBUTED MEMORY ARCHITECTURES A SYSOLIC APPROACH O LOOP PARIIONING AND MAPPING INO FIXED SIZE DISRIBUED MEMORY ARCHIECURES Ioanns Drosts, Nektaros Kozrs, George Papakonstantnou and Panayots sanakas Natonal echncal Unversty of Athens

More information

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming CS 4/560 Desgn and Analyss of Algorthms Kent State Unversty Dept. of Math & Computer Scence LECT-6 Dynamc Programmng 2 Dynamc Programmng Dynamc Programmng, lke the dvde-and-conquer method, solves problems

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Finite Element Analysis of Rubber Sealing Ring Resilience Behavior Qu Jia 1,a, Chen Geng 1,b and Yang Yuwei 2,c

Finite Element Analysis of Rubber Sealing Ring Resilience Behavior Qu Jia 1,a, Chen Geng 1,b and Yang Yuwei 2,c Advanced Materals Research Onlne: 03-06-3 ISSN: 66-8985, Vol. 705, pp 40-44 do:0.408/www.scentfc.net/amr.705.40 03 Trans Tech Publcatons, Swtzerland Fnte Element Analyss of Rubber Sealng Rng Reslence Behavor

More information

Chapter 1. Comparison of an O(N ) and an O(N log N ) N -body solver. Abstract

Chapter 1. Comparison of an O(N ) and an O(N log N ) N -body solver. Abstract Chapter 1 Comparson of an O(N ) and an O(N log N ) N -body solver Gavn J. Prngle Abstract In ths paper we compare the performance characterstcs of two 3-dmensonal herarchcal N-body solvers an O(N) and

More information

High-Boost Mesh Filtering for 3-D Shape Enhancement

High-Boost Mesh Filtering for 3-D Shape Enhancement Hgh-Boost Mesh Flterng for 3-D Shape Enhancement Hrokazu Yagou Λ Alexander Belyaev y Damng We z Λ y z ; ; Shape Modelng Laboratory, Unversty of Azu, Azu-Wakamatsu 965-8580 Japan y Computer Graphcs Group,

More information

Loop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation

Loop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation Loop Transformatons for Parallelsm & Localty Last week Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Scalar expanson for removng false dependences Loop nterchange Loop

More information

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces Range mages For many structured lght scanners, the range data forms a hghly regular pattern known as a range mage. he samplng pattern s determned by the specfc scanner. Range mage regstraton 1 Examples

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Load-Balanced Anycast Routing

Load-Balanced Anycast Routing Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

An Accurate Evaluation of Integrals in Convex and Non convex Polygonal Domain by Twelve Node Quadrilateral Finite Element Method

An Accurate Evaluation of Integrals in Convex and Non convex Polygonal Domain by Twelve Node Quadrilateral Finite Element Method Internatonal Journal of Computatonal and Appled Mathematcs. ISSN 89-4966 Volume, Number (07), pp. 33-4 Research Inda Publcatons http://www.rpublcaton.com An Accurate Evaluaton of Integrals n Convex and

More information

A NOTE ON FUZZY CLOSURE OF A FUZZY SET

A NOTE ON FUZZY CLOSURE OF A FUZZY SET (JPMNT) Journal of Process Management New Technologes, Internatonal A NOTE ON FUZZY CLOSURE OF A FUZZY SET Bhmraj Basumatary Department of Mathematcal Scences, Bodoland Unversty, Kokrajhar, Assam, Inda,

More information

Newton-Raphson division module via truncated multipliers

Newton-Raphson division module via truncated multipliers Newton-Raphson dvson module va truncated multplers Alexandar Tzakov Department of Electrcal and Computer Engneerng Illnos Insttute of Technology Chcago,IL 60616, USA Abstract Reducton n area and power

More information

Speedup of Type-1 Fuzzy Logic Systems on Graphics Processing Units Using CUDA

Speedup of Type-1 Fuzzy Logic Systems on Graphics Processing Units Using CUDA Speedup of Type-1 Fuzzy Logc Systems on Graphcs Processng Unts Usng CUDA Durlabh Chauhan 1, Satvr Sngh 2, Sarabjeet Sngh 3 and Vjay Kumar Banga 4 1,2 Department of Electroncs & Communcaton Engneerng, SBS

More information

Simulation Based Analysis of FAST TCP using OMNET++

Simulation Based Analysis of FAST TCP using OMNET++ Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Scheduling Remote Access to Scientific Instruments in Cyberinfrastructure for Education and Research

Scheduling Remote Access to Scientific Instruments in Cyberinfrastructure for Education and Research Schedulng Remote Access to Scentfc Instruments n Cybernfrastructure for Educaton and Research Je Yn 1, Junwe Cao 2,3,*, Yuexuan Wang 4, Lanchen Lu 1,3 and Cheng Wu 1,3 1 Natonal CIMS Engneerng and Research

More information

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements Module 3: Element Propertes Lecture : Lagrange and Serendpty Elements 5 In last lecture note, the nterpolaton functons are derved on the bass of assumed polynomal from Pascal s trangle for the fled varable.

More information

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation Intellgent Informaton Management, 013, 5, 191-195 Publshed Onlne November 013 (http://www.scrp.org/journal/m) http://dx.do.org/10.36/m.013.5601 Qualty Improvement Algorthm for Tetrahedral Mesh Based on

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated. Some Advanced SP Tools 1. umulatve Sum ontrol (usum) hart For the data shown n Table 9-1, the x chart can be generated. However, the shft taken place at sample #21 s not apparent. 92 For ths set samples,

More information

Intra-procedural Inference of Static Types for Java Bytecode 1

Intra-procedural Inference of Static Types for Java Bytecode 1 McGll Unversty School of Computer Scence Sable Research Group Intra-procedural Inference of Statc Types for Java Bytecode 1 Sable Techncal Report No. 5 Etenne Gagnon Laure Hendren October 14, 1998 w w

More information