A Fast Parallel Reed-Solomon Decoder On a Reconfigurable Architecture

Size: px
Start display at page:

Download "A Fast Parallel Reed-Solomon Decoder On a Reconfigurable Architecture"

Transcription

1 A Fast Parallel Reed-Solomon Decoder On a Reconfgurable Archtecture Arezou Kooh, Nader Bagherzadeh, Chengz Pan EECS Department EECS Department EECS Department Unversty of Calforna Irvne Unversty of Calforna, Irvne Unversty of Calforna, Irvne akooh@ece.uc.edu Nader@ece.uc.edu panc@uc.edu ABSTRACT Ths paper presents a software mplementaton of a very fast parallel Reed-Solomon decoder on the second generaton of MorphoSys reconfgurable computaton platform, whch s targetng on streamed applcatons such as multmeda and DSP. Numerous modfcatons of the frst-generaton of the archtecture have made a scalable computaton and communcaton ntensve archtecture capable of extractng parallelsms of fne gran n nstructon level. Many algorthms and the whole Dgtal Vdeo Broadcastng base-band recever as well, have been mapped onto the second archtecture wth mpressng performance. The mappng of a Reed-Solomon decoder proposed n ths paper hghly parallelzes all of ts sub-algorthms, ncludng Syndrome Computaton, Berlekamp Algorthm, Chen Search, and Error Value Computaton, n a SIMD fashon. The mappng s tested on a cycle-accurate smulator, Mulate, and the performance s encouragngly better than other archtectures. The decodng speed of the RS (255,239,16) decoder usng two dfferent methods of GF multplcaton can be 1.319Gbps and 2.534Gbps, respectvely. Furthermore, snce there s no functonalty specfcally talored to Reed-Solomon decoder, the result has demonstrated the capablty of MorphoSys archtecture to extractng Instructon Level Parallelsm from streamed applcatons. Categores and Subject Descrptors C.1.2 [Computer System Organzaton]: Processor Archtectures, Multple Data Stream Archtectures (Multprocessors), Sngle-nstructon-stream, Multple-Data - Stream Processors General Terms Algorthms, Performance, Desgn, Expermentaton Keywords: Reconfgurable Archtecture, SIMD Processor, Reed_Solomon codes, Berlekamp Algorthm, Chen Search 1. INTRODUCTION Toward a comng bllon-transstor era, today s computaton platform desgn has already foreseen the end of the road for conventonal mcro-archtectures [4], and numerous new Permsson to make dgtal or hard copes of all or part of ths work for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. To copy otherwse, or republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. CODES+ISSS 03, October 1 3, 2003, Newport Beach, Calforna, USA. Copyrght 2003 ACM /03/0010 $5.00. approaches have arsen above the horzon, such as EPIC (Itanum 2) [5], RAW [6], Imagne [7], and VIRAM [8], etc. Many of them target on stream applcatons, whch have already been consumng more than 90% of total computng cycles nowadays [7]. The bggest challenge of archtecture desgn s the scalablty, only wth whch can one follow up the step of Moore s Law. The dffculty of scalablty s mposed by slower decrease of wre transmsson delay than that of transstor swtchng delay. Ths dscrepancy requres a new phlosophy on desgn of scalar operand network [9] and memory herarchy. In ths paper, we wll ntroduce the 2 nd -generaton MorphoSys reconfgurable archtecture called M2, a computaton and communcaton ntensve platform capable of extractng fne gran parallelsms at the nstructon level. Many algorthms and the whole Dgtal Vdeo Broadcastng base-band recever as well, have been mapped onto M2 wth mpressng performance. The mappng of a Reed-Solomon decoder s proposed n ths paper. RS codes are powerful block codes wdely used as an error correcton method n the areas such as dgtal communcaton, dgtal dsc error correcton, etc. Recently, concatenated codes made up of convolutonal codes followed by RS codes have been proved as an effcent way of error correcton n wreless data communcaton systems. We present M2 Archtecture at the frst secton of the paper, ncludng the archtecture dfferent parts, M2 mplementaton and ts programmng model. Secton 3 ntroduces the RS decodng Algorthms and the mappng procedure on the M2. The secton also ncludes the M2 archtecture feasblty for dfferent parts of the Algorthms. In secton 4 we compare the result of the RS (255,239,16) decoder wth the exstng benchmarks ncludng the TI64x and dfferent Ascs. The decodng speed of the RS (255,239,16) decoder usng two dfferent methods of GF multplcaton can be 1.319Gbps and 2.534Gbps, respectvely 2. MORPHOSYS RECONFIGURABLE ARCHITECTURE 2.1. Introducton of MorphoSys MorphoSys s a reconfgurable computaton platform targetng computaton ntensve data parallel applcatons, ncludng streamed applcatons. M1 [1-2], the frst prototype of MorphoSys, has been used as a platform for many applcatons such as multmeda, wreless communcaton, sgnal processng, and computer graphcs. M2, the 2 nd generaton of MorphoSys, follows the basc concepts of MorphoSys; however, t s redesgned n both scalar operand network and memory herarchy, thus greatly enhanced n performance. Feedbacks from numerous 59

2 kernel and system mappngs ponted out M1 s bottlenecks, whch have been revsted n M2. M2 archtecture conssts of three man subsystems: a core processor called TnyRISC, an array of 64 Reconfgurable Cells (RCs) organzed n SIMD fashon, and a specal data movement unt called Frame Buffer (FB). The programmng model s smple. TnyRISC takes charge of the whole Data Control Flow (DCF); RC array and Frame Buffer are only trggered by TnyRISC and executng on ther own confguratons (called context) contnuously for a gven number of cycles specfed by TnyRISC. The man dfference between M1 and M2 s descrbed n [3]. Fgure 1 shows the MorphoSys dagram. Man Memory can be ether on-chp or off-chp wthout sgnfcant dfference of the connecton nterface, as long as Man Memory s also composed of 64 banks M2 mplementaton The followng table-1 gves out the characterstcs of M2 and compares t wth the other processors. Results of frst order smulaton usng Synopss tools show that the crtcal-path delay s about 1.8~2.3ns. Hence, 450 MHz s used n our smulaton. Other data are ether based on M1 mplementaton and M2 postsynthess (current status), or projected as our desgn am, whch are chosen no more aggressve than commercal processors. Though ndependently desgned, M2 combnes the structural advantages of three mportant archtecture parameters: the overall structure of host-processor/slave-computaton-fabrc, the controlled sze of dstrbuted memory wthn each AlU clusters and the powerful sequental code, whch s hard to be parallelzed. Second, the modest sze of dstrbuted memory enables more AlUs to be ntegrated n the array and, more mportantly, reduces the latency of scalar operand network, whch s essental to extract to adopt wde range of optmzaton technques, both coarse-gran and fne-gran, and avods usage of hardware-specfc languages as StreamC. Man Memory Frame Buffer DMA TnyRISC RC Array Context Memory Fgure 1. MorphoSys Dagram Table 1. Comparson of M2 and other archtectures* the latency s analyzed n 5-tuple <send occupancy, send latency, network hop latency, receve latency, receve occupancy> [9]. VIRAM Imagne RAW M2 (wth/wthout on-chp memory) Parallelsm Capablty Scalar Operand Network Memory Herarchy Parallelsm Vector SIMD MIMD SIMD model Peak 16-bt OPS 6.4G 23.7G 3.6G (32-bt) 28.8G Clock Speed 200MHz 296MHz 225MHz 450MHz Chp area 15*18mm 2 12*12mm 2 18*18mm 2 8*8mm 2 /16*16mm 2 # Transstors 120M 21M 122M 20M/120M Power 2W(averag e) 4W 25W 4W(peak MAC) Technology 0.18µm 0.15µm 0.15µm 0.13µm # Network 8(Banks) nodes Total bandwdth 51.2Gbps 75.8Gbps 922Gbps 922Gbps Latency * <0,1,1,1,0> <0,1,1,1,0> <0,1,1~6,1,0> <0,0,1~2,0,0> Full permutaton No Yes Yes Yes 1 st level sze 64Kb(VRF 96Kb(LRF) 16.4Kb(RF) 16.4Kb(RF) ) 2 nd level sze 104Mb(DR 1Mb(SRF) 16Mb(Local 2Mb(Local Mem.) AM) Mem.) 3 rd level sze Off-chp Off-chp Off-chp 16Mb/off-chp 1 st level 205Gbps 758Gbps 230Gbps 1382Gbps bandwdth 2 nd level 51.2Gbps 829Gbps ** 334Gbps 461Gbps bandwdth 3 rd level N/A 12.4Gbps 200Gbps 115Gbps/56.7Gbps bandwdth 60

3 2.3. Programmng Model One potental drawback of M2 s the SIMD model of RC array. However, mappng experence on MorhpoSys, VIRAM, Imagne, and other SIMD extenson of general-purpose processor lke Pentum MMX, all show that SIMD s suffcent for streamed multmeda applcatons, not mentonng the much smaller code sze than that of MIMD model. For example, mappng of DVB-T system on M2 does not need MIMD generalty. Although SPMD feature mght make mappng of MPEG-2 decoder easer, other subsystems are applyng mostly affne array access and small porton of non-affne but statc access nvolvng random communcaton, whch can be done effcently by the scalar operand network. Snce SIMD programmng has long been a matured feld n general, t s reasonable to suppose that an effcent C compler for streamed applcatons s hghly possble, wth some prevous experence already [10-11].Moreover, parallel programmng n M2 can adopt doman decomposton or functon decomposton, the former more straghtforward, whle Imagne has to stck wth cumbersome functon decomposton often experencng load-unbalanced problem. The followng fgure llustrates a real code segment for The Berelekamp algorthm operaton whch also shows programmng model of MorphoSys. Sequental functons Data and Computaton Parallel (RC Array) functons CONTROL + DATA CONTROL FLOW DATA PATH TnyRISC assembly sub ldfb $1, $15,$15,3 $2, 8, 511 Ldrcex $4, 1, 0, 1, 0, 41 ldfb $2,0,0,256 ldl $10,511 Ldrcex $5, 1, 1, 42 add $2,0,0,256 ldctxt $1, 0, 80 delay4: add $4, $4, 64 ldw sub $10,$10,4 $1, 0, 0, 0, add + $5, cbcast $5, 64 0, 0, 0, 0 jal nop 0, 2, 0, 1, ldrcex sbcb $4, 1, 0, 2, 1, 1, 0, lu brle $0,$10,delay4 ldrcex $5, 1, 0, 1, 0, 44 5 dbcbc $1, 1, 4, 64 addw nop add $4, $4, 64 0, 0, 0, 1, stfb $6, 1, 0, 16 wfb 0, 0, 1, 32 Frame Buffer DMAC TnyRISC RC Array RC context set 0: 0,1 ADD KEEP R0 I H1 I ; > R6 R0; set 1: 1,1 SUB CLOAD!0x004 H0 R0 > R6 I I >2 R0; ; set 2: 2,1 ADD ADD R0 E H3 r0 LSL > R6 2 >0 R0; WE; set 3: 3,1 SUB ADD H2 XQ R0 r0 > LSL R6 2 R0; >0; set 4: 4,1 ADD SUB R0 XQ H5 r0 > LSL R6 2 R0; >0; set 5: 5,1 SUB SUB H4 E R0 r0 LSL > R6 2 >0 R0; WE; 6: ADD R0 H7 > R6 R0; set 6,1 CLOAD!0x004 I I >2 ; 7: SUB H6 R0 > R6 R0; set 7,1 KEEP I I ; Context Memory Fgure 2- Programmng Model n MorphoSys 2.4. Scalablty There s a msunderstandng n many places, that scalablty s equal to even dstrbuton of resources. Actually, ths s not true n large scale. For example, as the tle number of RAW processor grows, the scalar operand network latency ncreases, and more severely, traffc load per tle ncreases, whch wll fnally destroy the scalablty. In order to avod ths load ncrease, one would expect to keep the atomc sze of 8X8 basc array and nstead use an ncremental level of nter-array communcaton network accommodatng ths ncremental load. Ths leads to a herarchcal scalng behavor, n each level of whch there are both dstrbuted and centralzed recourses. The real bllon-transstor archtecture comng around year 2007 can be made up of 8 M2 together wth the counterpart of TnyRISC and operand network at a hgh level.3 Mappng of Reed-Solomon Decodng Algorthm 3. MAPPING OF REED-SOLOMON DECODING ALGORITHM 3.1 Reed-Solomon Decodng Algorthm [14] In the RS (n,k) code, n s the block length, k s the transmtted code word length and n-k s the mnmum dstance over GF ( 2 n ). Decoder s capable of correctng ( n k) t = error n 2 the receved code word. RS decoder ntroduced n ths paper s developed for the SIMD reconfgurable processor. It s necessary to fnd a RS decodng algorthm, whch makes the parallel processng possble. Let s consder v(x) s the transmtted code word and r(x) s the receved code word. Then the error added by channel s e( x) = r( x) v( x) (1) Consderng ν errors are occurred n the transmtted code word the syndromes are computed as follows: S = v j n υ j j j j k j ( α ) = r( α ) + e( α ) = ek ( α ) = e X l l k= 0 Eq (3) shows the key equaton to fnd the error locatons and error magntudes. 2t ( x) S( x) = Γ( x) mod x l= 1 (2) Λ (3) ( x) Λ s the error locaton polynomal and ts roots are the nverses of the error locatons. It can be presented as follows: Λ ν υ ( ) = 1+ 1x ν x = ( 1 xα l ) x (4) l= 1 After calculatng the error locaton polynomal roots the error value correspondng to each error locaton can be easly calculated usng the Eq (5) e l ( ) ( ) l Γ ( ) ( α ) l l Λ ( α ) = α (5) The Berlekamp algorthm [14] s used to fnd the error locaton polynomal for the SIMD mplementaton of ths decoder GF Multplcaton and Addton Insde Each RC GF multplcaton s the basc operaton should be consdered n the mplementaton of RS decoder. There are two ways of mplementng GF multplcaton nsde each RC. The frst method s usng look up tables for convertng the vector representaton to the power representaton. Beng able to change the power representaton to ts vector, we can consder GF elements n the whole decodng procedure as ther vector representatons. The GF multplcaton wll be smply the addton of the two powers. After addton, Mod 255 of the result should be computed. We avod ths tme consumng computaton by storng two look up tables contnuously nstead of one for convertng the power to vector. the second look up table wll be the vector correspondng for the power of 255 to 512. Ths method works f we just have the multplcaton of two GF elements and wll convert the result to ts vector presentaton soon after that. Ths s the case n 61

4 mplementng the GF Multplcaton. Total applcaton s manly conssts of GF polynomal computaton whch s the GF addtons after each GF multplcaton. In order to make each RC the computatonal cell for the GF addton and multplcaton, we are takng advantage of the local memory nsde each RC. These memores are specally mplemented for storng look up tables, whch are extremely, used n the data streamed applcatons. The second method comes from the feasblty of mplementng the GF multpler hardware nsde each RC usng the fne gran block. Usng ths way there s no need to know about the power presentaton of GF elements and all the computaton wll go on usng the vector presentaton. Usng the GF multpler nstead of the look up tables wll speed up total applcaton, as now we are able to multply two GF elements and get the result n one clock cycle. B0-B7 B8-B15 B16-B23 B24-B31 B32-B39 B40-B47 B48-B55 B56-B63 D0 D1 D2 D3 D4 D5 D6 D7 Forney s method stage and wll be used to calculate the error values Syndrome Computaton The frst step s the Syndrome calculaton out of the receved code word. Receved code word s saved n the frame buffer and wll be transmtted to each RC.Each RC computes two Syndromes n parallel wth the other RCs of the row. Ths scheme s generalzed dependng on the number of Syndromes should be computed. Ths has been made easy snce there s one bank of the Frame buffer correspondng to each RC. So we can easly scale the Syndrome numbers transmttng the data from frame buffer s bank to each RC. S 1 S 8 1 T 1 1? r 2 == 0 S 2 S 10 β3 S 3 S 11 β3 Syndrom Computaton S 4 S 12 Chen Search β4 S 5 S 13 β5 S 64 S Berlekamp Algorthm 2 3 S 4 5 S 46 T 2 T 3 ST 12 4 T 5 ST 12 6 β6 1 S 7 S 15 7 T 7 β7 S 8 S 16 8 T 8 β8 Fgure 3. context broadcast to RC columns from the context memores of each column Forney's Method 3.3. Developng the Parallel Algorthm of the RS Decoder 1 δ 1 2 δ 2 3 δ 3 S 4 Sδ δ 5 S 46 Sδ δ 7 8 δ 8 In order to acheve the hghest performance of the decoder mplemented on the Morphosys, We should develop the ntroduced decodng algorthm for the SIMD processor. Fg 3 shows the parallelsm exploted at the data level for the RS (255, 239,16). Each row of RC has been reconfgured for decodng one block of the data. 8 blocks of data wll be decoded n parallel wth each other. Ths wll decrease the throughput to 1/8 of the total tme needed for decodng 8 blocks n parallel. Fg 4 shows the task-flow dagram of the decoder. Each RC holds one coeffcent of the error locaton polynomal, error magntude polynomal and error correcton polynomal. Each RC s the basc operatonal block for GF addton and multplcaton. In the fgure S Stands for the dfferent syndromes computed durng the syndrome computaton. and T s are representng the error locaton and error correcton polynomals coeffcents respectvely whch are computed durng the Berelekamp algorthm. In the Chen search part β stands for the GF (255) element, whch should be substtuted nto the error locaton polynomal, to fnd the root of the error locaton polynomal. δ presents a coeffcent of the error magntude polynomal whch s computed durng the Fgure 4. Task dagram of the decodng procedure n the tme doman Berlekamp Algorthm Berlekamp Algorthm s consstng of three man parts. Dscrepancy computaton, calculaton of the new error locaton polynomal and error correcton polynomal. These calculatons are bascally made of addton of two polynomals together or multplyng one of them by the GF element. Addton has been made possble by storng the coeffcents of the same order n the same RC. Addton of the polynomals wll be addng of these two coeffcents together n parallel n all RCs. In order to compute the dscrepancy we need to take advantage of the data movement, between RCs. Ths data movement s possble n the same clock cycle as the multplcaton happens usng that element from the other RC. Ths s depcted n Fg 5.Takng advantage of the data movement between RCs s lke ncreasng the regster fle of each RC to the ones from the RCs of the same row and same column. Ths s the strong tool to enhance the level of parallelsm of the algorthm. Each RC can use the results of the other computaton, as t has been stored n ts regster fle. 62

5 Chen Search Chen search can be done n parallel n the 7 RCs of the row. The frst RC wll check the result for the possble error detectons. Each RC s responsble for tryng 33 GF( 2 8 ) elements. The whole search has been done n parallel n the 7 RCs. FB wll be used as a secondary memory here to store the elements of the GF ( 2 8 ). Each bank wll keep the necessary elements and ther power, snce we need up to 8 power of each element to reduce the computaton tme. In fact memory herarchy s one of the mportant feature of the Morphosys. We can take advantage of the dfferent parameter storage n the dfferent levels of the memory Error Value computaton Error value calculaton wll be performed sequentally for the dfferent error locatons. Ths way we take advantage of parallel GF addton and multplcaton n dfferent RCs. Fg 6 shows dfferent terms calculaton of the error values nsde each RC. s the dscrepancy calculated durng the Berlekamp algorthm part and S 1 S 2 s are the error locaton coeffcents. S 3 S S 7 s 6 S 5 S 4 S 3 S 2 1 S CONCLUSION AND COMPARISON MorphoSys as a reconfgurable archtecture provdes a very flexble platform for mplementng dfferent DSP applcatons. Ths s perfectly demonstrated through the RS decoder mplementaton wth dfferent error correctng length. Beng able to reconfgure each RC for the dfferent parts of the wreless communcaton recever wll provde the best functonalty of each RC for the whole applcaton. TI DSPs have the same approach to perform GF multplcaton. In comparson Morphosys takes advantage of the hgher data parallelsm obtaned through the SIMD characterstcs of t. DSP C6400 provdes the hardware support for performng the Galos Feld multples. In the absence of hardware to effectvely perform Galos feld math, prevous DSP mplementatons made use of logarthms to perform multplcaton. Ths has decreased performance to ¼ of the present one wth hardware GF multplcaton. Many dfferent Reed-Solomon ASIC Archtectures are proposed n the lterature [12-13]. These blocks are mostly specfed and optmzed for the RS decoder. They can be consdered as a sngle chp RS decoder. In comparson Morphosys can be vewed as the programmable archtecture optmzed for the whole recever Morphosys outperforms Ascs, whch have the smaller sze. We have smulated the result on the Morphosys hardware smulator called Mulate. The result s gven for the RS decoder of (255,239,16) usng two dfferent methods of GF multplcaton. The decodng speed s 1.319Gbps and 2.534Gbps, respectvely. Fg 7 shows the bt rate of the decoder and ts comparson wth the dfferent Ascs and Also TI64x 2.534G bps S 5 s 6 S 7 824M 517M 320M 1.319G Fgure 5. Data movement between RCs n order to calculate dfferent terms of the Dscrepancy n the dfferent RC Coc5128 Amphon TI-C64x Xlnx Decoder Morphosys1 Morphosys2 Fgure7. Bt rate comparson of two decoders mplemented on Morphosys wth the other Ascs and TIC64x k 6 7 k 8 Fgure 6. Dfferent terms calculaton of the error values 63

6 References: [1] H. Sngh, Lee, Lu, Bagherzadeh, Kurdah, MorphoSys: An Integrated Reconfgurable System for Data-Parallel and Computaton Intensve Applcatons, IEEE Transactons on Computers, vol. 49, No. 5, pp , May [2] Lee, Sngh, Lu, Bagherzadeh, Kurdah, Desgn and Implementaton of the MorphoSys Reconfgurable Computng Processor, Journal of VLSI Sgnal Processng Systems for Sgnal, Image, and Vdeo Technology, vol. 24, No. 2-3, Kluwer Academc Publshers, pp , Mar [3] Pan, Kamalzad, Kooh, Bagherzadeh, Desgn and Analyss of a Programmable Sngle-Chp Archtecture for DVB-T Base-Band Recever, To Appear n DATE [4] Agarwal, Hrshkesh, Keckler, Burger, Clock rate versus IPC: the end of the road for conventonal mcroarchtectures, Computer Archtecture, Proceedngs of the 27th Internatonal Symposum on, 2000 Page(s): [5] Schlansker, Rau, EPIC: Explctly Parallel Instructon Computng, Computer, Volume: 33 Issue: 2, Feb 2000 Page(s): [6] Taylor, et.al. The Raw mcroprocessor: a computatonal fabrc for software crcuts and general-purpose programs, Mcro, IEEE, Volume: 22 Issue: 2, Mar/Apr 2002, Page(s): [7] Rxner, et.al. A bandwdth-effcent archtecture for meda processng, Mcroarchtecture, MICRO-31. Proceedngs. 31st Annual ACM/IEEE Internatonal Symposum on, 30 Nov-2 Dec 1998 [8] Kozyraks, Patterson, Vector vs. superscalar and VLIW archtectures for embedded multmeda benchmarks, Mcroarchtecture, (MICRO-35). Proceedngs. 35th Annual IEEE/ACM Internatonal Symposum on, 2002 Page(s): [9] Taylor, Lee, Amarasnghe, Agarwal, Scalar Operand Networks: On-chp Interconnect for ILP n Parttoned Archtectures, HPCA 2003, February [10] Venkataraman, Najjar, Kurdah, Bagherzadeh, Bohm, A Compler Framework for Mappng Applcatons to a Coarsegraned Reconfgurable Computer Archtecture, CASES 2001, Atlanta, GA, November [11] Maestre, Kurdah, Bagherzadeh, Sngh, Hermda, Fernandez, Kernel schedulng n reconfgurable computng, DATE [12] Implementaton of hgh speed Reed-Solomon decoder. [Conference Paper] 42nd Mdwest Symposum on Crcuts and Systems (Cat. No.99CH36356). IEEE. Part vol. 2, 2000, pp vol. 2. Pscataway, NJ, USA. [13] Martna M, Masera G, Pccnn G, Vacca F, Zambon M. VLSI Reed Solomon decoder archtecture for networked multmeda applcatons. [Conference Paper] Proceedngs 14th Annual IEEE Internatonal ASIC/SOC Conference (IEEE Cat. No.01TH8558). IEEE. 2001, pp Pscataway, NJ,USASystems II-Analog & Dgtal Sgnal Processng, vol.47, no.11, Nov. 2000, pp Publsher: IEEE, USA. [14] Shu LIN/ Danel J Costello Error Control Codng Fundamentals and applcatons [15] [16] [17] 64

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Simulation Based Analysis of FAST TCP using OMNET++

Simulation Based Analysis of FAST TCP using OMNET++ Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Configuration Management in Multi-Context Reconfigurable Systems for Simultaneous Performance and Power Optimizations*

Configuration Management in Multi-Context Reconfigurable Systems for Simultaneous Performance and Power Optimizations* Confguraton Management n Mult-Context Reconfgurable Systems for Smultaneous Performance and Power Optmzatons* Rafael Maestre, Mlagros Fernandez Departamento de Arqutectura de Computadores y Automátca Unversdad

More information

Outline. Digital Systems. C.2: Gates, Truth Tables and Logic Equations. Truth Tables. Logic Gates 9/8/2011

Outline. Digital Systems. C.2: Gates, Truth Tables and Logic Equations. Truth Tables. Logic Gates 9/8/2011 9/8/2 2 Outlne Appendx C: The Bascs of Logc Desgn TDT4255 Computer Desgn Case Study: TDT4255 Communcaton Module Lecture 2 Magnus Jahre 3 4 Dgtal Systems C.2: Gates, Truth Tables and Logc Equatons All sgnals

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Video Proxy System for a Large-scale VOD System (DINA)

Video Proxy System for a Large-scale VOD System (DINA) Vdeo Proxy System for a Large-scale VOD System (DINA) KWUN-CHUNG CHAN #, KWOK-WAI CHEUNG *# #Department of Informaton Engneerng *Centre of Innovaton and Technology The Chnese Unversty of Hong Kong SHATIN,

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

Floating-Point Division Algorithms for an x86 Microprocessor with a Rectangular Multiplier

Floating-Point Division Algorithms for an x86 Microprocessor with a Rectangular Multiplier Floatng-Pont Dvson Algorthms for an x86 Mcroprocessor wth a Rectangular Multpler Mchael J. Schulte Dmtr Tan Carl E. Lemonds Unversty of Wsconsn Advanced Mcro Devces Advanced Mcro Devces Schulte@engr.wsc.edu

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Mallathahally, Bangalore, India 1 2

Mallathahally, Bangalore, India 1 2 7 IMPLEMENTATION OF HIGH PERFORMANCE BINARY SQUARER PRADEEP M C, RAMESH S, Department of Electroncs and Communcaton Engneerng, Dr. Ambedkar Insttute of Technology, Mallathahally, Bangalore, Inda pradeepmc@gmal.com,

More information

RADIX-10 PARALLEL DECIMAL MULTIPLIER

RADIX-10 PARALLEL DECIMAL MULTIPLIER RADIX-10 PARALLEL DECIMAL MULTIPLIER 1 MRUNALINI E. INGLE & 2 TEJASWINI PANSE 1&2 Electroncs Engneerng, Yeshwantrao Chavan College of Engneerng, Nagpur, Inda E-mal : mrunalngle@gmal.com, tejaswn.deshmukh@gmal.com

More information

Convolutional interleaver for unequal error protection of turbo codes

Convolutional interleaver for unequal error protection of turbo codes Convolutonal nterleaver for unequal error protecton of turbo codes Sna Vaf, Tadeusz Wysock, Ian Burnett Unversty of Wollongong, SW 2522, Australa E-mal:{sv39,wysock,an_burnett}@uow.edu.au Abstract: Ths

More information

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to:

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to: 4.1 4.2 Motvaton EE 457 Unt 4 Computer System Performance An ndvdual user wants to: Mnmze sngle program executon tme A datacenter owner wants to: Maxmze number of Mnmze ( ) http://e-tellgentnternetmarketng.com/webste/frustrated-computer-user-2/

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

Efficient Distributed File System (EDFS)

Efficient Distributed File System (EDFS) Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate

More information

Analysis of Collaborative Distributed Admission Control in x Networks

Analysis of Collaborative Distributed Admission Control in x Networks 1 Analyss of Collaboratve Dstrbuted Admsson Control n 82.11x Networks Thnh Nguyen, Member, IEEE, Ken Nguyen, Member, IEEE, Lnha He, Member, IEEE, Abstract Wth the recent surge of wreless home networks,

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Memory Modeling in ESL-RTL Equivalence Checking

Memory Modeling in ESL-RTL Equivalence Checking 11.4 Memory Modelng n ESL-RTL Equvalence Checkng Alfred Koelbl 2025 NW Cornelus Pass Rd. Hllsboro, OR 97124 koelbl@synopsys.com Jerry R. Burch 2025 NW Cornelus Pass Rd. Hllsboro, OR 97124 burch@synopsys.com

More information

Parallel Inverse Halftoning by Look-Up Table (LUT) Partitioning

Parallel Inverse Halftoning by Look-Up Table (LUT) Partitioning Parallel Inverse Halftonng by Look-Up Table (LUT) Parttonng Umar F. Sddq and Sadq M. Sat umar@ccse.kfupm.edu.sa, sadq@kfupm.edu.sa KFUPM Box: Department of Computer Engneerng, Kng Fahd Unversty of Petroleum

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Array transposition in CUDA shared memory

Array transposition in CUDA shared memory Array transposton n CUDA shared memory Mke Gles February 19, 2014 Abstract Ths short note s nspred by some code wrtten by Jeremy Appleyard for the transposton of data through shared memory. I had some

More information

Storage Binding in RTL synthesis

Storage Binding in RTL synthesis Storage Bndng n RTL synthess Pe Zhang Danel D. Gajsk Techncal Report ICS-0-37 August 0th, 200 Center for Embedded Computer Systems Department of Informaton and Computer Scence Unersty of Calforna, Irne

More information

Efficient Content Distribution in Wireless P2P Networks

Efficient Content Distribution in Wireless P2P Networks Effcent Content Dstrbuton n Wreless P2P Networs Qong Sun, Vctor O. K. L, and Ka-Cheong Leung Department of Electrcal and Electronc Engneerng The Unversty of Hong Kong Pofulam Road, Hong Kong, Chna {oansun,

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

ELEC 377 Operating Systems. Week 6 Class 3

ELEC 377 Operating Systems. Week 6 Class 3 ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

Evaluation of an Enhanced Scheme for High-level Nested Network Mobility

Evaluation of an Enhanced Scheme for High-level Nested Network Mobility IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.15 No.10, October 2015 1 Evaluaton of an Enhanced Scheme for Hgh-level Nested Network Moblty Mohammed Babker Al Mohammed, Asha Hassan.

More information

FPGA Based Fixed Width 4 4, 6 6, 8 8 and Bit Multipliers using Spartan-3AN

FPGA Based Fixed Width 4 4, 6 6, 8 8 and Bit Multipliers using Spartan-3AN IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.11 No.2, February 211 61 FPGA Based Fxed Wdth 4 4, 6 6, 8 8 and 12 12-Bt Multplers usng Spartan-3AN Muhammad H. Ras and Mohamed H.

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

Enhanced AMBTC for Image Compression using Block Classification and Interpolation

Enhanced AMBTC for Image Compression using Block Classification and Interpolation Internatonal Journal of Computer Applcatons (0975 8887) Volume 5 No.0, August 0 Enhanced AMBTC for Image Compresson usng Block Classfcaton and Interpolaton S. Vmala Dept. of Comp. Scence Mother Teresa

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Conditional Speculative Decimal Addition*

Conditional Speculative Decimal Addition* Condtonal Speculatve Decmal Addton Alvaro Vazquez and Elsardo Antelo Dep. of Electronc and Computer Engneerng Unv. of Santago de Compostela, Span Ths work was supported n part by Xunta de Galca under grant

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

Efficient Broadcast Disks Program Construction in Asymmetric Communication Environments

Efficient Broadcast Disks Program Construction in Asymmetric Communication Environments Effcent Broadcast Dsks Program Constructon n Asymmetrc Communcaton Envronments Eleftheros Takas, Stefanos Ougaroglou, Petros copoltds Department of Informatcs, Arstotle Unversty of Thessalonk Box 888,

More information

AADL : about scheduling analysis

AADL : about scheduling analysis AADL : about schedulng analyss Schedulng analyss, what s t? Embedded real-tme crtcal systems have temporal constrants to meet (e.g. deadlne). Many systems are bult wth operatng systems provdng multtaskng

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

THE low-density parity-check (LDPC) code is getting

THE low-density parity-check (LDPC) code is getting Implementng the NASA Deep Space LDPC Codes for Defense Applcatons Wley H. Zhao, Jeffrey P. Long 1 Abstract Selected codes from, and extended from, the NASA s deep space low-densty party-check (LDPC) codes

More information

Area Efficient Self Timed Adders For Low Power Applications in VLSI

Area Efficient Self Timed Adders For Low Power Applications in VLSI ISSN(Onlne): 2319-8753 ISSN (Prnt) :2347-6710 Internatonal Journal of Innovatve Research n Scence, Engneerng and Technology (An ISO 3297: 2007 Certfed Organzaton) Area Effcent Self Tmed Adders For Low

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Hybrid Non-Blind Color Image Watermarking

Hybrid Non-Blind Color Image Watermarking Hybrd Non-Blnd Color Image Watermarkng Ms C.N.Sujatha 1, Dr. P. Satyanarayana 2 1 Assocate Professor, Dept. of ECE, SNIST, Yamnampet, Ghatkesar Hyderabad-501301, Telangana 2 Professor, Dept. of ECE, AITS,

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Real-Time Guarantees. Traffic Characteristics. Flow Control

Real-Time Guarantees. Traffic Characteristics. Flow Control Real-Tme Guarantees Requrements on RT communcaton protocols: delay (response s) small jtter small throughput hgh error detecton at recever (and sender) small error detecton latency no thrashng under peak

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

High level vs Low Level. What is a Computer Program? What does gcc do for you? Program = Instructions + Data. Basic Computer Organization

High level vs Low Level. What is a Computer Program? What does gcc do for you? Program = Instructions + Data. Basic Computer Organization What s a Computer Program? Descrpton of algorthms and data structures to acheve a specfc ojectve Could e done n any language, even a natural language lke Englsh Programmng language: A Standard notaton

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Load-Balanced Anycast Routing

Load-Balanced Anycast Routing Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance

More information

EFFICIENT SYNCHRONOUS PARALLEL DISCRETE EVENT SIMULATION

EFFICIENT SYNCHRONOUS PARALLEL DISCRETE EVENT SIMULATION EFFICIENT SYNCHRONOUS PARALLEL DISCRETE EVENT SIMULATION WITH THE ARMEN ARCHITECTURE C. Beaumont, B. Potter J.M. Flloque LIBr I.U.T. de Brest and LIBr Unversté de Bretagne Occdentale Télécom Bretagne BP

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT

DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT Bran J. Wolf, Joseph L. Hammond, and Harlan B. Russell Dept. of Electrcal and Computer Engneerng, Clemson Unversty,

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

Using Delayed Addition Techniques to Accelerate Integer and Floating-Point Calculations in Configurable Hardware

Using Delayed Addition Techniques to Accelerate Integer and Floating-Point Calculations in Configurable Hardware Draft submtted for publcaton. Please do not dstrbute Usng Delayed Addton echnques to Accelerate Integer and Floatng-Pont Calculatons n Confgurable Hardware Zhen Luo, Nonmember and Margaret Martonos, Member,

More information

CACHE MEMORY DESIGN FOR INTERNET PROCESSORS

CACHE MEMORY DESIGN FOR INTERNET PROCESSORS CACHE MEMORY DESIGN FOR INTERNET PROCESSORS WE EVALUATE A SERIES OF THREE PROGRESSIVELY MORE AGGRESSIVE ROUTING-TABLE CACHE DESIGNS AND DEMONSTRATE THAT THE INCORPORATION OF HARDWARE CACHES INTO INTERNET

More information

Vectorization in the Polyhedral Model

Vectorization in the Polyhedral Model Vectorzaton n the Polyhedral Model Lous-Noël Pouchet pouchet@cse.oho-state.edu Dept. of Computer Scence and Engneerng, the Oho State Unversty October 200 888. Introducton: Overvew Vectorzaton: Detecton

More information

Explicit Formulas and Efficient Algorithm for Moment Computation of Coupled RC Trees with Lumped and Distributed Elements

Explicit Formulas and Efficient Algorithm for Moment Computation of Coupled RC Trees with Lumped and Distributed Elements Explct Formulas and Effcent Algorthm for Moment Computaton of Coupled RC Trees wth Lumped and Dstrbuted Elements Qngan Yu and Ernest S.Kuh Electroncs Research Lab. Unv. of Calforna at Berkeley Berkeley

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Adaptive Network Resource Management in IEEE Wireless Random Access MAC

Adaptive Network Resource Management in IEEE Wireless Random Access MAC Adaptve Network Resource Management n IEEE 802.11 Wreless Random Access MAC Hao Wang, Changcheng Huang, James Yan Department of Systems and Computer Engneerng Carleton Unversty, Ottawa, ON, Canada Abstract

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

Constructing Minimum Connected Dominating Set: Algorithmic approach

Constructing Minimum Connected Dominating Set: Algorithmic approach Constructng Mnmum Connected Domnatng Set: Algorthmc approach G.N. Puroht and Usha Sharma Centre for Mathematcal Scences, Banasthal Unversty, Rajasthan 304022 usha.sharma94@yahoo.com Abstract: Connected

More information

Evaluation of Parallel Processing Systems through Queuing Model

Evaluation of Parallel Processing Systems through Queuing Model ISSN 2278-309 Vkas Shnde, Internatonal Journal of Advanced Volume Trends 4, n Computer No.2, March Scence - and Aprl Engneerng, 205 4(2), March - Aprl 205, 36-43 Internatonal Journal of Advanced Trends

More information

Improving Low Density Parity Check Codes Over the Erasure Channel. The Nelder Mead Downhill Simplex Method. Scott Stransky

Improving Low Density Parity Check Codes Over the Erasure Channel. The Nelder Mead Downhill Simplex Method. Scott Stransky Improvng Low Densty Party Check Codes Over the Erasure Channel The Nelder Mead Downhll Smplex Method Scott Stransky Programmng n conjuncton wth: Bors Cukalovc 18.413 Fnal Project Sprng 2004 Page 1 Abstract

More information

Loop Transformations, Dependences, and Parallelization

Loop Transformations, Dependences, and Parallelization Loop Transformatons, Dependences, and Parallelzaton Announcements Mdterm s Frday from 3-4:15 n ths room Today Semester long project Data dependence recap Parallelsm and storage tradeoff Scalar expanson

More information

Overview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION

Overview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION Overvew 2 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION Introducton Mult- Smulator MASIM Theoretcal Work and Smulaton Results Concluson Jay Wagenpfel, Adran Trachte Motvaton and Tasks Basc Setup

More information

CHAPTER 4 PARALLEL PREFIX ADDER

CHAPTER 4 PARALLEL PREFIX ADDER 93 CHAPTER 4 PARALLEL PREFIX ADDER 4.1 INTRODUCTION VLSI Integer adders fnd applcatons n Arthmetc and Logc Unts (ALUs), mcroprocessors and memory addressng unts. Speed of the adder often decdes the mnmum

More information

A RECONFIGURABLE ARCHITECTURE FOR MULTI-GIGABIT SPEED CONTENT-BASED ROUTING. James Moscola, Young H. Cho, John W. Lockwood

A RECONFIGURABLE ARCHITECTURE FOR MULTI-GIGABIT SPEED CONTENT-BASED ROUTING. James Moscola, Young H. Cho, John W. Lockwood A RECONFIGURABLE ARCHITECTURE FOR MULTI-GIGABIT SPEED CONTENT-BASED ROUTING James Moscola, Young H. Cho, John W. Lockwood Dept. of Computer Scence and Engneerng Washngton Unversty, St. Lous, MO {jmm5,

More information

Advanced Computer Networks

Advanced Computer Networks Char of Network Archtectures and Servces Department of Informatcs Techncal Unversty of Munch Note: Durng the attendance check a stcker contanng a unque QR code wll be put on ths exam. Ths QR code contans

More information

Chapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward

More information

A SCALABLE DIGITAL ARCHITECTURE OF A KOHONEN NEURAL NETWORK

A SCALABLE DIGITAL ARCHITECTURE OF A KOHONEN NEURAL NETWORK A SCALABLE DIGITAL ARCHITECTURE OF A KOHONEN NEURAL NETWORK Andrés E. Valenca, Jorge A. Peña and Maurco Vanegas. Unversdad Pontfca Bolvarana, Medellín, Colomba andrez_valenca@yahoo.com, jorge.pena@alar.ch,

More information

On Reconfiguration-Oriented Approximate Adder Design and Its Application

On Reconfiguration-Oriented Approximate Adder Design and Its Application On Reconfguraton-Orented Approxmate Adder Desgn and Its Applcaton Rong Ye, Tng Wang, Feng Yuan, Rakesh Kumar and Qang Xu Chk RElable Computng Laboratory (CRE) Department of Computer Scence & Engneerng

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Newton-Raphson division module via truncated multipliers

Newton-Raphson division module via truncated multipliers Newton-Raphson dvson module va truncated multplers Alexandar Tzakov Department of Electrcal and Computer Engneerng Illnos Insttute of Technology Chcago,IL 60616, USA Abstract Reducton n area and power

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

Assembler. Building a Modern Computer From First Principles.

Assembler. Building a Modern Computer From First Principles. Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought

More information

2 optmal per-pxel estmate () whch we had proposed for non-scalable vdeo codng [5] [6]. The extended s shown to accurately account for both temporal an

2 optmal per-pxel estmate () whch we had proposed for non-scalable vdeo codng [5] [6]. The extended s shown to accurately account for both temporal an Scalable Vdeo Codng wth Robust Mode Selecton Ru Zhang, Shankar L. Regunathan and Kenneth Rose Department of Electrcal and Computer Engneerng Unversty of Calforna Santa Barbara, CA 906 Abstract We propose

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introducton 1.1 Parallel Processng There s a contnual demand for greater computatonal speed from a computer system than s currently possble (.e. sequental systems). Areas need great computatonal

More information

Dynamic Bandwidth Provisioning with Fairness and Revenue Considerations for Broadband Wireless Communication

Dynamic Bandwidth Provisioning with Fairness and Revenue Considerations for Broadband Wireless Communication Ths full text paper was peer revewed at the drecton of IEEE Communcatons Socety subject matter experts for publcaton n the ICC 008 proceedngs. Dynamc Bandwdth Provsonng wth Farness and Revenue Consderatons

More information

OPTIMAL CONFIGURATION FOR NODES IN MIXED CELLULAR AND MOBILE AD HOC NETWORK FOR INET

OPTIMAL CONFIGURATION FOR NODES IN MIXED CELLULAR AND MOBILE AD HOC NETWORK FOR INET OPTIMAL CONFIGURATION FOR NODE IN MIED CELLULAR AND MOBILE AD HOC NETWORK FOR INET Olusola Babalola D.E. Department of Electrcal and Computer Engneerng Morgan tate Unversty Dr. Rchard Dean Faculty Advsor

More information

[33]. As we have seen there are different algorithms for compressing the speech. The

[33]. As we have seen there are different algorithms for compressing the speech. The 49 5. LD-CELP SPEECH CODER 5.1 INTRODUCTION Speech compresson s one of the mportant doman n dgtal communcaton [33]. As we have seen there are dfferent algorthms for compressng the speech. The mportant

More information

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) ,

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) , VRT012 User s gude V0.1 Thank you for purchasng our product. We hope ths user-frendly devce wll be helpful n realsng your deas and brngng comfort to your lfe. Please take few mnutes to read ths manual

More information

A New Memory Reduced Radix-4 CORDIC Processor For FFT Operation

A New Memory Reduced Radix-4 CORDIC Processor For FFT Operation IOSR Journal of VLSI and Sgnal Processng (IOSR-JVSP) Volume, Issue 5 (May. Jun. 013), PP 09-16 e-issn: 319 400, p-issn No. : 319 4197 www.osrjournals.org A New Memory Reduced Radx-4 CORDIC Processor For

More information

Maintaining temporal validity of real-time data on non-continuously executing resources

Maintaining temporal validity of real-time data on non-continuously executing resources Mantanng temporal valdty of real-tme data on non-contnuously executng resources Tan Ba, Hong Lu and Juan Yang Hunan Insttute of Scence and Technology, College of Computer Scence, 44, Yueyang, Chna Wuhan

More information

FPGA Implementation of CORDIC Algorithms for Sine and Cosine Generator

FPGA Implementation of CORDIC Algorithms for Sine and Cosine Generator The 5th Internatonal Conference on Electrcal Engneerng and Informatcs 25 August -, 25, Bal, Indonesa FPGA Implementaton of CORDIC Algorthms for Sne and Cosne Generator Antonus P. Renardy, Nur Ahmad, Ashbr

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

Synchronous Distributed Wireless Network Emulator for High-Speed Mobility: Implementation and Evaluation

Synchronous Distributed Wireless Network Emulator for High-Speed Mobility: Implementation and Evaluation Synchronous Dstrbuted Wreless Network Emulator for Hgh-Speed Moblty: Implementaton and Evaluaton Mnoru Kozum, Tomoch Ebata Yokohama Research Laboratory, Htach, Ltd., 292 Yoshda-cho, Totsuka-ku, Yokohama,

More information