Toward Realtime Side Information Decoding On Multi-Core Processors

Size: px
Start display at page:

Download "Toward Realtime Side Information Decoding On Multi-Core Processors"

Transcription

1 MITSUBISHI ELECTRIC RESEARCH LABORATORIES Toward Realtime Side Iformatio Decodig O Multi-Core Processors Svetislav Momcilovic, Yige Wag, Shatau Rae, Athoy Vetro TR21-1 December 21 Abstract Most distributed source codig schemes ivolve the applicatio of a chael code to the sigal ad trasmissio of the resultig sydromes. For low complexity ecodig with superior compressio performace, graph-based chael codes such as LDPC codes are used to geerate the sydromes. The ecoder performs simple XOR operatios, while the decoder uses belief propagatio (BP) decodig to recover the sigal of iterest usig the sydromes ad some correlated side iformatio. We cosider parallelizatio of BP decodig o geeral-purpose multi core CPUs. The motivatio is to make BP decodig fast eough for realtime applicatios. We cosider three differet BP decodig algorithms: Sum-Product BP, Mi-Sum BP ad Algorithm E. The speedup obtaied by parallelizig these algorithms is examied alog with the tradeoff agaist decodig performace Parallelizatio is achieved by dividig the received sydrome vectors amog differet cores, ad by usig vector operatios to simultaeously process multiple check odes i each core. While Mi-Sum BP has itermediate decodig complexity, a vectorized versio of Mi-Sum BP performs early as fast as the much simpler Algorithm E with sigificatly fewer decodig errors. Our experimets idicates that, for the best compromise betwee speed ad performace, the decoder should use Mi-Sum BP whe the side iformatio is of good quality ad Sum-Product BP otherwise. Multimedia Sigal Processig Workshop This work may ot be copied or reproduced i whole or i part for ay commercial purpose. Permissio to copy i whole or i part without paymet of fee is grated for oprofit educatioal ad research purposes provided that all such whole or partial copies iclude the followig: a otice that such copyig is by permissio of Mitsubishi Electric Research Laboratories, Ic.; a ackowledgmet of the authors ad idividual cotributios to the work; ad all applicable portios of the copyright otice. Copyig, reproductio, or republishig for ay other purpose shall require a licese with paymet of fee to Mitsubishi Electric Research Laboratories, Ic. All rights reserved. Copyright c Mitsubishi Electric Research Laboratories, Ic., Broadway, Cambridge, Massachusetts 2139

2 MERLCoverPageSide2

3 Toward Realtime Side Iformatio Decodig o Multi-core Processors Svetislav Momcilovic, Yige Wag, Shatau Rae, Athoy Vetro Mitsubishi Electric Research Laboratories (MERL), Cambridge, MA 214, USA. Abstract Most distributed source codig schemes ivolve the applicatio of a chael code to the sigal ad trasmissio of the resultig sydromes. For low-complexity ecodig with superior compressio performace, graph-based chael codes such as LDPC codes are used to geerate the sydromes. The ecoder performs simple XOR operatios, while the decoder uses belief propagatio (BP) decodig to recover the sigal of iterest usig the sydromes ad some correlated side iformatio. We cosider parallelizatio of BP decodig o geeral-purpose multicore CPUs. The motivatio is to make BP decodig fast eough for realtime applicatios. We cosider three differet BP decodig algorithms: Sum-Product BP, Mi-Sum BP ad Algorithm E. The speedup obtaied by parallelizig these algorithms is examied alog with the tradeoff agaist decodig performace. Parallelizatio is achieved by dividig the received sydrome vectors amog differet cores, ad by usig vector operatios to simultaeously process multiple check odes i each core. While Mi-Sum BP has itermediate decodig complexity, a vectorized versio of Mi-Sum BP performs early as fast as the much simpler Algorithm E with sigificatly fewer decodig errors. Our experimets idicate that, for the best compromise betwee speed ad performace, the decoder should use Mi- Sum BP whe the side iformatio is of good quality ad Sum- Product BP otherwise. I. INTRODUCTION Distributed source codig is a attractive optio for sesor etworks ad surveillace systems i which image or video is acquired ad compressed usig low-cost hardware. This method of compressio ivolves ecodig the acquired sigal coditioed o some statistically correlated side iformatio at the decoder. For example, i the case of video sigals, the side iformatio for the curret video frame ca be furished by a motio compesated versio of the previous decoded frame. Distributed source codig draws o iformatio-theoretic results o lossless codig of correlated sources [1] ad ratedistortio tradeoffs for ecodig of correlated sources [2]. Recet years have see a revival i distributed source codig, i particular distributed video codig [3] which has exploited graph-based chael codes. Nearly all implemetatios of distributed compressio systems ivolve appropriately quatizig the sigal of iterest, ad the extractig parity or sydrome symbols from it by applyig a chael code. The sydromes costitute the compressed bit stream which is trasmitted to the decodig statio where, they are combied with the side iformatio ad fed to a chael decoder. The chael decoder essetially treats S. Momcilovic is with INESC-ID TU Lisbo, Portugal. This work was carried out whe he was a iter at MERL /1/$26. c 21 IEEE the side iformatio as a error proe versio of the sigal of iterest ad uses the received sydromes to correct the errors, thereby recoverig the sigal of iterest. This process, ecompassig the side iformatio decodig as well as the distortio itroduced by quatizatio, is referred to as Wyer- Ziv codig. I theory, ay chael code ca be used i this way. I practice, however, graph-based chael codes such as Low-desity Parity Check (LDPC) codes or Turbo Codes are preferred over hard-decisio algebraic schemes such as Reed- Solomo codes or BCH codes owig to very low ecodig complexity, ad availability of soft-decisio decodig algorithms that achieve better chael codig performace. LDPC ecodig, for example, ivolves simple XOR operatios, while LDPC decodig ivolves ruig a Belief Propagatio (BP) algorithm to recover the sigal of iterest. A example of a distributed video codig system usig a LDPC code is show i Fig. 1. BP decodig is much more complex tha the operatios at the ecoder. I this paper, we cosider BP decodig o geeral-purpose multi-core CPUs. The motivatio is to make BP decodig fast eough for realtime decodig of time-sesitive sigals such as surveillace videos. This is a practical requiremet that has remaied relatively uexplored i the literature; the emphasis has bee o pure compressio performace. I a detailed evaluatio of the DISCOVER codec [4] o a geeral purpose dual core machie, the authors report that, to achieve high recostructio quality eve for QCIF-sized video frames, side iformatio decodig of a sigle frame required 4-8 secods depedig upo the video cotet. By exploitig parallelizatio o multi-core CPUs used o cosumer-level computers, we take a step toward real time decodig of Wyer- Ziv coded sigals. The techiques i this paper also apply to the recet parallel implemetatios of BP o Graphics Processig Uits (GPUs) based o NVIDIA s CUDA-based parallel computig architecture [5], [6], [7]. LDPC codes were iveted by Gallager i the 196s [8], but were igored because of the limited processig capabilities at the time. They were rediscovered by MacKay ad Neal [9] ad sice the, have received icreased attetio due to their ear-shao-limit error performace. For distributed source codig, a class of rate-adaptive codes called LDPC Accumulate (LDPCA) codes [1] have become very popular. A LDPCA code is a LDPC code cocateated with a accumulator. The accumulator allows sydromes to be trasmitted icremetally util decodig succeeds. For ay set of accumulated sydromes, the decodig procedure is the same as that of a covetioal LDPC code. I this paper, we focus o

4 DCT, Quat & get bitplaes Iput Frame... bitplaes Low-complexity ecoder sydromes sydromes sydromes DCT, Quat & get bitplaes side ifo Combie bitplaes, get IDCT & recostruct Motio compesatio Decoded Frame Previous decoded frames Side iformatio decoder Fig. 1. The mai compoets of a distributed video codig system. Decodig of oly oe bitplae is show. the LDPC decodig, keepig i mid that LDPCA decodig would eed repeated ivocatios of LDPC decodig. The remaider of the paper is orgaized as follows. Sectio II describes the three LDPC decodig algorithms evaluated i this paper. I particular, the calculatios performed at the check odes ad variable odes are described. Sectio III describes how parallelizatio is achieved by dividig decodig tasks amog multiple processor cores ad by icorporatig vector istructios withi each core. I Sectio IV, the speedup obtaied via parallelizatio of the three BP decodig algorithms is discussed alog with the tradeoff i decodig performace. II. LDPC DECODING ALGORITHMS A (N,K) LDPC code is defied as the ull space of asparseparitycheckmatrixh M N,whereN is the code legth, K is the code dimesio, ad M N K. The rate of the LDPC code, R = K/N. ALDPCcodecaalso be represeted by a bipartite Taer graph with two types of odes: variable odes ad check odes. Each row i H correspods to a check ode ad each colum correspods to avariableode;thei th check ode is coected to the j th variable ode if ad oly if H(i, j) =1. Assume that the vectors beig ecoded are biary. For image ad video applicatios, o-biary vectors costructed from blocks of pixels are coverted ito biary ad the idividual bitplaes are provided as iputs to the LDPC ecoder, as show i Fig. 1. Ecodig cosists of calculatig the sydromes accordig to s = Hc where c is the iput biary sequece or bitplae ad s is the sydrome vector. If R < 1, the sydrome vector s represets a compressed ecoded versio of the iput vector c, ad is trasmitted to the decoder. To iitialize side iformatio decodig, the variable odes i the LDPC code graph are populated with a hypothesis about the bits to be recovered. This hypothesis is obtaied usig a side iformatio vector v which is correlated to the vector c which is to be recovered. I the simplest case, a startig hypothesis for c is the vector v itself. I geeral, the startig hypothesis expresses the likelihood that the bit v i i the i th variable ode has value or 1. The check odes are associated with the bits from the received sydrome vector s. Each check ode specifies a costrait equatio satisfied by all the variable odes coected to it. I BP decodig, messages are passed betwee the variable odes ad check odes with the aim of eforcig these costraits. The messages propagate the beliefs at a give ode to the other odes coected to it. After a few iteratios of message passig, the variable odes should satisfy the costraits imposed by the check bits, i which case the decodig is deemed to be successful. BP decodig ca be realized either via a fully parallel floodig-type schedulig, or a fully serial shuffled-type schedulig [11], or a partial parallel group-shuffled approach [11]. Here, we focus o the fully parallel scheme i which the flow of operatios is as idicated i Fig. 2. We cosider 3decodigalgorithms:Sum-ProductBP,Mi-SumBPad Algorithm E. The decodig operatios for each of these three algorithms are detailed below. As a setup step, usig biary phase shift keyig (BPSK), the sequece to be ecoded, i.e., c is mapped ito x accordig to x =1 2c.Thedestiatio observes a side iformatio sequece y where y =1 2v { 1, 1}. Deotethesetofvariableodescoectedtocheck ode j by N (j) ={k : H jk =1} ad the set of check odes coected to variable ode k as M(k) ={j : H jk =1}. Deote usig N (j) k the set N (j) with variable ode k excluded, ad M(k) j the set M(k) with check ode j excluded. The followig otatio is used for the i th iteratio of message passig: m: messagefromcheckodem to variable ode v m: (i) messagefromvariableode to check ode m v (i) :beliefofvariableode u :messagefromtheside-iformatiochaelforvariable ode A. Sum-Product BP Decodig Operatios 1) Iitializatio: Set i = 1, ad the maximum umber of iteratios to I MAX. For 1 N, set u = P (x=1 y) l P (x = 1 y ).Foreachm,, setv() m = u. 2) Iterative decodig: (a) Perform check ode calculatios, i.e., for 1 m M ad each N(m), m =2tah 1 ( ) v (i 1) m tah (1) 2 N (m) For details about the derivatio of the above formula, the reader is referred to [12]. (b) Perform variable ode calculatios, i.e., for 1 N ad each m M(), v (i) m = u + m M() m m (2) 3) Hard decisio ad stoppig criterio test: Set v (i) = u + m M() u(i) m. Createĉ (i) =[ˆc (i) ] such that ˆc (i) otherwise. = 1 if v (i) <, ad ˆc (i) = If Hĉ (i) = s or if the umber of iteratios has reached I MAX, stop decodig ad go to Step 4. Otherwise set i := i +1 ad go to Step 2. 4) Output ĉ (i) as the decoded codeword.

5 t = t + 1 y Start t=1 Get t th side ifo ad sydrome vectors To parallelize, divide sydrome vectors amog cores iteratio couter i = 1 i = i + 1 check ode operatios variable ode operatios To parallelize, use vector istructios withi a core check equatios satisfied? maximum iteratios doe? y all vectors decoded? y Stop Fig. 2. A flow diagram cotaiig a high-level summary of the sequece of operatios performed i BP decodig. B. Mi-Sum BP Decodig Operatios The Mi-Sum algorithm [13] is a simplified versio of Sum- Product BP. All decodig steps are the same as those i Sum- Product BP except the check ode update i (1), which is ow approximated by m = N (m) sg(v (i 1) m ) mi N (m) v(i 1) m. (3) As decodig primarily ivolves additios ad comparisos, Mi-Sum BP is less complex tha Sum-Product BP. C. Algorithm E Decodig Operatios Algorithm E was proposed ad aalyzed i [14], [15]. It quatizes all the messages i Sum-Product BP to 1,, or +1 ad ca be carried out as follows: 1) Iitializatio: Set i =1ad the maximum umber of iteratio to I MAX.Foreachm,, setv m () = y. 2) Iterative Decodig: (a) Perform check ode calculatios as follows: For 1 m M ad each N(m), m = N (m) v (i 1) m (b) Perform variable ode calculatios as follows: For 1 N ad each m M(), v m (i) = sg w (i) y + m M() m m where sg(x) takes values -1, or +1 for x<, x =, ad x> respectively, ad w (i) is a weight chose to optimize performace. For example, i [15], w (1) =2 ad w (i) =1for i 2 is foud to optimize the decodig performace for a regular (3, 6) LDPC code. 3) The, for the stoppig criterio test, evaluate the variable ode beliefs as v (i) = sg w (i) y + m m M() ad proceed as i the Sum-Product BP algorithm I terms of processig time per iteratio, Algorithm E is faster tha Sum-Product BP ad Mi-Sum BP owig to its simpler decodig operatios. I the high sigal-to-oise ratio (SNR) regime, where the side iformatio is accurate ad oly a few bits are estimated i error, a eve faster algorithm called Active-Set Algorithm E, or Fast Algorithm E has bee proposed [16]. The ratioale is that whe SNR is large, most messages coverge quickly. Thus, it is ot ecessary to update every variable/check ode at each iteratio. The decoder just checks whether the messages eterig a ode are differet from their values i the previous iteratio. If oe of the messages has chaged, the ode is ot updated ad overall decodig time is reduced with o loss of performace. III. IMPLEMENTATION ON MULTI-CORE CPUS A. Processor-level Parallelizatio To speed up the executio of BP decodig, two kids of parallelism are used. At the level of the processor cores, a sigle program/multiple data (SPMD) approach is used. This approach is useful i scearios where the same set of istructios is executed i multiple iteratios o differet data. As there are o data depedecies betwee the iteratios, they ca be implemeted idepedetly ad i ay order o separate processor cores. The parallelizatio of such loops requires the creatio of multiple threads ad ew thread cotexts, by meas of replicatio of the variables, e.g., couters, that will be private i each thread. All other variables are shared betwee the threads, ad may be accessed cocurretly by multiple threads. The creatio of ew thread cotexts, ad simultaeous access of the same variables by differet processor cores results i a parallelizatio overhead. Thespeedupobtaied via parallel executio eeds to be large eough to overcome the effect of parallelizatio overhead. I BP decodig, as show i Fig. 2, there are loops at three levels. The top-most loop is o the block level, where the decodig algorithm is repeated for each received M-legth block of sydromes. With the SPMD approach, this loop ca be executed i parallel by allocatig the sydrome blocks to the multiple cores, as there are o data depedecies betwee blocks. The mid-level loop is o the iteratio level, where at each iteratio, messages are exchaged betwee variable odes ad check odes. The exchaged messages are differet i each iteratio ad depedet o the messages from the previous iteratio. Therefore, they ca ot be executed i ay arbitrary order, ad this loop caot be parallelized via the SPMD approach. The iermost-level cosists of two loops for computig the variable ode messages ad check ode messages. I these loops, the same calculatios are performed at each variable or check ode, ad with small modificatios, these loops ca be executed i parallel usig the SPMD approach. The proposed scheme parallelizes the check ode loop rather tha the variable ode loop, because check ode processig was foud experimetally to occupy alargerfractiooftheprocessigtime.vectoristructios

6 Check odes Variable odes coected to each check ode Traspose & Reoder Variable odes Check odes 128 b 128 b 128 b 128 b X X 32-bit message Fig. 3. Remappig the ode idices to eable simultaeous processig of multiple check odes. I this example, each small square is a 32-bit message. are used i order to achieve parallelizatio of the check ode calculatios withi a processor core. This is elaborated below. B. Parallelizatio via Vector Istructios Vector istructios allow each processor core to process several check odes simultaeously, thereby reducig the processig time per iteratio. I order to exploit vector istructios to the fullest, it is ecessary that the calculatio performed at each check ode is simple ad similar to the calculatios performed at every other check ode. Ufortuately, Sum-Product BP evaluates the tah( ) fuctio via direct computatio or atablelookup,forwhichthereisoefficietimplemetatio usig vector istructios. Also, Fast Algorithm E processes a ode oly if the messages eterig it have chaged sice the last iteratio ad this o-uiformity makes it usuitable for vector istructios. O the other had, Mi-Sum BP ad Algorithm E ca both use vector istructios because, i these algorithms, every check ode is processed i early the same way as every other check ode. Oe iterestig observatio, elaborated i Sectio IV is that, by usig vector istructios for Algorithm E, the speedup obtaied is eough to rival the speed of Fast Algorithm E. To apply vector istructios, it is ecessary to remap the messages betwee the check odes ad variable odes, as show i Fig. 3. I particular, it is ecessary to arrage the messages such that they occupy W -bit blocks, where W is the size of the largest block o which additios ad logical operatios ca be performed. Thus, for a give data-type, No. of calculatios i parallel = W sizeof(datatype) Fig. 3 shows the data orgaizatio i memory for the case i which W =128ad the messages are each 32 bits log, allowig a block of 4 check odes to be processed i parallel. Each idividual square cotais a message from a variable ode to the appropriate check ode. The traspositio esures that messages related to a give check ode are placed i successive locatios i memory. Reorderig as show i the figure provides the most efficiet way to process groups of 4checkodeswhileisertigamiimumumberofeutral messages. For the chose word size of 32 bits, these eutral messages, marked X are placed to esure that the umber of messages beig processed is a multiple of 4. The value of these eutral messages is set to zero for additios ad to oe for multiplicatios so that they do ot affect the calculatios. TABLE I DIMENSIONS OF LDPC MATRICES Matrix Size #edges rate H G H H H I To implemet parallel check ode operatios i the above fashio for Mi-Sum BP, the SSE 1 Vector Istructio Set from Itel is used. I Algorithm E, o the other had, the check ode calculatios ivolve messages that take values -1 or or +1 ad they are implemeted etirely usig logical operatios o char, i.e.,8-bitvalues.thus,ithevectorimplemetatioof Algorithm E, 16 check odes ca be processed simultaeously. IV. EXPERIMENTAL RESULTS All experimets were coducted o a Itel Core 2 Quad CPU Q965 ruig at 3 GHz with 4 GB of RAM. The OpeMP Applicatio Program Iterface Versio 3. [17] was used to implemet parallelizatio of various BP decodig algorithms. This API provides C/C++ compiler directives ad library routies to support shared-memory parallelism. The simulatios were coducted o ie LDPC codes at various rates; we report results o the three largest codes i this paper. The parity check matrices are labeled H G, H H ad H I,ad their dimesios are show i Table I. These codes are all derived from a sigle LDPCA code with 594 variable odes. The code dimesios are motivated by a distributed video compressio applicatio 2.Thesydromevectorshavetobe decoded with the help of side iformatio, e.g., the previous video frame or a motio compesated versio of it. Video studies have show that a Laplacia model is close to the observed depedecy betwee the source bitplae ad the side iformatio bitplae. I this work, we are iterested primarily i speedup from parallelizatio per iteratio of LDPC decodig, ot i choosig the LDPC code with the smallest umber of check odes or the LDPC code that coverges i the smallest umber of iteratios. Sice the parallelizatio speedup per iteratio is idepedet of the chael model used, we assume a much simpler Biary Symmetric Chael (BSC) model betwee the source bitplae ad the side iformatio bitplae. Thus, if the crossover probability of the BSC is too large, the LDPC code will ot be able to recover the source bitplae from the side iformatio bitplae eve after the maximum umber of iteratios, I MAX,isreached.Iallour simulatios, I MAX =1. First, cosider the speedup obtaied simply by dividig all the received sydrome vectors amog the available processor cores. As show i Fig. 4, the decodig time is the least for Algorithm E ad the highest for Sum-Product BP. These 1 SSE = Streamig SIMD Extesios 2 Cosider a video frame of size pixels. A 8 8 blockwise Discrete Cosie Trasform (DCT) is applied to the frame, each DCT coefficiet is separately quatized ad the resultig bitplaes are iput to the LDPC ecoder. For a sigle LDPC code to be applied to a particular DCT coefficiet, the umber of variable odes i the LDPC code must be =594.There 8 8 is oe LDPC code for every coded bitplae of each of the 64 DCT coefficiets, ad each of these codes trasmits a sydrome vector to the decoder.

7 Decodig Time (ms) Sum Product LUT MiSum AlgE Sum Product LUT MiSum AlgE Sum Product LUT MiSum AlgE Sum Product LUT MiSum AlgE Decodig Speedup Factor Sum-Prod LUT MiSum Alg E Normalized Speedup Sum-Prod LUT MiSum Alg E Fig. 4. As ew cores are added, the speed of BP decodig icreases, but parallelizatio overhead prevets a ormalized speedup of oe per core. results are for the parity check matrix H I with a low BSC crossover probability of.5. For each algorithm, the decodig times are averaged over 1 decodigs, i.e., the code is ru 1 times with the same sydrome vectors but with side iformatio radomly perturbed accordig a BSC. The bar labeled LUT refers to a implemetatio of Sum-Product BP i which the tah( ) fuctio is read from a look-up table with 32 bit precisio. The LUT variat rus faster tha Sum- Product BP, which uses a C-math fuctio to compute tah( ) but has worse performace tha Sum-Product BP, i.e., a larger umber of ucorrected errors. The secod bar graph plots the decodig speedup factor S(y) while the third graph plots the ormalized speedup S(y). Thesefactorsaregiveby: S(y) = Decodig time with Decodig time with y cores, S(y) = S(y) y where y is the umber of cores used i parallel. The results show that, as more cores are added, the ormalized speedup reduces because of the parallelizatio overhead associated with replicatig thread cotexts, ad the cotetio that occurs whe two threads access the same portios of memory. Now, we describe the beefits of usig vector istructios which speeds up the check ode decodig operatios withi each core, as explaied i Sectio III-B. Note that, for a fixed BP decodig algorithm, usig vector istructios does ot chage the umber of iteratios eeded for covergece, thus the codig performace of a BP decodig algorithm ad its vectorized versio are idetical; the latter versio just executes faster per iteratio. As explaied i Sectio III-B, Mi-Sum BP ad Algorithm E ca be profitably vectorized as show i Figs 5(a), (b) for the code matrices H G ad H H. Firstly, the codig performace of Sum-Product BP is the best, followed by Mi-Sum BP, followed by Algorithm E. This is expected because the latter two algorithms are approximatios of Sum-Product BP. Further, as the crossover probability of the BSC betwee the source ad side iformatio bitplaes icreases, the probability of ucorrected errors icreases util it plateaus at 1, which meas that there are udetected or ucorrected errors i every decoded vector. As there are more check odes i H H tha H G,theplateauoccursatahigher crossover probability for the H H code. Whe the crossover probability icreases, more BP iteratios are eeded to recover the ecoded vector, so the decodig time icreases util the umber of iteratios maxes out at 1. A fact that is ot visible from these plots is that Sum-Product BP coverges i the fewest iteratios but each iteratio cosumes more time. Secodly, recall that Fast Algorithm E executes fewer iteratios by first checkig whether a ode eeds updatig. The decodig time graphs show that, by usig vector istructios i the plai Algorithm E, the decodig speed approaches ad, i some cases, exceeds that of Fast Algorithm E. Note that, owig to the coditioal checks, Fast Algorithm E is ot suitable for implemetatio usig vector istructios. Thirdly, Mi-Sum BP provides itermediate decodig performace ad decodig speed betwee Sum-Product BP ad Algorithm E. Iterestigly however, with vector istructios, the Mi-Sum BP decodig time is early as small as that of Algorithm E while retaiig its superior decodig performace. The reaso for this is that, check ode operatios cosume 55-65% of the decodig time i Mi-Sum BP, but oly 35-45% of the decodig time i Algorithm E. Sice vectorizatio reduces check ode processig time, Mi-Sum BP beefits more from vectorizatio tha Algorithm E. We coclude that Vector Mi- Sum BP is early always to be preferred over Algorithm E for side iformatio decodig. Whe the crossover probability is low, e.g., while decodig the higher sigificat bitplaes of image pixels, Vector Mi-Sum BP is to be preferred over Sum-Product BP because it gives the same performace i less time. However, whe the crossover probability icreases, e.g., while decodig the middle bitplaes of image pixels, Sum- Product BP gives sigificatly fewer decodig errors ad must be preferred over Mi-Sum BP eve though it is slower. V. DISCUSSION To see the decodig time results i the cotext of a video viewig applicatio, cosider the followig very rough calculatio: Suppose that Mi-Sum BP decodig is performed ad we ca tolerate error-proe decoded blocks with probability less tha.1. From Fig. 5, this requiremet is satisfied for BSC crossover probability.5 for the code H H,forexample. The decodig time for vector Mi-Sum BP at this probability is 8.8 ms. With a 8 8 block DCT trasform, there are 64 coefficiets to be coded. However, ot all bitplaes of each DCT coefficiet are sigificat. At 4 db quality i atural images, we foud experimetally that, out of 64 8=512 bitplaes, it is ecessary to code about 3 bitplaes. Thus, 3 Mi-Sum BP decodigs must be carried out per video frame. For the code H H,thisgivesatotaldecodigtimeof

8 Decodig Time (ms) Sum-Product BP MiSumBP Vector MiSumBP Fast Alg E Vector Alg E Residual block error probability Sum-Product BP MiSumBP Alg E BSC crossover probability BSC crossover probability (a) Parity Check Matrix H G,coderate.83 Decodig Time Sum-Product BP MiSumBP Vector MiSumBP Fast Alg E Vector Alg E BSC crossover probability Residual block error probability (b) Parity Check Matrix H H,coderate.49 Sum-Product BP MiSumBP Alg E BSC crossover probability Fig. 5. A compariso of the speeds ad performace of Sum-Product BP, Mi-Sum BP ad Algorithm E at various crossover probabilities = 2.64 secods. This implies that the decodig speed is.38 frames/s for stadard defiitio video, or 1.52 frames/s for CIF video, or 6.6 frames/s for QCIF video o a geeral-purpose Quad Core machie. There are may simplifyig assumptios made above: Firstly, differet code matrices would be required for each bitplae. More reliable bitplaes would decode faster tha H H ad less reliable bitplaes would decode slower. Secodly, motio compesatio is eeded to geerate good side iformatio ad this icurs additioal delay. Nevertheless, it is ecouragig to see that realtime Wyer-Ziv decodig is withi reach o multicore CPUs ad certaily o massively parallel GPUs. Our curret work cosists of combiig parallelized BP decodig, parallelized motio compesatio ad improved side iformatio decodig ito a realtime distributed video decoder. I additio to side iformatio decodig, the beefits of parallelizatio ad vector istructios reported herei are expected to be useful i may other applicatios that use BP decodig - disparity estimatio i multiview images/video, traditioal digital commuicatios, ad speech recogitio to ame a few. REFERENCES [1] D. Slepia ad J. K. Wolf, Noiseless Codig of Correlated Iformatio Sources, IEEE Tras. Iformatio Theory, pp , July1973. [2] A. D. Wyer ad J. Ziv, The rate-distortio fuctio for source codig with side iformatio at the decoder, IEEE Tras. Iformatio Theory, vol. 22, pp. 1 1, Ja [3] B. Girod, A.Aaro, S. Rae, ad D. Rebollo-Moedero, Distributed video codig, Proceedigs of the IEEE, Special Issue o Advaces i Video Codig ad Delivery, vol. 93, o. 1, pp , Ja. 25. [4] X. Artigas, J. Asceso, M. Dalai, S. Klomp, D. Kubasov, ad M. Ouaret, The DISCOVER codec: Architecture, techiques ad evaluatio, i Picture Codig Symposium, Lisbo, Portugal, Nov. 27. [5] S. Grauer-Gray, C. Kambhamettu, ad K. Palaiappa, GPU implemetatio of belief propagatio usig CUDA for cloud trackig ad recostructio, i 5th IAPR Workshop o Patter Recogitio i Remote Sesig (PRRS), Tampa, FL, Dec. 28. [6] S. Wag, S. Cheg, ad Q. Wu, A parallel decodig algorithm of LDPC codes usig CUDA, i Proc. Asilomar Coferece o Sigals, Systems, ad Computers, PacificGrove,CA,Oct.28. [7] A. D. Copelad, N. B. Chag, ad S. Leug, GPU accelerated decodig of high performace error correctig codes, i 29 High Performace Embedded Computig (HPEC), Lexigto,MA,Sept.29. [8] R. G. Gallager, Low-desity parity-check codes, M.I.T. Press, [9] D. J. MacKay ad R. M. Neal, Near Shao-limit performace of low desity parity check codes, Electroics Letters, vol. 32, pp , [1] D. Varodaya, A. Aaro, ad B. Girod, Rate-adaptive codes for distributed source codig, EURASIP Sigal Processig Joural,vol.86, o. 11, pp , Nov. 26. [11] J. Zhag ad M. Fossorier, Shuffled iterative decodig, IEEE Trasactios o Commuicatios, vol.53,o.2,pp ,25. [12] F. R. Kschischag, B. J. Frey, ad H. Loeliger, Factor graphs ad the sum-product algorithm, IEEE Tras. Iformatio Theory, vol.47,pp , Feb. 21. [13] N. Wiberg, Codes ad Decodig o Geeral Graphs. Studies i Sci. ad Techol., Dissertatio o. 44, Liköpig, Swede, [14] M. Mitzemacher, A ote o low desity parity check codes for erasures ad errors, i SRC Tech. Note , COMPAQ, [15] T. J. Richardso ad R. Urbake, The capacity of low-desity paritycheck codes uder message-passig decodig, IEEE Tras. Iformatio Theory, vol.47,pp ,Feb.21. [16] Y. Wag, J. S. Yedidia, ad S. C. Draper, Multi-stage decodig of LDPC codes, i IEEE It. Symp. Iform. Theory, Jue 29, pp [17] OpeMP Versio 3. Applicatio Program Iterface. OpeMP Architecture Review Board, May 28.

Elementary Educational Computer

Elementary Educational Computer Chapter 5 Elemetary Educatioal Computer. Geeral structure of the Elemetary Educatioal Computer (EEC) The EEC coforms to the 5 uits structure defied by vo Neuma's model (.) All uits are preseted i a simplified

More information

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation Improvemet of the Orthogoal Code Covolutio Capabilities Usig FPGA Implemetatio Naima Kaabouch, Member, IEEE, Apara Dhirde, Member, IEEE, Saleh Faruque, Member, IEEE Departmet of Electrical Egieerig, Uiversity

More information

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis Itro to Algorithm Aalysis Aalysis Metrics Slides. Table of Cotets. Aalysis Metrics 3. Exact Aalysis Rules 4. Simple Summatio 5. Summatio Formulas 6. Order of Magitude 7. Big-O otatio 8. Big-O Theorems

More information

Fully Parallel Window Decoder Architecture for Spatially-Coupled LDPC Codes

Fully Parallel Window Decoder Architecture for Spatially-Coupled LDPC Codes Fully Parallel Widow Decoder Architecture for Spatially-Coupled LDPC Codes Najeeb Ul Hassa, Marti Schlüter, ad Gerhard P. Fettweis Vodafoe Chair Mobile Commuicatios Systems, Dresde Uiversity of Techology

More information

Chapter 3 Classification of FFT Processor Algorithms

Chapter 3 Classification of FFT Processor Algorithms Chapter Classificatio of FFT Processor Algorithms The computatioal complexity of the Discrete Fourier trasform (DFT) is very high. It requires () 2 complex multiplicatios ad () complex additios [5]. As

More information

GPUMP: a Multiple-Precision Integer Library for GPUs

GPUMP: a Multiple-Precision Integer Library for GPUs GPUMP: a Multiple-Precisio Iteger Library for GPUs Kaiyog Zhao ad Xiaowe Chu Departmet of Computer Sciece, Hog Kog Baptist Uiversity Hog Kog, P. R. Chia Email: {kyzhao, chxw}@comp.hkbu.edu.hk Abstract

More information

Fast Fourier Transform (FFT) Algorithms

Fast Fourier Transform (FFT) Algorithms Fast Fourier Trasform FFT Algorithms Relatio to the z-trasform elsewhere, ozero, z x z X x [ ] 2 ~ elsewhere,, ~ e j x X x x π j e z z X X π 2 ~ The DFS X represets evely spaced samples of the z- trasform

More information

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 1 Itroductio to Computers ad C++ Programmig Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 1.1 Computer Systems 1.2 Programmig ad Problem Solvig 1.3 Itroductio to C++ 1.4 Testig

More information

Appendix D. Controller Implementation

Appendix D. Controller Implementation COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Appedix D Cotroller Implemetatio Cotroller Implemetatios Combiatioal logic (sigle-cycle); Fiite state machie (multi-cycle, pipelied);

More information

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method A ew Morphological 3D Shape Decompositio: Grayscale Iterframe Iterpolatio Method D.. Vizireau Politehica Uiversity Bucharest, Romaia ae@comm.pub.ro R. M. Udrea Politehica Uiversity Bucharest, Romaia mihea@comm.pub.ro

More information

Pattern Recognition Systems Lab 1 Least Mean Squares

Pattern Recognition Systems Lab 1 Least Mean Squares Patter Recogitio Systems Lab 1 Least Mea Squares 1. Objectives This laboratory work itroduces the OpeCV-based framework used throughout the course. I this assigmet a lie is fitted to a set of poits usig

More information

Joint Message-Passing Symbol-Decoding of LDPC Coded Signals over Partial-Response Channels

Joint Message-Passing Symbol-Decoding of LDPC Coded Signals over Partial-Response Channels Joit Message-Passig Symbol-Decodig of LDPC Coded Sigals over Partial-Respose Chaels Rathakumar Radhakrisha ad ae Vasić Departmet of Electrical ad Computer Egieerig Uiversity of Arizoa, Tucso, AZ-8572 Email:

More information

Low Complexity H.265/HEVC Coding Unit Size Decision for a Videoconferencing System

Low Complexity H.265/HEVC Coding Unit Size Decision for a Videoconferencing System BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 15, No 6 Special Issue o Logistics, Iformatics ad Service Sciece Sofia 2015 Prit ISSN: 1311-9702; Olie ISSN: 1314-4081 DOI:

More information

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming Lecture Notes 6 Itroductio to algorithm aalysis CSS 501 Data Structures ad Object-Orieted Programmig Readig for this lecture: Carrao, Chapter 10 To be covered i this lecture: Itroductio to algorithm aalysis

More information

3D Model Retrieval Method Based on Sample Prediction

3D Model Retrieval Method Based on Sample Prediction 20 Iteratioal Coferece o Computer Commuicatio ad Maagemet Proc.of CSIT vol.5 (20) (20) IACSIT Press, Sigapore 3D Model Retrieval Method Based o Sample Predictio Qigche Zhag, Ya Tag* School of Computer

More information

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem A Improved Shuffled Frog-Leapig Algorithm for Kapsack Problem Zhoufag Li, Ya Zhou, ad Peg Cheg School of Iformatio Sciece ad Egieerig Hea Uiversity of Techology ZhegZhou, Chia lzhf1978@126.com Abstract.

More information

CNN-based architecture for real-time object-oriented video coding applications

CNN-based architecture for real-time object-oriented video coding applications INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS It. J. Circ. Theor. Appl. 2005; 33:53 64 Published olie i Wiley IterSciece (www.itersciece.wiley.com). DOI: 10.1002/cta.303 CNN-based architecture

More information

Computing a k-sparse n-length Discrete Fourier Transform using at most 4k samples and O(k log k) complexity

Computing a k-sparse n-length Discrete Fourier Transform using at most 4k samples and O(k log k) complexity 2013 IEEE Iteratioal Symposium o Iformatio Theory Computig a k-sparse -legth Discrete Fourier Trasform usig at most 4k samples ad O(k log k) complexity Sameer Pawar ad Kaa Ramchadra Dept of Electrical

More information

k (check node degree) and j (variable node degree)

k (check node degree) and j (variable node degree) A Parallel Turbo Decodig Message Passig Architecture for Array LDPC Codes Kira Guam, Pakaj Bhagawat, Weihuag Wag, Gwa Choi, Mark Yeary * Dept. of Electrical Egieerig, Texas A&M Uiversity, College Statio,

More information

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution Multi-Threadig Hyper-, Multi-, ad Simultaeous Thread Executio 1 Performace To Date Icreasig processor performace Pipeliig. Brach predictio. Super-scalar executio. Out-of-order executio. Caches. Hyper-Threadig

More information

Image Segmentation EEE 508

Image Segmentation EEE 508 Image Segmetatio Objective: to determie (etract) object boudaries. It is a process of partitioig a image ito distict regios by groupig together eighborig piels based o some predefied similarity criterio.

More information

DESIGN AND ANALYSIS OF LDPC DECODERS FOR SOFTWARE DEFINED RADIO

DESIGN AND ANALYSIS OF LDPC DECODERS FOR SOFTWARE DEFINED RADIO DESIGN AND ANALYSIS OF LDPC DECODERS FOR SOFTWARE DEFINED RADIO Sagwo Seo, Trevor Mudge Advaced Computer Architecture Laboratory Uiversity of Michiga at A Arbor {swseo, tm}@umich.edu Yumig Zhu, Chaitali

More information

Lecture 1: Introduction and Strassen s Algorithm

Lecture 1: Introduction and Strassen s Algorithm 5-750: Graduate Algorithms Jauary 7, 08 Lecture : Itroductio ad Strasse s Algorithm Lecturer: Gary Miller Scribe: Robert Parker Itroductio Machie models I this class, we will primarily use the Radom Access

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter The Processor Part A path Desig Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler. CPI ad

More information

6.854J / J Advanced Algorithms Fall 2008

6.854J / J Advanced Algorithms Fall 2008 MIT OpeCourseWare http://ocw.mit.edu 6.854J / 18.415J Advaced Algorithms Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 18.415/6.854 Advaced Algorithms

More information

. Written in factored form it is easy to see that the roots are 2, 2, i,

. Written in factored form it is easy to see that the roots are 2, 2, i, CMPS A Itroductio to Programmig Programmig Assigmet 4 I this assigmet you will write a java program that determies the real roots of a polyomial that lie withi a specified rage. Recall that the roots (or

More information

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 12: Virtual Memory Prof. Yajig Li Uiversity of Chicago A System with Physical Memory Oly Examples: most Cray machies early PCs Memory early all embedded systems

More information

Evaluation scheme for Tracking in AMI

Evaluation scheme for Tracking in AMI A M I C o m m u i c a t i o A U G M E N T E D M U L T I - P A R T Y I N T E R A C T I O N http://www.amiproject.org/ Evaluatio scheme for Trackig i AMI S. Schreiber a D. Gatica-Perez b AMI WP4 Trackig:

More information

Algorithms for Disk Covering Problems with the Most Points

Algorithms for Disk Covering Problems with the Most Points Algorithms for Disk Coverig Problems with the Most Poits Bi Xiao Departmet of Computig Hog Kog Polytechic Uiversity Hug Hom, Kowloo, Hog Kog csbxiao@comp.polyu.edu.hk Qigfeg Zhuge, Yi He, Zili Shao, Edwi

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 19 Query Optimizatio Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Query optimizatio Coducted by a query optimizer i a DBMS Goal:

More information

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design College of Computer ad Iformatio Scieces Departmet of Computer Sciece CSC 220: Computer Orgaizatio Uit 11 Basic Computer Orgaizatio ad Desig 1 For the rest of the semester, we ll focus o computer architecture:

More information

Bayesian approach to reliability modelling for a probability of failure on demand parameter

Bayesian approach to reliability modelling for a probability of failure on demand parameter Bayesia approach to reliability modellig for a probability of failure o demad parameter BÖRCSÖK J., SCHAEFER S. Departmet of Computer Architecture ad System Programmig Uiversity Kassel, Wilhelmshöher Allee

More information

BASED ON ITERATIVE ERROR-CORRECTION

BASED ON ITERATIVE ERROR-CORRECTION A COHPARISO OF CRYPTAALYTIC PRICIPLES BASED O ITERATIVE ERROR-CORRECTIO Miodrag J. MihaljeviC ad Jova Dj. GoliC Istitute of Applied Mathematics ad Electroics. Belgrade School of Electrical Egieerig. Uiversity

More information

Chapter 4 The Datapath

Chapter 4 The Datapath The Ageda Chapter 4 The Datapath Based o slides McGraw-Hill Additioal material 24/25/26 Lewis/Marti Additioal material 28 Roth Additioal material 2 Taylor Additioal material 2 Farmer Tae the elemets that

More information

Data diverse software fault tolerance techniques

Data diverse software fault tolerance techniques Data diverse software fault tolerace techiques Complemets desig diversity by compesatig for desig diversity s s limitatios Ivolves obtaiig a related set of poits i the program data space, executig the

More information

Lecture 28: Data Link Layer

Lecture 28: Data Link Layer Automatic Repeat Request (ARQ) 2. Go ack N ARQ Although the Stop ad Wait ARQ is very simple, you ca easily show that it has very the low efficiecy. The low efficiecy comes from the fact that the trasmittig

More information

EE123 Digital Signal Processing

EE123 Digital Signal Processing Last Time EE Digital Sigal Processig Lecture 7 Block Covolutio, Overlap ad Add, FFT Discrete Fourier Trasform Properties of the Liear covolutio through circular Today Liear covolutio with Overlap ad add

More information

Lower Bounds for Sorting

Lower Bounds for Sorting Liear Sortig Topics Covered: Lower Bouds for Sortig Coutig Sort Radix Sort Bucket Sort Lower Bouds for Sortig Compariso vs. o-compariso sortig Decisio tree model Worst case lower boud Compariso Sortig

More information

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control EE 459/500 HDL Based Digital Desig with Programmable Logic Lecture 13 Cotrol ad Sequecig: Hardwired ad Microprogrammed Cotrol Refereces: Chapter s 4,5 from textbook Chapter 7 of M.M. Mao ad C.R. Kime,

More information

The Magma Database file formats

The Magma Database file formats The Magma Database file formats Adrew Gaylard, Bret Pikey, ad Mart-Mari Breedt Johaesburg, South Africa 15th May 2006 1 Summary Magma is a ope-source object database created by Chris Muller, of Kasas City,

More information

Performance Plus Software Parameter Definitions

Performance Plus Software Parameter Definitions Performace Plus+ Software Parameter Defiitios/ Performace Plus Software Parameter Defiitios Chapma Techical Note-TG-5 paramete.doc ev-0-03 Performace Plus+ Software Parameter Defiitios/2 Backgroud ad Defiitios

More information

1. SWITCHING FUNDAMENTALS

1. SWITCHING FUNDAMENTALS . SWITCING FUNDMENTLS Switchig is the provisio of a o-demad coectio betwee two ed poits. Two distict switchig techiques are employed i commuicatio etwors-- circuit switchig ad pacet switchig. Circuit switchig

More information

Analysis of Server Resource Consumption of Meteorological Satellite Application System Based on Contour Curve

Analysis of Server Resource Consumption of Meteorological Satellite Application System Based on Contour Curve Advaces i Computer, Sigals ad Systems (2018) 2: 19-25 Clausius Scietific Press, Caada Aalysis of Server Resource Cosumptio of Meteorological Satellite Applicatio System Based o Cotour Curve Xiagag Zhao

More information

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS APPLICATION NOTE PACE175AE BUILT-IN UNCTIONS About This Note This applicatio brief is iteded to explai ad demostrate the use of the special fuctios that are built ito the PACE175AE processor. These powerful

More information

Ones Assignment Method for Solving Traveling Salesman Problem

Ones Assignment Method for Solving Traveling Salesman Problem Joural of mathematics ad computer sciece 0 (0), 58-65 Oes Assigmet Method for Solvig Travelig Salesma Problem Hadi Basirzadeh Departmet of Mathematics, Shahid Chamra Uiversity, Ahvaz, Ira Article history:

More information

How do we evaluate algorithms?

How do we evaluate algorithms? F2 Readig referece: chapter 2 + slides Algorithm complexity Big O ad big Ω To calculate ruig time Aalysis of recursive Algorithms Next time: Litterature: slides mostly The first Algorithm desig methods:

More information

A Fully SNR, Spatial and Temporal Scalable 3DSPIHT-Based Video Coding Algorithm for Video Streaming Over Heterogeneous Networks

A Fully SNR, Spatial and Temporal Scalable 3DSPIHT-Based Video Coding Algorithm for Video Streaming Over Heterogeneous Networks A Fully SNR, Spatial ad Temporal Scalable 3DSIHT-Based ideo Codig Algorithm for ideo Streamig Over Heterogeeous Networks Habibollah Dayali ad Alfred Mertis School of Electrical, Computer ad Telecommuicatios

More information

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19 CIS Data Structures ad Algorithms with Java Sprig 09 Stacks, Queues, ad Heaps Moday, February 8 / Tuesday, February 9 Stacks ad Queues Recall the stack ad queue ADTs (abstract data types from lecture.

More information

On the Use of Hard-Decision LDPC Decoders on MLC NAND Flash Memory

On the Use of Hard-Decision LDPC Decoders on MLC NAND Flash Memory O the Use of Hard-Decisio LDPC Decoders o MLC NAND Flash Memory Khoa Le ad Fakhreddie Ghaffari ETIS, UMR-8051, Uiversité Paris Seie, Uiversité de Cergy-Potoise, ENSEA, CNRS, Frace. {khoa.letrug, fakhreddie.ghaffari}@esea.fr

More information

A Study on the Performance of Cholesky-Factorization using MPI

A Study on the Performance of Cholesky-Factorization using MPI A Study o the Performace of Cholesky-Factorizatio usig MPI Ha S. Kim Scott B. Bade Departmet of Computer Sciece ad Egieerig Uiversity of Califoria Sa Diego {hskim, bade}@cs.ucsd.edu Abstract Cholesky-factorizatio

More information

MOTIF XF Extension Owner s Manual

MOTIF XF Extension Owner s Manual MOTIF XF Extesio Ower s Maual Table of Cotets About MOTIF XF Extesio...2 What Extesio ca do...2 Auto settig of Audio Driver... 2 Auto settigs of Remote Device... 2 Project templates with Iput/ Output Bus

More information

GE FUNDAMENTALS OF COMPUTING AND PROGRAMMING UNIT III

GE FUNDAMENTALS OF COMPUTING AND PROGRAMMING UNIT III GE2112 - FUNDAMENTALS OF COMPUTING AND PROGRAMMING UNIT III PROBLEM SOLVING AND OFFICE APPLICATION SOFTWARE Plaig the Computer Program Purpose Algorithm Flow Charts Pseudocode -Applicatio Software Packages-

More information

Evaluation of Distributed and Replicated HLR for Location Management in PCS Network

Evaluation of Distributed and Replicated HLR for Location Management in PCS Network JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 9, 85-0 (2003) Evaluatio of Distributed ad Replicated HLR for Locatio Maagemet i PCS Network Departmet of Computer Sciece ad Iformatio Egieerig Natioal Chiao

More information

FAST BIT-REVERSALS ON UNIPROCESSORS AND SHARED-MEMORY MULTIPROCESSORS

FAST BIT-REVERSALS ON UNIPROCESSORS AND SHARED-MEMORY MULTIPROCESSORS SIAM J. SCI. COMPUT. Vol. 22, No. 6, pp. 2113 2134 c 21 Society for Idustrial ad Applied Mathematics FAST BIT-REVERSALS ON UNIPROCESSORS AND SHARED-MEMORY MULTIPROCESSORS ZHAO ZHANG AND XIAODONG ZHANG

More information

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov Sortig i Liear Time Data Structures ad Algorithms Adrei Bulatov Algorithms Sortig i Liear Time 7-2 Compariso Sorts The oly test that all the algorithms we have cosidered so far is compariso The oly iformatio

More information

Improving Template Based Spike Detection

Improving Template Based Spike Detection Improvig Template Based Spike Detectio Kirk Smith, Member - IEEE Portlad State Uiversity petra@ee.pdx.edu Abstract Template matchig algorithms like SSE, Covolutio ad Maximum Likelihood are well kow for

More information

Lecture 5. Counting Sort / Radix Sort

Lecture 5. Counting Sort / Radix Sort Lecture 5. Coutig Sort / Radix Sort T. H. Corme, C. E. Leiserso ad R. L. Rivest Itroductio to Algorithms, 3rd Editio, MIT Press, 2009 Sugkyukwa Uiversity Hyuseug Choo choo@skku.edu Copyright 2000-2018

More information

End Semester Examination CSE, III Yr. (I Sem), 30002: Computer Organization

End Semester Examination CSE, III Yr. (I Sem), 30002: Computer Organization Ed Semester Examiatio 2013-14 CSE, III Yr. (I Sem), 30002: Computer Orgaizatio Istructios: GROUP -A 1. Write the questio paper group (A, B, C, D), o frot page top of aswer book, as per what is metioed

More information

Accuracy Improvement in Camera Calibration

Accuracy Improvement in Camera Calibration Accuracy Improvemet i Camera Calibratio FaJie L Qi Zag ad Reihard Klette CITR, Computer Sciece Departmet The Uiversity of Aucklad Tamaki Campus, Aucklad, New Zealad fli006, qza001@ec.aucklad.ac.z r.klette@aucklad.ac.z

More information

Wavelet Transform. CSE 490 G Introduction to Data Compression Winter Wavelet Transformed Barbara (Enhanced) Wavelet Transformed Barbara (Actual)

Wavelet Transform. CSE 490 G Introduction to Data Compression Winter Wavelet Transformed Barbara (Enhanced) Wavelet Transformed Barbara (Actual) Wavelet Trasform CSE 49 G Itroductio to Data Compressio Witer 6 Wavelet Trasform Codig PACW Wavelet Trasform A family of atios that filters the data ito low resolutio data plus detail data high pass filter

More information

A Note on Least-norm Solution of Global WireWarping

A Note on Least-norm Solution of Global WireWarping A Note o Least-orm Solutio of Global WireWarpig Charlie C. L. Wag Departmet of Mechaical ad Automatio Egieerig The Chiese Uiversity of Hog Kog Shati, N.T., Hog Kog E-mail: cwag@mae.cuhk.edu.hk Abstract

More information

Random Network Coding in Wireless Sensor Networks: Energy Efficiency via Cross-Layer Approach

Random Network Coding in Wireless Sensor Networks: Energy Efficiency via Cross-Layer Approach Radom Network Codig i Wireless Sesor Networks: Eergy Efficiecy via Cross-Layer Approach Daiel Platz, Dereje H. Woldegebreal, ad Holger Karl Uiversity of Paderbor, Paderbor, Germay {platz, dereje.hmr, holger.karl}@upb.de

More information

BOOLEAN MATHEMATICS: GENERAL THEORY

BOOLEAN MATHEMATICS: GENERAL THEORY CHAPTER 3 BOOLEAN MATHEMATICS: GENERAL THEORY 3.1 ISOMORPHIC PROPERTIES The ame Boolea Arithmetic was chose because it was discovered that literal Boolea Algebra could have a isomorphic umerical aspect.

More information

ALU Augmentation for MPEG-4 Repetitive Padding

ALU Augmentation for MPEG-4 Repetitive Padding ALU Augmetatio for MPEG-4 Repetitive Paddig Georgi Kuzmaov Stamatis Vassiliadis Computer Egieerig Lab, Electrical Egieerig Departmet, Faculty of formatio Techology ad Systems, Delft Uiversity of Techology,

More information

Introduction to SWARM Software and Algorithms for Running on Multicore Processors

Introduction to SWARM Software and Algorithms for Running on Multicore Processors Itroductio to SWARM Software ad Algorithms for Ruig o Multicore Processors David A. Bader Georgia Istitute of Techology http://www.cc.gatech.edu/~bader Tutorial compiled by Rucheek H. Sagai M.S. Studet,

More information

Computer Systems - HS

Computer Systems - HS What have we leared so far? Computer Systems High Level ENGG1203 2d Semester, 2017-18 Applicatios Sigals Systems & Cotrol Systems Computer & Embedded Systems Digital Logic Combiatioal Logic Sequetial Logic

More information

Cache-Optimal Methods for Bit-Reversals

Cache-Optimal Methods for Bit-Reversals Proceedigs of the ACM/IEEE Supercomputig Coferece, November 1999, Portlad, Orego, U.S.A. Cache-Optimal Methods for Bit-Reversals Zhao Zhag ad Xiaodog Zhag Departmet of Computer Sciece College of William

More information

Lecture 18. Optimization in n dimensions

Lecture 18. Optimization in n dimensions Lecture 8 Optimizatio i dimesios Itroductio We ow cosider the problem of miimizig a sigle scalar fuctio of variables, f x, where x=[ x, x,, x ]T. The D case ca be visualized as fidig the lowest poit of

More information

Neural Networks A Model of Boolean Functions

Neural Networks A Model of Boolean Functions Neural Networks A Model of Boolea Fuctios Berd Steibach, Roma Kohut Freiberg Uiversity of Miig ad Techology Istitute of Computer Sciece D-09596 Freiberg, Germay e-mails: steib@iformatik.tu-freiberg.de

More information

A REDUCED-COMPLEXITY LDPC DECODING ALGORITHM WITH CHEBYSHEV POLYNOMIAL FITTING

A REDUCED-COMPLEXITY LDPC DECODING ALGORITHM WITH CHEBYSHEV POLYNOMIAL FITTING Joural of Theoretical ad Applied Iformatio Techology st March. Vol. 49 No. 5 - JATIT & LLS. All rights reserved. ISSN: 99-8645 www.jatit.org E-ISSN: 87-95 A REDUCED-COMPLEXITY LDPC DECODING ALGORITHM WITH

More information

CS 683: Advanced Design and Analysis of Algorithms

CS 683: Advanced Design and Analysis of Algorithms CS 683: Advaced Desig ad Aalysis of Algorithms Lecture 6, February 1, 2008 Lecturer: Joh Hopcroft Scribes: Shaomei Wu, Etha Feldma February 7, 2008 1 Threshold for k CNF Satisfiability I the previous lecture,

More information

Throughput-Delay Scaling in Wireless Networks with Constant-Size Packets

Throughput-Delay Scaling in Wireless Networks with Constant-Size Packets Throughput-Delay Scalig i Wireless Networks with Costat-Size Packets Abbas El Gamal, James Mamme, Balaji Prabhakar, Devavrat Shah Departmets of EE ad CS Staford Uiversity, CA 94305 Email: {abbas, jmamme,

More information

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON Roberto Lopez ad Eugeio Oñate Iteratioal Ceter for Numerical Methods i Egieerig (CIMNE) Edificio C1, Gra Capitá s/, 08034 Barceloa, Spai ABSTRACT I this work

More information

COMP Parallel Computing. PRAM (1): The PRAM model and complexity measures

COMP Parallel Computing. PRAM (1): The PRAM model and complexity measures COMP 633 - Parallel Computig Lecture 2 August 24, 2017 : The PRAM model ad complexity measures 1 First class summary This course is about parallel computig to achieve high-er performace o idividual problems

More information

The impact of GOP pattern and packet loss on the video quality. of H.264/AVC compression standard

The impact of GOP pattern and packet loss on the video quality. of H.264/AVC compression standard The impact of GOP patter ad packet loss o the video quality of H.264/AVC compressio stadard MIROSLAV UHRINA, JAROSLAV FRNDA, LUKÁŠ ŠEVČÍK, MARTIN VACULÍK Departmet of Telecommuicatios ad Multimedia Uiversity

More information

EE University of Minnesota. Midterm Exam #1. Prof. Matthew O'Keefe TA: Eric Seppanen. Department of Electrical and Computer Engineering

EE University of Minnesota. Midterm Exam #1. Prof. Matthew O'Keefe TA: Eric Seppanen. Department of Electrical and Computer Engineering EE 4363 1 Uiversity of Miesota Midterm Exam #1 Prof. Matthew O'Keefe TA: Eric Seppae Departmet of Electrical ad Computer Egieerig Uiversity of Miesota Twi Cities Campus EE 4363 Itroductio to Microprocessors

More information

Chapter 10. Defining Classes. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 10. Defining Classes. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 10 Defiig Classes Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 10.1 Structures 10.2 Classes 10.3 Abstract Data Types 10.4 Itroductio to Iheritace Copyright 2015 Pearso Educatio,

More information

Optimum Solution of Quadratic Programming Problem: By Wolfe s Modified Simplex Method

Optimum Solution of Quadratic Programming Problem: By Wolfe s Modified Simplex Method Volume VI, Issue III, March 7 ISSN 78-5 Optimum Solutio of Quadratic Programmig Problem: By Wolfe s Modified Simple Method Kalpaa Lokhade, P. G. Khot & N. W. Khobragade, Departmet of Mathematics, MJP Educatioal

More information

The VSS CCD photometry spreadsheet

The VSS CCD photometry spreadsheet The VSS CCD photometry spreadsheet Itroductio This Excel spreadsheet has bee developed ad tested by the BAA VSS for aalysig results files produced by the multi-image CCD photometry procedure i AIP4Wi v2.

More information

CIS 121 Data Structures and Algorithms with Java Spring Stacks and Queues Monday, February 12 / Tuesday, February 13

CIS 121 Data Structures and Algorithms with Java Spring Stacks and Queues Monday, February 12 / Tuesday, February 13 CIS Data Structures ad Algorithms with Java Sprig 08 Stacks ad Queues Moday, February / Tuesday, February Learig Goals Durig this lab, you will: Review stacks ad queues. Lear amortized ruig time aalysis

More information

Computers and Scientific Thinking

Computers and Scientific Thinking Computers ad Scietific Thikig David Reed, Creighto Uiversity Chapter 15 JavaScript Strigs 1 Strigs as Objects so far, your iteractive Web pages have maipulated strigs i simple ways use text box to iput

More information

Accelerating Multi Dimensional Queries in Data Warehouses

Accelerating Multi Dimensional Queries in Data Warehouses Acceleratig Multi Dimesioal Queries i Data Warehouses Russel Pears ad Brya Houlisto School of Computig ad Mathematical Scieces, Aucklad Uiversity of Techology, New Zealad Email rpears@aut.ac.z Data Warehouses

More information

EE260: Digital Design, Spring /16/18. n Example: m 0 (=x 1 x 2 ) is adjacent to m 1 (=x 1 x 2 ) and m 2 (=x 1 x 2 ) but NOT m 3 (=x 1 x 2 )

EE260: Digital Design, Spring /16/18. n Example: m 0 (=x 1 x 2 ) is adjacent to m 1 (=x 1 x 2 ) and m 2 (=x 1 x 2 ) but NOT m 3 (=x 1 x 2 ) EE26: Digital Desig, Sprig 28 3/6/8 EE 26: Itroductio to Digital Desig Combiatioal Datapath Yao Zheg Departmet of Electrical Egieerig Uiversity of Hawaiʻi at Māoa Combiatioal Logic Blocks Multiplexer Ecoders/Decoders

More information

Automatic Generation of Polynomial-Basis Multipliers in GF (2 n ) using Recursive VHDL

Automatic Generation of Polynomial-Basis Multipliers in GF (2 n ) using Recursive VHDL Automatic Geeratio of Polyomial-Basis Multipliers i GF (2 ) usig Recursive VHDL J. Nelso, G. Lai, A. Teca Abstract Multiplicatio i GF (2 ) is very commoly used i the fields of cryptography ad error correctig

More information

Chapter 4 Threads. Operating Systems: Internals and Design Principles. Ninth Edition By William Stallings

Chapter 4 Threads. Operating Systems: Internals and Design Principles. Ninth Edition By William Stallings Operatig Systems: Iterals ad Desig Priciples Chapter 4 Threads Nith Editio By William Stalligs Processes ad Threads Resource Owership Process icludes a virtual address space to hold the process image The

More information

Algorithm. Counting Sort Analysis of Algorithms

Algorithm. Counting Sort Analysis of Algorithms Algorithm Coutig Sort Aalysis of Algorithms Assumptios: records Coutig sort Each record cotais keys ad data All keys are i the rage of 1 to k Space The usorted list is stored i A, the sorted list will

More information

Efficient Hardware Design for Implementation of Matrix Multiplication by using PPI-SO

Efficient Hardware Design for Implementation of Matrix Multiplication by using PPI-SO Efficiet Hardware Desig for Implemetatio of Matrix Multiplicatio by usig PPI-SO Shivagi Tiwari, Niti Meea Dept. of EC, IES College of Techology, Bhopal, Idia Assistat Professor, Dept. of EC, IES College

More information

Realistic Storage of Pending Requests in Content-Centric Network Routers

Realistic Storage of Pending Requests in Content-Centric Network Routers Realistic Storage of Pedig Requests i Cotet-Cetric Network Routers Wei You, Bertrad Mathieu, Patrick Truog, Jea-Fraçois Peltier Orage Labs Laio, Frace {wei.you, bertrad2.mathieu, patrick.truog, jeafracois.peltier}@orage.com

More information

Prediction-based Incremental Refinement For Binomially-factorized Discrete Wavelet Transforms

Prediction-based Incremental Refinement For Binomially-factorized Discrete Wavelet Transforms IEEE Trasactios o Sigal Processig - T-SP-0992-2009, to appear. Predictio-based Icremetal Refiemet For Biomially-factorized Discrete Wavelet Trasforms Yiais Adreopoulos, Dai Jiag ad Adreas Demostheous ABSTRACT

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5 Morga Kaufma Publishers 26 February, 28 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Set-Associative Cache Architecture Performace Summary Whe CPU performace icreases:

More information

Lecturers: Sanjam Garg and Prasad Raghavendra Feb 21, Midterm 1 Solutions

Lecturers: Sanjam Garg and Prasad Raghavendra Feb 21, Midterm 1 Solutions U.C. Berkeley CS170 : Algorithms Midterm 1 Solutios Lecturers: Sajam Garg ad Prasad Raghavedra Feb 1, 017 Midterm 1 Solutios 1. (4 poits) For the directed graph below, fid all the strogly coected compoets

More information

A Generalized Set Theoretic Approach for Time and Space Complexity Analysis of Algorithms and Functions

A Generalized Set Theoretic Approach for Time and Space Complexity Analysis of Algorithms and Functions Proceedigs of the 10th WSEAS Iteratioal Coferece o APPLIED MATHEMATICS, Dallas, Texas, USA, November 1-3, 2006 316 A Geeralized Set Theoretic Approach for Time ad Space Complexity Aalysis of Algorithms

More information

CIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8)

CIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8) CIS 11 Data Structures ad Algorithms with Java Fall 017 Big-Oh Notatio Tuesday, September 5 (Make-up Friday, September 8) Learig Goals Review Big-Oh ad lear big/small omega/theta otatios Practice solvig

More information

Architectural styles for software systems The client-server style

Architectural styles for software systems The client-server style Architectural styles for software systems The cliet-server style Prof. Paolo Ciacarii Software Architecture CdL M Iformatica Uiversità di Bologa Ageda Cliet server style CS two tiers CS three tiers CS

More information

Stone Images Retrieval Based on Color Histogram

Stone Images Retrieval Based on Color Histogram Stoe Images Retrieval Based o Color Histogram Qiag Zhao, Jie Yag, Jigyi Yag, Hogxig Liu School of Iformatio Egieerig, Wuha Uiversity of Techology Wuha, Chia Abstract Stoe images color features are chose

More information

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments Ruig Time Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects. The

More information

Adaptive Resource Allocation for Electric Environmental Pollution through the Control Network

Adaptive Resource Allocation for Electric Environmental Pollution through the Control Network Available olie at www.sciecedirect.com Eergy Procedia 6 (202) 60 64 202 Iteratioal Coferece o Future Eergy, Eviromet, ad Materials Adaptive Resource Allocatio for Electric Evirometal Pollutio through the

More information

Pruning and Summarizing the Discovered Time Series Association Rules from Mechanical Sensor Data Qing YANG1,a,*, Shao-Yu WANG1,b, Ting-Ting ZHANG2,c

Pruning and Summarizing the Discovered Time Series Association Rules from Mechanical Sensor Data Qing YANG1,a,*, Shao-Yu WANG1,b, Ting-Ting ZHANG2,c Advaces i Egieerig Research (AER), volume 131 3rd Aual Iteratioal Coferece o Electroics, Electrical Egieerig ad Iformatio Sciece (EEEIS 2017) Pruig ad Summarizig the Discovered Time Series Associatio Rules

More information

Dynamic Programming and Curve Fitting Based Road Boundary Detection

Dynamic Programming and Curve Fitting Based Road Boundary Detection Dyamic Programmig ad Curve Fittig Based Road Boudary Detectio SHYAM PRASAD ADHIKARI, HYONGSUK KIM, Divisio of Electroics ad Iformatio Egieerig Chobuk Natioal Uiversity 664-4 Ga Deokji-Dog Jeoju-City Jeobuk

More information

VIDEO WATERMARKING IN 3D DCT DOMAIN

VIDEO WATERMARKING IN 3D DCT DOMAIN 14th Europea Sigal Processig Coferece (EUSIPCO 06), Florece, Italy, September 4-8, 06, copyright by EURASIP VIDEO WATERMARKING IN 3D DCT DOMAIN M. Carli, R. Mazzeo, ad A. Neri AE Departmet Uiversity of

More information