An FPGA Implementation of ( )-Regular Low-Density. Parity-Check Code Decoder

Size: px
Start display at page:

Download "An FPGA Implementation of ( )-Regular Low-Density. Parity-Check Code Decoder"

Transcription

1 An FPGA Imlementation of ( )-Regular Low-Density Parity-Check Code Decoder Tong Zhang ECSE Det., RPI Troy, NY tzhang@ecse.ri.edu Keshab K. Parhi ECE Det., Univ. of Minnesota Minneaolis, MN arhi@ece.umn.edu Abstract Because of their excellent error-correcting erformance, Low-Density Parity-Check (LDPC) codes have recently attracted a lot of attentions. In this work, we are interested in the ractical LDPC code decoder hardware imlementations. The direct fully arallel decoder imlementation usually incurs too high hardware comlexity for many real alications, thus artly arallel decoder design aroaches that can achieve aroriate trade-offs between hardware comlexity and decoding throughut are highly desirable. Alying a joint code and decoder design methodology, we develo a high-seed ( )-regular LDPC code artly arallel decoder architecture, based on which we imlement a -bit, rate- ( )-regular LDPC code decoder on Xilinx FPGA device. This artly arallel decoder suorts a maximum symbol throughut of Mbs and achieves BER at 2dB over AWGN channel while erforming maximum decoding iterations. Introduction In the ast few years, the recently rediscovered Low-Density Parity-Check (LDPC) codes [][2][3] have received a lot of attentions and have been widely considered as next-generation error-correcting codes for telecommunication and magnetic storage. Defined as the null sace of a very sarse arity check matrix, an LDPC code is tyically reresented by a biartite grah, usually called Tanner grah, in which one set of variable nodes corresonds to the set of codeword, another set of check nodes corresonds to the set A biartite grah is one in which the nodes can be artitioned into two sets, X and Y, so that the only edges of the grah are between the nodes in X and the nodes in Y.

2 of arity check constraints and each edge corresonds to a non-zero entry in the arity check matrix. An LDPC code is known as ( )-regular LDPC code if each variable node has the degree of and each check node has the degree of, or in its arity check matrix each column and each row have and non-zero entries, resectively. The code rate of a ( )-regular LDPC code is rovided that the arity check matrix has full rank. The construction of LDPC codes is tyically random. LDPC codes can be effectively decoded by the iterative belief-roagation (BP) algorithm [3] that, as illustrated in Fig., directly matches the Tanner grah: decoding messages are iteratively comuted on each variable node and check node and exchanged through the edges between the neighboring nodes. check nodes check to variable message variable to check message variable nodes Figure : Tanner grah reresentation of an LDPC code and the decoding messages flow. Recently, tremendous efforts have been devoted to analyze and imrove the LDPC codes error-correcting caability, see []-[], etc. Besides their owerful error-correcting caability, another imortant reason why LDPC codes attract so many attentions is that the iterative BP decoding algorithm is inherently fully arallel, thus a great otential decoding seed can be exected. The high-seed decoder hardware imlementation is obviously one of the most crucial issues determining the extent of LDPC alications in the real world. The most natural solution for the decoder architecture design is to directly instantiate the BP decoding algorithm to hardware: each variable node and check node are hysically assigned their own rocessors and all the rocessors are connected through an interconnection network reflecting the Tanner grah connectivity. By comletely exloiting the arallelism of the BP decoding algorithm, such fully arallel decoder can achieve very high decoding seed, e.g., a -bit, rate- LDPC code fully arallel decoder with the maximum symbol throughut of Gbs has been hysically imlemented using ASIC technology [2]. The main disadvantage of such fully arallel design is that with the increase of code length, tyically the LDPC code length is very large (at least several thousands), the incurred hardware comlexity will become more and more rohibitive for many ractical uroses, e.g., for K code length the ASIC decoder imlementation [2] consumes M gates. Moreover, as ointed out in [2], the routing over- 2

3 head for imlementing the entire interconnection network will become quite formidable due to the large code length and randomness of the Tanner grah. Thus high-seed artly arallel decoder design aroaches that achieve aroriate trade-offs between hardware comlexity and decoding throughut are highly desirable. For any given LDPC code, due to the randomness of its Tanner grah, it is nearly imossible to directly develo a high-seed artly arallel decoder architecture. To circumvent this difficulty, Boutillon et al. [3] roosed a decoder-first code design methodology: instead of trying to conceive the high-seed artly arallel decoder for any given random LDPC code, use an available high-seed artly arallel decoder to define a constrained random LDPC code. We may consider it as an alication of the well-known Think in the reverse direction methodology. Insired by the decoder-first code design methodology, we roosed a joint code and decoder design methodology in [] for ( )-regular LDPC code artly arallel decoder design. By jointly conceiving the code construction and artly arallel decoder architecture design, we resented a ( )-regular LDPC code artly arallel decoder structure in [], which not only defines very good ( )-regular LDPC codes but also could otentially achieve high-seed artly arallel decoding. In this aer, alying the joint code and decoder design methodology, we develo an elaborate ( )- regular LDPC code high-seed artly arallel decoder architecture based on which we imlement a - bit, rate- ( )-regular LDPC code decoder using Xilinx Virtex FPGA (Field Programmable Gate Array) device. In this work, we significantly modify the original decoder structure [] to imrove the decoding throughut and simlify the control logic design. To achieve good error-correcting caability, the LDPC code decoder architecture has to ossess randomness to some extent, which makes the FPGA imlementations more challenging since FPGA has fixed and regular hardware resources. We roose a novel scheme to realize the random connectivity by concatenating two routing networks, where all the random hardwire routings are localized and the overall routing comlexity is significantly reduced. Exloiting the good minimum distance roerty of LDPC codes, this decoder emloys arity check as the earlier decoding stoing criterion to achieve adative decoding for energy reduction. With the maximum 8 decoding iterations, this FPGA artly arallel decoder suorts a maximum Mbs symbol throughut and achieves BER at 2dB over AWGN channel. This aer begins with a brief descrition of the LDPC code decoding algorithm in Section 2. In section 3, we briefly describe the joint code and decoder design methodology for ( )-regular LDPC code artly arallel decoder design. In Section, we resent the detailed high-seed artly arallel decoder architecture design. Finally, an FPGA imlementation of a ( )-regular LDPC code artly arallel decoder is discussed in Section 5. 3

4 H 2 Decoding Algorithm Since the direct imlementation of BP algorithm will incur too high hardware comlexity due to the large number of multilications, we introduce some logarithmic quantities to convert these comlicated multilications into additions, which leads to the Log-BP algorithm [2][5]. Before the descrition of Log-BP decoding algorithm, we introduce some definitions as follows: Let denote the sarse arity check matrix of the LDPC code and denote the entry of at the osition ( ). We define the set of bits that articiate in arity check as, and the set of arity checks in which bit articiates as!" #$ %. We denote the set with bit excluded by '&, and the set ( with arity check excluded by )&*. Algorithm 2. Iterative Log-BP Decoding Algorithm Inut: The rior robabilities +-, / and +) Outut: Hard decision ; 8=> ; Procedure: < ; :9:9:9 ; / ,, :9:9:9 ;. Initialization: For each, comute the intrinsic (or channel) message?8/@badcfegh EJI N O:P Q, comute R $ TS7 VU3WX? O@YADC0Z []\ )^_ H ^ `\ )^_ H ^ba dcqef\:g\]s hu3wx? jlk i [ m?o#n m?o#o and for each KML 2. Iterative Decoding Horizontal (or check node comutation) ste: For each KMLqN N:P, comute r $ T@YADC%Z *[s\ 2t \ 2t a u wvyx{z0 }~X S VUNW R wv () where R vbxƒz% ~X PR $ v P. Vertical (or variable node comutation) ste: For each KML N O:P Q, comute R $ %TS7 VU3WX?O$ "O@BADC Z W[s\ )^_ ˆH ^ (2) \ )^_ ˆH ^ a where?o$ 5T?N>[ v xw ] }D~y r WvX. For each, udate the seudo-osterior log-likelihood ratio (LLR) Š2 as: Š Œ?N [ Ž $xw ] }D~ r (3)

5 Decision ste: (a) Perform hard decision on Š :9:9:9 Šf=> to obtain < ; ; :9:9:9 8=> ; such that 8T ; if Š and ; if Š ; (b) If 9-; <, then algorithm terminates, else go to Horizontal ste until the re-set maximum number of iterations have occurred. We call R $ r and r in the above algorithm as extrinsic messages, where R is delivered from variable node to check node and $ is delivered from check node to variable node. It is clear that each decoding iteration can be erformed in fully arallel by hysically maing each check node to one individual Check Node rocessing Unit (CNU) and each variable node to one individual Variable Node rocessing Unit (VNU). Moreover, by delivering the hard decision ; from each VNU to its neighboring < can be easily erformed by all the CNUs. Thanks to the good minimum distance CNUs, the arity check 98; roerty of LDPC code, such adative decoding scheme can effectively reduce the average energy consumtion of the decoder without erformance degradation. In the artly arallel decoding, the oerations of a certain number of check nodes or variable nodes are time-multilexed, or folded [], to a single CNU or VNU. For an LDPC code with check nodes and variable nodes, if its artly arallel decoder contains E CNUs and E VNUs, we denote = factor and = as VNU folding factor. as CNU folding 3 Joint Code and Decoder Design In this section we briefly describe the joint ( )-regular LDPC code and decoder design methodology []. It is well known that the BP (or Log-BP) decoding algorithm works well if the underlying Tanner grah is -cycle free and does not contain too many short cycles. Thus the motivation of this joint design aroach is to construct an LDPC code that not only fits to a high-seed artly arallel decoder but also has the average cycle length as large as ossible in its -cycle free Tanner grah. This joint design rocess is outlined as follows and the corresonding schematic flow diagram is shown in Fig. 2.. Exlicitly construct two matrices, and, in such a way that ; LDPC code whose Tanner grah has the girth 2 of 2; defines a ( )-regular 2 Girth is the length of a shortest cycle in a grah. 5

6 #!!!!!!!!! 2. Develo a artly arallel decoder that is configured by a set of constrained random arameters and defines a ( )-regular LDPC code ensemble, in which each code is a sub-code of and has the arity check matrix ; ; 3. Select a good ( )-regular LDPC code from the code ensemble based on the criteria of large Tanner grah average cycle length and comuter simulations. Tyically the arity check matrix of the selected code has only few redundant checks, so we may assume the code rate is always. High Seed Partly Parallel Decoder Constrained Random Parameters random inut H 3 Construction of H= H H 2 deterministic inut H (3,k) regular LDPC code ensemble defined by H= H H 3 Selected Code Construction of ; by 9 : The structure of ; Figure 2: Joint design flow diagram. submatrices. Each block matrix J in is an J in is obtained by a cyclic shift of an is shown in Fig. 3, where both and are 9 identity matrix. Let identity matrix and each block matrix denote the right cyclic shift oerator where 25 reresents right cyclic shifting matrix by columns, then J 2 7 where K 7)9 A and reresents the identity matrix, e.g., let, and %, we have 7d9A A, then "! F 7 Notice that in both and each row contains s and each column contains a single. Thus, the variable nodes and 9 check matrix ; defines a ( )-regular LDPC code with 9 nodes. Let $ denote the Tanner grah of, we have the following theorem regarding to the girth of $ :

7 L H = H H 2 = I, 0 P, I 2, P2, 0 0 I,2 I k, P,2 Pk, I 2,2 0 0 P2,2 0 0 I k,2 Pk,2 I,k 0 I 2,k P,k P2,k 0 0 I k,k Pk,k L k L k Figure 3: Structure of ; Theorem 3. If can not be factored as 9 there is at least one 2-cycle assing each check node., where N=L k 2 L :9:9:9, then the girth of $ is 2 and Partly Parallel Decoder: Based on the secific structure of ;, a rincial ( )-regular LDPC code artly arallel decoder structure was resented in []. This decoder is configured by a set of constrained random arameters and defines a ( )-regular LDPC code ensemble. Each code in this ensemble is essentially constructed by inserting extra 9 check nodes to the high-girth ( )-regular LDPC code under the constraint secified by the decoder. Therefore, it is reasonable to exect that the codes in this ensemble more likely do not contain too many short cycles and we may easily select a good code from it. For real alications, we can select a good code from this code ensemble as follows: first in the code ensemble find several codes with relatively high average cycle lengths, then select the one leading to the best result in the comuter simulations. The rincial artly arallel decoder structure resented in [] has the following roerties: It contains memory banks, each one consists of several RAMs to store all the decoding messages associated with variable nodes; Each memory bank associates with one address generator that is configured by one element in a constrained random integer set. It contains a configurable random-like one-dimensional shuffle network with the routing comlexity scaled by It contains 9 0 ; VNUs and CNUs so that the VNU and CNU folding factors are 9, resectively; 7 and

8 Each iteration comletes in clock cycles in which only CNUs work in the first clock cycles and both CNUs and VNUs work in the last clock cycles. Over all the ossible and, this decoder defines a ( has the arity check matrix ;, where the submatrix is jointly secified by and. )-regular LDPC code ensemble in which each code Partly Parallel Decoder Architecture In this work, alying the joint code and decoder design methodology, we develo a high-seed ( )-regular LDPC code artly arallel decoder architecture based on which a -bit, rate- ( )-regular LDPC code artly arallel decoder has been imlemented using Xilinx Virtex FPGA device. Comared with the structure resented in [], this artly arallel decoder architecture has the following distinct characteristics: It emloys a novel concatenated configurable random two-dimensional shuffle network imlementation scheme to realize the random-like connectivity with low routing overhead, which is esecially desirable for FPGA imlementations; To imrove the decoding throughut, both the VNU folding factor and CNU folding factor are, instead of and in the structure resented in []; To simlify the control logic design and reduce the memory bandwidth requirement, this decoder comletes each decoding iteration in clock cycles in which CNUs and VNUs work in the and clock cycles, alternatively. Following the joint design methodology, we have that this decoder should define a ( code ensemble in which each code has 9 variable nodes and )-regular LDPC 9 check nodes, and as illustrated in Fig., the arity check matrix of each code has the form where and have the exlicit structures as shown in Fig. 3 and the random-like is secified by certain configuration arameters of the decoder. To facilitate the descrition of the decoder architecture, we introduce some definitions as follows: we denote the submatrix consisting of the consecutive columns in that go through the block matrix J as J ƒ~, in which from left to right each column is labeled as e J {~ with increasing from to, as shown in Fig.. We label the variable node corresonding to column e J ƒ~ " J {~ as and the " J {~ variable nodes for W :9:9:9 constitute a variable node grou VG J. Finally, we arrange the 9 check nodes corresonding to all the 9 rows of submatrix into check node grou CG. 8

9 left most column h L (k,2) right most column h (k,2) L H = H H 2 H 3 = I, P, I k, Pk, I,2 P,2 I k,2 Pk,2 I,k P,k I P k,k k,k L k L k L k (k,2) H 2 L k Figure : The arity check matrix. PE, PE 2, PE k,k active during variable node rocessing VNU RAMs, VNU RAMs 2, VNU RAMs k,k π (regular & fixed) π 2 (regular & fixed) π (random like & 3 configurable) active during check node rocessing CNU, CNU CNU,k 2, CNU 2,k CNU 3, CNU 3,k Figure 5: The rincial ( )-regular LDPC code artly arallel decoder structure. 9

10 Fig. 5 shows the rincial structure of this artly arallel decoder. It mainly contains PE Blocks PE J for s, three bi-directional shuffle networks, and andq9 CNUs. Each PE J contains one memory bank RAMs J that stores all the decoding messages, including the intrinsic and extrinsic messages and hard decisions, associated with all the variable nodes in the variable node grou VG J, and contains one VNU to erform the variable node comutations for these variable nodes. Each bi-directional shuffle network 2 realizes the extrinsic message exchange between all the 9 variable nodes and the 9 check nodes in CG. The CNU for :9:9:9 erform the check node comutations for all the 9 check nodes in CG. This decoder comletes each decoding iteration in clock cycles, and during the and cycles, it works in check node rocessing mode and variable node rocessing mode, resectively. In the check node rocessing mode, the decoder not only erforms the comutations of all the check nodes but also comletes the extrinsic message exchange between neighboring nodes. In variable node rocessing mode, the decoder only erforms the comutations of all the variable nodes. clock The intrinsic and extrinsic messages are all quantized to bits and the iterative decoding dataaths of this artly arallel decoder are illustrated in Fig., in which the dataaths in check node rocessing and variable node rocessing are reresented by solid lines and dash dot lines, resectively. As shown in Fig., each PE Block PE J contains five RAM blocks: EXT RAM for % EXT RAM has memory locations and the location with the address (, INT RAM and DEC RAM. Each ) contains the J ƒ~ extrinsic messages exchanged between the variable node in VG J and its neighboring check node in CG " J {~. The INT RAM and DEC RAM store the intrinsic message and hard decision associated with node at the memory location with the address ( ). As we will see later, such decoding messages storage strategy could greatly simlify the control logic for generating the memory access address. For the urose of simlicity, in Fig. we do not show the dataath from INT RAM to EXT RAM s for extrinsic message initialization, which can be easily realized in clock cycles before the decoder enters the iterative decoding rocess.. Check node rocessing During the check node rocessing, the decoder erforms the comutations of all the check nodes and realizes the extrinsic message exchange between all the neighboring nodes. At the beginning of check node rocessing, in each PE J the memory location with address in EXT RAM contains -bit hybrid data that consists of -bit hard decision and 5-bit variable-to-check extrinsic message associated with the variable node " J {~ in 0

11 CNU,j bits 5 bits π (regular & fixed) h () x,y PE x,y { h (i) x,y} 8 bits INT_RAM CNU 2,j CNU 3,j bits 5 bits bits 5 bits π 2 (regular & fixed) (random like & π 3 configurable) (2) h x,y (3) h x,y {β (i) x,y} 5 bits 8 bits EXT_RAM_ EXT_RAM_2 EXT_RAM_3 { h (i) x,y} {β (i) x,y} 5 bits 5 bits VNU bit DEC_RAM Figure : Iterative decoding dataaths. VG J. Each clock cycle this decoder erforms the read-shuffle-modify-unshuffle-write oerations to convert one variable-to-check extrinsic message in each EXT RAM to its check-to-variable counterart. As illustrated in Fig., we may outline the dataath loo in check node rocessing as follows:. Read: One -bit hybrid data e } B~ J is read from each EXT RAM in each PE J ; 2. Shuffle: Each hybrid data e } B~ J goes through the shuffle network8 and arrives CNU ; 3. Modify: Each CNU erforms the arity check on the inut hard decision bits and generates the outut 5-bit check-to-variable extrinsic messages r } B~ J based on the inut 5-bit variable-to-check extrinsic messages;. Unshuffle: Send each check-to-variable extrinsic message r } B~ J back to the PE Block via the same ath as its variable-to-check counterart; 5. Write: Write each r } B~ J to the same memory location in EXT RAM as its variable-to-check counterart. All the CNUs deliver the arity check results to a central control block that will, at the end of check node rocessing, determine whether all the arity check equations secified by the arity check matrix have been satisfied, if yes, the decoding for current code frame will terminate. To achieve higher decoding throughut, we imlement the read-shuffle-modify-unshuffle-write loo oeration by five-stage ielining as shown in Fig. 7, where CNU is -stage ielined. To make this ielining scheme feasible, we realize each bi-directional I/O connection in the three shuffle networks by two distinct

12 Read CNU (st half) CNU CNU (2nd half) bits bits 5 bits 5 bits Shuffle Unshuffle Write Figure 7: Five-stage ielining of the check node rocessing dataath. sets of wires with oosite directions, which means that the hybrid data from PE Blocks to CNUs and the check-to-variable extrinsic messages from CNUs to PE Blocks are carried on distinct sets of wires. Comared with sharing one set of wires in time-multilexed fashion, this aroach has higher wire routing overhead but obviates the logic gate overhead due to the realization of time-multilex and, more imortantly, make it feasible to directly ieline the dataath loo for higher } B~ decoding throughut. In this decoder, one } B~ address generator AG node rocessing, AG J associates with one EXT RAM in each PE J. In the check J generates the address for reading hybrid data and, due to the five-stage ielining of dataath loo, the address for writing back the check-to-variable message is obtained via delaying the read address by five clock cycles. It is clear that the connectivity among all the variable nodes and check nodes, or the entire arity check matrix, realized by this decoder is jointly secified by all the address generators and, the connectivity among all the variable nodes and the the three shuffle networks. Moreover, for check nodes in CG } B~ is comletely determined by AG J and2. Following the joint design methodology, we imlement all the address generators and the three shuffle networks as follows... Imlementations of AG J ~ and The bi-directional shuffle network and AG J ~ realize the connectivity among all the variable nodes and all " J {~ the check nodes in CG as secified by the fixed submatrix. Recall that node corresonds to the column e " J {~ " J {~ as illustrated in Fig. and the extrinsic messages associated with node is always stored at address. Exloiting the exlicit structure of, we easily obtain the imlementation schemes for AG J ~ and as follows: Each AG J ~ is realized as -bit binary counter that is cleared to zero at the beginning of check node rocessing; The bi-directional shuffle network connects the PE J with the same -index to the same CNU. 2

13 I I..2 Imlementations of AG ~ J and The bi-directional shuffle network and AG ~ J realize the connectivity among all the variable nodes and all the check nodes in CG as secified by the fixed matrix. Similarly, exloiting the extrinsic messages storage strategy and the exlicit structure of, we imlement AG J ~ and as follows: Each AG J ~ is realized as -bit binary counter that only counts u to the value and is loaded with the value of K 7d9 A at the beginning of check node rocessing; The bi-directional shuffle network connects the PE J with the same -index to the same CNU. Notice that the counter load value for each AG J ~ directly comes from the construction of each block matrix J in as described in Section Imlementations of AG ~ J and The bi-directional shuffle network ~ and AG J jointly define the connectivity between all the variable nodes and all the check nodes in CG, which is reresented by as illustrated in Fig.. In the above, we show that by exloiting the secific structures of Y~ and and the extrinsic messages storage strategy, we can directly obtain the imlementations of each AG are not easy because of the following requirements on : J and for W ~. However, the imlementations of AG J and. The Tanner grah corresonding to the arity check matrix should be -cycle free; 2. To make be random to some extent, should be random-like; ~ As roosed in [], to simlify the design rocess, we searately conceive AG J and ~ that the imlementations of AG J and accomlish the above and ~ Imlementations of AG J ~ : We imlement each AG J as requirement, resectively. in such a way -bit binary counter that counts u to the value and is initialized with a constant value J at the beginning of check node rocessing. Each J is selected in random under the following two constraints:. Given, J 2. Given, J, L :9:9:9 f ; K q: 9 A, f K L :9:9:9 f. It can be roved that the above two constraints on J are sufficient to make the entire arity check matrix always corresond to a -cycle free Tanner grah no matter how we imlement. 3

14 Imlementation of : Since each AG ~ J is realized as a counter, the attern of shuffle network can not be fixed otherwise the shuffle attern of will regularly reeat in the, which means that will always contain very regular connectivity atterns no matter how random-like the attern of itself is. Thus we should make be configurable to some extent. In this work, we roose the following concatenated configurable random shuffle network imlementation scheme for. π 3 Inut Data from PE Blocks a, a,k a a k, k,k r=0... L ROM R s bit (r) s (r) bit k a, b, b,k ak, b k, (r) ψ ψ (r) k (R or Id) (R or Id) k a,k a k,k b k,k b, b k, r=0... L s bit (c) ψ (c) (C or Id) ROM C s (c) ψk (C or Id) k (c) c, b, c, c k, b k, bit k c k, Outut Data to CNU 3,j s c, c,k c c k, k,k Stage I: Intra Row Shuffle Stage II: Intra Column Shuffle Figure 8: Forward ath of. Fig. 8 shows the forward ath (from PE J to CNU ) of the bi-directional shuffle network. In each clock cycle, it realizes the data shuffle from J to J by two concatenated stages: intra-row shuffle and intra-column shuffle. Firstly, the J data block, where each J comes from PE J ~, asses an intra-row shuffle network array in which each shuffle network shuffles the inut data J to J ~ for. Each is configured by -bit control signal S K~ leading to the fixed random ermutation if S ~, or to the identity ermutation (Id) otherwise. The reason why we use the Id attern instead of another random shuffle attern is to minimize the routing overhead, and our simulations suggest that there is no gain on the errorcorrecting erformance by using another random shuffle attern instead of Id attern. The -bit configuration word K~ changes every clock cycle and all the -bit control words are stored in ROM R. Next, the J # ~ data block goes through an intra-column shuffle network array in which each shuffles the J to J for ~. Similarly, each is configured by -bit control signal S ~ ermutation if S ~, or to the identity ermutation (Id) otherwise. The -bit configuration word S ~ changes every clock cycle and all the leading to the fixed random -bit control words are stored in ROM C. As the outut of forward ath, the J with the same -index are delivered to the same CNU K~. To realize the bi-directional shuffle, we only need to imlement each configurable shuffle network ~ and as bi-directional so that can

15 ; unshuffle the data backward from CNU to PE J along the same route as the forward ath on distinct sets of wires. Notice that, due to the ielining on the dataath loo, the backward ath control signals are obtained via delaying the forward ath control signals by three clock cycles. To make the connectivity realized by be random-like and change each clock cycle, we only need to randomly generate the control word S ~ and S ~ for each clock cycle and the fixed shuffle atterns of each and. Since most modern FPGA devices have multile metal layers, the imlementations of the two shuffle arrays can be overlaed from the bird s-eye view. Therefore, the above concatenated imlementation scheme will confine all the routing wires to small area (in one row or one column), which will significantly reduce the ossibility of routing congestion and reduce the routing overhead..2 Variable node rocessing Comared with the above check node rocessing, the oerations erformed in the variable node rocessing is quite simle since the decoder only needs to carry out all the variable node comutations. Notice that at the beginning of variable node rocessing, the three 5-bit check-to-variable extrinsic messages associated with each variable node J ƒ~ are stored at the address of the three EXT RAM in PE J. The 5-bit intrinsic J ƒ~ is also stored at the address of INT RAM in PE J. In message associated with variable node each clock cycle, this decoder erforms the read-modify-write oerations to convert the three check-to-variable extrinsic messages associated with the same variable node to three hybrid data consisting of variable-to-check extrinsic messages and hard decisions. As shown in Fig., we may outline the dataath loo in variable node rocessing as follows:. Read: In each PE J, three 5-bit check-to-variable extrinsic messages r B~ J and one 5-bit intrinsic messages? J associated with the same variable node are read from the three EXT RAM and INT RAM at the same address; 2. Modify: Based on the inut check-to-variable extrinsic messages and intrinsic message, each VNU generates the -bit hard decision ; J and three -bit hybrid data e J B~ ; 3. Write: Each e J } B~ is written back to the same memory location as its check-to-variable counterart and J is written to DEC RAM. The forward ath from memory to VNU and backward ath from VNU to memory are imlemented by distinct sets of wires and the entire read-modify-write dataath loo is ielined by three-stage ielining as 5

16 illustrated in Fig. 9. VNU Read 5 bits VNU (st half) VNU (2nd half) bits Write bit Figure 9: Three-stage ielining of the variable node rocessing dataath. Since all the extrinsic and intrinsic messages associated with the same variable node are stored at the same address in different RAM blocks, we can use only one binary counter to generate all the read address. Due to the ielining of the dataath, the write address is obtained via delaying the read address by three clock cycles..3 CNU and VNU Architectures Each CNU carries out the oerations of one check node, including the arity check and comutation of checkto-variable extrinsic messages. Fig. 0 shows the CNU architecture for check node with the degree of. Each inut } B~ is a -bit hybrid data consisting of -bit hard decision and 5-bit variable-to-check extrinsic message. The arity check is erformed by XORing all the six -bit hard decisions. Each -bit variable-to-check extrinsic messages is reresented by sign-magnitude format with a sign bit and four magnitude bits. The architecture for comuting the check-to-variable extrinsic messages is directly obtained from () in Algorithm 2.. The function *T@YADC is realized by the LUT (Look-U Table) that is imlemented as a combinational logic block in FPGA. Each outut 5-bit check-to-variable extrinsic message } B~ is also reresented by signmagnitude format. Each VNU generates the hard decision and all the variable-to-check extrinsic messages associated with one variable node. Fig. shows the VNU architecture for variable node with the degree of 3. With the inut -bit intrinsic message and three -bit check-to-variable extrinsic messages } B~ associated with the same variable node, VNU generates three 5-bit variable-to-check extrinsic messages and -bit hard decision according to (2) and (3) in Algorithm 2., resectively. To enable each CNU receive the hard decisions to erform arity check as described in the above, the hard decision is combined with each 5-bit variable-to-check extrinsic message to form -bit hybrid data Y~ as shown in Fig.. Since each inut check-to-variable extrinsic message } B~ is reresented by sign-magnitude format, we need to convert it to two s comlement format before erforming the additions. Before going through the LUT that each data is converted back the

17 x () x (2) (3) () (5) x () x x x ieline LUT LUT LUT LUT LUT LUT () (2) (3) () (5) () y y y y y y Parity check result Figure 0: Architecture for check node rocessor unit (CNU) with 5. sign-magnitude format.. Data Inut/Outut This artly arallel decoder works simultaneously on three consecutive code frames in two-stage ielining mode: while one frame is being iteratively decoded, the next frame is loaded into the decoder and the hard decisions of the revious frame are read out from the decoder. Thus each INT RAM contains two RAM blocks to store the intrinsic messages of both current and next frames. Similarly, each DEC RAM contains two RAM blocks to store the hard decisions of both current and revious frames. The design scheme for intrinsic message inut and hard decision outut is heavily deendent on the floor lanning of the PE Blocks. To minimize the routing overhead, we develo a square-shaed floor lanning for PE Blocks as illustrated in Fig. 2 and the corresonding data inut/outut scheme is described in the following:. Intrinsic Data Inut: The intrinsic messages of next frame is loaded symbol er clock cycle. As shown in Fig. 2, the memory location of each inut intrinsic data is determined by the inut load address that has the width bits in bits secify which PE Block (or which bits locate the memory location in the selected INT RAM) is being accessed and the 7

18 ieline S to T: Sign Magnitude to Two s Comlement z 5 T to S: Two s Comlement to Sign Magnitude Hard decision y () 5 S to T T to S 7 LUT x () y (2) 5 S to T T to S 7 LUT x (2) y (3) 5 S to T T to S 7 LUT x (3) Figure : Architecture for variable node rocessor unit (VNU) with. Load Address Intrinsic Data log 2 k 2 Binary decoder log 2 L + log 2 k 2 log 2 L 5 k 2 2 k k PE Block Select PE, PE,2 PE,k 2 k k PE 2, PE 2,2 PE 2,k Read Address log 2 L k 2 Decoding Outut PE k, PE k,2 2 k PE k,k k Figure 2: Data Inut/Outut structure. 8

19 INT RAM. As shown in Fig. 2, the rimary intrinsic data and load address inut directly connect to the PE Blocks PE for, and from each PE J the intrinsic data and load address are delivered to the adjacent PE Block PE in ielined fashion. 2. Decoded Data Outut: The decoded data (or hard decisions) of the revious frame is read out in -bit read address inut directly connects ielined fashion. As shown in Fig. 2, the to the PE Blocks PE J for, and from each PE J the read address are delivered to the adjacent block PE J in ielined fashion. Based on its inut read address, each PE Block oututs -bit hard decision er clock cycle. Therefore, as illustrated in Fig. 2, the width of ielined decoded data bus increases by after going through one PE Block, and at the rightmost side, we obtain decoded outut that are combined together as the -bit rimary decoded data outut. -bit 5 FPGA Imlementation Alying the above decoder architecture, we imlemented a ( )-regular LDPC code artly arallel decoder for using Xilinx Virtex-E XCV200E device with the ackage FG5. The corresonding LDPC code length is 9 9 and code rate is. We obtain the constrained random arameter set for imlementing ~ and each AG J as follows: first generate a large number of arameter sets from which we find few sets leading to relatively high Tanner grah average cycle length, then we select one set leading to the best erformance based on comuter simulations. The target XCV200E FPGA device contains 8 large on-chi block RAMs, each one is a fully synchronous dual-ort K-bit RAM. In this decoder imlementation, we configure each dual-ort K-bit RAM as two indeendent single-ort -bit RAM blocks so that each EXT RAM can be realized by one single-ort -bit RAM block. Since each INT RAM contains two RAM blocks for storing the intrinsic messages of both current and next code frames, we use two single-ort -bit RAM blocks to imlement one INT RAM. Due to the relatively small memory size requirement, the DEC RAM is realized by distributed RAM that rovides shallow RAM structures imlemented in CLBs. Since this decoder contains PE Blocks, each one incororates one INT RAM and three EXT RAM s, we totally utilize single-ort -bit RAM blocks (or dual-ort K-bit RAM blocks). We manually configured the lacement of each PE Block according to the floor lanning scheme as shown in Fig. 2 in Section.. Notice that such lacement scheme exactly matches the structure of the configurable shuffle network as described in Section..3, thus the routing overhead for imlementing the is also minimized in this FPGA imlementation. 9

20 From the architecture descrition in Section, we know that, during each clock cycle in the iterative decoding, this decoder need to erform both read and write oerations on each single-ort RAM block EXT RAM. Therefore, suose the rimary clock frequency is, we must generate a clock signal as the RAM control signal to achieve read-and-write oeration in one clock cycle. This 2x clock signal is generated using the Delay-Locked Loo (DLL) in XCV200E. To facilitate the entire imlementation rocess, we extensively utilized the highly otimized Xilinx IP cores to instantiate many function blocks, i.e., all the RAM blocks, all the counters for generating addresses, and the ROMs used to store the control signals for shuffle network. Moreover, all the adders in CNUs and VNUs are imlemented by rile-carry adder that is exactly suitable for Xilinx FPGA imlementations thanks to the on-chi dedicated fast arithmetic carry chain. Table : FPGA resources utilization statistics. Resource Number Utilization rate Resource Number Utilization rate Slices,792 % Slices Registers 0,05 9% inut LUTs 5,933 3% Bonded IOBs 8 8% Block RAMs 90 8% DLLs 2% This decoder was described in the VHDL hardware descrition language (HDL) and SYNOPSYS FPGA Exress was used to synthesize the VHDL imlementation. We used the Xilinx Develoment System tool suite to lace and route the synthesized imlementation for the target XCV200E device with the seed otion -7. Table shows the hardware resource utilization statistics. Notice that 7% of the total utilized slices, or 89 slices, were used for imlementing all the CNUs and VNUs. Fig. 3 shows the laced and routed design in which the lacement of all the PE Blocks are constrained based on the on-chi RAM block locations. Based on the results reorted by the Xilinx static timing analysis tool, the maximum decoder clock frequency can be 5 MHz. If this decoder erforms S decoding iterations for each code frame, the total clock cycle number for decoding one frame will be DS$9 [ where the extra clock cycles is due to the initialization rocess, and the maximum symbol decoding throughut will be W9 9 " DS9 [ * *9 " DSd[ 7 Mbs. Here we set S and obtain the maximum symbol decoding throughut as 5 Mbs. Fig. shows the corresonding erformance over AWGN channel with S, including the BER (Bit Error Rate), FER (Frame Error Rate) and the average iteration numbers. 20

21 Figure 3: The laced and routed decoder imlementation. BER/FER BER FER Average number of iterations E b /N 0 (db) E b /N 0 (db) Figure : Simulation results on BER (Bit Error Rate), FER (Frame Error Rate) and the average iteration numbers. 2

22 Conclusion Due to the unique characteristics of LDPC codes, we believe that jointly conceiving the code construction and artly arallel decoder design should be a key for ractical high-seed LDPC coding system imlementations. In this work, alying a joint design methodology, we develoed a ( )-regular LDPC code high-seed artly arallel decoder architecture design and imlemented a -bit, rate- ( )-regular LDPC code decoder on the Xilinx XCV200E FPGA device. The detailed decoder architecture and floor lanning scheme have been resented and a concatenated configurable random shuffle network imlementation is roosed to minimize the routing overhead for the random-like shuffle network realization. With the maximum 8 decoding iterations, this decoder can achieve u to 5 Mbs symbol decoding throughut and the BER at 2dB over AWGN channel. Moreover, exloiting the good minimum distance roerty of LDPC code, this decoder uses arity check after each iteration as earlier stoing criterion to effectively reduce the average energy consumtion. References [] R. G. Gallager, Low-density arity-check codes, IRE Transactions on Information Theory, vol. IT-8,. 2 28, Jan. 92. [2] R. G. Gallager, Low-Density Parity-Check Codes, M.I.T Press, 93. available at htt://justice.mit.edu/eole/gallager.html. [3] D. J. C. MacKay, Good error-correcting codes based on very sarse matrices, IEEE Transactions on Information Theory, vol. 5, , Mar [] M. C. Davey and D. J. C. MacKay, Low-density arity check codes over, IEEE Commun. Letters, vol. 2,. 5 7, June 998. [5] M. Luby, M. Shokrolloahi, M. Mizenmacher, and D. Sielman, Imroved low-density arity-check codes using irregular grahs and belief roagation, in Proc. 998 IEEE Intl. Sym. Inform. Theory,. 7, Cambridge, MA, Aug availabel at htt://htt.icsi.berkeley.edu/ luby/. [] T. Richardson and R. Urbanke, The caacity of low-density arity-check codes under message-assing decoding, IEEE Transactions on Information Theory, vol. 7, , Feb [7] T. Richardson, A. Shokrollahi, and R. Urbanke, Design of caacity-aroaching irregular low-density arity-check codes, IEEE Transactions on Information Theory, vol. 7,. 9 37, Feb [8] S.-Y. Chung, T. J. Richardson, and R. L. Urbanke, Analysis of sum-roduct decoding of low-density arity-check codes using a Gaussian aroximation, IEEE Transactions on Information Theory, vol. 7, , Feb [9] M. G. Luby, M. Mitzenmacher, M. A. Shokrollahi, and D. A. Sielman, Imroved low-density arity-check codes using irregular grahs, IEEE Transactions on Information Theory, vol. 7, , Feb [0] S. Chung, G. D. Forney, T. Richardson, and R. Urbanke, On the design of low-density arity-check codes within db of the shannon limit, IEEE Communications Letters, vol. 5,. 58 0, Feb

23 [] G. Miller and D. Burshtein, Bounds on the maximum-likelihood decoding error robability of low-density aritycheck codes, IEEE Transactions on Information Theory, vol. 7, , Nov [2] A. J. Blanksby and C. J. Howland, A 90-mW -Gb/s 02-b, rate-/2 low-density arity-check code decoder, IEEE Journal of Solid-State Circuits, vol. 37,. 0 2, March [3] E. Boutillon, J. Castura, and F. R. Kschischang, Decoder-first code design, in Proceedings of the 2nd International Symosium on Turbo Codes and Related Toics,. 59 2, Brest, France, Set available at htt://lester.univ-ubs.fr:8080/ boutillon/ublications.html. [] T. Zhang and K. K. Parhi, VLSI imlementation-oriented ( )-regular low-density arity-check codes,. 25 3, IEEE Worksho on Signal Processing Systems (SiPS), Set available at htt:// [5] M. Chiani, A. Conti, and A. Ventura, Evaluation of low-density arity-check codes over block fading channels, in 2000 IEEE International Conference on Communications, , [] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Imlementation, John Wiley & Sons,

An FPGA Implementation of (3, 6)-Regular Low-Density Parity-Check Code Decoder

An FPGA Implementation of (3, 6)-Regular Low-Density Parity-Check Code Decoder EURASIP Journal on Applied Signal Processing 2003:, 30 42 c 2003 Hindawi Publishing Corporation An FPGA Implementation of (3, )-Regular Low-Density Parity-Check Code Decoder Tong Zhang Department of Electrical,

More information

AUTOMATIC GENERATION OF HIGH THROUGHPUT ENERGY EFFICIENT STREAMING ARCHITECTURES FOR ARBITRARY FIXED PERMUTATIONS. Ren Chen and Viktor K.

AUTOMATIC GENERATION OF HIGH THROUGHPUT ENERGY EFFICIENT STREAMING ARCHITECTURES FOR ARBITRARY FIXED PERMUTATIONS. Ren Chen and Viktor K. inuts er clock cycle Streaming ermutation oututs er clock cycle AUTOMATIC GENERATION OF HIGH THROUGHPUT ENERGY EFFICIENT STREAMING ARCHITECTURES FOR ARBITRARY FIXED PERMUTATIONS Ren Chen and Viktor K.

More information

A CLASS OF STRUCTURED LDPC CODES WITH LARGE GIRTH

A CLASS OF STRUCTURED LDPC CODES WITH LARGE GIRTH A CLASS OF STRUCTURED LDPC CODES WITH LARGE GIRTH Jin Lu, José M. F. Moura, and Urs Niesen Deartment of Electrical and Comuter Engineering Carnegie Mellon University, Pittsburgh, PA 15213 jinlu, moura@ece.cmu.edu

More information

Construction of Irregular QC-LDPC Codes in Near-Earth Communications

Construction of Irregular QC-LDPC Codes in Near-Earth Communications Journal of Communications Vol. 9, No. 7, July 24 Construction of Irregular QC-LDPC Codes in Near-Earth Communications Hui Zhao, Xiaoxiao Bao, Liang Qin, Ruyan Wang, and Hong Zhang Chongqing University

More information

Shuigeng Zhou. May 18, 2016 School of Computer Science Fudan University

Shuigeng Zhou. May 18, 2016 School of Computer Science Fudan University Query Processing Shuigeng Zhou May 18, 2016 School of Comuter Science Fudan University Overview Outline Measures of Query Cost Selection Oeration Sorting Join Oeration Other Oerations Evaluation of Exressions

More information

AN ANALYTICAL MODEL DESCRIBING THE RELATIONSHIPS BETWEEN LOGIC ARCHITECTURE AND FPGA DENSITY

AN ANALYTICAL MODEL DESCRIBING THE RELATIONSHIPS BETWEEN LOGIC ARCHITECTURE AND FPGA DENSITY AN ANALYTICAL MODEL DESCRIBING THE RELATIONSHIPS BETWEEN LOGIC ARCHITECTURE AND FPGA DENSITY Andrew Lam 1, Steven J.E. Wilton 1, Phili Leong 2, Wayne Luk 3 1 Elec. and Com. Engineering 2 Comuter Science

More information

SPITFIRE: Scalable Parallel Algorithms for Test Set Partitioned Fault Simulation

SPITFIRE: Scalable Parallel Algorithms for Test Set Partitioned Fault Simulation To aear in IEEE VLSI Test Symosium, 1997 SITFIRE: Scalable arallel Algorithms for Test Set artitioned Fault Simulation Dili Krishnaswamy y Elizabeth M. Rudnick y Janak H. atel y rithviraj Banerjee z y

More information

An Efficient Coding Method for Coding Region-of-Interest Locations in AVS2

An Efficient Coding Method for Coding Region-of-Interest Locations in AVS2 An Efficient Coding Method for Coding Region-of-Interest Locations in AVS2 Mingliang Chen 1, Weiyao Lin 1*, Xiaozhen Zheng 2 1 Deartment of Electronic Engineering, Shanghai Jiao Tong University, China

More information

An Efficient VLSI Architecture for Adaptive Rank Order Filter for Image Noise Removal

An Efficient VLSI Architecture for Adaptive Rank Order Filter for Image Noise Removal International Journal of Information and Electronics Engineering, Vol. 1, No. 1, July 011 An Efficient VLSI Architecture for Adative Rank Order Filter for Image Noise Removal M. C Hanumantharaju, M. Ravishankar,

More information

An Efficient Video Program Delivery algorithm in Tree Networks*

An Efficient Video Program Delivery algorithm in Tree Networks* 3rd International Symosium on Parallel Architectures, Algorithms and Programming An Efficient Video Program Delivery algorithm in Tree Networks* Fenghang Yin 1 Hong Shen 1,2,** 1 Deartment of Comuter Science,

More information

Collective communication: theory, practice, and experience

Collective communication: theory, practice, and experience CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Comutat.: Pract. Exer. 2007; 19:1749 1783 Published online 5 July 2007 in Wiley InterScience (www.interscience.wiley.com)..1206 Collective

More information

Collective Communication: Theory, Practice, and Experience. FLAME Working Note #22

Collective Communication: Theory, Practice, and Experience. FLAME Working Note #22 Collective Communication: Theory, Practice, and Exerience FLAME Working Note # Ernie Chan Marcel Heimlich Avi Purkayastha Robert van de Geijn Setember, 6 Abstract We discuss the design and high-erformance

More information

COMP Parallel Computing. BSP (1) Bulk-Synchronous Processing Model

COMP Parallel Computing. BSP (1) Bulk-Synchronous Processing Model COMP 6 - Parallel Comuting Lecture 6 November, 8 Bulk-Synchronous essing Model Models of arallel comutation Shared-memory model Imlicit communication algorithm design and analysis relatively simle but

More information

A Study of Protocols for Low-Latency Video Transport over the Internet

A Study of Protocols for Low-Latency Video Transport over the Internet A Study of Protocols for Low-Latency Video Transort over the Internet Ciro A. Noronha, Ph.D. Cobalt Digital Santa Clara, CA ciro.noronha@cobaltdigital.com Juliana W. Noronha University of California, Davis

More information

The VEGA Moderately Parallel MIMD, Moderately Parallel SIMD, Architecture for High Performance Array Signal Processing

The VEGA Moderately Parallel MIMD, Moderately Parallel SIMD, Architecture for High Performance Array Signal Processing The VEGA Moderately Parallel MIMD, Moderately Parallel SIMD, Architecture for High Performance Array Signal Processing Mikael Taveniku 2,3, Anders Åhlander 1,3, Magnus Jonsson 1 and Bertil Svensson 1,2

More information

Learning Motion Patterns in Crowded Scenes Using Motion Flow Field

Learning Motion Patterns in Crowded Scenes Using Motion Flow Field Learning Motion Patterns in Crowded Scenes Using Motion Flow Field Min Hu, Saad Ali and Mubarak Shah Comuter Vision Lab, University of Central Florida {mhu,sali,shah}@eecs.ucf.edu Abstract Learning tyical

More information

Extracting Optimal Paths from Roadmaps for Motion Planning

Extracting Optimal Paths from Roadmaps for Motion Planning Extracting Otimal Paths from Roadmas for Motion Planning Jinsuck Kim Roger A. Pearce Nancy M. Amato Deartment of Comuter Science Texas A&M University College Station, TX 843 jinsuckk,ra231,amato @cs.tamu.edu

More information

Multicast in Wormhole-Switched Torus Networks using Edge-Disjoint Spanning Trees 1

Multicast in Wormhole-Switched Torus Networks using Edge-Disjoint Spanning Trees 1 Multicast in Wormhole-Switched Torus Networks using Edge-Disjoint Sanning Trees 1 Honge Wang y and Douglas M. Blough z y Myricom Inc., 325 N. Santa Anita Ave., Arcadia, CA 916, z School of Electrical and

More information

J. Parallel Distrib. Comput.

J. Parallel Distrib. Comput. J. Parallel Distrib. Comut. 71 (2011) 288 301 Contents lists available at ScienceDirect J. Parallel Distrib. Comut. journal homeage: www.elsevier.com/locate/jdc Quality of security adatation in arallel

More information

RTL Fast Convolution using the Mersenne Number Transform

RTL Fast Convolution using the Mersenne Number Transform RTL Fast Convolution using the Mersenne Number Transform Oscar N. Bria and Horacio A. Villagarcía o.bria@ieee.org CeTAD - - Argentina Abstract VHDL is a versatile high level language for the secification

More information

A Novel Iris Segmentation Method for Hand-Held Capture Device

A Novel Iris Segmentation Method for Hand-Held Capture Device A Novel Iris Segmentation Method for Hand-Held Cature Device XiaoFu He and PengFei Shi Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200030, China {xfhe,

More information

Equality-Based Translation Validator for LLVM

Equality-Based Translation Validator for LLVM Equality-Based Translation Validator for LLVM Michael Ste, Ross Tate, and Sorin Lerner University of California, San Diego {mste,rtate,lerner@cs.ucsd.edu Abstract. We udated our Peggy tool, reviously resented

More information

A Reconfigurable Architecture for Quad MAC VLIW DSP

A Reconfigurable Architecture for Quad MAC VLIW DSP A Reconfigurable Architecture for Quad MAC VLIW DSP Sangwook Kim, Sungchul Yoon, Jaeseuk Oh, Sungho Kang Det. of Electrical & Electronic Engineering, Yonsei University 132 Shinchon-Dong, Seodaemoon-Gu,

More information

Hardware-Accelerated Formal Verification

Hardware-Accelerated Formal Verification Hardare-Accelerated Formal Verification Hiroaki Yoshida, Satoshi Morishita 3 Masahiro Fujita,. VLSI Design and Education Center (VDEC), University of Tokyo. CREST, Jaan Science and Technology Agency 3.

More information

Sensitivity Analysis for an Optimal Routing Policy in an Ad Hoc Wireless Network

Sensitivity Analysis for an Optimal Routing Policy in an Ad Hoc Wireless Network 1 Sensitivity Analysis for an Otimal Routing Policy in an Ad Hoc Wireless Network Tara Javidi and Demosthenis Teneketzis Deartment of Electrical Engineering and Comuter Science University of Michigan Ann

More information

Efficient Processing of Top-k Dominating Queries on Multi-Dimensional Data

Efficient Processing of Top-k Dominating Queries on Multi-Dimensional Data Efficient Processing of To-k Dominating Queries on Multi-Dimensional Data Man Lung Yiu Deartment of Comuter Science Aalborg University DK-922 Aalborg, Denmark mly@cs.aau.dk Nikos Mamoulis Deartment of

More information

Leak Detection Modeling and Simulation for Oil Pipeline with Artificial Intelligence Method

Leak Detection Modeling and Simulation for Oil Pipeline with Artificial Intelligence Method ITB J. Eng. Sci. Vol. 39 B, No. 1, 007, 1-19 1 Leak Detection Modeling and Simulation for Oil Pieline with Artificial Intelligence Method Pudjo Sukarno 1, Kuntjoro Adji Sidarto, Amoranto Trisnobudi 3,

More information

Lecture 18. Today, we will discuss developing algorithms for a basic model for parallel computing the Parallel Random Access Machine (PRAM) model.

Lecture 18. Today, we will discuss developing algorithms for a basic model for parallel computing the Parallel Random Access Machine (PRAM) model. U.C. Berkeley CS273: Parallel and Distributed Theory Lecture 18 Professor Satish Rao Lecturer: Satish Rao Last revised Scribe so far: Satish Rao (following revious lecture notes quite closely. Lecture

More information

Semi-Markov Process based Model for Performance Analysis of Wireless LANs

Semi-Markov Process based Model for Performance Analysis of Wireless LANs Semi-Markov Process based Model for Performance Analysis of Wireless LANs Murali Krishna Kadiyala, Diti Shikha, Ravi Pendse, and Neeraj Jaggi Deartment of Electrical Engineering and Comuter Science Wichita

More information

Directed File Transfer Scheduling

Directed File Transfer Scheduling Directed File Transfer Scheduling Weizhen Mao Deartment of Comuter Science The College of William and Mary Williamsburg, Virginia 387-8795 wm@cs.wm.edu Abstract The file transfer scheduling roblem was

More information

Source Coding and express these numbers in a binary system using M log

Source Coding and express these numbers in a binary system using M log Source Coding 30.1 Source Coding Introduction We have studied how to transmit digital bits over a radio channel. We also saw ways that we could code those bits to achieve error correction. Bandwidth is

More information

IMS Network Deployment Cost Optimization Based on Flow-Based Traffic Model

IMS Network Deployment Cost Optimization Based on Flow-Based Traffic Model IMS Network Deloyment Cost Otimization Based on Flow-Based Traffic Model Jie Xiao, Changcheng Huang and James Yan Deartment of Systems and Comuter Engineering, Carleton University, Ottawa, Canada {jiexiao,

More information

CMSC 425: Lecture 16 Motion Planning: Basic Concepts

CMSC 425: Lecture 16 Motion Planning: Basic Concepts : Lecture 16 Motion lanning: Basic Concets eading: Today s material comes from various sources, including AI Game rogramming Wisdom 2 by S. abin and lanning Algorithms by S. M. LaValle (Chats. 4 and 5).

More information

A New and Efficient Algorithm-Based Fault Tolerance Scheme for A Million Way Parallelism

A New and Efficient Algorithm-Based Fault Tolerance Scheme for A Million Way Parallelism A New and Efficient Algorithm-Based Fault Tolerance Scheme for A Million Way Parallelism Erlin Yao, Mingyu Chen, Rui Wang, Wenli Zhang, Guangming Tan Key Laboratory of Comuter System and Architecture Institute

More information

A power efficient flit-admission scheme for wormhole-switched networks on chip

A power efficient flit-admission scheme for wormhole-switched networks on chip A ower efficient flit-admission scheme for wormhole-switched networks on chi Zhonghai Lu, Li Tong, Bei Yin and Axel Jantsch Laboratory of Electronics and Comuter Systems Royal Institute of Technology,

More information

Space-efficient Region Filling in Raster Graphics

Space-efficient Region Filling in Raster Graphics "The Visual Comuter: An International Journal of Comuter Grahics" (submitted July 13, 1992; revised December 7, 1992; acceted in Aril 16, 1993) Sace-efficient Region Filling in Raster Grahics Dominik Henrich

More information

An empirical analysis of loopy belief propagation in three topologies: grids, small-world networks and random graphs

An empirical analysis of loopy belief propagation in three topologies: grids, small-world networks and random graphs An emirical analysis of looy belief roagation in three toologies: grids, small-world networks and random grahs R. Santana, A. Mendiburu and J. A. Lozano Intelligent Systems Grou Deartment of Comuter Science

More information

RECENTLY, low-density parity-check (LDPC) codes have

RECENTLY, low-density parity-check (LDPC) codes have 892 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL. 53, NO. 4, APRIL 2006 Code Construction and FPGA Implementation of a Low-Error-Floor Multi-Rate Low-Density Parity-Check Code Decoder

More information

A Model-Adaptable MOSFET Parameter Extraction System

A Model-Adaptable MOSFET Parameter Extraction System A Model-Adatable MOSFET Parameter Extraction System Masaki Kondo Hidetoshi Onodera Keikichi Tamaru Deartment of Electronics Faculty of Engineering, Kyoto University Kyoto 66-1, JAPAN Tel: +81-7-73-313

More information

PREDICTING LINKS IN LARGE COAUTHORSHIP NETWORKS

PREDICTING LINKS IN LARGE COAUTHORSHIP NETWORKS PREDICTING LINKS IN LARGE COAUTHORSHIP NETWORKS Kevin Miller, Vivian Lin, and Rui Zhang Grou ID: 5 1. INTRODUCTION The roblem we are trying to solve is redicting future links or recovering missing links

More information

Layered Switching for Networks on Chip

Layered Switching for Networks on Chip Layered Switching for Networks on Chi Zhonghai Lu zhonghai@kth.se Ming Liu mingliu@kth.se Royal Institute of Technology, Sweden Axel Jantsch axel@kth.se ABSTRACT We resent and evaluate a novel switching

More information

Interactive Image Segmentation

Interactive Image Segmentation Interactive Image Segmentation Fahim Mannan (260 266 294) Abstract This reort resents the roject work done based on Boykov and Jolly s interactive grah cuts based N-D image segmentation algorithm([1]).

More information

LOW-DENSITY PARITY-CHECK (LDPC) codes [1] can

LOW-DENSITY PARITY-CHECK (LDPC) codes [1] can 208 IEEE TRANSACTIONS ON MAGNETICS, VOL 42, NO 2, FEBRUARY 2006 Structured LDPC Codes for High-Density Recording: Large Girth and Low Error Floor J Lu and J M F Moura Department of Electrical and Computer

More information

Complexity Issues on Designing Tridiagonal Solvers on 2-Dimensional Mesh Interconnection Networks

Complexity Issues on Designing Tridiagonal Solvers on 2-Dimensional Mesh Interconnection Networks Journal of Comuting and Information Technology - CIT 8, 2000, 1, 1 12 1 Comlexity Issues on Designing Tridiagonal Solvers on 2-Dimensional Mesh Interconnection Networks Eunice E. Santos Deartment of Electrical

More information

Randomized algorithms: Two examples and Yao s Minimax Principle

Randomized algorithms: Two examples and Yao s Minimax Principle Randomized algorithms: Two examles and Yao s Minimax Princile Maximum Satisfiability Consider the roblem Maximum Satisfiability (MAX-SAT). Bring your knowledge u-to-date on the Satisfiability roblem. Maximum

More information

Efficient Parallel Hierarchical Clustering

Efficient Parallel Hierarchical Clustering Efficient Parallel Hierarchical Clustering Manoranjan Dash 1,SimonaPetrutiu, and Peter Scheuermann 1 Deartment of Information Systems, School of Comuter Engineering, Nanyang Technological University, Singaore

More information

Mitigating the Impact of Decompression Latency in L1 Compressed Data Caches via Prefetching

Mitigating the Impact of Decompression Latency in L1 Compressed Data Caches via Prefetching Mitigating the Imact of Decomression Latency in L1 Comressed Data Caches via Prefetching by Sean Rea A thesis resented to Lakehead University in artial fulfillment of the requirement for the degree of

More information

Overlapped Scheduling for Folded LDPC Decoding Based on Matrix Permutation

Overlapped Scheduling for Folded LDPC Decoding Based on Matrix Permutation Overlapped Scheduling for Folded LDPC Decoding Based on Matrix Permutation In-Cheol Park and Se-Hyeon Kang Department of Electrical Engineering and Computer Science, KAIST {icpark, shkang}@ics.kaist.ac.kr

More information

Introduction to Parallel Algorithms

Introduction to Parallel Algorithms CS 1762 Fall, 2011 1 Introduction to Parallel Algorithms Introduction to Parallel Algorithms ECE 1762 Algorithms and Data Structures Fall Semester, 2011 1 Preliminaries Since the early 1990s, there has

More information

Lecture 8: Orthogonal Range Searching

Lecture 8: Orthogonal Range Searching CPS234 Comutational Geometry Setember 22nd, 2005 Lecture 8: Orthogonal Range Searching Lecturer: Pankaj K. Agarwal Scribe: Mason F. Matthews 8.1 Range Searching The general roblem of range searching is

More information

HIGH-THROUGHPUT MULTI-RATE LDPC DECODER BASED ON ARCHITECTURE-ORIENTED PARITY CHECK MATRICES

HIGH-THROUGHPUT MULTI-RATE LDPC DECODER BASED ON ARCHITECTURE-ORIENTED PARITY CHECK MATRICES HIGH-THROUGHPUT MULTI-RATE LDPC DECODER BASED ON ARCHITECTURE-ORIENTED PARITY CHECK MATRICES Predrag Radosavljevic, Alexandre de Baynast, Marjan Karkooti, Joseph R. Cavallaro ECE Department, Rice University

More information

Stereo Disparity Estimation in Moment Space

Stereo Disparity Estimation in Moment Space Stereo Disarity Estimation in oment Sace Angeline Pang Faculty of Information Technology, ultimedia University, 63 Cyberjaya, alaysia. angeline.ang@mmu.edu.my R. ukundan Deartment of Comuter Science, University

More information

RST(0) RST(1) RST(2) RST(3) RST(4) RST(5) P4 RSR(0) RSR(1) RSR(2) RSR(3) RSR(4) RSR(5) Processor 1X2 Switch 2X1 Switch

RST(0) RST(1) RST(2) RST(3) RST(4) RST(5) P4 RSR(0) RSR(1) RSR(2) RSR(3) RSR(4) RSR(5) Processor 1X2 Switch 2X1 Switch Sub-logarithmic Deterministic Selection on Arrays with a Recongurable Otical Bus 1 Yijie Han Electronic Data Systems, Inc. 750 Tower Dr. CPS, Mail Sto 7121 Troy, MI 48098 Yi Pan Deartment of Comuter Science

More information

CASCH - a Scheduling Algorithm for "High Level"-Synthesis

CASCH - a Scheduling Algorithm for High Level-Synthesis CASCH a Scheduling Algorithm for "High Level"Synthesis P. Gutberlet H. Krämer W. Rosenstiel Comuter Science Research Center at the University of Karlsruhe (FZI) HaidundNeuStr. 1014, 7500 Karlsruhe, F.R.G.

More information

Cooperative Diversity Combined with Beamforming

Cooperative Diversity Combined with Beamforming Cooerative Diversity Combined with Beamforming Hyunwoo Choi and Jae Hong Lee School of Electrical Engineering and INMC Seoul National University Seoul, Korea Email: {novo28, jhlee}@snu.ac.kr Abstract In

More information

Submission. Verifying Properties Using Sequential ATPG

Submission. Verifying Properties Using Sequential ATPG Verifying Proerties Using Sequential ATPG Jacob A. Abraham and Vivekananda M. Vedula Comuter Engineering Research Center The University of Texas at Austin Austin, TX 78712 jaa, vivek @cerc.utexas.edu Daniel

More information

Parallel Construction of Multidimensional Binary Search Trees. Ibraheem Al-furaih, Srinivas Aluru, Sanjay Goil Sanjay Ranka

Parallel Construction of Multidimensional Binary Search Trees. Ibraheem Al-furaih, Srinivas Aluru, Sanjay Goil Sanjay Ranka Parallel Construction of Multidimensional Binary Search Trees Ibraheem Al-furaih, Srinivas Aluru, Sanjay Goil Sanjay Ranka School of CIS and School of CISE Northeast Parallel Architectures Center Syracuse

More information

split split (a) (b) split split (c) (d)

split split (a) (b) split split (c) (d) International Journal of Foundations of Comuter Science c World Scientic Publishing Comany ON COST-OPTIMAL MERGE OF TWO INTRANSITIVE SORTED SEQUENCES JIE WU Deartment of Comuter Science and Engineering

More information

A Parallel Algorithm for Constructing Obstacle-Avoiding Rectilinear Steiner Minimal Trees on Multi-Core Systems

A Parallel Algorithm for Constructing Obstacle-Avoiding Rectilinear Steiner Minimal Trees on Multi-Core Systems A Parallel Algorithm for Constructing Obstacle-Avoiding Rectilinear Steiner Minimal Trees on Multi-Core Systems Cheng-Yuan Chang and I-Lun Tseng Deartment of Comuter Science and Engineering Yuan Ze University,

More information

The Scalability and Performance of Common Vector Solution to Generalized Label Continuity Constraint in Hybrid Optical/Packet Networks

The Scalability and Performance of Common Vector Solution to Generalized Label Continuity Constraint in Hybrid Optical/Packet Networks The Scalability and Performance of Common Vector Solution to Generalized abel Continuity Constraint in Hybrid Otical/Pacet etwors Shujia Gong and Ban Jabbari {sgong, bjabbari}@gmuedu George Mason University

More information

FIELD-programmable gate arrays (FPGAs) are quickly

FIELD-programmable gate arrays (FPGAs) are quickly This article has been acceted for inclusion in a future issue of this journal Content is final as resented, with the excetion of agination IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

More information

A GPU Heterogeneous Cluster Scheduling Model for Preventing Temperature Heat Island

A GPU Heterogeneous Cluster Scheduling Model for Preventing Temperature Heat Island A GPU Heterogeneous Cluster Scheduling Model for Preventing Temerature Heat Island Yun-Peng CAO 1,2,a and Hai-Feng WANG 1,2 1 School of Information Science and Engineering, Linyi University, Linyi Shandong,

More information

Design Trade-offs in Customized On-chip Crossbar Schedulers

Design Trade-offs in Customized On-chip Crossbar Schedulers J Sign Process Syst () 8:9 8 DOI.7/s-8--x Design Trade-offs in Customized On-chi Crossbar Schedulers Jae Young Hur Stehan Wong Todor Stefanov Received: October 7 / Revised: June 8 / cceted: ugust 8 / Published

More information

AN INTEGER LINEAR MODEL FOR GENERAL ARC ROUTING PROBLEMS

AN INTEGER LINEAR MODEL FOR GENERAL ARC ROUTING PROBLEMS AN INTEGER LINEAR MODEL FOR GENERAL ARC ROUTING PROBLEMS Philie LACOMME, Christian PRINS, Wahiba RAMDANE-CHERIF Université de Technologie de Troyes, Laboratoire d Otimisation des Systèmes Industriels (LOSI)

More information

Texture Mapping with Vector Graphics: A Nested Mipmapping Solution

Texture Mapping with Vector Graphics: A Nested Mipmapping Solution Texture Maing with Vector Grahics: A Nested Mimaing Solution Wei Zhang Yonggao Yang Song Xing Det. of Comuter Science Det. of Comuter Science Det. of Information Systems Prairie View A&M University Prairie

More information

A 2D Random Walk Mobility Model for Location Management Studies in Wireless Networks Abstract: I. Introduction

A 2D Random Walk Mobility Model for Location Management Studies in Wireless Networks Abstract: I. Introduction A D Random Walk Mobility Model for Location Management Studies in Wireless Networks Kuo Hsing Chiang, RMIT University, Melbourne, Australia Nirmala Shenoy, Information Technology Deartment, RIT, Rochester,

More information

OMNI: An Efficient Overlay Multicast. Infrastructure for Real-time Applications

OMNI: An Efficient Overlay Multicast. Infrastructure for Real-time Applications OMNI: An Efficient Overlay Multicast Infrastructure for Real-time Alications Suman Banerjee, Christoher Kommareddy, Koushik Kar, Bobby Bhattacharjee, Samir Khuller Abstract We consider an overlay architecture

More information

Randomized Selection on the Hypercube 1

Randomized Selection on the Hypercube 1 Randomized Selection on the Hyercube 1 Sanguthevar Rajasekaran Det. of Com. and Info. Science and Engg. University of Florida Gainesville, FL 32611 ABSTRACT In this aer we resent randomized algorithms

More information

Autonomic Physical Database Design - From Indexing to Multidimensional Clustering

Autonomic Physical Database Design - From Indexing to Multidimensional Clustering Autonomic Physical Database Design - From Indexing to Multidimensional Clustering Stehan Baumann, Kai-Uwe Sattler Databases and Information Systems Grou Technische Universität Ilmenau, Ilmenau, Germany

More information

CS649 Sensor Networks IP Track Lecture 6: Graphical Models

CS649 Sensor Networks IP Track Lecture 6: Graphical Models CS649 Sensor Networks IP Track Lecture 6: Grahical Models I-Jeng Wang htt://hinrg.cs.jhu.edu/wsn06/ Sring 2006 CS 649 1 Sring 2006 CS 649 2 Grahical Models Grahical Model: grahical reresentation of joint

More information

Cross products. p 2 p. p p1 p2. p 1. Line segments The convex combination of two distinct points p1 ( x1, such that for some real number with 0 1,

Cross products. p 2 p. p p1 p2. p 1. Line segments The convex combination of two distinct points p1 ( x1, such that for some real number with 0 1, CHAPTER 33 Comutational Geometry Is the branch of comuter science that studies algorithms for solving geometric roblems. Has alications in many fields, including comuter grahics robotics, VLSI design comuter

More information

Improved heuristics for the single machine scheduling problem with linear early and quadratic tardy penalties

Improved heuristics for the single machine scheduling problem with linear early and quadratic tardy penalties Imroved heuristics for the single machine scheduling roblem with linear early and quadratic tardy enalties Jorge M. S. Valente* LIAAD INESC Porto LA, Faculdade de Economia, Universidade do Porto Postal

More information

EE678 Application Presentation Content Based Image Retrieval Using Wavelets

EE678 Application Presentation Content Based Image Retrieval Using Wavelets EE678 Alication Presentation Content Based Image Retrieval Using Wavelets Grou Members: Megha Pandey megha@ee. iitb.ac.in 02d07006 Gaurav Boob gb@ee.iitb.ac.in 02d07008 Abstract: We focus here on an effective

More information

Single character type identification

Single character type identification Single character tye identification Yefeng Zheng*, Changsong Liu, Xiaoqing Ding Deartment of Electronic Engineering, Tsinghua University Beijing 100084, P.R. China ABSTRACT Different character recognition

More information

Lecture 2: Fixed-Radius Near Neighbors and Geometric Basics

Lecture 2: Fixed-Radius Near Neighbors and Geometric Basics structure arises in many alications of geometry. The dual structure, called a Delaunay triangulation also has many interesting roerties. Figure 3: Voronoi diagram and Delaunay triangulation. Search: Geometric

More information

Tracking Multiple Targets Using a Particle Filter Representation of the Joint Multitarget Probability Density

Tracking Multiple Targets Using a Particle Filter Representation of the Joint Multitarget Probability Density Tracing Multile Targets Using a Particle Filter Reresentation of the Joint Multitarget Probability Density Chris Kreucher, Keith Kastella, Alfred Hero This wor was suorted by United States Air Force contract

More information

Source-to-Source Code Generation Based on Pattern Matching and Dynamic Programming

Source-to-Source Code Generation Based on Pattern Matching and Dynamic Programming Source-to-Source Code Generation Based on Pattern Matching and Dynamic Programming Weimin Chen, Volker Turau TR-93-047 August, 1993 Abstract This aer introduces a new technique for source-to-source code

More information

Detection of Occluded Face Image using Mean Based Weight Matrix and Support Vector Machine

Detection of Occluded Face Image using Mean Based Weight Matrix and Support Vector Machine Journal of Comuter Science 8 (7): 1184-1190, 2012 ISSN 1549-3636 2012 Science Publications Detection of Occluded Face Image using Mean Based Weight Matrix and Suort Vector Machine 1 G. Nirmala Priya and

More information

Performance and Area Tradeoffs in Space-Qualified FPGA-Based Time-of-Flight Systems

Performance and Area Tradeoffs in Space-Qualified FPGA-Based Time-of-Flight Systems Performance and Area Tradeoffs in Sace-Qualified FPGA-Based Time-of-Flight Systems Whitney Hsiong 1, Steve Huntzicker 1, Kevin King 1, Austin Lee 1, Chen Lim 1, Jason Wang 1, Sarah Harris, Ph.D. 1, Jörg-Micha

More information

A Metaheuristic Scheduler for Time Division Multiplexed Network-on-Chip

A Metaheuristic Scheduler for Time Division Multiplexed Network-on-Chip Downloaded from orbit.dtu.dk on: Jan 25, 2019 A Metaheuristic Scheduler for Time Division Multilexed Network-on-Chi Sørensen, Rasmus Bo; Sarsø, Jens; Pedersen, Mark Ruvald; Højgaard, Jasur Publication

More information

RESEARCH ARTICLE. Simple Memory Machine Models for GPUs

RESEARCH ARTICLE. Simple Memory Machine Models for GPUs The International Journal of Parallel, Emergent and Distributed Systems Vol. 00, No. 00, Month 2011, 1 22 RESEARCH ARTICLE Simle Memory Machine Models for GPUs Koji Nakano a a Deartment of Information

More information

Using Permuted States and Validated Simulation to Analyze Conflict Rates in Optimistic Replication

Using Permuted States and Validated Simulation to Analyze Conflict Rates in Optimistic Replication Using Permuted States and Validated Simulation to Analyze Conflict Rates in Otimistic Relication An-I A. Wang Comuter Science Deartment Florida State University Geoff H. Kuenning Comuter Science Deartment

More information

Brief Contributions. A Geometric Theorem for Network Design 1 INTRODUCTION

Brief Contributions. A Geometric Theorem for Network Design 1 INTRODUCTION IEEE TRANSACTIONS ON COMPUTERS, VOL. 53, NO., APRIL 00 83 Brief Contributions A Geometric Theorem for Network Design Massimo Franceschetti, Member, IEEE, Matthew Cook, and Jehoshua Bruck, Fellow, IEEE

More information

Near-Optimal Routing Lookups with Bounded Worst Case Performance

Near-Optimal Routing Lookups with Bounded Worst Case Performance Near-Otimal Routing Lookus with Bounded Worst Case Performance Pankaj Guta Balaji Prabhakar Stehen Boyd Deartments of Electrical Engineering and Comuter Science Stanford University CA 9430 ankaj@stanfordedu

More information

On the construction of Tanner graphs

On the construction of Tanner graphs On the construction of Tanner graphs Jesús Martínez Mateo Universidad Politécnica de Madrid Outline Introduction Low-density parity-check (LDPC) codes LDPC decoding Belief propagation based algorithms

More information

Statistical Detection for Network Flooding Attacks

Statistical Detection for Network Flooding Attacks Statistical Detection for Network Flooding Attacks C. S. Chao, Y. S. Chen, and A.C. Liu Det. of Information Engineering, Feng Chia Univ., Taiwan 407, OC. Email: cschao@fcu.edu.tw Abstract In order to meet

More information

Using Standard AADL for COMPASS

Using Standard AADL for COMPASS Using Standard AADL for COMPASS (noll@cs.rwth-aachen.de) AADL Standards Meeting Aachen, Germany; July 5 8, 06 Overview Introduction SLIM Language Udates COMPASS Develoment Roadma Fault Injections Parametric

More information

10. Parallel Methods for Data Sorting

10. Parallel Methods for Data Sorting 10. Parallel Methods for Data Sorting 10. Parallel Methods for Data Sorting... 1 10.1. Parallelizing Princiles... 10.. Scaling Parallel Comutations... 10.3. Bubble Sort...3 10.3.1. Sequential Algorithm...3

More information

Parallel Algorithms for the Summed Area Table on the Asynchronous Hierarchical Memory Machine, with GPU implementations

Parallel Algorithms for the Summed Area Table on the Asynchronous Hierarchical Memory Machine, with GPU implementations Parallel Algorithms for the Summed Area Table on the Asynchronous Hierarchical Memory Machine, ith GPU imlementations Akihiko Kasagi, Koji Nakano, and Yasuaki Ito Deartment of Information Engineering Hiroshima

More information

GEOMETRIC CONSTRAINT SOLVING IN < 2 AND < 3. Department of Computer Sciences, Purdue University. and PAMELA J. VERMEER

GEOMETRIC CONSTRAINT SOLVING IN < 2 AND < 3. Department of Computer Sciences, Purdue University. and PAMELA J. VERMEER GEOMETRIC CONSTRAINT SOLVING IN < AND < 3 CHRISTOPH M. HOFFMANN Deartment of Comuter Sciences, Purdue University West Lafayette, Indiana 47907-1398, USA and PAMELA J. VERMEER Deartment of Comuter Sciences,

More information

On combining chase-2 and sum-product algorithms for LDPC codes

On combining chase-2 and sum-product algorithms for LDPC codes University of Wollongong Research Online Faculty of Engineering and Information Sciences - Papers: Part A Faculty of Engineering and Information Sciences 2012 On combining chase-2 and sum-product algorithms

More information

A DEA-bases Approach for Multi-objective Design of Attribute Acceptance Sampling Plans

A DEA-bases Approach for Multi-objective Design of Attribute Acceptance Sampling Plans Available online at htt://ijdea.srbiau.ac.ir Int. J. Data Enveloment Analysis (ISSN 2345-458X) Vol.5, No.2, Year 2017 Article ID IJDEA-00422, 12 ages Research Article International Journal of Data Enveloment

More information

Distributed Estimation from Relative Measurements in Sensor Networks

Distributed Estimation from Relative Measurements in Sensor Networks Distributed Estimation from Relative Measurements in Sensor Networks #Prabir Barooah and João P. Hesanha Abstract We consider the roblem of estimating vectorvalued variables from noisy relative measurements.

More information

A joint multi-layer planning algorithm for IP over flexible optical networks

A joint multi-layer planning algorithm for IP over flexible optical networks A joint multi-layer lanning algorithm for IP over flexible otical networks Vasileios Gkamas, Konstantinos Christodoulooulos, Emmanouel Varvarigos Comuter Engineering and Informatics Deartment, University

More information

New Message-Passing Decoding Algorithm of LDPC Codes by Partitioning Check Nodes 1

New Message-Passing Decoding Algorithm of LDPC Codes by Partitioning Check Nodes 1 New Message-Passing Decoding Algorithm of LDPC Codes by Partitioning Check Nodes 1 Sunghwan Kim* O, Min-Ho Jang*, Jong-Seon No*, Song-Nam Hong, and Dong-Joon Shin *School of Electrical Engineering and

More information

Generic ILP-Based Approaches for Time-Multiplexed FPGA Partitioning

Generic ILP-Based Approaches for Time-Multiplexed FPGA Partitioning 1266 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 2, NO. 1, OCTOBER 21 Generic ILP-Based Aroaches for Time-Multilexed FPGA Partitioning Guang-Ming Wu, Jai-Ming Lin,

More information

Non-Strict Independence-Based Program Parallelization Using Sharing and Freeness Information

Non-Strict Independence-Based Program Parallelization Using Sharing and Freeness Information Non-Strict Indeendence-Based Program Parallelization Using Sharing and Freeness Information Daniel Cabeza Gras 1 and Manuel V. Hermenegildo 1,2 Abstract The current ubiuity of multi-core rocessors has

More information

arxiv: v1 [cs.dc] 13 Nov 2018

arxiv: v1 [cs.dc] 13 Nov 2018 Task Grah Transformations for Latency Tolerance arxiv:1811.05077v1 [cs.dc] 13 Nov 2018 Victor Eijkhout November 14, 2018 Abstract The Integrative Model for Parallelism (IMP) derives a task grah from a

More information

Argo Programming Guide

Argo Programming Guide Argo Programming Guide Evangelia Kasaaki, asmus Bo Sørensen February 9, 2015 Coyright 2014 Technical University of Denmark This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International

More information

Simple Memory Machine Models for GPUs

Simple Memory Machine Models for GPUs 2012 IEEE 2012 26th IEEE International 26th International Parallel Parallel and Distributed and Distributed Processing Processing Symosium Symosium Workshos Workshos & PhD Forum Simle Memory Machine Models

More information