DESIGN AND ANALYSIS OF LDPC DECODERS FOR SOFTWARE DEFINED RADIO

Size: px
Start display at page:

Download "DESIGN AND ANALYSIS OF LDPC DECODERS FOR SOFTWARE DEFINED RADIO"

Transcription

1 DESIGN AND ANALYSIS OF LDPC DECODERS FOR SOFTWARE DEFINED RADIO Sagwo Seo, Trevor Mudge Advaced Computer Architecture Laboratory Uiversity of Michiga at A Arbor {swseo, tm}@umich.edu Yumig Zhu, Chaitali Chakrabarti Departmet of Electrical Egieerig Arioa State Uiversity {yumig, chaitali}@asu.edu ABSTRACT Low Desity Parity Check (LDPC) codes are oe of the most promisig error correctio codes that are beig adopted by may wireless stadards. This paper presets a case study for a scalable LDPC decoder supportig multiple code rates ad multiple block sies o a software defied radio (SDR) platform. Sice techology scalig aloe is ot sufficiet for curret SDR architectures to meet the requiremets of the ext geeratio wireless stadards, this paper presets three techiques to improve the throughput performace. The techiques are use of data path accelerators, additio of memory uits ad additio of a few assembly istructios. The proposed LDPC decoder implemetatio achieved 30.4 Mbps decodig throughput for the =2304 ad R=5/6 LDPC code outlied i the IEEE e stadard. Idex Terms LDPC, Mi-sum iterative decodig, SDR, SODA, 1. INTRODUCTION Low desity parity check (LPDC) codes have excellet error correctio performace that approaches the Shao capacity limit [1], [2]. As a result, they have bee adopted i may curret ad ext geeratio wireless protocols such as DVB- S2 ad the IEEE e stadard (WiMAX). Decoders used for LDPC codes have high throughput requiremets ad have bee successfully implemeted usig ASICs ad FPGAs [3]. However, the emergece of a wide variety of wireless protocols that are rapidly chagig makes custom hardware for these decoders relatively time cosumig ad expesive to develop. Software Defied Radio (SDR) is a programmable hardware platform capable of supportig software implemetatios of wireless commuicatio protocols for physical layers [4]. This paper presets a case study for a LDPC decoder implemetatio that supports multiple code rates ad multiple block sies o a SDR platform, SODA (Sigal-processig O-Demad Architecture). SODA is a multiprocessor architecture, where each processor is equipped with a 32-wide (Sigle Istructio Multiple Data) pipelie, a scalar pipelie ad scratchpad memories. Whe the LDPC matrix Fig. 1. LDPC matrix H ad the correspodig bipartite graph is represeted by structured submatrices, the data-level parallelism ca be efficietly hadled by the pipelie. However the curret SODA architecture is uable to meet the high decodig throughput ad the scalability requiremets (multiple block sies ad multiple code rates) of the IEEE e stadard. I this paper we preset use of data path accelerators, additio of memory uits ad additio of a few assembly istructios to address the throughput ad scalability requiremets. The proposed LDPC decoder implemetatio achieves 30.4 Mbps decodig throughput for the =2304 ad R=5/6 LDPC code outlied i the IEEE e stadard. The rest of the paper is orgaied as follows. Sectio 2 gives a brief overview of LDPC codes. Sectio 3 itroduces SODA, the -based high-performace DSP processor for SDR ad mappig of the LDPC decoder oto SODA. Sectio 4 describes LDPC accelerators, memory cotroller/buffer orgaiatio ad assembly support required for the high throughput scalable LDPC decoder implemetatio. Sectio 5 presets memory ad throughput aalysis of the augmeted architecture. Sectio 6 cocludes the paper Itroductio 2. LDPC BASICS A LDPC code is a class of liear block codes whose codewords satisfy a set of liear parity-check costraits [1]. These costraits are typically defied by a m-by- parity-check matrix H, whose m rows specify each of the m costraits (the umber of parity checks), ad represets the legth of a codeword. H is also characteried by W r ad W c, which represet the umber of 1 s i the rows ad colums, respec-

2 tively. A LDPC code ca be represeted by a bipartite graph, which cosists of two types of odes, Variable Nodes (VN) ad Check Nodes (CN). Check ode i is coected to variable ode j wheever h ij of H is o-ero. Fig. 1 describes the matrix H ad the correspodig bipartite graph of a simple LDPC code. Theoretically, the LDPC decodig process fiishes whe all parity-check equatios are satisfied. I reality, a predefied umber of iteratios (NUM) based o SNR is geerally used LDPC Matrix Partitio 2.2. LDPC Decodig Process LDPC codes are decoded iteratively usig a message passig algorithm [1]. This algorithm ivolves exchagig the belief iformatio amog the variable odes ad check odes that are coected by edges i the bipartite graph. Let I be the itrisic iformatio from the received sigal, L be the reliable iformatio for variable ode, L,m be the iformatio coveyed from variable ode to check ode m, ad E,m be the extrisic iformatio geerated i check ode m that is passed to variable ode. The belief iformatio is updated i a iterative maer ad implemeted i two phases. I the first phase, the variable odes sed their belief iformatio, L,m, to check odes coected to them; i the secod phase, the check odes sed the updated belief iformatio (ew E,m ) to the variable odes coected to them for updatig L (See Fig. 1). The iteratio steps are summaried i Algorithm 1. Algorithm 1: Mi-sum LDPC Decodig Algorithm 1. Iitialiatio: E,m = 0, L = I 2. VN to CN: L,m = L - E,m 3. Update E,m: E,m ew = f(l,m S N(m)) 4. Update L : L ew = L,m + E,m ew 5. Repeat the steps 2,3,4 for NUM iteratio times 6. Make a decisio of bit based o the correspodig L value Here, N(m) is the set of variable odes which are coected with check ode m i the bipartite graph. Similarly, M() is the set of check odes which are coected with variable ode. The decodig algorithms differ i how the fuctio f i Step 3 of Algorithm 1 is evaluated. There are three optios for the LDPC iterative decodig algorithm: Belief Propagatio (BP), λ-mi ad mi-sum algorithms [5]. Although BP ad λ-mi algorithms show better error correctio performace compared to mi-sum algorithm, these algorithms require a look-up table for hyperbolic fuctio values, which requires additioal memory space. The mi-sum algorithm is selected here because of the limited memory sie ad easy computatio patters. The mi-sum algorithm f is show as follows. Here, N(m),. E ew,m = - ( sig(l,m)) mi L,m As ca be see, the operatios i the mi-sum LDPC decodig algorithm are limited to additio, subtractio ad fidig a miimum value, all of which ca be supported by our SDR architecture described i Sectio 3. Fig. 2. Partitioig of H ito -by- cyclic idetity matrices A LDPC matrix H has radomly distributed 1 s which results i complex data routig ad is a major challege for buildig a high-performace ad low-power LDPC decoder. [3] ad [6] show that itroductio of some structural regularity i the matrix does ot degrade its error correctio performace. Moreover the regularity eables partially parallel implemetatio of LDPC decoders ad has bee utilied i the IEEE e stadard. Fig. 2 shows the partitioig of H ito -by- cyclic idetity submatrices. Here, I x represets a cyclic idetity matrix with rows shifted cyclically to the right by x positios. This characteristic reduces the routig overhead ad has bee exploited efficietly i our architecture. Fig. 2 also shows how the of the idetity matrices alog a row ca be grouped to form a block row. So, i essece, the H matrix ca also be partitioed ito m block rows each of sie -by-. 3. SDR ARCHITECTURE I this sectio, we preset the -based SDR architecture, SODA [4]. This architecture was iitially desiged to support wireless protocols such as WCDMA ad IEEE a SODA Overview The SODA multiprocessor architecture is show i Fig. 3. It cosists of multiple data processig elemets (s), oe cotrol processor ad a global scratchpad memory, all of which are coected through a shared bus. Each SODA cosists of five major compoets: 1) a 32-way, 16-bit datapath pipelie for supportig vector operatios. Each datapath icludes oe 16-bit ALU with multiplier ad a 2 readport, 1 write-port 16 etry register file. Itra-processor data movemets are supported through a Shuffle Network

3 Itercoect Bus SODA System Cotrol Processor Global Scratchpad Memory Executio Memories Uit Executio Memories Uit Executio Memories Uit SODA To System Bus 5. DMA RF DMA ALU 3. memory Memory Scalar Memory 512-bit Reg. File Scalar RF E X S T V E X 1. pipelie Pred. Regs 512-bit W B ALU+ Mult Shuffle W Network B (SSN) to V Scalar T S (VtoS) 2. Scalar pipelie Scalar W ALU B code as specified by the IEEE e stadard o a SODA. We describe the ehacemets that had to be made i terms of accelerators, memory uits, ad ew assembly istructios to support multiple code rates ad multiple block sies. Fig. 4 shows the modified pipelie the additioal uits have bee show usig shaded blocks. AGU RF E X AGU ALU W B 4. AGU pipelie Fig. 3. SDR architecture: SODA [4] (SSN); 2) a 16-bit datapath scalar pipelie for sequetial operatios. The scalar pipelie executes i lock-step with pipelie; -to-scalar ad scalar-to- operatios exchage data betwee two pipelies; 3) two local scratchpad memories for the pipelie ad the scalar pipelie; 4) a AGU (Address-Geeratio-Uit) pipelie for providig the addresses for local memory accesses; ad 5) a programmable DMA (Direct-Memory-Access) uit to trasfer data betwee scratchpad memories ad iterface with the outside system (iter-processor data trasfer). The pipelie, the scalar pipelie ad the AGU pipelie execute i VLIW-styled lockstep maer, cotrolled with oe program couter (PC) [4] LDPC o SODA The mi-sum LDPC decodig algorithm (Algorithm 1) is map ped oto SODA i the followig way. Step 2 of Algorithm 1 is applied to o-ero -by- submatrices. However, because Step 3 uses the L,m values related with check ode m, the pipelie loads values of type L ad aligs the data i check ode order by usig SSN before executig Step 2. The shuffled L,m values for all o-ero -by- submatrices i oe -by- block row are calculated i the datapath. After that, the -to-scalar uit is used for fidig the miimum E,m ew amog W r of L,m values for the same check ode m. Next, E,m ew ad the correspodig sig idicator are used to update a L value (Step 4). This procedure implies that some slices execute additios ad others execute subtractios based o sig values a feature that is supported by predicated istructios i SODA. After updatig the L values, the data is iversely shuffled ad stored i variable ode order. This process is repeated for every -by- block row i every iteratio. 4. SCALABLE LDPC IMPLEMENTATION I this sectio, we study a scalable LDPC decoder implemetatio for block sie, code rate R=k/, ad (W c, W r )-LDPC Fig. 4. Modified pipelie i a SODA 4.1. LDPC Accelerator I order to meet the high decodig throughput requiremets, we itroduce a LDPC accelerator i every slice as show i the Fig. 4. There are oly two possible E,m ew values for check ode m i Step 3 of Algorithm 1 (which are selected from W r values of type L,m ): the miimum E m1 ad the secod miimum E m2. Each LDPC accelerator expedites fidig the miimum values usig two compare/store uits with two W r -bit special registers, a selectio register P m ad a sig register S m, as ca be see i Fig. 5. The operatio of the LDPC accelerator is summaried below. The Algorithm of LDPC Accelerator if (L,m <= Em1) \\ operatios i Cmp&Store 1 { Em1 <= L,m; Em2 <= Em1; if (L,m < Em1) Pm = 1 << i; else Pm = 0; } else if (L,m < Em2) \\ operatios i Cmp&Store 2 { Em2 <= L,m; } Sm = (Sm sig(l,m)) << 1; E m1, E m2, P m ad S m are extracted usig a flush sigal ad these values are used to compute E m, usig the followig operatio (Step 7 ad 14 of Algorithm 2). if (P m[i] == 1) E m,[i] = (S m[i]) E m1, else E m,[i] = (S m[i]) E m2

4 m start =0. This is doe for all o-ero W r submatrices i a -by- block row. At the ed of this process, BUF1 cotais W r groups of aliged L data (see Fig. 6). I a similar way, the memory cotroller fills BUF2 for L update data with aother shift amout ((s s update ) mod ) (to be described i Sectio. 4.3). Note that the width of BUF1 ad BUF2 is Memory Uits Fig. 5. LDPC accelerator A major challege i decodig LDPC codes is the large umber of data aligmet operatios required for every -by- permutatio matrix. values of type L eed to be shuffled so that they ca be correctly aliged for check ode processig. If is less tha the width ( ), the data aligmet ca be executed i oe clock cycle usig SSN. However, the IEEE e stadard uses differet values (24, 28, 32,..., 96) for differet block sies [7]. If is larger tha, may clock cycles are required for data aligmet operatio whe SSN is used. This causes a degradatio i the LDPC decodig throughput performace. To solve the aligmet issue, we propose a memory cotroller ad buffer orgaiatio (istead of usig the shuffle etwork) as show i Fig. 4. BUF1 ad BUF2 cotai aliged (to be described i Sectio. 4.3) respectively; BUF3 cotais E m1 ad E m2 ; ad BUF4 cotais P m ad S m. L ad L update 4.3. Modified Decodig Algorithm Algorithm 2 shows the LDPC decodig algorithm o the modified SODA architecture. The L ad L update values are aliged ad stored i BUF1 ad BUF2 (Steps 1 ad 2 of Algorithm 2). The aliged values of L ad L update (Step 5) alog with E m1, E m2 (Step 4), P m ad S m (Step 6) of the first row of the first group (see for example Group 1 i Fig. 6) are fed to the ALU uit ad LDPC accelerator i each slice. These values are updated i Steps 7, 8, 9 of Algorithm 2. The process is repeated for the first row of the ext group (see for example Group 2 i Fig.6), ad so o. After completig processig of all the first rows of all the W r groups (Step 10), the updated values of E m1, E m2, P m ad S m are stored i their respective buffers (Steps 11, 12). The updated values are used to compute Em, ew ad L update (Step 15, 16) of the first row of each W r group (Step 17). The process is repeated for the secod row of each W r group, ad so o (Step 18). The above schedule results i high decodig throughput performace; it reduces the umber of data switches ad also speeds up the operatio of fidig the miimum values i the mi-sum decodig algorithm. After processig all the data for oe -by- block row, the data for the ext -by- block row is loaded ito BUF1 ad BUF2, ad the process repeats the umber of -by- block rows(= (1 R) ) times. Algorithm 2: LDPC decodig algorithm i the modified SODA Fig. 6. Data aligmet i buffers The memory cotroller hadles movemet of L data betwee the memory ad BUF1. Sice the -by- permutatio matrices i the LDPC codes used i the IEEE e stadard are circular right-shifted idetity matrices, each permutatio matrix ca be defied by a sigle right-shifted amout s. The aligmet operatio ca ow be achieved by two memory copy operatios described below. If the shifted amout is s ad the start memory address is m start, the memory cotroller first copies MEM[m start + s... m start + 1] to BUF1, ad the copies MEM[m start... m start + s 1] to BUF1. This is show i Fig. 6 for a example where s=5, 1. load aliged L to BUF1 2. load aliged L update to BUF2 3. load W r for the curret -by- block row 4. load E m1, E m2 from BUF3 5. load L, L update 6. load P m, S m from BUF4 from BUF1, BUF2 7. compute E curr m, usig E m1, E m2, P m, ad S m. 8. update L,m = L + L update - E curr m, 9. update E m1, E m2, P m, ad S m usig L,m 10. repeat step 5 to step 9 W r times 11. store updated E m1, E m2 (E ew m1,e ew m2 ) i BUF3 12. store updated P m, S m (P ew m 13. load L update from BUF2 agai 14. compute E ew m, usig E ew m1, E ew, Sm ew ) i BUF4 m2, Pm ew, ad S ew m

5 15. update L update += Em, ew 16. store updated L update i MEM 17. repeat step 12 to step 16 W r times 18. repeat step 4 to step 17 times 19. repeat step 1 to step 18 (1 R) times. 20. repeat step 1 to step 19 NUM times. I order to reduce the memory for storig L,m, we itroduce the parameter L update, which is (-E,m + E,m ew ). I fact, the memory space is reduced by a factor of m by keepig oe L update value for each check ode istead of storig all L,m values for every ad m combiatio. Sice updated L update values are processed i check ode order, iverse aligmet operatio is required to store the data i variable ode order i memory. After L update is stored back i memory, for the ext -by- block row computatio, the data is realiged with a differet shift amout. However, these two aligmet operatios ca be reduced to oe aligmet operatio usig aother shift amout s update ; istead of iverse aligmet operatio, L update is stored with the curret shifted amout s update ad the, i the ext iteratio, the memory cotroller use ((s s update ) mod ) as a shift amout to alig L update Assembly Support New assembly istructios are required for the proposed architecture to improve the decodig throughput performace. Steps 1 ad 2 of Algorithm 2 are idepedet ad ca be executed i parallel. These are combied to form istructio ldpc mem2buf. Similarly steps 5 ad 6 of Algorithm 2 ca be executed i parallel ad combied to form istructio ldbufs. Steps 8 ad 9 of Algorithm 2 ca be executed i a pipelied maer through the ALU uit ad the LDPC accelerator uit. We combie these two istructios ad itroduce a macro-operatio istructio, ldpc i. To implemet steps 11 ad 12 of Algorithm 2, the ew istructio, ldpc out.(vp), is itroduced to flush E m1, E m2, P m, ad S m from LDPC accelerators ad store them i BUF3 ad BUF4. The additioal ew assembly istructios are listed below. The New Assembly Istructios 1. ldpc mem2buf Addr[Mem],Addr[BUF1],Addr[BUF2],S1,S2 : sed a cotrol sigal to the memory cotroller : the cotroller loads L,L update from a memory ad aligs the data with shift amouts (S1, S2) i BUF1 ad BUF2 2. ldbuf3 V3,V4,Addr[BUF3] : load V3=E m1, V4=E m2 from BUF3 3. ldbufs V1,V2,P1,P2,Addr[BUF1],Addr[BUF2],Addr[BUF4] : load V1=L, V2=L update, P1=P m, P2=S m from BUF1, BUF2, BUF4 4. ldpc i V1,V6 : 1) calculate L,m with V1=L ad V6=L update E curr m, : 2) update E m1,e m2,p m,s m i LDPC accelerators with L,m. 5. ldpc out.v V7,V8,Addr[BUF3] : extract V7=E m1, V8=E m2 from LDPC accelerators ad store them i BUF3 6. ldpc out.p P3,P4,Addr[BUF4] : extract P3=P m, P4=S m from LDPC accelerators ad store them i BUF4 The overhead of addig these ew istructios is the icreased istructio bit width ad the istructio decoder complexity Scalability Issues The proposed architecture supports differet values of ad W r correspodig to the differet code sies ad code rates madated by the IEEE e stadard. The memory cofiguratio described i Sectio 4.2 hadles the more difficult case of whe >. Larger results i more computatios ad so a larger would help i achievig higher decodig throughput. The pealty is the larger area, both is terms of datapath ad memory, ad larger power. The parameter W r affects the decodig throughput (umber of iteratios i Algorithm 2). Sice it also affects the buffer sie ad P m, S m registers i the LDPC accelerators, the architecture has to be desiged for the largest value of W r. 5. ANALYSIS I this sectio, we study the required memory ad buffer sie, ad also aalye the improvemet i the decodig throughput due to the memory orgaiatio, datapath accelerators ad assembly istructio support Memory Sie Aalysis LDPC decodig process cosists of computatioally simple operatios ad multiple memory operatios. As a result, if the memory is ot orgaied properly, the it is highly likely that the pipelie would have to wait for the data to arrive. I a typical implemetatio, there are four mai values that are to be stored: L, L,m, E,m, ad shuffle iformatio. For =2304 ad R=5/6 LDPC codes outlied i the IEEE e stadard, a brute-force decodig method eeds 3.456GB for storig the L,m ad E,m values. Eve if we cosider oly o-ero elemets, the storage still requires 30KB (15KB+15KB), which is a still large memory space for a SDR platform. Therefore, a ew scheme to reduce memory space should be cosidered. There is o way to reduce the storage of L because the data is used to decide the fial decoded bit value. However, the storage for L,m ad E,m ca be sigificatly reduced.

6 To reduce E,m storage sie, we exploited the fact that there are oly two possible E,m ew values for check ode m: E m1 ad E m2. This two-miimum method reduces the required memory space by a factor of W r /2. For the case metioed above, the storage requiremet for E,m values is reduced to 1.5KB. Also, istead of storig all L,m values, we store L update values, thereby reducig the storage by a factor of m(=4) to 3.75KB. Storage Sie(B) Ex.(KB) MEM: L, L update 4 9 BUF1: L 2 W r 3.75 BUF2: L update 2 W r 3.75 BUF3: E m1, E m2 4 (1 R) 1.5 BUF4: P m, S m 2W r (1 R) 0.94 Table 1. Memory/Buffer requiremets for =2304 ad R=5/6 LDPC code i the IEEE e stadard Table 1 summaries the memory ad buffer requiremets for a block sie, code rate R=k/, ad (W c,w r )-LDPC code. We list the memory requiremets for =2304 ad R=5/6 LDPC code (the IEEE e stadard) whe = 32, W r = 20, ad = 96 uder the colum Ex. i the table Throughput Aalysis The data path accelerators, the memory uits, ad the ew istructios all help i icreasig the decodig throughput. For the =2304 ad R=5/6 LDPC code i the IEEE e stadard ad for NUM=10, the achievable clock cycle reductios for each of the ehacemets are show i Table 2. Here is the umber of cycles i the origial SODA implemetatio. Red. (Cycles) % red. LDPC Accelerators 5760(40000) 14.4 Memory Uits 6912(40000) 17.3 New Istructios 4608(40000) 11.5 Table 2. Cycle reductios due to ehacemets The proposed SODA is implemeted i 0.18um techology ad is clocked at 400MH. The LDPC decodig throughput for =2304 ad R=5/6 LDPC code ca be boosted from 18.3 Mbps to 30.4 Mbps usig the proposed ehacemets. With techology scalig, the decodig throughput is expected to icrease to aroud 62.2 Mbps i 90m techology. The area ad power overhead i the datapath ad memory is quite small. For istace the area of the memory cotroller ad LDPC accelerators is egligible (5.37%) compared to the origial desig. However the complexity of addig CISCtype istructios requires careful evaluatio. 6. CONCLUSION I this paper, we preseted a software-hardware co-desig case study of LDPC decoder for SDR. We first provided a overview of LDPC codes ad the showed how LDPC decodig ca be doe by the SDR architecture. Next we showed how use of datapath accelerators, memory buffers ad additioal istructios ca be used to improve the decodig throughput performace. We implemeted a scalable LDPC decoder for the IEEE e stadard. Our results show that we ca achieve 30.4 Mbps decodig throughput for =2304 ad R=5/6 LDPC code. 7. ACKNOWLEDGEMENT This research is supported i part by ARM Ltd., NSF CSR- EHS , NSF ITR ad The Korea Foudatio for Advaced Studies. 8. REFERENCES [1] Gallager, Low-desity parity-check codes, IRE Trasactios o Iformatio Theory, vol. IT-8, o.1, pp , Jauary [2] D.J.C.MacKay; R.M.Neal, Near shao-limit performace of low-desity parity-check codes, Electroics letters, vol. 32, pp , August [3] M.M.Masour; N.R.Shabhag, High-throughput ldpc decoders, IEEE Trasactios o VLSI Systems, vol. 11, o.6, pp , December [4] Y.Li et. al., Soda: A low-power architecture for software radio, Proceedigs of the 33rd Aual Iteratioal Symposium o Computer Architecture (ISCA), [5] F.Guilloud; E.Boutillo; J.L.Dager, λ-mi decodig algorithm of regular ad irregular ldpc codes, 3rd Iteratioal Symposium o Turbo Codes & related topics, September [6] D.E.Hocevar, A reduced complexity decoder architecture via layered decodig of ldpc codes, IEEE Workshop o Sigal Processig Systems, pp , [7] IEEE Std e-2005, available at pdf, February 2006.

Elementary Educational Computer

Elementary Educational Computer Chapter 5 Elemetary Educatioal Computer. Geeral structure of the Elemetary Educatioal Computer (EEC) The EEC coforms to the 5 uits structure defied by vo Neuma's model (.) All uits are preseted i a simplified

More information

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation Improvemet of the Orthogoal Code Covolutio Capabilities Usig FPGA Implemetatio Naima Kaabouch, Member, IEEE, Apara Dhirde, Member, IEEE, Saleh Faruque, Member, IEEE Departmet of Electrical Egieerig, Uiversity

More information

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design College of Computer ad Iformatio Scieces Departmet of Computer Sciece CSC 220: Computer Orgaizatio Uit 11 Basic Computer Orgaizatio ad Desig 1 For the rest of the semester, we ll focus o computer architecture:

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter The Processor Part A path Desig Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler. CPI ad

More information

Chapter 3 Classification of FFT Processor Algorithms

Chapter 3 Classification of FFT Processor Algorithms Chapter Classificatio of FFT Processor Algorithms The computatioal complexity of the Discrete Fourier trasform (DFT) is very high. It requires () 2 complex multiplicatios ad () complex additios [5]. As

More information

Chapter 4 The Datapath

Chapter 4 The Datapath The Ageda Chapter 4 The Datapath Based o slides McGraw-Hill Additioal material 24/25/26 Lewis/Marti Additioal material 28 Roth Additioal material 2 Taylor Additioal material 2 Farmer Tae the elemets that

More information

UNIVERSITY OF MORATUWA

UNIVERSITY OF MORATUWA UNIVERSITY OF MORATUWA FACULTY OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING B.Sc. Egieerig 2014 Itake Semester 2 Examiatio CS2052 COMPUTER ARCHITECTURE Time allowed: 2 Hours Jauary 2016

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 4 The Processor Pipeliig Sigle-Cycle Disadvatages & Advatages Clk Uses the clock cycle iefficietly the clock cycle must

More information

Appendix D. Controller Implementation

Appendix D. Controller Implementation COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Appedix D Cotroller Implemetatio Cotroller Implemetatios Combiatioal logic (sigle-cycle); Fiite state machie (multi-cycle, pipelied);

More information

k (check node degree) and j (variable node degree)

k (check node degree) and j (variable node degree) A Parallel Turbo Decodig Message Passig Architecture for Array LDPC Codes Kira Guam, Pakaj Bhagawat, Weihuag Wag, Gwa Choi, Mark Yeary * Dept. of Electrical Egieerig, Texas A&M Uiversity, College Statio,

More information

EE260: Digital Design, Spring /16/18. n Example: m 0 (=x 1 x 2 ) is adjacent to m 1 (=x 1 x 2 ) and m 2 (=x 1 x 2 ) but NOT m 3 (=x 1 x 2 )

EE260: Digital Design, Spring /16/18. n Example: m 0 (=x 1 x 2 ) is adjacent to m 1 (=x 1 x 2 ) and m 2 (=x 1 x 2 ) but NOT m 3 (=x 1 x 2 ) EE26: Digital Desig, Sprig 28 3/6/8 EE 26: Itroductio to Digital Desig Combiatioal Datapath Yao Zheg Departmet of Electrical Egieerig Uiversity of Hawaiʻi at Māoa Combiatioal Logic Blocks Multiplexer Ecoders/Decoders

More information

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers *

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers * Load balaced Parallel Prime umber Geerator with Sieve of Eratosthees o luster omputers * Soowook Hwag*, Kyusik hug**, ad Dogseug Kim* *Departmet of Electrical Egieerig Korea Uiversity Seoul, -, Rep. of

More information

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 12: Virtual Memory Prof. Yajig Li Uiversity of Chicago A System with Physical Memory Oly Examples: most Cray machies early PCs Memory early all embedded systems

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 18 Strategies for Query Processig Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio DBMS techiques to process a query Scaer idetifies

More information

1. SWITCHING FUNDAMENTALS

1. SWITCHING FUNDAMENTALS . SWITCING FUNDMENTLS Switchig is the provisio of a o-demad coectio betwee two ed poits. Two distict switchig techiques are employed i commuicatio etwors-- circuit switchig ad pacet switchig. Circuit switchig

More information

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control EE 459/500 HDL Based Digital Desig with Programmable Logic Lecture 13 Cotrol ad Sequecig: Hardwired ad Microprogrammed Cotrol Refereces: Chapter s 4,5 from textbook Chapter 7 of M.M. Mao ad C.R. Kime,

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 19 Query Optimizatio Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Query optimizatio Coducted by a query optimizer i a DBMS Goal:

More information

End Semester Examination CSE, III Yr. (I Sem), 30002: Computer Organization

End Semester Examination CSE, III Yr. (I Sem), 30002: Computer Organization Ed Semester Examiatio 2013-14 CSE, III Yr. (I Sem), 30002: Computer Orgaizatio Istructios: GROUP -A 1. Write the questio paper group (A, B, C, D), o frot page top of aswer book, as per what is metioed

More information

Software development of components for complex signal analysis on the example of adaptive recursive estimation methods.

Software development of components for complex signal analysis on the example of adaptive recursive estimation methods. Software developmet of compoets for complex sigal aalysis o the example of adaptive recursive estimatio methods. SIMON BOYMANN, RALPH MASCHOTTA, SILKE LEHMANN, DUNJA STEUER Istitute of Biomedical Egieerig

More information

A Study on the Performance of Cholesky-Factorization using MPI

A Study on the Performance of Cholesky-Factorization using MPI A Study o the Performace of Cholesky-Factorizatio usig MPI Ha S. Kim Scott B. Bade Departmet of Computer Sciece ad Egieerig Uiversity of Califoria Sa Diego {hskim, bade}@cs.ucsd.edu Abstract Cholesky-factorizatio

More information

Task scenarios Outline. Scenarios in Knowledge Extraction. Proposed Framework for Scenario to Design Diagram Transformation

Task scenarios Outline. Scenarios in Knowledge Extraction. Proposed Framework for Scenario to Design Diagram Transformation 6-0-0 Kowledge Trasformatio from Task Scearios to View-based Desig Diagrams Nima Dezhkam Kamra Sartipi {dezhka, sartipi}@mcmaster.ca Departmet of Computig ad Software McMaster Uiversity CANADA SEKE 08

More information

Lecture 1: Introduction and Strassen s Algorithm

Lecture 1: Introduction and Strassen s Algorithm 5-750: Graduate Algorithms Jauary 7, 08 Lecture : Itroductio ad Strasse s Algorithm Lecturer: Gary Miller Scribe: Robert Parker Itroductio Machie models I this class, we will primarily use the Radom Access

More information

CMSC Computer Architecture Lecture 5: Pipelining. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 5: Pipelining. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 5: Pipeliig Prof. Yajig Li Uiversity of Chicago Admiistrative Stuff Lab1 Due toight Lab2: out later today; due 2 weeks from ow Review sessio this Friday Turig award

More information

Switching Hardware. Spring 2018 CS 438 Staff, University of Illinois 1

Switching Hardware. Spring 2018 CS 438 Staff, University of Illinois 1 Switchig Hardware Sprig 208 CS 438 Staff, Uiversity of Illiois Where are we? Uderstad Differet ways to move through a etwork (forwardig) Read sigs at each switch (datagram) Follow a kow path (virtual circuit)

More information

Reversible Realization of Quaternary Decoder, Multiplexer, and Demultiplexer Circuits

Reversible Realization of Quaternary Decoder, Multiplexer, and Demultiplexer Circuits Egieerig Letters, :, EL Reversible Realizatio of Quaterary Decoder, Multiplexer, ad Demultiplexer Circuits Mozammel H.. Kha, Member, ENG bstract quaterary reversible circuit is more compact tha the correspodig

More information

IMP: Superposer Integrated Morphometrics Package Superposition Tool

IMP: Superposer Integrated Morphometrics Package Superposition Tool IMP: Superposer Itegrated Morphometrics Package Superpositio Tool Programmig by: David Lieber ( 03) Caisius College 200 Mai St. Buffalo, NY 4208 Cocept by: H. David Sheets, Dept. of Physics, Caisius College

More information

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution Multi-Threadig Hyper-, Multi-, ad Simultaeous Thread Executio 1 Performace To Date Icreasig processor performace Pipeliig. Brach predictio. Super-scalar executio. Out-of-order executio. Caches. Hyper-Threadig

More information

Fully Parallel Window Decoder Architecture for Spatially-Coupled LDPC Codes

Fully Parallel Window Decoder Architecture for Spatially-Coupled LDPC Codes Fully Parallel Widow Decoder Architecture for Spatially-Coupled LDPC Codes Najeeb Ul Hassa, Marti Schlüter, ad Gerhard P. Fettweis Vodafoe Chair Mobile Commuicatios Systems, Dresde Uiversity of Techology

More information

Ones Assignment Method for Solving Traveling Salesman Problem

Ones Assignment Method for Solving Traveling Salesman Problem Joural of mathematics ad computer sciece 0 (0), 58-65 Oes Assigmet Method for Solvig Travelig Salesma Problem Hadi Basirzadeh Departmet of Mathematics, Shahid Chamra Uiversity, Ahvaz, Ira Article history:

More information

Multiprocessors. HPC Prof. Robert van Engelen

Multiprocessors. HPC Prof. Robert van Engelen Multiprocessors Prof. Robert va Egele Overview The PMS model Shared memory multiprocessors Basic shared memory systems SMP, Multicore, ad COMA Distributed memory multicomputers MPP systems Network topologies

More information

CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW. Prof. Yanjing Li University of Chicago

CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW. Prof. Yanjing Li University of Chicago CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW Prof. Yajig Li Uiversity of Chicago Admiistrative Stuff Lab2 due toight Exam I: covers lectures 1-9 Ope book, ope otes, close device

More information

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19 CIS Data Structures ad Algorithms with Java Sprig 09 Stacks, Queues, ad Heaps Moday, February 8 / Tuesday, February 9 Stacks ad Queues Recall the stack ad queue ADTs (abstract data types from lecture.

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5 Morga Kaufma Publishers 26 February, 28 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Set-Associative Cache Architecture Performace Summary Whe CPU performace icreases:

More information

Pattern Recognition Systems Lab 1 Least Mean Squares

Pattern Recognition Systems Lab 1 Least Mean Squares Patter Recogitio Systems Lab 1 Least Mea Squares 1. Objectives This laboratory work itroduces the OpeCV-based framework used throughout the course. I this assigmet a lie is fitted to a set of poits usig

More information

An Efficient Algorithm for Graph Bisection of Triangularizations

An Efficient Algorithm for Graph Bisection of Triangularizations A Efficiet Algorithm for Graph Bisectio of Triagularizatios Gerold Jäger Departmet of Computer Sciece Washigto Uiversity Campus Box 1045 Oe Brookigs Drive St. Louis, Missouri 63130-4899, USA jaegerg@cse.wustl.edu

More information

EE123 Digital Signal Processing

EE123 Digital Signal Processing Last Time EE Digital Sigal Processig Lecture 7 Block Covolutio, Overlap ad Add, FFT Discrete Fourier Trasform Properties of the Liear covolutio through circular Today Liear covolutio with Overlap ad add

More information

A Note on Least-norm Solution of Global WireWarping

A Note on Least-norm Solution of Global WireWarping A Note o Least-orm Solutio of Global WireWarpig Charlie C. L. Wag Departmet of Mechaical ad Automatio Egieerig The Chiese Uiversity of Hog Kog Shati, N.T., Hog Kog E-mail: cwag@mae.cuhk.edu.hk Abstract

More information

. Written in factored form it is easy to see that the roots are 2, 2, i,

. Written in factored form it is easy to see that the roots are 2, 2, i, CMPS A Itroductio to Programmig Programmig Assigmet 4 I this assigmet you will write a java program that determies the real roots of a polyomial that lie withi a specified rage. Recall that the roots (or

More information

Lecture 2. RTL Design Methodology. Transition from Pseudocode & Interface to a Corresponding Block Diagram

Lecture 2. RTL Design Methodology. Transition from Pseudocode & Interface to a Corresponding Block Diagram Lecture 2 RTL Desig Methodology Trasitio from Pseudocode & Iterface to a Correspodig Block Diagram Structure of a Typical Digital Data Iputs Datapath (Executio Uit) Data Outputs System Cotrol Sigals Status

More information

EE University of Minnesota. Midterm Exam #1. Prof. Matthew O'Keefe TA: Eric Seppanen. Department of Electrical and Computer Engineering

EE University of Minnesota. Midterm Exam #1. Prof. Matthew O'Keefe TA: Eric Seppanen. Department of Electrical and Computer Engineering EE 4363 1 Uiversity of Miesota Midterm Exam #1 Prof. Matthew O'Keefe TA: Eric Seppae Departmet of Electrical ad Computer Egieerig Uiversity of Miesota Twi Cities Campus EE 4363 Itroductio to Microprocessors

More information

Project 2.5 Improved Euler Implementation

Project 2.5 Improved Euler Implementation Project 2.5 Improved Euler Implemetatio Figure 2.5.10 i the text lists TI-85 ad BASIC programs implemetig the improved Euler method to approximate the solutio of the iitial value problem dy dx = x+ y,

More information

Master Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1

Master Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1 Advaced Architectures Master Iformatics Eg. 2017/18 A.J.Proeça Memory Hierarchy (most slides are borrowed) AJProeça, Advaced Architectures, MiEI, UMiho, 2017/18 1 Itroductio Programmers wat ulimited amouts

More information

Hash Tables. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015.

Hash Tables. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015. Presetatio for use with the textbook Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 2015 Hash Tables xkcd. http://xkcd.com/221/. Radom Number. Used with permissio uder Creative

More information

An Efficient Algorithm for Graph Bisection of Triangularizations

An Efficient Algorithm for Graph Bisection of Triangularizations Applied Mathematical Scieces, Vol. 1, 2007, o. 25, 1203-1215 A Efficiet Algorithm for Graph Bisectio of Triagularizatios Gerold Jäger Departmet of Computer Sciece Washigto Uiversity Campus Box 1045, Oe

More information

Cluster Analysis. Andrew Kusiak Intelligent Systems Laboratory

Cluster Analysis. Andrew Kusiak Intelligent Systems Laboratory Cluster Aalysis Adrew Kusiak Itelliget Systems Laboratory 2139 Seamas Ceter The Uiversity of Iowa Iowa City, Iowa 52242-1527 adrew-kusiak@uiowa.edu http://www.icae.uiowa.edu/~akusiak Two geeric modes of

More information

GPUMP: a Multiple-Precision Integer Library for GPUs

GPUMP: a Multiple-Precision Integer Library for GPUs GPUMP: a Multiple-Precisio Iteger Library for GPUs Kaiyog Zhao ad Xiaowe Chu Departmet of Computer Sciece, Hog Kog Baptist Uiversity Hog Kog, P. R. Chia Email: {kyzhao, chxw}@comp.hkbu.edu.hk Abstract

More information

Course Site: Copyright 2012, Elsevier Inc. All rights reserved.

Course Site:   Copyright 2012, Elsevier Inc. All rights reserved. Course Site: http://cc.sjtu.edu.c/g2s/site/aca.html 1 Computer Architecture A Quatitative Approach, Fifth Editio Chapter 2 Memory Hierarchy Desig 2 Outlie Memory Hierarchy Cache Desig Basic Cache Optimizatios

More information

Python Programming: An Introduction to Computer Science

Python Programming: An Introduction to Computer Science Pytho Programmig: A Itroductio to Computer Sciece Chapter 6 Defiig Fuctios Pytho Programmig, 2/e 1 Objectives To uderstad why programmers divide programs up ito sets of cooperatig fuctios. To be able to

More information

K-NET bus. When several turrets are connected to the K-Bus, the structure of the system is as showns

K-NET bus. When several turrets are connected to the K-Bus, the structure of the system is as showns K-NET bus The K-Net bus is based o the SPI bus but it allows to addressig may differet turrets like the I 2 C bus. The K-Net is 6 a wires bus (4 for SPI wires ad 2 additioal wires for request ad ackowledge

More information

CIS 121 Data Structures and Algorithms with Java Spring Stacks and Queues Monday, February 12 / Tuesday, February 13

CIS 121 Data Structures and Algorithms with Java Spring Stacks and Queues Monday, February 12 / Tuesday, February 13 CIS Data Structures ad Algorithms with Java Sprig 08 Stacks ad Queues Moday, February / Tuesday, February Learig Goals Durig this lab, you will: Review stacks ad queues. Lear amortized ruig time aalysis

More information

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS APPLICATION NOTE PACE175AE BUILT-IN UNCTIONS About This Note This applicatio brief is iteded to explai ad demostrate the use of the special fuctios that are built ito the PACE175AE processor. These powerful

More information

Evaluation of Distributed and Replicated HLR for Location Management in PCS Network

Evaluation of Distributed and Replicated HLR for Location Management in PCS Network JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 9, 85-0 (2003) Evaluatio of Distributed ad Replicated HLR for Locatio Maagemet i PCS Network Departmet of Computer Sciece ad Iformatio Egieerig Natioal Chiao

More information

Data diverse software fault tolerance techniques

Data diverse software fault tolerance techniques Data diverse software fault tolerace techiques Complemets desig diversity by compesatig for desig diversity s s limitatios Ivolves obtaiig a related set of poits i the program data space, executig the

More information

Heaps. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015

Heaps. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015 Presetatio for use with the textbook Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 201 Heaps 201 Goodrich ad Tamassia xkcd. http://xkcd.com/83/. Tree. Used with permissio uder

More information

The Magma Database file formats

The Magma Database file formats The Magma Database file formats Adrew Gaylard, Bret Pikey, ad Mart-Mari Breedt Johaesburg, South Africa 15th May 2006 1 Summary Magma is a ope-source object database created by Chris Muller, of Kasas City,

More information

1 Graph Sparsfication

1 Graph Sparsfication CME 305: Discrete Mathematics ad Algorithms 1 Graph Sparsficatio I this sectio we discuss the approximatio of a graph G(V, E) by a sparse graph H(V, F ) o the same vertex set. I particular, we cosider

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5.

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5. Morga Kaufma Publishers 26 February, 208 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Virtual Memory Review: The Memory Hierarchy Take advatage of the priciple

More information

ALU Augmentation for MPEG-4 Repetitive Padding

ALU Augmentation for MPEG-4 Repetitive Padding ALU Augmetatio for MPEG-4 Repetitive Paddig Georgi Kuzmaov Stamatis Vassiliadis Computer Egieerig Lab, Electrical Egieerig Departmet, Faculty of formatio Techology ad Systems, Delft Uiversity of Techology,

More information

Evaluation scheme for Tracking in AMI

Evaluation scheme for Tracking in AMI A M I C o m m u i c a t i o A U G M E N T E D M U L T I - P A R T Y I N T E R A C T I O N http://www.amiproject.org/ Evaluatio scheme for Trackig i AMI S. Schreiber a D. Gatica-Perez b AMI WP4 Trackig:

More information

Descriptive Statistics Summary Lists

Descriptive Statistics Summary Lists Chapter 209 Descriptive Statistics Summary Lists Itroductio This procedure is used to summarize cotiuous data. Large volumes of such data may be easily summarized i statistical lists of meas, couts, stadard

More information

CS 683: Advanced Design and Analysis of Algorithms

CS 683: Advanced Design and Analysis of Algorithms CS 683: Advaced Desig ad Aalysis of Algorithms Lecture 6, February 1, 2008 Lecturer: Joh Hopcroft Scribes: Shaomei Wu, Etha Feldma February 7, 2008 1 Threshold for k CNF Satisfiability I the previous lecture,

More information

Parallel Polygon Approximation Algorithm Targeted at Reconfigurable Multi-Ring Hardware

Parallel Polygon Approximation Algorithm Targeted at Reconfigurable Multi-Ring Hardware Parallel Polygo Approximatio Algorithm Targeted at Recofigurable Multi-Rig Hardware M. Arif Wai* ad Hamid R. Arabia** *Califoria State Uiversity Bakersfield, Califoria, USA **Uiversity of Georgia, Georgia,

More information

CMSC Computer Architecture Lecture 3: ISA and Introduction to Microarchitecture. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 3: ISA and Introduction to Microarchitecture. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 3: ISA ad Itroductio to Microarchitecture Prof. Yajig Li Uiversity of Chicago Lecture Outlie ISA uarch (hardware implemetatio of a ISA) Logic desig basics Sigle-cycle

More information

Fundamentals of. Chapter 1. Microprocessor and Microcontroller. Dr. Farid Farahmand. Updated: Tuesday, January 16, 2018

Fundamentals of. Chapter 1. Microprocessor and Microcontroller. Dr. Farid Farahmand. Updated: Tuesday, January 16, 2018 Fudametals of Chapter 1 Microprocessor ad Microcotroller Dr. Farid Farahmad Updated: Tuesday, Jauary 16, 2018 Evolutio First came trasistors Itegrated circuits SSI (Small-Scale Itegratio) to ULSI Very

More information

Data Structures Week #9. Sorting

Data Structures Week #9. Sorting Data Structures Week #9 Sortig Outlie Motivatio Types of Sortig Elemetary (O( 2 )) Sortig Techiques Other (O(*log())) Sortig Techiques 21.Aralık.2010 Boraha Tümer, Ph.D. 2 Sortig 21.Aralık.2010 Boraha

More information

Computer Systems - HS

Computer Systems - HS What have we leared so far? Computer Systems High Level ENGG1203 2d Semester, 2017-18 Applicatios Sigals Systems & Cotrol Systems Computer & Embedded Systems Digital Logic Combiatioal Logic Sequetial Logic

More information

CMSC Computer Architecture Lecture 11: More Caches. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 11: More Caches. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 11: More Caches Prof. Yajig Li Uiversity of Chicago Lecture Outlie Caches 2 Review Memory hierarchy Cache basics Locality priciples Spatial ad temporal How to access

More information

The Simeck Family of Lightweight Block Ciphers

The Simeck Family of Lightweight Block Ciphers The Simeck Family of Lightweight Block Ciphers Gagqiag Yag, Bo Zhu, Valeti Suder, Mark D. Aagaard, ad Guag Gog Electrical ad Computer Egieerig, Uiversity of Waterloo Sept 5, 205 Yag, Zhu, Suder, Aagaard,

More information

Probability of collisions in Soft Input Decryption

Probability of collisions in Soft Input Decryption Issue 1, Volume 1, 007 1 Probability of collisios i Soft Iput Decryptio Nataša Živić, Christoph Rulad Abstract I this work, probability of collisio i Soft Iput Decryptio has bee aalyzed ad calculated.

More information

CS2410 Computer Architecture. Flynn s Taxonomy

CS2410 Computer Architecture. Flynn s Taxonomy CS2410 Computer Architecture Dept. of Computer Sciece Uiversity of Pittsburgh http://www.cs.pitt.edu/~melhem/courses/2410p/idex.html 1 Fly s Taxoomy SISD Sigle istructio stream Sigle data stream (SIMD)

More information

Module Instantiation. Finite State Machines. Two Types of FSMs. Finite State Machines. Given submodule mux32two: Instantiation of mux32two

Module Instantiation. Finite State Machines. Two Types of FSMs. Finite State Machines. Given submodule mux32two: Instantiation of mux32two Give submodule mux32two: 2-to- MUX module mux32two (iput [3:] i,i, iput sel, output [3:] out); Module Istatiatio Fiite Machies esig methodology for sequetial logic -- idetify distict s -- create trasitio

More information

6.854J / J Advanced Algorithms Fall 2008

6.854J / J Advanced Algorithms Fall 2008 MIT OpeCourseWare http://ocw.mit.edu 6.854J / 18.415J Advaced Algorithms Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 18.415/6.854 Advaced Algorithms

More information

The Penta-S: A Scalable Crossbar Network for Distributed Shared Memory Multiprocessor Systems

The Penta-S: A Scalable Crossbar Network for Distributed Shared Memory Multiprocessor Systems The Peta-S: A Scalable Crossbar Network for Distributed Shared Memory Multiprocessor Systems Abdulkarim Ayyad Departmet of Computer Egieerig, Al-Quds Uiversity, Jerusalem, P.O. Box 20002 Tel: 02-2797024,

More information

COSC 1P03. Ch 7 Recursion. Introduction to Data Structures 8.1

COSC 1P03. Ch 7 Recursion. Introduction to Data Structures 8.1 COSC 1P03 Ch 7 Recursio Itroductio to Data Structures 8.1 COSC 1P03 Recursio Recursio I Mathematics factorial Fiboacci umbers defie ifiite set with fiite defiitio I Computer Sciece sytax rules fiite defiitio,

More information

An Efficient Implementation Method of Fractal Image Compression on Dynamically Reconfigurable Architecture

An Efficient Implementation Method of Fractal Image Compression on Dynamically Reconfigurable Architecture A Efficiet Implemetatio Method of Fractal Image Compressio o Dyamically Recofigurable Architecture Hidehisa Nagao, Akihiro Matsuura, ad Akira Nagoya NTT Commuicatio Sciece Laboratories 2-4 Hikaridai, Seika-cho,

More information

3D Model Retrieval Method Based on Sample Prediction

3D Model Retrieval Method Based on Sample Prediction 20 Iteratioal Coferece o Computer Commuicatio ad Maagemet Proc.of CSIT vol.5 (20) (20) IACSIT Press, Sigapore 3D Model Retrieval Method Based o Sample Predictio Qigche Zhag, Ya Tag* School of Computer

More information

Fast Fourier Transform (FFT) Algorithms

Fast Fourier Transform (FFT) Algorithms Fast Fourier Trasform FFT Algorithms Relatio to the z-trasform elsewhere, ozero, z x z X x [ ] 2 ~ elsewhere,, ~ e j x X x x π j e z z X X π 2 ~ The DFS X represets evely spaced samples of the z- trasform

More information

Improving Template Based Spike Detection

Improving Template Based Spike Detection Improvig Template Based Spike Detectio Kirk Smith, Member - IEEE Portlad State Uiversity petra@ee.pdx.edu Abstract Template matchig algorithms like SSE, Covolutio ad Maximum Likelihood are well kow for

More information

Joint Message-Passing Symbol-Decoding of LDPC Coded Signals over Partial-Response Channels

Joint Message-Passing Symbol-Decoding of LDPC Coded Signals over Partial-Response Channels Joit Message-Passig Symbol-Decodig of LDPC Coded Sigals over Partial-Respose Chaels Rathakumar Radhakrisha ad ae Vasić Departmet of Electrical ad Computer Egieerig Uiversity of Arizoa, Tucso, AZ-8572 Email:

More information

Ontology-based Decision Support System with Analytic Hierarchy Process for Tour Package Selection

Ontology-based Decision Support System with Analytic Hierarchy Process for Tour Package Selection 2017 Asia-Pacific Egieerig ad Techology Coferece (APETC 2017) ISBN: 978-1-60595-443-1 Otology-based Decisio Support System with Aalytic Hierarchy Process for Tour Pacage Selectio Tie-We Sug, Chia-Jug Lee,

More information

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON Roberto Lopez ad Eugeio Oñate Iteratioal Ceter for Numerical Methods i Egieerig (CIMNE) Edificio C1, Gra Capitá s/, 08034 Barceloa, Spai ABSTRACT I this work

More information

Lecture 5. Counting Sort / Radix Sort

Lecture 5. Counting Sort / Radix Sort Lecture 5. Coutig Sort / Radix Sort T. H. Corme, C. E. Leiserso ad R. L. Rivest Itroductio to Algorithms, 3rd Editio, MIT Press, 2009 Sugkyukwa Uiversity Hyuseug Choo choo@skku.edu Copyright 2000-2018

More information

Lecture 18. Optimization in n dimensions

Lecture 18. Optimization in n dimensions Lecture 8 Optimizatio i dimesios Itroductio We ow cosider the problem of miimizig a sigle scalar fuctio of variables, f x, where x=[ x, x,, x ]T. The D case ca be visualized as fidig the lowest poit of

More information

9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence

9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence _9.qxd // : AM Page Chapter 9 Sequeces, Series, ad Probability 9. Sequeces ad Series What you should lear Use sequece otatio to write the terms of sequeces. Use factorial otatio. Use summatio otatio to

More information

Image Segmentation EEE 508

Image Segmentation EEE 508 Image Segmetatio Objective: to determie (etract) object boudaries. It is a process of partitioig a image ito distict regios by groupig together eighborig piels based o some predefied similarity criterio.

More information

IS-IS in Detail. ISP Workshops

IS-IS in Detail. ISP Workshops IS-IS i Detail ISP Workshops These materials are licesed uder the Creative Commos Attributio-NoCommercial 4.0 Iteratioal licese (http://creativecommos.org/liceses/by-c/4.0/) Last updated 27 th November

More information

A REDUCED-COMPLEXITY LDPC DECODING ALGORITHM WITH CHEBYSHEV POLYNOMIAL FITTING

A REDUCED-COMPLEXITY LDPC DECODING ALGORITHM WITH CHEBYSHEV POLYNOMIAL FITTING Joural of Theoretical ad Applied Iformatio Techology st March. Vol. 49 No. 5 - JATIT & LLS. All rights reserved. ISSN: 99-8645 www.jatit.org E-ISSN: 87-95 A REDUCED-COMPLEXITY LDPC DECODING ALGORITHM WITH

More information

Chapter 3. Floating Point Arithmetic

Chapter 3. Floating Point Arithmetic COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 3 Floatig Poit Arithmetic Review - Multiplicatio 0 1 1 0 = 6 multiplicad 32-bit ALU shift product right multiplier add

More information

Goals of the Lecture UML Implementation Diagrams

Goals of the Lecture UML Implementation Diagrams Goals of the Lecture UML Implemetatio Diagrams Object-Orieted Aalysis ad Desig - Fall 1998 Preset UML Diagrams useful for implemetatio Provide examples Next Lecture Ð A variety of topics o mappig from

More information

High-Speed Computation of the Kleene Star in Max-Plus Algebra Using a Cell Broadband Engine

High-Speed Computation of the Kleene Star in Max-Plus Algebra Using a Cell Broadband Engine Proceedigs of the 9th WSEAS Iteratioal Coferece o APPLICATIONS of COMPUTER ENGINEERING High-Speed Computatio of the Kleee Star i Max-Plus Algebra Usig a Cell Broadbad Egie HIROYUKI GOTO ad TAKAHIRO ICHIGE

More information

Computer Graphics Hardware An Overview

Computer Graphics Hardware An Overview Computer Graphics Hardware A Overview Graphics System Moitor Iput devices CPU/Memory GPU Raster Graphics System Raster: A array of picture elemets Based o raster-sca TV techology The scree (ad a picture)

More information

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method A ew Morphological 3D Shape Decompositio: Grayscale Iterframe Iterpolatio Method D.. Vizireau Politehica Uiversity Bucharest, Romaia ae@comm.pub.ro R. M. Udrea Politehica Uiversity Bucharest, Romaia mihea@comm.pub.ro

More information

Efficient Hardware Design for Implementation of Matrix Multiplication by using PPI-SO

Efficient Hardware Design for Implementation of Matrix Multiplication by using PPI-SO Efficiet Hardware Desig for Implemetatio of Matrix Multiplicatio by usig PPI-SO Shivagi Tiwari, Niti Meea Dept. of EC, IES College of Techology, Bhopal, Idia Assistat Professor, Dept. of EC, IES College

More information

The isoperimetric problem on the hypercube

The isoperimetric problem on the hypercube The isoperimetric problem o the hypercube Prepared by: Steve Butler November 2, 2005 1 The isoperimetric problem We will cosider the -dimesioal hypercube Q Recall that the hypercube Q is a graph whose

More information

Page 1. Why Care About the Memory Hierarchy? Memory. DRAMs over Time. Virtual Memory!

Page 1. Why Care About the Memory Hierarchy? Memory. DRAMs over Time. Virtual Memory! Why Care About the Memory Hierarchy? Memory Virtual Memory -DRAM Memory Gap (latecy) Reasos: Multi process systems (abstractio & memory protectio) Solutio: Tables (holdig per process traslatios) Fast traslatio

More information

Behavioral Modeling in Verilog

Behavioral Modeling in Verilog Behavioral Modelig i Verilog COE 202 Digital Logic Desig Dr. Muhamed Mudawar Kig Fahd Uiversity of Petroleum ad Mierals Presetatio Outlie Itroductio to Dataflow ad Behavioral Modelig Verilog Operators

More information

Isn t It Time You Got Faster, Quicker?

Isn t It Time You Got Faster, Quicker? Is t It Time You Got Faster, Quicker? AltiVec Techology At-a-Glace OVERVIEW Motorola s advaced AltiVec techology is desiged to eable host processors compatible with the PowerPC istructio-set architecture

More information

Hardware Design and Performance Estimation of The 128-bit Block Cipher CRYPTON

Hardware Design and Performance Estimation of The 128-bit Block Cipher CRYPTON Hardware Desig ad Performace Estimatio of The 128-bit Block Cipher CRYPTON Eujog Hog, Jai-Hoo Chug, ad Chae Hoo Lim Iformatio ad Commuicatios Research Ceter Future Systems, Ic. 372-2 Yagjae-Dog, Seocho-Ku,

More information

Introduction. Nature-Inspired Computing. Terminology. Problem Types. Constraint Satisfaction Problems - CSP. Free Optimization Problem - FOP

Introduction. Nature-Inspired Computing. Terminology. Problem Types. Constraint Satisfaction Problems - CSP. Free Optimization Problem - FOP Nature-Ispired Computig Hadlig Costraits Dr. Şima Uyar September 2006 Itroductio may practical problems are costraied ot all combiatios of variable values represet valid solutios feasible solutios ifeasible

More information

Algorithms for Disk Covering Problems with the Most Points

Algorithms for Disk Covering Problems with the Most Points Algorithms for Disk Coverig Problems with the Most Poits Bi Xiao Departmet of Computig Hog Kog Polytechic Uiversity Hug Hom, Kowloo, Hog Kog csbxiao@comp.polyu.edu.hk Qigfeg Zhuge, Yi He, Zili Shao, Edwi

More information