Improved Convolutional Coding and Decoding of IEEE802.11n Based on General Purpose Processors

Size: px
Start display at page:

Download "Improved Convolutional Coding and Decoding of IEEE802.11n Based on General Purpose Processors"

Transcription

1 2013 th International Conference on Communications and Networking in China (CHINACOM) Improved Convolutional Coding and Decoding of IEEE02.11n Based on General Purpose Processors Yanuo Xu, Kai Niu, Zhiqiang He, Jiaru Lin Key Lab of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing 10076, China Abstract In this paper, the convolutional coding and decoding of 02.11n are improved on general purpose processor (GPP) software defined radio(sdr) platforms. The prototype makes extensive use of features of contemporary processor architectures to accele signal processing and satisfy protocol real-time requirements of IEEE02.11n, including large lowlatency caches to store lookup tables, and single instruction multiple data (SIMD) processor on GPPs. In the prototype, the Viterbi decoder employs the parallel structure and trace back decoding algorithm to improve performance. The simulation results show the prototype can satisfy the performance and realtime requirement of IEEE02.11n. Considering the rapid development of GPP, data processing capacity of our prototype will be further improved. Keywords-Convolutional code; 02.11n; GPP; Viterbi decoder; SIMD I. INTRODUCTION With the rapid development of wireless Local Area Network (WLAN) technology, IEEE02.11n has become mainstream wireless LAN standard. In the wireless LAN, the transmission characteristics of the actual communication channel is not very well, the noise in the channel often cause the receiving end of a certain error, affect the reliability of data transmission. To solve this problem, channel codes are applied to enhance the reliability of information transmission. In IEEE02.11n, a convolutional codes is adopted which is currently the most widely used in practical communication systems. Convolutional cods can obtain a high coding-gain with a simple encoding structure. To decode a convolutional code, Viterbi decoding algorithm was firstly proposed in 1967[1], and it is proven to obtain the maximum likelihood. With a low code constraint length, a good decoding performance can be obtained with a quite low complexity. And the hardware structure of Viterbi decoding algorithm is quite easy to implementation. Thus, it is one of the best decoding algorithms for convolutional codes. The implementation of convolutional coding and Viterbi decoding algorithm is a key technology in IEEE02.11 n. Many existing SDR platforms are based on either programmable hardware such as field programmable gate arrays (FPGAs) or embedded digital signal processors (DSPs). Such hardware platforms can meet the processing and timing requirements of modern high-speed wireless protocols, but programming FPGAs and specialized DSPs are difficult tasks. Developers have to learn how to program on each particular embedded architecture, often without the support of a rich development environment of programming and debugging tools. Meanwhile, GPP technology provides another method to signal processing. According to Moore s law, the capability and the integration of a microprocessor will be doubled every 1 months. Recently, the single-core technique has reached its limit which is caused by the physical size of semiconductorbased microelectronics. Though manufacturing technology improves, the precision reaches 32nm-45nm and can hardly be reduced. In this situation, the trend of the future development is to make full use of various features of widely adopted multicore architectures in existing GPPs. In [2], Sora is a fully programmable software radio platform on commodity PC architectures. Sora achieves equivalent performance of IEEE a/b/g. Sora is taken as a object with our prototype. In this paper, convolutional coding and decoding of IEEE02.11n are implemented on general purpose processor (GPP) software defined radio (SDR) platforms. II. ARCHITECTURE COMPARED WITH SORA This section briefly describes the advantages of our architecture and the differences compared with Sora. Sora use Intel Core 2 microarchitecture, and we use the new architecture-intel Sandy Bridge. We mainly analysis three parts (instruction set, cache and multi-core and multi-threading) that used in our implementation to accele signal processing of IEEE02.11n. A. Instruction set General purpose processor platform provides a lot of SIMD instruction sets to optimize data processing. SIMD computations (see Figure 1) are introduced to the architecture with MMX technology. MMX technology allows SIMD computations to be performed on packed byte, word, and double-word integers. The integers are contained in a set of eight 12-bit registers called MMX registers in our scheme. Figure 1 shows a typical SIMD float-point computation. OP in Figure 1 stands for the operation performed on Xi and Yi (i=1, 2, 3, 4). Two sets of the four packed float-point data elements are oped in parallel, with the same operation being performed on each corresponding pair of data elements IEEE

2 It is worth noting that Sora use two cores to complete the whole processing but each unit can only be processed in one core. To satisfy the requirement of IEEE 02.11n, we must use several cores to complete Viterbi decoding in parallel. In our implementation, we use the multi-core API provided by Windows. Figure 1. Typical SIMD operation Streaming SIMD Extensions (SSE) is provided in Intel architectures. In computing, SSE is an SIMD instruction set extension for x6 architecture, designed by Intel and introduced in 1999 in their Pentium III series processors as a reply to AMD's 3DNow!. Compared with Sora, Sandy Bridge architecture can support SSE4.2 and AVX instruction sets. The width of the SIMD register file is increased from 12 bits to 256 bits in AVX instruction set. Therefore, the instruction efficiency and optimization are better than Sora. B. Cache The cache structure of Sandy Bridge is similar to Nehalem. However, the cache structure of Nehalem changes a lot compared with Intel Core 2. The cache of Core 2 has two levels, and L2 is shared by two cores to reduce coherency traffic. The cache structure of Nehalem has three levels. L1 and L2 which are relatively small are private, and L3 is very large shared by all cores. LUT is an optimization method which trades space complexity for time complexity. The basis of the method can be considered as the one-to-one correspondence of the input value and the result. Suppose we have a module whose input is a symbol stream while the output is the calculation result of each symbol. No matter how complicated the calculation formula is, we can summarize a one-to-one correspondence between input data and output result. As far as we known, the access speed of a cache is much faster than that of a memory. To lower the access latency, we load the LUTs with a right size to the core cache. The cache structure of Nehalem can support and optimize unaligned instructions. And our architecture has a large cache memory with three cache levels. C. Multi-core and multi-threading Along with the limited processing capability of single core, the multi-core technology has been more and more used. Compared with Core 2 architecture, simultaneous multithreading (SMT) has been used in Nehalem and Sandy Bridge. SMT is 2-way which means each core can simultaneously handle two threads. In the case of multi-threaded tasks, the delay of a single thread can be covered. SMT can more effectively improve performance with lager cache and lager memory in Sandy Bridge. III. IMPLEMENTATION We have implemented convolutional coding and decoding structures of IEEE02.11n on GPP platform. This section mainly describes how to optimize the convolutional encoder and decoder on GPP platform which can satisfy real-time requirements. The convolutional encoder use SIMD instruction and lookup tables to accele data processing. The Viterbi decoder mainly uses SIMD instruction and multi-core to accele decoding processing. A. Convolutional encoder design Convolutional code is a famous forward error correction (FEC) code, standard by ( n, k, L ), where k is the input information bits, n is the output bits, L is constraint length, thus the code R k / n. The k input bits are encoded to n output bits. After encoding, n output bits not only have the relationship with k input information bits, but also related to L 1 information bits. In IEEE std [3], the convolutional encoder is defined by generator polynomials with g0 133 and g1 171, the code which is R 1/ 2. The encoder is shown in Figure 2. Figure 2. Convolutional encoder in IEEE std [3] The generator polynomial corresponding to output A is: g0 1 D D D D The generator polynomial corresponding to output B is: g1 1 D D D D The convolutional encoder uses the SIMD instruction set and lookup tables to accele the single processing. With input bits and the state of the registers, we can acculy calculate output bits. This part of the calculation can be avoided by using LUT to accele single processing. Figure 3 shows the encoder LUT data structure. One byte can be encoded to 16 bits at one processing in the encoder with 64 states. The input bits range from 0x00 to 0xff (one byte). LUT must save 16 output bits and one bit for the state of the registers after one processing. Therefore, the LUT has totally 2 * 64 *(16 1) 2752 numbers. 190

3 0x00 0xFF 16+1 X X 0 S0~S63 X X 0 X X 0 X X 0 X X 0 X X 0 X X 63 X X 63 X X 63 X X 63 X X 63 X X 63 Figure 3. Convolutional encoder LUT data structure B. Viterbi decoder design The Viterbi decoder can be divided into three functional units, i.e., a branch metric unit, an add-compare-select (ACS) unit, and a track back unit, as shown in Figure 4. In the Viterbi decoder, all the data are represented by -bit. So, one 12bit SIMD instruction can handle 16 data operations simultaneously. Figure 4 shows the structure of Viterbi decoder. transition. We use the received data to calculate each branch metric. 2) Add-compare-select unit This unit is the most important unit of the Viterbi decoder. First, add the current state metric at previous moment to the branch metric of the path which reach the current state, and gene new state metric. Second, compare the two path state metrics and select the path with the minimum state metric as the survivor path. Finally, save the state metric of the survivor path for the next ACS operation. For each state, there are two branch paths. One ACS operation has two add operation and one compare operation, as shown in Figure 6. In Figure 6, 16 ACS operations can be processed in parallel using SIMD instruction. Figure 6. ACS processing using SIMD instructions Figure 4. The structure of Viterbi decoder 1) Branch metric unit This unit mainly calculates branch metrics of all states in each state transition time. Using SIMD instructions set, we can deal with 16 branch metric calculations at one time. For -bit fix point data, normalizations of the state metrics are required to avoid overflow. 3) Track back unit The key of backtracking algorithm is to find the survivor decoding path. The survivor decoding path is stored as a linked list. Each node in the linked list represents one state of the state transition diagram. If we find a state of a certain moment, we can backtrack this linked list and find previous states of the state transition diagram. Therefore, we can achieve decoding purpose. IV. SIMULATIONS This section mainly analyzes the simulation results of performance and throughput. Test environment of GPP platform is shown in Table 1. The CPU is Intel core i7-2600k which has 4 cores and threads. Intel C++ Compiler, also known as ICC or ICL, is a group of C and C++ compilers from Intel Corporation available for Apple Mac OS X, Linux and Microsoft Windows, which support SSE instruction. TABLE 1 TEST CONDITIONS Figure 5. (2,1,7) Convolutional code trellis diagram Figure 5 shows a quarter of the convolutional code trellis diagram. On the left is the state before transferring; on the right is the state after transferring. The dashed lines illust the state transition when the input bits are 0, and the solid lines are for input bits 1. The binary numbers shows the output bits of convolutional encoder when taking the corresponding state CPU Architecture L2 L3 Version of SSE Instruction Set Intel core Sandy Bridge 4*256KB MB SSE4.2, AVX Operation System Windows 7 Software Microsoft Visual Studio 2010 Compiler Intel ICC v

4 A. Performance In IEEE std [3], encoder bits are punctured to four s (1/2, 2/3, 3/4, 5/6) on request. We test all the four cases with block length 1040(which is the number of data bits per OFDM symbol) over AWGN channel under BPSK modulation. The bit error (BER) performances are shown in Figure 7. streams; MCS have three spatial streams; MCS have four spatial streams. We test the processing capability of CPU when using 1, 2, 3 or 4 cores. According to the required throughput of each MCS, we calculate the core utilization of each MCS. 4 BER Core Utilization R=1/2 float point R=1/2 bit fix point 10-5 R=2/3 bit fix point R=3/4 bit fix point R=5/6 bit fix point Figure 7. BER performances of four s In Figure 7, the dashed black curve is the theoretical performance of R=1/2 float point convolutional code [4]. From the simulation results, we conclude that the fix point performances are very close to the float point performances. The BER performance is more and more well with lower. B. Real-time 1) Throughput The maximum throughput of 20 MHz bandwidth in IEEE 02.11n is 260Mbps and the number of data bits per OFDM symbol is On this condition, the throughput results of our implementation are shown in Table 2. TABLE 2 THROUGHPUT RESULTS Algorithm Throughput(Mbps) Delay(us) Conv. Encoder Viterbi One core 75.2 One core 13. Four cores Eb/N0(dB) Four cores 3.4 As shown in Table 2, our implementation can satisfy the maximum throughput requirement of 20MHz bandwidth when using four cores. Because of the threads delay and the multicore data transmission, the throughput enhancement of the application of multi-core is not linear to the number of cores. But we are able to increase the throughput by increasing the number of cores. For example, we can use cores to satisfy the throughput requirement of 40MHz bandwidth. In our implementation, the Viterbi decoder is the most computationally-intensive component. Figure shows the core utilization of our implementation to support the Viterbi decoder real-time requirements of 32 MCSs at 20MHz bandwidth. MCS 0-7 have one spatial stream; MCS -15 have two spatial MCSs of 20MHz bandwidth Figure. Core Utilization of 32 MCSs At the receiver, a higher throughput requires higher core utilization due to the increased computational complexity of the Viterbi decoder. We can see that one core of a contemporary multi-core CPU can comfortably support MCS 0-7. Due to the multi-core call delay, the core utilization is not completely linear with the spatial streams. Along with the increasing of the core number, the processing capability of one core decreases. 2) Compared with Sora To facilitate comparison, we used the same test conditions of Sora. It is worth noting that the CPU frequency of Sora is 2.66GHz and the CPU frequency of our implementation is 3.4GHz. Our implementation will have shorter delays than Sora with the same computation. TABLE 3 Algorithm Conv. Encoder Viterbi THE COMPARE BETWEEN SORA IMPLEMENTATIONS AND OUR IMPLEMENTATIONS Configuration 24Mbps,1/2 4Mbps,2/3 54Mbps,3/4 24Mbps,1/2 4Mbps,2/3 54Mbps,3/4 Computation Required (M cycles/sec) Sora Impl.(one core) Our Impl.(one core) In Table 3, the numbers of required computation between Sora implementations and our implementations are listed. Our Viterbi decoder need less computing resources than Sora. The convolutional encoder doesn t perform very well in low throughput conditions. But, in high throughput conditions, the encoder performs better than Sora. In IEEE 02.11n, we mainly process and optimize high throughput data stream and 192

5 encoder need very little delay compared with decoder. Therefore, the processing delay of encoder is acceptable. [1] Viterbi A J.ErTor bounds for convolutional codes and an asymptotically optimum decoding algorithm [J].IEEE Trans.Inform Theory, 1967, ITS(2): [2] Sora: High Performance Software Radio Using General Purpose Multicore Processors. Kun Tan, Jiansong Zhang, Ji Fang, He Liu, Yusheng Ye, Shen Wang, Yongguang Zhang, Haitao Wu, Wei Wang, Geoffre M. Voelker, Microsoft Research Asia, Beijing, China; Tsinghua University, Beijing, China; Beijing Jiaotong University, Beijing, China; UCSD, La Jolla, USA. [3] IEEE Std , Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications. [4] Wiley. Error Correction Coding Mathematical Methods and Algorithms. May ebook-ddu. [5] G. Feygin, P. Gulak, "Architectural tradeoffs for survivor sequence memory management in Viterbi decoders" IEEE Transactions on Communications, vol.41, Issue 3, pp , March [6] A. J. Viterbi, Convolutional codes and their performance in communication systems, IEEE Trans. Commun., vol.com-19, pp , Oct [7] P. K. Singh, and S. Jayasimha, A low-complexity, reduced-power Viterbi algorithm, proc. 12 th International Conf. on VLSI Design, Goa, India, pp 61-66, Jan [] D. A. El-Dib, M. I. Elmasry, "Modified register-wxchange Viterbi decoder for low-power wireless communications" IEEE Transactions on Circuits and Systems, vol. 51, Issue2, pp , Feb [9] F. Chan and D. Haccoun, "Adaptive Viterbi decoding of convolutional Codes over memory less channels" IEEE Transaction on Communications, vol. 45, no. 11, pp , Nov [10] B. Pandita and S. K. Roy. Design and Implementation of a Viterbi Decoder Using FPGAs. In Proceedings of IEEE International Conference on VLSI Design, pp 611-6I4, Jan Compared with Sora GPP platform, our platform has higher CPU frequency, new architecture and new SIMD instructions. Along with the evolution of GPP platform, our implementation can handle larger amount of data processing. V. CONCLUSION This paper mainly describes how to implement convolutional coding and decoding of IEEE02.11n on GPP platform. According to the simulation results, the SIMD instruction and LUT can rapidly accele signal processing to satisfy the real-time requirement of IEEE02.11n. GPP technology has many advantages compared to the traditional FPGA or DSP. The rapid development of CPU and the CPU architecture optimization can greatly enhance the code execution efficiency with less program optimization. It has lower hardware cost and shorter code development cycle and test cycle. Therefore, GPP technology has large development space. ACKNOWLEDGMENT This work was supported by the National Basic Research Program of China (973 Program) (No. 2009CB320401), the National Natural Science Foundation of China (No ), the National Science and Technology Major Project of China (No.2012ZX and 2013ZX ). REFERENCES 193

Low-Power Adaptive Viterbi Decoder for TCM Using T-Algorithm

Low-Power Adaptive Viterbi Decoder for TCM Using T-Algorithm International Journal of Scientific and Research Publications, Volume 3, Issue 8, August 2013 1 Low-Power Adaptive Viterbi Decoder for TCM Using T-Algorithm MUCHHUMARRI SANTHI LATHA*, Smt. D.LALITHA KUMARI**

More information

BER Guaranteed Optimization and Implementation of Parallel Turbo Decoding on GPU

BER Guaranteed Optimization and Implementation of Parallel Turbo Decoding on GPU 2013 8th International Conference on Communications and Networking in China (CHINACOM) BER Guaranteed Optimization and Implementation of Parallel Turbo Decoding on GPU Xiang Chen 1,2, Ji Zhu, Ziyu Wen,

More information

High Speed ACSU Architecture for Viterbi Decoder Using T-Algorithm

High Speed ACSU Architecture for Viterbi Decoder Using T-Algorithm High Speed ACSU Architecture for Viterbi Decoder Using T-Algorithm Atish A. Peshattiwar & Tejaswini G. Panse Department of Electronics Engineering, Yeshwantrao Chavan College of Engineering, E-mail : atishp32@gmail.com,

More information

Fundamentals of Computer Design

Fundamentals of Computer Design CS359: Computer Architecture Fundamentals of Computer Design Yanyan Shen Department of Computer Science and Engineering 1 Defining Computer Architecture Agenda Introduction Classes of Computers 1.3 Defining

More information

Parallelized Progressive Network Coding with Hardware Acceleration

Parallelized Progressive Network Coding with Hardware Acceleration Parallelized Progressive Network Coding with Hardware Acceleration Hassan Shojania, Baochun Li Department of Electrical and Computer Engineering University of Toronto Network coding Information is coded

More information

Viterbi Algorithm for error detection and correction

Viterbi Algorithm for error detection and correction IOSR Journal of Electronicsl and Communication Engineering (IOSR-JECE) ISSN: 2278-2834-, ISBN: 2278-8735, PP: 60-65 www.iosrjournals.org Viterbi Algorithm for error detection and correction Varsha P. Patil

More information

Payload Length and Rate Adaptation for Throughput Optimization in Wireless LANs

Payload Length and Rate Adaptation for Throughput Optimization in Wireless LANs Payload Length and Rate Adaptation for Throughput Optimization in Wireless LANs Sayantan Choudhury and Jerry D. Gibson Department of Electrical and Computer Engineering University of Califonia, Santa Barbara

More information

Design of Convolution Encoder and Reconfigurable Viterbi Decoder

Design of Convolution Encoder and Reconfigurable Viterbi Decoder RESEARCH INVENTY: International Journal of Engineering and Science ISSN: 2278-4721, Vol. 1, Issue 3 (Sept 2012), PP 15-21 www.researchinventy.com Design of Convolution Encoder and Reconfigurable Viterbi

More information

Implementation of a Dual-Mode SDR Smart Antenna Base Station Supporting WiBro and TDD HSDPA

Implementation of a Dual-Mode SDR Smart Antenna Base Station Supporting WiBro and TDD HSDPA Implementation of a Dual-Mode SDR Smart Antenna Base Station Supporting WiBro and TDD HSDPA Jongeun Kim, Sukhwan Mun, Taeyeol Oh,Yusuk Yun, Seungwon Choi 1 HY-SDR Research Center, Hanyang University, Seoul,

More information

Design of Viterbi Decoder for Noisy Channel on FPGA Ms. M.B.Mulik, Prof. U.L.Bombale, Prof. P.C.Bhaskar

Design of Viterbi Decoder for Noisy Channel on FPGA Ms. M.B.Mulik, Prof. U.L.Bombale, Prof. P.C.Bhaskar International Journal of Scientific & Engineering Research Volume 2, Issue 6, June-2011 1 Design of Viterbi Decoder for Noisy Channel on FPGA Ms. M.B.Mulik, Prof. U.L.Bombale, Prof. P.C.Bhaskar Abstract

More information

44 1 Vol.44 No Journal of University of Electronic Science and Technology of China Jan. 2015

44 1 Vol.44 No Journal of University of Electronic Science and Technology of China Jan. 2015 44 1 Vol.44 No.1 2015 1 Journal of University of Electronic Science and Technology of China Jan. 2015 GRT 1 1,3 1 1 1 1 2,3 (1. 100871; 2. 90095; 3. 100871) GRT GRT802.11a/g/ WiFi TP393.02 B doi:10.3969/j.issn.1001-0548.2015.01.021

More information

CPU Architecture Overview. Varun Sampath CIS 565 Spring 2012

CPU Architecture Overview. Varun Sampath CIS 565 Spring 2012 CPU Architecture Overview Varun Sampath CIS 565 Spring 2012 Objectives Performance tricks of a modern CPU Pipelining Branch Prediction Superscalar Out-of-Order (OoO) Execution Memory Hierarchy Vector Operations

More information

An FPGA Based Adaptive Viterbi Decoder

An FPGA Based Adaptive Viterbi Decoder An FPGA Based Adaptive Viterbi Decoder Sriram Swaminathan Russell Tessier Department of ECE University of Massachusetts Amherst Overview Introduction Objectives Background Adaptive Viterbi Algorithm Architecture

More information

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Introduction: Modern computer architecture The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Motivation: Multi-Cores where and why Introduction: Moore s law Intel

More information

On the Optimizing of LTE System Performance for SISO and MIMO Modes

On the Optimizing of LTE System Performance for SISO and MIMO Modes 2015 Third International Conference on Artificial Intelligence, Modelling and Simulation On the Optimizing of LTE System Performance for SISO and MIMO Modes Ali Abdulqader Bin Salem, Yung-Wey Chong, Sabri

More information

Implementation of Adaptive Viterbi Decoder on FPGA for Wireless Communication

Implementation of Adaptive Viterbi Decoder on FPGA for Wireless Communication Implementation of Adaptive Viterbi Decoder on FPGA for Wireless Communication Parameshwara R 1, Ganesh V.N 2 P.G. Student, Dept.of ECE, MITE College (Affiliated to VTU, Belagavi), Moodbidri, Karnataka,

More information

/$ IEEE

/$ IEEE IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 56, NO. 1, JANUARY 2009 81 Bit-Level Extrinsic Information Exchange Method for Double-Binary Turbo Codes Ji-Hoon Kim, Student Member,

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations

More information

Viterbi Algorithm Survivor Path Decoding

Viterbi Algorithm Survivor Path Decoding Viterbi Algorithm Survivor Path Decoding Lecture 6 Vladimir Stojanović 6.973 Communication System Design Spring 26 Massachusetts Institute of Technology Cite as: Vladimir Stojanovic, course materials for

More information

Piecewise Linear Approximation Based on Taylor Series of LDPC Codes Decoding Algorithm and Implemented in FPGA

Piecewise Linear Approximation Based on Taylor Series of LDPC Codes Decoding Algorithm and Implemented in FPGA Journal of Information Hiding and Multimedia Signal Processing c 2018 ISSN 2073-4212 Ubiquitous International Volume 9, Number 3, May 2018 Piecewise Linear Approximation Based on Taylor Series of LDPC

More information

A Modified Medium Access Control Algorithm for Systems with Iterative Decoding

A Modified Medium Access Control Algorithm for Systems with Iterative Decoding A Modified Medium Access Control Algorithm for Systems with Iterative Decoding Inkyu Lee Carl-Erik W. Sundberg Sunghyun Choi Dept. of Communications Eng. Korea University Seoul, Korea inkyu@korea.ac.kr

More information

IBM Research Report. SPU Based Network Module for Software Radio System on Cell Multicore Platform

IBM Research Report. SPU Based Network Module for Software Radio System on Cell Multicore Platform RC24643 (C0809-009) September 19, 2008 Electrical Engineering IBM Research Report SPU Based Network Module for Software Radio System on Cell Multicore Platform Jianwen Chen China Research Laboratory Building

More information

Intel released new technology call P6P

Intel released new technology call P6P P6 and IA-64 8086 released on 1978 Pentium release on 1993 8086 has upgrade by Pipeline, Super scalar, Clock frequency, Cache and so on But 8086 has limit, Hard to improve efficiency Intel released new

More information

45-year CPU Evolution: 1 Law -2 Equations

45-year CPU Evolution: 1 Law -2 Equations 4004 8086 PowerPC 601 Pentium 4 Prescott 1971 1978 1992 45-year CPU Evolution: 1 Law -2 Equations Daniel Etiemble LRI Université Paris Sud 2004 Xeon X7560 Power9 Nvidia Pascal 2010 2017 2016 Are there

More information

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 03, SPRING 2013

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 03, SPRING 2013 CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 03, SPRING 2013 TOPICS TODAY Moore s Law Evolution of Intel CPUs IA-32 Basic Execution Environment IA-32 General Purpose Registers

More information

Dan Stafford, Justine Bonnot

Dan Stafford, Justine Bonnot Dan Stafford, Justine Bonnot Background Applications Timeline MMX 3DNow! Streaming SIMD Extension SSE SSE2 SSE3 and SSSE3 SSE4 Advanced Vector Extension AVX AVX2 AVX-512 Compiling with x86 Vector Processing

More information

Error-Sensitive Adaptive Frame Aggregation in n WLANs

Error-Sensitive Adaptive Frame Aggregation in n WLANs Error-Sensitive Adaptive Frame Aggregation in 802.11n WLANs Melody Moh, Teng Moh, and Ken Chan Department of Computer Science San Jose State University San Jose, CA, USA Outline 1. Introduction 2. Background

More information

Leveraging Mobile GPUs for Flexible High-speed Wireless Communication

Leveraging Mobile GPUs for Flexible High-speed Wireless Communication 0 Leveraging Mobile GPUs for Flexible High-speed Wireless Communication Qi Zheng, Cao Gao, Trevor Mudge, Ronald Dreslinski *, Ann Arbor The 3 rd International Workshop on Parallelism in Mobile Platforms

More information

Implementation of Convolution Encoder and Viterbi Decoder Using Verilog

Implementation of Convolution Encoder and Viterbi Decoder Using Verilog International Journal of Electronics and Communication Engineering. ISSN 0974-2166 Volume 11, Number 1 (2018), pp. 13-21 International Research Publication House http://www.irphouse.com Implementation

More information

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache.

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache. Multiprocessors why would you want a multiprocessor? Multiprocessors and Multithreading more is better? Cache Cache Cache Classifying Multiprocessors Flynn Taxonomy Flynn Taxonomy Interconnection Network

More information

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

COSC 6385 Computer Architecture - Thread Level Parallelism (I) COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month

More information

The Lekha 3GPP LTE Turbo Decoder IP Core meets 3GPP LTE specification 3GPP TS V Release 10[1].

The Lekha 3GPP LTE Turbo Decoder IP Core meets 3GPP LTE specification 3GPP TS V Release 10[1]. Lekha IP Core: LW RI 1002 3GPP LTE Turbo Decoder IP Core V1.0 The Lekha 3GPP LTE Turbo Decoder IP Core meets 3GPP LTE specification 3GPP TS 36.212 V 10.5.0 Release 10[1]. Introduction The Lekha IP 3GPP

More information

International Journal of Science Engineering and Advance Technology, IJSEAT, Vol 2, Issue 11, November ISSN

International Journal of Science Engineering and Advance Technology, IJSEAT, Vol 2, Issue 11, November ISSN Rtl Desing And Vlsi Implementation Of An Efficient Convolution Encoder And Adaptive Viterbi Decoder Thalakayala Eleesha #1 V.G.Pavan Kumar #2 #1 Student, M.Tech (VLSI), #2 Assistant Professor, Sri Vasavi

More information

LANCOM Techpaper IEEE n Indoor Performance

LANCOM Techpaper IEEE n Indoor Performance Introduction The standard IEEE 802.11n features a number of new mechanisms which significantly increase available bandwidths. The former wireless LAN standards based on 802.11a/g enable physical gross

More information

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Computing architectures Part 2 TMA4280 Introduction to Supercomputing Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:

More information

Coarse Grain Reconfigurable Arrays are Signal Processing Engines!

Coarse Grain Reconfigurable Arrays are Signal Processing Engines! Coarse Grain Reconfigurable Arrays are Signal Processing Engines! Advanced Topics in Telecommunications, Algorithms and Implementation Platforms for Wireless Communications, TLT-9707 Waqar Hussain Researcher

More information

EFFICIENT RECURSIVE IMPLEMENTATION OF A QUADRATIC PERMUTATION POLYNOMIAL INTERLEAVER FOR LONG TERM EVOLUTION SYSTEMS

EFFICIENT RECURSIVE IMPLEMENTATION OF A QUADRATIC PERMUTATION POLYNOMIAL INTERLEAVER FOR LONG TERM EVOLUTION SYSTEMS Rev. Roum. Sci. Techn. Électrotechn. et Énerg. Vol. 61, 1, pp. 53 57, Bucarest, 016 Électronique et transmission de l information EFFICIENT RECURSIVE IMPLEMENTATION OF A QUADRATIC PERMUTATION POLYNOMIAL

More information

Chip Design for Turbo Encoder Module for In-Vehicle System

Chip Design for Turbo Encoder Module for In-Vehicle System Chip Design for Turbo Encoder Module for In-Vehicle System Majeed Nader Email: majeed@wayneedu Yunrui Li Email: yunruili@wayneedu John Liu Email: johnliu@wayneedu Abstract This paper studies design and

More information

IMPLEMENTATION OF A BIT ERROR RATE TESTER OF A WIRELESS COMMUNICATION SYSTEM ON AN FPGA

IMPLEMENTATION OF A BIT ERROR RATE TESTER OF A WIRELESS COMMUNICATION SYSTEM ON AN FPGA IMPLEMENTATION OF A BIT ERROR RATE TESTER OF A WIRELESS COMMUNICATION SYSTEM ON AN FPGA Lakshmy Sukumaran 1, Dharani K G 2 1 Student, Electronics and communication, MVJ College of Engineering, Bangalore-560067

More information

The Tick Programmable Low-Latency SDR System

The Tick Programmable Low-Latency SDR System The Tick Programmable Low-Latency SDR System Haoyang Wu 1, Tao Wang 1, Zengwen Yuan 2, Chunyi Peng 3, Zhiwei Li 1, Zhaowei Tan 2, Boyan Ding 1, Xiaoguang Li 1, Yuanjie Li 2, Jun Liu 1, Songwu Lu 2 New

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 22 Title: and Extended

More information

Related Work The Concept of the Signaling. In the mobile communication system, in addition to transmit the necessary user information (usually voice

Related Work The Concept of the Signaling. In the mobile communication system, in addition to transmit the necessary user information (usually voice International Conference on Information Science and Computer Applications (ISCA 2013) The Research and Design of Personalization preferences Based on Signaling analysis ZhiQiang Wei 1,a, YiYan Zhang 1,b,

More information

Flexible wireless communication architectures

Flexible wireless communication architectures Flexible wireless communication architectures Sridhar Rajagopal Department of Electrical and Computer Engineering Rice University, Houston TX Faculty Candidate Seminar Southern Methodist University April

More information

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI. CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance

More information

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

XPU A Programmable FPGA Accelerator for Diverse Workloads

XPU A Programmable FPGA Accelerator for Diverse Workloads XPU A Programmable FPGA Accelerator for Diverse Workloads Jian Ouyang, 1 (ouyangjian@baidu.com) Ephrem Wu, 2 Jing Wang, 1 Yupeng Li, 1 Hanlin Xie 1 1 Baidu, Inc. 2 Xilinx Outlines Background - FPGA for

More information

Advance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts

Advance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts Computer Architectures Advance CPU Design Tien-Fu Chen National Chung Cheng Univ. Adv CPU-0 MMX technology! Basic concepts " small native data types " compute-intensive operations " a lot of inherent parallelism

More information

A Frame Aggregation Scheduler for IEEE n

A Frame Aggregation Scheduler for IEEE n A Frame Aggregation Scheduler for IEEE 802.11n Selvam T AU-KBC Research Centre MIT campus of Anna University Chennai, India selvam@au-kbc.org Srikanth S AU-KBC Research Centre MIT Campus of Anna University

More information

MAC level Throughput comparison: ax vs ac

MAC level Throughput comparison: ax vs ac MAC level Throughput comparison: 82.11ax vs. 82.11ac arxiv:183.1189v1 [cs.ni] 27 Mar 218 Oran Sharon Department of Computer Science Netanya Academic College 1 University St. Netanya, 42365 Israel Robert

More information

All MSEE students are required to take the following two core courses: Linear systems Probability and Random Processes

All MSEE students are required to take the following two core courses: Linear systems Probability and Random Processes MSEE Curriculum All MSEE students are required to take the following two core courses: 3531-571 Linear systems 3531-507 Probability and Random Processes The course requirements for students majoring in

More information

Fundamentals of Quantitative Design and Analysis

Fundamentals of Quantitative Design and Analysis Fundamentals of Quantitative Design and Analysis Dr. Jiang Li Adapted from the slides provided by the authors Computer Technology Performance improvements: Improvements in semiconductor technology Feature

More information

IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM

IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM I5 AND I7 PROCESSORS Juan M. Cebrián 1 Lasse Natvig 1 Jan Christian Meyer 2 1 Depart. of Computer and Information

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 15, 2007 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

Convolutional Code Optimization for Various Constraint Lengths using PSO

Convolutional Code Optimization for Various Constraint Lengths using PSO International Journal of Electronics and Communication Engineering. ISSN 0974-2166 Volume 5, Number 2 (2012), pp. 151-157 International Research Publication House http://www.irphouse.com Convolutional

More information

{ rizwan.rasheed, aawatif.menouni eurecom.fr,

{ rizwan.rasheed, aawatif.menouni eurecom.fr, Reconfigurable Viterbi Decoder for Mobile Platform Rizwan RASHEED, Mobile Communications Department, Institut Eurecom, Sophia Antipolis, France Aawatif MENOUNI HAYAR, Mobile Communications Department,

More information

Cache Justification for Digital Signal Processors

Cache Justification for Digital Signal Processors Cache Justification for Digital Signal Processors by Michael J. Lee December 3, 1999 Cache Justification for Digital Signal Processors By Michael J. Lee Abstract Caches are commonly used on general-purpose

More information

Advanced Parallel Programming I

Advanced Parallel Programming I Advanced Parallel Programming I Alexander Leutgeb, RISC Software GmbH RISC Software GmbH Johannes Kepler University Linz 2016 22.09.2016 1 Levels of Parallelism RISC Software GmbH Johannes Kepler University

More information

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture?

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture? This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital

More information

Study on the Key Technology of the Mobile Video Display System in the Client

Study on the Key Technology of the Mobile Video Display System in the Client 2011 International Conference on Information Management and Engineering (ICIME 2011) IPCSIT vol. 52 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V52.76 Study on the Key Technology of

More information

Dynamic Power Control MAC Protocol in Mobile Adhoc Networks

Dynamic Power Control MAC Protocol in Mobile Adhoc Networks Dynamic Power Control MAC Protocol in Mobile Adhoc Networks Anita Yadav Y N Singh, SMIEEE R R Singh Computer Science and Engineering Electrical Engineering Computer Science and Engineering Department Department

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology The Computer Revolution Progress in computer technology Underpinned by Moore

More information

802.11n and g Performance Comparison in Office Size for FTP Transmission

802.11n and g Performance Comparison in Office Size for FTP Transmission 802.11n and 802.11g Performance Comparison in Office Size for FTP Transmission Group 6 Chase Wen 301094042 ywa56@sfu.ca Yuheng Lin 301114176 yuhengl@sfu.ca Roadmap Introduction WiFi and IEEE 802.11 standards

More information

Benchmarking Multithreaded, Multicore and Reconfigurable Processors

Benchmarking Multithreaded, Multicore and Reconfigurable Processors Insight, Analysis, and Advice on Signal Processing Technology Benchmarking Multithreaded, Multicore and Reconfigurable Processors Berkeley Design Technology, Inc. 2107 Dwight Way, Second Floor Berkeley,

More information

FPOC: A Channel Assignment Strategy Using Four Partially Overlapping Channels in WMNs

FPOC: A Channel Assignment Strategy Using Four Partially Overlapping Channels in WMNs FPOC: A Channel Assignment Strategy Using Four Partially Overlapping Channels in WMNs Yung-Chang Lin Cheng-Han Lin Wen-Shyang Hwang Ce-Kuen Shieh yaya80306@hotmail.com jhlin5@cc.kuas.edu.tw wshwang@cc.kuas.edu.tw

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 4

ECE 571 Advanced Microprocessor-Based Design Lecture 4 ECE 571 Advanced Microprocessor-Based Design Lecture 4 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 28 January 2016 Homework #1 was due Announcements Homework #2 will be posted

More information

Ferre, PL., Doufexi, A., Chung How, J. T. H., Nix, AR., & Bull, D. (2003). Link adaptation for video transmission over COFDM based WLANs.

Ferre, PL., Doufexi, A., Chung How, J. T. H., Nix, AR., & Bull, D. (2003). Link adaptation for video transmission over COFDM based WLANs. Ferre, PL., Doufexi, A., Chung How, J. T. H., Nix, AR., & Bull, D. (2003). Link adaptation for video transmission over COFDM based WLANs. Peer reviewed version Link to publication record in Explore Bristol

More information

MICROPROCESSOR TECHNOLOGY

MICROPROCESSOR TECHNOLOGY MICROPROCESSOR TECHNOLOGY Assis. Prof. Hossam El-Din Moustafa Lecture 20 Ch.10 Intel Core Duo Processor Architecture 2-Jun-15 1 Chapter Objectives Understand the concept of dual core technology. Look inside

More information

Parallel Computing for Detecting Processes for Distorted Signal

Parallel Computing for Detecting Processes for Distorted Signal Volume 65.17, March 213 Parallel Computing for Detecting Processes for Distorted Signal Sarkout N. Abdulla, PhD. Assist. Prof. Baghdad University Iraq Zainab T. Alisa, PhD. Baghdad University Iraq, IEEE

More information

OpenCL Vectorising Features. Andreas Beckmann

OpenCL Vectorising Features. Andreas Beckmann Mitglied der Helmholtz-Gemeinschaft OpenCL Vectorising Features Andreas Beckmann Levels of Vectorisation vector units, SIMD devices width, instructions SMX, SP cores Cus, PEs vector operations within kernels

More information

Implementation of reduced memory Viterbi Decoder using Verilog HDL

Implementation of reduced memory Viterbi Decoder using Verilog HDL IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 8, Issue 4 (Nov. - Dec. 2013), PP 73-79 Implementation of reduced memory Viterbi Decoder

More information

Design of Large-scale Wire-speed Multicast Switching Fabric Based on Distributive Lattice

Design of Large-scale Wire-speed Multicast Switching Fabric Based on Distributive Lattice Design of Large-scale Wire-speed Multicast Switching Fabric Based on Distributive Lattice 1 CUI Kai, 2 LI Ke-dan, 1 CHEN Fu-xing, 1 ZHU Zhi-pu, 1 ZHU Yue-sheng 1. Shenzhen Eng. Lab of Converged Networks

More information

An Efficient Bandwidth Estimation Schemes used in Wireless Mesh Networks

An Efficient Bandwidth Estimation Schemes used in Wireless Mesh Networks An Efficient Bandwidth Estimation Schemes used in Wireless Mesh Networks First Author A.Sandeep Kumar Narasaraopeta Engineering College, Andhra Pradesh, India. Second Author Dr S.N.Tirumala Rao (Ph.d)

More information

CO403 Advanced Microprocessors IS860 - High Performance Computing for Security. Basavaraj Talawar,

CO403 Advanced Microprocessors IS860 - High Performance Computing for Security. Basavaraj Talawar, CO403 Advanced Microprocessors IS860 - High Performance Computing for Security Basavaraj Talawar, basavaraj@nitk.edu.in Course Syllabus Technology Trends: Transistor Theory. Moore's Law. Delay, Power,

More information

Intel Advisor XE. Vectorization Optimization. Optimization Notice

Intel Advisor XE. Vectorization Optimization. Optimization Notice Intel Advisor XE Vectorization Optimization 1 Performance is a Proven Game Changer It is driving disruptive change in multiple industries Protecting buildings from extreme events Sophisticated mechanics

More information

RECENTLY, researches on gigabit wireless personal area

RECENTLY, researches on gigabit wireless personal area 146 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 55, NO. 2, FEBRUARY 2008 An Indexed-Scaling Pipelined FFT Processor for OFDM-Based WPAN Applications Yuan Chen, Student Member, IEEE,

More information

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra Binary Representation Computer Systems Information is represented as a sequence of binary digits: Bits What the actual bits represent depends on the context: Seminar 3 Numerical value (integer, floating

More information

The design and implementation of TPC encoder and decoder

The design and implementation of TPC encoder and decoder Journal of Physics: Conference Series PAPER OPEN ACCESS The design and implementation of TPC encoder and decoder To cite this article: L J Xiang et al 016 J. Phys.: Conf. Ser. 679 0103 Related content

More information

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources This Unit: Putting It All Together CIS 501 Computer Architecture Unit 12: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital Circuits

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Classes of Computers Personal computers General purpose, variety of software

More information

The Implementation and Analysis of Important Symmetric Ciphers on Stream Processor

The Implementation and Analysis of Important Symmetric Ciphers on Stream Processor 2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore The Implementation and Analysis of Important Symmetric Ciphers on Stream Processor

More information

Accelerating String Matching Algorithms on Multicore Processors Cheng-Hung Lin

Accelerating String Matching Algorithms on Multicore Processors Cheng-Hung Lin Accelerating String Matching Algorithms on Multicore Processors Cheng-Hung Lin Department of Electrical Engineering, National Taiwan Normal University, Taipei, Taiwan Abstract String matching is the most

More information

Computer Performance Evaluation and Benchmarking. EE 382M Dr. Lizy Kurian John

Computer Performance Evaluation and Benchmarking. EE 382M Dr. Lizy Kurian John Computer Performance Evaluation and Benchmarking EE 382M Dr. Lizy Kurian John Evolution of Single-Chip Transistor Count 10K- 100K Clock Frequency 0.2-2MHz Microprocessors 1970 s 1980 s 1990 s 2010s 100K-1M

More information

Characterization of Native Signal Processing Extensions

Characterization of Native Signal Processing Extensions Characterization of Native Signal Processing Extensions Jason Law Department of Electrical and Computer Engineering University of Texas at Austin Austin, TX 78712 jlaw@mail.utexas.edu Abstract Soon if

More information

Fundamentals of Computers Design

Fundamentals of Computers Design Computer Architecture J. Daniel Garcia Computer Architecture Group. Universidad Carlos III de Madrid Last update: September 8, 2014 Computer Architecture ARCOS Group. 1/45 Introduction 1 Introduction 2

More information

Simplify System Complexity

Simplify System Complexity 1 2 Simplify System Complexity With the new high-performance CompactRIO controller Arun Veeramani Senior Program Manager National Instruments NI CompactRIO The Worlds Only Software Designed Controller

More information

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Several Common Compiler Strategies Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Basic Instruction Scheduling Reschedule the order of the instructions to reduce the

More information

Reconstruction Improvements on Compressive Sensing

Reconstruction Improvements on Compressive Sensing SCITECH Volume 6, Issue 2 RESEARCH ORGANISATION November 21, 2017 Journal of Information Sciences and Computing Technologies www.scitecresearch.com/journals Reconstruction Improvements on Compressive Sensing

More information

Inside Intel Core Microarchitecture

Inside Intel Core Microarchitecture White Paper Inside Intel Core Microarchitecture Setting New Standards for Energy-Efficient Performance Ofri Wechsler Intel Fellow, Mobility Group Director, Mobility Microprocessor Architecture Intel Corporation

More information

Survey on OFDMA based MAC Protocols for the Next Generation WLAN

Survey on OFDMA based MAC Protocols for the Next Generation WLAN Survey on OFDMA based MAC Protocols for the Next Generation WLAN Bo Li, Qiao Qu, Zhongjiang Yan, and Mao Yang School of Electronics and Information Northwestern Polytechnical University, Xi an, China Email:

More information

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital

More information

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Computer Architecture Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Milo Martin & Amir Roth at University of Pennsylvania! Computer Architecture

More information

Next Generation Technology from Intel Intel Pentium 4 Processor

Next Generation Technology from Intel Intel Pentium 4 Processor Next Generation Technology from Intel Intel Pentium 4 Processor 1 The Intel Pentium 4 Processor Platform Intel s highest performance processor for desktop PCs Targeted at consumer enthusiasts and business

More information

WLAN TRENDS. Dong Wang. Prof. Dr. Eduard Heindl 05/27/2009. E-Business Technologies

WLAN TRENDS. Dong Wang. Prof. Dr. Eduard Heindl 05/27/2009. E-Business Technologies WLAN TRENDS Dong Wang 232495 Prof. Dr. Eduard Heindl E-Business Technologies 05/27/2009 1 Declaration I, Dong Wang, hereby declare that this paper is my own work and all the related work cites are shown

More information

A Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid Cache in 3D chip Multi-processors

A Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid Cache in 3D chip Multi-processors , July 4-6, 2018, London, U.K. A Spherical Placement and Migration Scheme for a STT-RAM Based Hybrid in 3D chip Multi-processors Lei Wang, Fen Ge, Hao Lu, Ning Wu, Ying Zhang, and Fang Zhou Abstract As

More information

Parallel Computing Platforms

Parallel Computing Platforms Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding N.Rajagopala krishnan, k.sivasuparamanyan, G.Ramadoss Abstract Field Programmable Gate Arrays (FPGAs) are widely

More information

SIMD Instructions outside and inside Oracle 12c. Laurent Léturgez 2016

SIMD Instructions outside and inside Oracle 12c. Laurent Léturgez 2016 SIMD Instructions outside and inside Oracle 2c Laurent Léturgez 206 Whoami Oracle Consultant since 200 Former developer (C, Java, perl, PL/SQL) Owner@Premiseo: Data Management on Premise and in the Cloud

More information

Lowering the Error Floors of Irregular High-Rate LDPC Codes by Graph Conditioning

Lowering the Error Floors of Irregular High-Rate LDPC Codes by Graph Conditioning Lowering the Error Floors of Irregular High- LDPC Codes by Graph Conditioning Wen-Yen Weng, Aditya Ramamoorthy and Richard D. Wesel Electrical Engineering Department, UCLA, Los Angeles, CA, 90095-594.

More information

Lecture 1: Introduction

Lecture 1: Introduction Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline

More information