Energy-efficient Reconfigurable FEC Processor for Multi-standard Wireless Communication Systems

Size: px
Start display at page:

Download "Energy-efficient Reconfigurable FEC Processor for Multi-standard Wireless Communication Systems"

Transcription

1 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.17, NO.3, JUNE, 2017 ISSN(Print) ISSN(Online) Energy-efficient Reconfigurable FEC Processor for Multi-standard Wireless Communication Systems Meng Li 1, Liesbet Van der Perre 2, Wim van Thillo 1, and Youngjoo Lee 3,* Abstract In this paper, we describe HW/SW cooptimizations for reconfigurable application specific instruction-set processors (ASIPs). Based on our previous very long instruction word (VLIW) ASIP, the proposed framework realizes various forward error-correction (FEC) algorithms for wireless communication systems. In order to enhance the energy efficiency, we newly introduce several design methodologies including high-radix algorithms, tasklevel out-of-order executions, and intensive resource allocations with loop-level rescheduling. The case study on the radix-4 turbo decoding shows that the proposed techniques improve the energy efficiency by 3.7 times compared to the previous architecture. Index Terms Digital integrated circuits, error correction codes, programmable circuits, wireless communication Manuscript received Mar. 21, 2016; accepted Dec. 14, Interuniversity Microelectronics Center (IMEC) vzw, 3001 Leuven, Belgium 2 Department of Electrical Engineering, KU Leuven, 3001 Leuven, Belgium 3 Department of Electrical Engineering, POSTECH, 37673, Pohang, Korea yjlee.ims@gmail.com I. INTRODUCTION In last decades, numerous communication standards have been continuously developed to improve the connectivity of mobile devices. Basically, recent specifications are requested to satisfy the severe demands on data rate, reliability, and bandwidth efficiency. In order to increase the data integrity, iterative forward error-correction (FEC) codes have been widely accepted because of their powerful error-correcting capabilities [1-3]. Due to the different parameters from FEC standards, it is quite challenging to design highly-optimized decoder ASICs while beating the tough time-to-market (TTM) requirements [4-6]. The previous processor-based solutions may provide flexibilities for reducing the TTM, however they normally use much more hardware resources than the fixed ASICs, resulting the power hungry realizations [7-11]. To provide the flexible decoder architecture achieving an acceptable energy-efficiency, this paper presents novel design frameworks for the FEC application specific instruction-set processors (ASIPs). In contrast to the previous multi-standard approaches developing the unified hardware resources among different FEC specifications [12, 13], the proposed design procedures consider co-optimizations between the hardware architecture and software kernels. More precisely, we propose novel methodologies in algorithm, architecture, and firmware levels based on our previous flexible ASIP [8]. In the proposed method, we first investigate hardware-friendly FEC decoding algorithms. By relaxing severe congestions on register-files (RFs), the proposed high-level instructions allow the task-level out-of-order execution, which reduces the number of operating cycles. Considering the long-latency memory requests inside of a loop, the proposed loop-level rescheduling enhances the decoding throughput further by changing the order of instructions for eliminating the waiting cycles. To show the impacts of the proposed design methods, the optimized radix-4 LTE turbo decoder on the FlexFec is implemented as a prototype in a 40nm low-power (LP) CMOS process. Compared to the previous non-optimized

2 334 MENG LI et al : ENERGY-EFFICIENT RECONFIGURABLE FEC PROCESSOR FOR MULTI-STANDARD WIRELESS Flipr core 1 2 RF 3 VM1 VM2 Xbar ALU1 ALU2 Reconfigurable AGU Background Memory Scalar ALU architecture, the prototype improves the area and energy efficiencies by 3.3 and 3.7 times, respectively. The rest of this paper is organized as follows. Section II depicts the backgrounds of this work. Section III presents our design frameworks. A case study on the radix-4 turbo decoder is described and compared to the previous works in Section IV. The conclusions are made in Section V. II. BACKGROUNDS Scalar RF Scalar Fetch Scalar In this section, we describe our previous FlexFec platform, which changes its resource configurations during the design time [8]. Fig. 1 shows a block diagram of the FlexFec including multiple processing units associated with two RFs, a multi-step crossbar network (Xbar) controlled by the flexible address generation unit (AGU), and high-speed host interfaces. The decoding process is performed by the flipr core, which is an energy-efficient VLIW processor connected to a number of on-chip SRAMs denoted as gray-colored blocks. Note that the operation is programmable by initializing the proper kernels to program memory (PM), data memory (DM), and AGU. The received code is firstly moved into the background memories, which are denoted as BM1 and BM2, realizing the double-buffering scheme. In the flipr core, one scalar and five 96-way vector operations are processed simultaneously. Two on-chip memories, VM1 and VM2, are reserved for storing intermediate data with single-cycle instructions. However, the BM cannot be accessed in a cycle as the multi-step Xbar is inevitable in decoding of iterative FEC codes [4-8]. Based on the generalized instruction-set architecture (ISA), the FlexFec can support arbitrary LDPC, turbo PM BM2 BM1 Host interface Fig. 1. The reconfigurable FlexFec ASIP architecture [8]. DM and Viterbi codes, which are the most popular FEC codes. However, it is limited to increase the throughput of Flexfec by introducing more parallel operations due to the severe writing congestions on the vector RF, as the software kernels are described by using single-cycle vector instructions. If the BM through the Xbar is frequently accessed, moreover, a number of waiting cycles are used for the following instructions having data dependencies. Hence, the decoding throughput of the previous ASIP is limited by the large portion of nooperation (NOP) instructions. For a high-throughput decoder, in general, the ASIP-based solutions use multiple cores, increasing the decoding energy significantly [9-11]. As high-throughput energy-efficient flexible decoders are strongly recommended for the future wireless systems, it is urgent to develop an advanced design framework that enhances the decoding throughput without increasing the energy consumption of each ASIP core. III. PROPOSED OPTIMIZATIONS Before defining the FEC ASIP architecture, it is necessary to select the proper high-throughput algorithms, which can be realized by simple hardware resources. Numerous researches have revealed attractive solutions for the multi-standard FEC decoders [6-8, 12, 13]. Simplified FEC algorithms such as the min-sum LDPC decoding and the max-log-map turbo decoding are actively used as they are conceptually based on the similar max (or min) operations [4, 5]. Parallel structure for layered LDPC decoders and sliding-window-based turbo decoders are combined into a flexible structure by sharing the same SRAM buffers [6]. Similar to the dedicated ASIC-based decoders, in addition, high-radix decoding algorithms are also adopted to the recent ASIPbased flexible FEC decoders [10, 13]. Although highradix algorithms are effective in reducing the size of onchip memories, however, the throughput of each ASIP core cannot be enhanced drastically due to the increased number of cycles for the complex computations. In order to increase the decoding throughput of the flexible ASIP, we present several software-level optimizations that actually shorten the processing time of high-radix decoding algorithms.

3 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.17, NO.3, JUNE, cycles ALU1 T1 T1 T1 T1 T1 T1 T1 T1 T2 T2 T2 T2 T2 T2 T2 T2 T3 T3 T3 T3 T3 T4 T4 T4 T4 (a) Multi-cycle high-level instruction 20 cycles ALU1 T1 T1 T1 T1 T1 T1 T1 T1 T3 T3 T3 T3 T3 T4 T4 T4 T4 Multi-cycle high-level instruction ALU2 T2 T2 T2 T2 T2 T2 T2 T2 (b) Fig. 2. The processing sequences having data dependencies based on (a) the conventional in-order execution, (b) the proposed tasklevel out-of-order execution. 1. Task-level Out-of-order Execution Conventionally, the ISA for flexible ASIPs contains simple instructions for basic vector operations that can be completed in one cycle. As illustrated in Fig. 2(a), let several independent tasks be serially issued to an ALU. In the figure, the shaded circle denoted as Tx represents a single-cycle instruction of the x-th task, which produces a writing request on the vector RF. In a task, the dependencies between two instructions are denoted as dotted arrows. There might be dependencies between two tasks, which are represented as solid arrows in Fig. 2(a). Note that a task consists of single-cycle instructions related to each other, causing the continuous writing accesses. Even though multiple issues of tasks are possible by utilizing additional ALUs, the overall computing time is still limited by the limited bandwidth of RFs for reading the operands and storing the intermediate results. To solve the severe writing congestions, we define a new ISA by using the dedicated high-level instructions. Conceptually, a high-level instruction is a multi- cycle instruction, which is composed of several arithmetic vector operations. The demands on the RFs are naturally alleviated as the highlevel instructions reduce the number of writing requests. Therefore, it is possible to allocate other RF-writing instructions by using the non-rf-writing cycles of the high-level instruction. Due to the serialized dependencies inside of a task, however, it is hard to collect the available instructions for non-rf-writing cycles. In our work, the task- level out-of-order execution is proposed for the parallel issue of following tasks, which are independent of the current task. As depicted in Fig. 2(b), for example, the first task includes a high-level instruction whose internal cycles are represented as squares, where the shaded node only makes requests on the RF. The instructions in the next independent task, i.e., T2, can be performed earlier at the second ALU by accessing the RF without any congestion. As a result, the processing cycles can be shortened by using the tasklevel out-of-order execution. 2. Multi-level ALU Architecture In general, the previous pipelined ASIPs process all the instructions sequentially by using the generalized data-path [7]. Although the generalized data-path provides the maximum level of flexibility, it requires a number of operating cycles due to the in-order processing. To support the proposed task-level out-oforder execution effectively, as shown in Fig. 3, we introduce the n-level ALU, where each level performs the pre- defined vector operation at the corresponding processing unit (PUx). Note that the vector instruction is issued from the first level, i.e., PU1. In every cycle, each PU transfers its instruction to the next PU with the proper intermediate results, until the instruction reaches the last cycle defined by ISA. Controlled by the wide multiplexor, the vector RF is accessed once in a cycle by selecting the corresponding level whose instruction is completed. Note that it is unnecessary to employ the RF-writing paths for every level. According to the new high-level instruction, the workloads of each PU have to be carefully distributed for restricting the number of RF-writing paths,

4 336 MENG LI et al : ENERGY-EFFICIENT RECONFIGURABLE FEC PROCESSOR FOR MULTI-STANDARD WIRELESS Issued instruction PU1 preserving the original critical delay as much as possible. To reduce the complexity, in addition, the first level of the ALU can be combined with the previous ALU that performs a simple operation in one cycle. Instead of using the additional ALU for parallel processing in Fig. 2(b), therefore, the out-of-order execution can be naturally implemented in a single multi-level ALU as shown in Fig. 4. While the multi-cycle instruction is performed by changing the level, the independent second task can be issued to the first PU of the proposed ALU. In summary, the multi-level architecture successfully supports the proposed task-level out-of-order execution, leading to the significant reduction in terms of the processing cycles as well as the complexity. As the size of the PM increases by using the additional ALUs in the VLIW architecture, in addition, the proposed ALU also achieves the memory-reduced ASIP. 3. Loop-level Rescheduling PU2 RF Multiplexor PUn Level-1 Level-2 Level-n Fig. 3. The proposed multi-level vector ALU architecture. Fig. 4. The task-level out-of-order execution using the proposed multi-level vector ALU. In the multi-standard FEC solutions, the flexible multistep crossbar is normally used for supporting various interleaving patterns [7, 8, 11]. Hence, the previous software suffers from the long latency of reading a codeword. For example, the previous FlexFec uses 5 cycles for accessing the BM [8]. In the proposed framework, we focus on that the iterative decoding process normally has numerous loops for performing the identical tasks in each bit position. Fig. 5(a) illustrates the conventional processing scenario of the loop operation using long-latency load instructions. For the sake of simplicity, the single-cycle arithmetic vector instructions are denoted as Ax without constructing tasks. The 5-cycle load operation is denoted as triangular nodes, where the shaded node accesses the vector RF to store the reading-out codeword. Similar to the previous figures, the dotted arrows show the data dependencies inside of the loop. Note that the following operations have to wait the completion of the load instruction due to the dependency, although the dedicated unit activates the memory accesses in parallel. If there are multiple loads in the loop, moreover, the nonworking waiting time is increased significantly, causing the severe throughput degradations. In the proposed method, we reorganize the processing order inside of the loop as shown in Fig. 5(b). Based on the proposed rescheduling, the load operation reads the memory for the next iteration of the loop, and the arithmetic operations of the current iteration no longer suffer from the time-consuming memory accesses. More precisely, the instruction denoted as A4 becomes the first operation of the loop in the proposed method, and the required loading operation for A4 is performed at the end of the loop to prepare the next iteration. As the proposed rescheduling is conceptually similar to the software pipelining technique [14], there are some additional cycles for the prologue and the epilogue of the first and the last iterations, respectively. By eliminating the waiting time, the processing cycles in a loop is reduced from 15 to 10 as exemplified in Fig. 5(b). In other word, the hardware utilization is maximized by the proposed rescheduling, reducing the processing time remarkably to achieve an energy-efficient FEC ASIP solution. IV. IMPLEMENTATION RESULTS To improve the throughput as well as the energy efficiency, we design a prototype ASIP-based flexible FEC decoder based on the proposed design methods. In this section, the radix-4 LTE turbo decoding on our ASIP is detailed as a case study. Based on the same design concepts, the radix-4 LDPC and Viterbi decoders are also implemented on the prototype ASIP. For the algorithm-level improvement of a turbo

5 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.17, NO.3, JUNE, cycle load instruction 15 cycles in a loop 5-cycle load instruction ALU A1 A2 A3 A4 A5 A6 A7 A8 (a) 10 cycles for prologue 10 cycles in a loop 5-cycle load instruction 5-cycle load instruction 5-cycle load instruction 5-cycle load instruction 5 cycles for epilogue ALU A1 A2 A3 A4 A5 A6 A7 A8 A1 A2 A3 A4 A5 A6 A7 A8 (b) Fig. 5. (a) The conventional loop processing with long-latency loads, (b) the processing sequence using the proposed loop-level rescheduling. Table 1. High-level instructions for Radix-4 Turbo Decoding Processing latency 2 cycles Branch metric Metric calculation Forward recursion Reliability generation Backward recursion Reliability generation Output generation Reliability generation 3 cycles ACS tree ACS tree ACS tree 4 cycles 5 cycles Codword collection activation activation ACS: Addition, comparison and selection LLR: Log-likelihood ratio Extrinsic value calculation activation LLR calculation activation decoder, we first select the radix-4 decoding algorithm, which is accepted at the recent decoder ASICs [4, 6]. Without changing the original ASIP architecture, the firmware is re-designed for the radix-4 processing at this level. To shorten the decoding time further, we split the firmware into several tasks and define the high-level instructions in four categories, i.e., branch metric calculations, forward/ backward recursions, and output generations [4]. In this case study, as shown in Table 1, 11 high-level instructions are newly introduced by taking up to 5 processing cycles. Note that all the high-level instructions are basically multi-cycle instructions. Therefore, as shown in the previous section, the multilevel ALU supporting the task-level out-of-execution can relax the intensive writing requests on the vector RF, leading to the energy-efficient decoding operations. Fig. 6 illustrates the processing steps of backward recursion in turbo decoding algorithm based on the proposed highlevel instructions. Compared to the conventional serialized operations, it is noticeable that the proposed task-level out-of-order technique successfully reduces the latency of backward recursion by 20%. Number of processing cycles 6x10 4 5x10 4 4x10 4 3x10 4 2x Without optimizations Number of processing cycles Radix-4 decoding algorithm Task-level out-of-order execution Decoding throughput Loop-level optimization Fig. 6. Cycle reductions and throughput improvements. Due to the native iterations in turbo decoding process, the loop-level rescheduling is additionally performed on forward and backward recursions, which are associated with long-latency load operations as depicted in Fig. 6. A number of instructions can be processed in parallel with the load by breaking the data dependencies inside of the loop, minimizing the overall processing cycles. Fig. 7 depicts how the proposed optimizations reduce the number of cycles for decoding a 6144b LTE turbo code by using the prototype ASIPs. The maximum number of turbo iterations is equally set to six. By reducing the processing cycles in each design step, the proposed schemes shorten the total number of required cycles by 6.35 times compared to the radix-2 turbo decoding on the previous FlexFec ASIP [8]. To investigate the impacts of the proposed work, all the architectures are designed at the speed of 450 MHz in a 40 nm CMOS process. As shown in Fig. 7, the Decoding throughput (Mb/s)

6 338 MENG LI et al : ENERGY-EFFICIENT RECONFIGURABLE FEC PROCESSOR FOR MULTI-STANDARD WIRELESS Energy efficiency (nj/b) Without optimizations Radix-4 decoding algorithm Task-level out-of-order execution Multi-level ALU architecture Loop-level rescheduling Fig. 7. Backward recursion example in radix-4 turbo decoding on the proposed flexible processor. throughput is gradually increased by using the proposed schemes. Note that the fully-optimized ASIP-based turbo decoder achieves a throughput of 315 Mb/s, which can support up to the category-5 of LTE systems [1]. Fig. 8 shows how the proposed design methods improve the area and energy efficiencies of turbo decoders, where the efficiencies are defined as follows: Area efficiency (μm 2 s/b) 2 Area (mm ) = (1) Throughput (Mb/s) Decoding power (mw) Energy efficiency (nj/b) = Throughput (Mb/s) By applying the radix-4 algorithm, as shown in Fig. 8, the decoder becomes cost-effective in terms of area and energy consumption. As the straight-forward radix-4 firmware cannot meet the high-throughput demands as shown in Fig. 7, the task-level out-of-order execution enhances the throughput by utilizing hardware resources in parallel. The multi-level ALU compensates area overheads by reducing the PM size and the loop-level optimization finally improves the energy efficiency by changing the order of operations in a loop. As a result, the proposed work reduces the area and energy efficiencies for LTE turbo decoding by 3.3 and 3.7 times, respectively. The implementation results of the prototype FEC ASIP are summarized and compared to the previous works in Table 2. For the fair comparison, we normalize all the efficiencies to 40 nm CMOS whose reference voltage is 0.9 V. In addition, the maximum number of iterations for turbo and LDPC decoding scenarios are set to six and ten, (2) Area efficiency (μm 2 s/b) Fig. 8. Area and energy efficiencies. Table 2. Implementation results of ASIP-based FEC decoders Turbo LDPC Viterbi This work [7] [8] [9] [10] Process (nm) Voltage (V) 0.9 N. A N. A. Area (mm 2 ) Frequency (MHz) Standards Throughput (Mb/s) Norm. area eff. (μm2 s/b) Norm. energy eff. (nj/b) Throughput (Mb/s) Norm. area eff. (μm2 s/b) Norm. energy eff. (nj/b) Throughput (Mb/s) Norm. area eff. (μm2 s/b) Norm. energy eff. (nj/b) LTE WiFi WiMAX LTE LTE WiFi WiFi WiMAX WiMAX N. A. LTE WiFi WiMAX N. A N. A N. A N. A N. A. Norm. area eff. = (40/Process) 2 Area efficiency Norm. energy eff. = (40/Process) (0.9/Voltage) 2 Energy efficiency respectively. Note the proposed ASIP can support arbitrary LDPC, turbo and Viterbi codes. According to the proposed novel optimizations, the prototype ASIP offers an attractive multi-standard FEC decoder. In case of the turbo decoding for LTE systems, for example, the proposed ASIP-based work achieves the highest decoding throughput among the existing ASIP-based turbo decoders, while providing the similar area and

7 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.17, NO.3, JUNE, energy efficiencies. VI. CONCLUSION In this paper, we have presented several design schemes to enhance the energy efficiency of the multistandard FEC ASIP. By introducing the advanced methods on algorithm, software firmware and hardware structure, the proposed work reduces the number of processing cycles. The case study on the radix-4 turbo decoding shows that proposed framework achieves a sufficient decoding throughput for the recent wireless systems, while lowering the area and energy efficiencies remarkably. ACKNOWLEDGMENTS This work was supported by the National Research Foundation (NRF) of Korea grants funded by the Korea government (MSIP) (2016R1C1B ). REFERENCES [1] Multiplexing and Channel Coding, 3GPP TS , Rev , Jun [2] IEEE Standard for Local and metropolitan area networks, Part 16: Air Interface for Fixed Broadband Wireless Access Systems, IEEE Std e-2005, [3] IEEE Standard for Information Technology Telecommunications and Information Exchange between Local and Metropolitan Area Networks Specific Requirements Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications, IEEE Std n- 2009, [4] W. Byun, H. Kim, and J.-H. Kim, High throughput radix-4 SISO decoding architecture with reduced memory requirement, J. Semicon. Technol. Sci., vol. 14, no. 4, pp , Aug [5] Y.-M. Jung, C.-H. Chung, Y.-H. Jung, and J.-S. Kim, 7.7 Gbps encoder design for IEEE ac QC-LDPC Codes, J. Semicond. Technol. Sci., vol. 14, no. 4, pp , Aug [6] C. Condo, M. Martina, and G. Masera, VLSI implementation of a multi-mode turbo/ldpc decoder architecture, IEEE Trans. Circuits Syst. I, Reg. Paper, vol. 60, no. 6, pp , June [7] Z. Wu and D. Liu, Flexible multistandard FEC processor design with ASIP methodology, in Proc. IEEE Int. Conf. Application-specific Systems, Architectures and Processors (ASAP), 2014, pp [8] F. Naessens et al., A mm mw reconfigurable LDPC and turbo encoder and decoder for n, e and 3GPP-LTE, in Proc. IEEE Symp. VLSI Circuits, 2010, pp [9] B. Noethen et al., A 105GOPS 36mm 2 heterogeneous SDR MPSoC with energy-aware dynamic scheduling and iterative detectiondecoding for 4G in 65nm CMOS, in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), 2014, pp [10] P. Murugapp, R. Al-Khayat, A. Baghdadi, and M. Jezequel, A flexible high throughput multi-asip architecture for LDPC and turbo decoding, in Proc. Design, Automation Test in Europe Conf. Exhib. (DATE), 2012, pp [11] Z. Wu, D. Liu, Z. Yang, Q. Wang, and W. Zhou, FPGA implementation of a multi-algorithm parallel FEC for SDR platforms, in Proc. IEEE Int. Conf. Field Programmable Logic and Applications (FPL), 2014, pp [12] S. Kunze, E. Matus, G. Fettweis, and T. Kobori, Combining LDPC, turbo and Viterbi decoders: Benefits and cost, in Proc. Int. Workshop on Signal Process. Syst. (SiPS), 2011, pp [13] J. Dion, M. Hamon, P. Penard, M. Arzel and M. Jezequel, Multi-standard trellis-based FEC decoder, in Proc. IEEE Conf. Design and Architectures for Signal and Image Processing (DASIP), 2012, pp [14] J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach, San Mateo, CA, USA: Morgan Kaufmann, 2011.

8 340 MENG LI et al : ENERGY-EFFICIENT RECONFIGURABLE FEC PROCESSOR FOR MULTI-STANDARD WIRELESS Meng Li received a PH.D. degree from Telecom Bretagne in electrical engineering, France in She joined the Green Radio Group of IMEC in 2012 and now she is a senior research engineer. Her research interests cover high speed and low power digital circuit design for essential components in wireless baseband, especially with the design of decoder for error control codes. Liesbet Van der Perre received the M.Sc. degree in electrical engineering from the KU Leuven, Belgium in The research for her thesis was completed at the Ecole Nationale Superieure de Telecommunications in Paris. She graduated summa cu laude with a Ph.D. degree in electrical engineering from the same university in After finishing her Ph.D. on propagation modeling at the Telecommunications group of the KU Leuven, Belgium, Dr. Van der Perre joined IMEC in 1997 in the wireless group. She took up responsibilities as system architect, project leader and program manager, scientific and program director with a focus on energy efficiency in broadband communication till Currently, she is a professor of electrical engineering department of the KU Leuven. She s an author and co-author of over 300 scientific publications. She was appointed honorary doctor at the faculty of engineering LTH, Lund University, in Wim Van Thillo received his master degree in electrical engineering and his undergraduate degree in business economics from the Katholieke Universiteit Leuven, Belgium. He obtained a PhD degree from the same university based on his research in IMEC s wireless communications group. In 2008, he was a visiting researcher at UC Berkeley s Connectivity Lab. From 2012 to 2014 he led IMEC s 79 GHz radar research program. Since January 2015 Wim is responsible for IMEC s R&D in cellular and WiFi transceivers, 60 GHz communications, 79 GHz radar and 140 GHz sensors. Youngjoo Lee received the B.S., M.S. and Ph.D. degrees in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea, in 2008, 2010 and 2014, respectively. Since February 2017, he has been an Assistant Professor in the department of Electrical Engineering, POSTECH, Pohang, Korea. Prior to joining POSTECH, he was with Interuniversity Microelectronics Center (IMEC), Leuven, Belgium, from May 2014 to February 2015, where he researched reconfigurable SoC platforms for software-defined radio systems. From March 2015 to February 2017, he was with the Faculty of the Department Electronic Engineering, Kwangwoon University, Seoul, Korea. His current research interests include the algorithms and architectures for embedded processors, intelligent transportation systems, advanced error-correction codes, and mixed-signal circuit designs.

THE turbo code is one of the most attractive forward error

THE turbo code is one of the most attractive forward error IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 63, NO. 2, FEBRUARY 2016 211 Memory-Reduced Turbo Decoding Architecture Using NII Metric Compression Youngjoo Lee, Member, IEEE, Meng

More information

/$ IEEE

/$ IEEE IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 56, NO. 1, JANUARY 2009 81 Bit-Level Extrinsic Information Exchange Method for Double-Binary Turbo Codes Ji-Hoon Kim, Student Member,

More information

ISSCC 2003 / SESSION 8 / COMMUNICATIONS SIGNAL PROCESSING / PAPER 8.7

ISSCC 2003 / SESSION 8 / COMMUNICATIONS SIGNAL PROCESSING / PAPER 8.7 ISSCC 2003 / SESSION 8 / COMMUNICATIONS SIGNAL PROCESSING / PAPER 8.7 8.7 A Programmable Turbo Decoder for Multiple 3G Wireless Standards Myoung-Cheol Shin, In-Cheol Park KAIST, Daejeon, Republic of Korea

More information

High Throughput Radix-4 SISO Decoding Architecture with Reduced Memory Requirement

High Throughput Radix-4 SISO Decoding Architecture with Reduced Memory Requirement JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.14, NO.4, AUGUST, 2014 http://dx.doi.org/10.5573/jsts.2014.14.4.407 High Throughput Radix-4 SISO Decoding Architecture with Reduced Memory Requirement

More information

Memory-Reduced Turbo Decoding Architecture Using NII Metric Compression

Memory-Reduced Turbo Decoding Architecture Using NII Metric Compression Memory-Reduced Turbo Decoding Architecture Using NII Metric Compression Syed kareem saheb, Research scholar, Dept. of ECE, ANU, GUNTUR,A.P, INDIA. E-mail:sd_kareem@yahoo.com A. Srihitha PG student dept.

More information

BER Guaranteed Optimization and Implementation of Parallel Turbo Decoding on GPU

BER Guaranteed Optimization and Implementation of Parallel Turbo Decoding on GPU 2013 8th International Conference on Communications and Networking in China (CHINACOM) BER Guaranteed Optimization and Implementation of Parallel Turbo Decoding on GPU Xiang Chen 1,2, Ji Zhu, Ziyu Wen,

More information

A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors

A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors Brent Bohnenstiehl and Bevan Baas Department of Electrical and Computer Engineering University of California, Davis {bvbohnen,

More information

Stopping-free dynamic configuration of a multi-asip turbo decoder

Stopping-free dynamic configuration of a multi-asip turbo decoder 2013 16th Euromicro Conference on Digital System Design Stopping-free dynamic configuration of a multi-asip turbo decoder Vianney Lapotre, Purushotham Murugappa, Guy Gogniat, Amer Baghdadi, Michael Hübner

More information

RECENTLY, researches on gigabit wireless personal area

RECENTLY, researches on gigabit wireless personal area 146 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 55, NO. 2, FEBRUARY 2008 An Indexed-Scaling Pipelined FFT Processor for OFDM-Based WPAN Applications Yuan Chen, Student Member, IEEE,

More information

A Scalable Multi-Core ASIP Virtual Platform For Standard-Compliant Trellis Decoding

A Scalable Multi-Core ASIP Virtual Platform For Standard-Compliant Trellis Decoding A Scalable Multi-Core ASIP Virtual Platform For Standard-Compliant Trellis Decoding Matthias Jung, Christian Brehm, Norbert Wehn Microelectronic Systems Design Research Group University of Kaiserslautern,

More information

High Speed Downlink Packet Access efficient turbo decoder architecture: 3GPP Advanced Turbo Decoder

High Speed Downlink Packet Access efficient turbo decoder architecture: 3GPP Advanced Turbo Decoder I J C T A, 9(24), 2016, pp. 291-298 International Science Press High Speed Downlink Packet Access efficient turbo decoder architecture: 3GPP Advanced Turbo Decoder Parvathy M.*, Ganesan R.*,** and Tefera

More information

Linköping University Post Print. epuma: a novel embedded parallel DSP platform for predictable computing

Linköping University Post Print. epuma: a novel embedded parallel DSP platform for predictable computing Linköping University Post Print epuma: a novel embedded parallel DSP platform for predictable computing Jian Wang, Joar Sohl, Olof Kraigher and Dake Liu N.B.: When citing this work, cite the original article.

More information

ASIP LDPC DESIGN FOR AD AND AC

ASIP LDPC DESIGN FOR AD AND AC ASIP LDPC DESIGN FOR 802.11AD AND 802.11AC MENG LI CSI DEPARTMENT 3/NOV/2014 GDR-ISIS @ TELECOM BRETAGNE BREST FRANCE OUTLINES 1. Introduction of IMEC and CSI department 2. ASIP design flow 3. Template

More information

THE orthogonal frequency-division multiplex (OFDM)

THE orthogonal frequency-division multiplex (OFDM) 26 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 57, NO. 1, JANUARY 2010 A Generalized Mixed-Radix Algorithm for Memory-Based FFT Processors Chen-Fong Hsiao, Yuan Chen, Member, IEEE,

More information

A 256-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology

A 256-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology http://dx.doi.org/10.5573/jsts.014.14.6.760 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.14, NO.6, DECEMBER, 014 A 56-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology Sung-Joon Lee

More information

A scalable, fixed-shuffling, parallel FFT butterfly processing architecture for SDR environment

A scalable, fixed-shuffling, parallel FFT butterfly processing architecture for SDR environment LETTER IEICE Electronics Express, Vol.11, No.2, 1 9 A scalable, fixed-shuffling, parallel FFT butterfly processing architecture for SDR environment Ting Chen a), Hengzhu Liu, and Botao Zhang College of

More information

Flexible wireless communication architectures

Flexible wireless communication architectures Flexible wireless communication architectures Sridhar Rajagopal Department of Electrical and Computer Engineering Rice University, Houston TX Faculty Candidate Seminar Southern Methodist University April

More information

EFFICIENT RECURSIVE IMPLEMENTATION OF A QUADRATIC PERMUTATION POLYNOMIAL INTERLEAVER FOR LONG TERM EVOLUTION SYSTEMS

EFFICIENT RECURSIVE IMPLEMENTATION OF A QUADRATIC PERMUTATION POLYNOMIAL INTERLEAVER FOR LONG TERM EVOLUTION SYSTEMS Rev. Roum. Sci. Techn. Électrotechn. et Énerg. Vol. 61, 1, pp. 53 57, Bucarest, 016 Électronique et transmission de l information EFFICIENT RECURSIVE IMPLEMENTATION OF A QUADRATIC PERMUTATION POLYNOMIAL

More information

Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver

Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver E.Kanniga 1, N. Imocha Singh 2,K.Selva Rama Rathnam 3 Professor Department of Electronics and Telecommunication, Bharath

More information

Design of Low-Power and Low-Latency 256-Radix Crossbar Switch Using Hyper-X Network Topology

Design of Low-Power and Low-Latency 256-Radix Crossbar Switch Using Hyper-X Network Topology JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.15, NO.1, FEBRUARY, 2015 http://dx.doi.org/10.5573/jsts.2015.15.1.077 Design of Low-Power and Low-Latency 256-Radix Crossbar Switch Using Hyper-X Network

More information

FAST FOURIER TRANSFORM (FFT) and inverse fast

FAST FOURIER TRANSFORM (FFT) and inverse fast IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 11, NOVEMBER 2004 2005 A Dynamic Scaling FFT Processor for DVB-T Applications Yu-Wei Lin, Hsuan-Yu Liu, and Chen-Yi Lee Abstract This paper presents an

More information

A New List Decoding Algorithm for Short-Length TBCCs With CRC

A New List Decoding Algorithm for Short-Length TBCCs With CRC Received May 15, 2018, accepted June 11, 2018, date of publication June 14, 2018, date of current version July 12, 2018. Digital Object Identifier 10.1109/ACCESS.2018.2847348 A New List Decoding Algorithm

More information

Low Complexity Architecture for Max* Operator of Log-MAP Turbo Decoder

Low Complexity Architecture for Max* Operator of Log-MAP Turbo Decoder International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Low

More information

A VLSI Architecture for H.264/AVC Variable Block Size Motion Estimation

A VLSI Architecture for H.264/AVC Variable Block Size Motion Estimation Journal of Automation and Control Engineering Vol. 3, No. 1, February 20 A VLSI Architecture for H.264/AVC Variable Block Size Motion Estimation Dam. Minh Tung and Tran. Le Thang Dong Center of Electrical

More information

Twiddle Factor Transformation for Pipelined FFT Processing

Twiddle Factor Transformation for Pipelined FFT Processing Twiddle Factor Transformation for Pipelined FFT Processing In-Cheol Park, WonHee Son, and Ji-Hoon Kim School of EECS, Korea Advanced Institute of Science and Technology, Daejeon, Korea icpark@ee.kaist.ac.kr,

More information

Real-time and smooth scalable video streaming system with bitstream extractor intellectual property implementation

Real-time and smooth scalable video streaming system with bitstream extractor intellectual property implementation LETTER IEICE Electronics Express, Vol.11, No.5, 1 6 Real-time and smooth scalable video streaming system with bitstream extractor intellectual property implementation Liang-Hung Wang 1a), Yi-Mao Hsiao

More information

A CORDIC Algorithm with Improved Rotation Strategy for Embedded Applications

A CORDIC Algorithm with Improved Rotation Strategy for Embedded Applications A CORDIC Algorithm with Improved Rotation Strategy for Embedded Applications Kui-Ting Chen Research Center of Information, Production and Systems, Waseda University, Fukuoka, Japan Email: nore@aoni.waseda.jp

More information

The Serial Commutator FFT

The Serial Commutator FFT The Serial Commutator FFT Mario Garrido Gálvez, Shen-Jui Huang, Sau-Gee Chen and Oscar Gustafsson Journal Article N.B.: When citing this work, cite the original article. 2016 IEEE. Personal use of this

More information

High Performance Memory Read Using Cross-Coupled Pull-up Circuitry

High Performance Memory Read Using Cross-Coupled Pull-up Circuitry High Performance Memory Read Using Cross-Coupled Pull-up Circuitry Katie Blomster and José G. Delgado-Frias School of Electrical Engineering and Computer Science Washington State University Pullman, WA

More information

An Area-Efficient BIRA With 1-D Spare Segments

An Area-Efficient BIRA With 1-D Spare Segments 206 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 26, NO. 1, JANUARY 2018 An Area-Efficient BIRA With 1-D Spare Segments Donghyun Kim, Hayoung Lee, and Sungho Kang Abstract The

More information

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on on-chip Donghyun Kim, Kangmin Lee, Se-joong Lee and Hoi-Jun Yoo Semiconductor System Laboratory, Dept. of EECS, Korea Advanced

More information

LLR-based Successive-Cancellation List Decoder for Polar Codes with Multi-bit Decision

LLR-based Successive-Cancellation List Decoder for Polar Codes with Multi-bit Decision > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLIC HERE TO EDIT < LLR-based Successive-Cancellation List Decoder for Polar Codes with Multi-bit Decision Bo Yuan and eshab. Parhi, Fellow,

More information

DUE to the high computational complexity and real-time

DUE to the high computational complexity and real-time IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 3, MARCH 2005 445 A Memory-Efficient Realization of Cyclic Convolution and Its Application to Discrete Cosine Transform Hun-Chen

More information

Programmable Turbo Decoder Supporting Multiple Third-Generation Wireless Standards

Programmable Turbo Decoder Supporting Multiple Third-Generation Wireless Standards Programmable Turbo Decoder Supporting Multiple Third-eneration Wireless Standards Myoung-Cheol Shin and In-Cheol Park Department of Electrical Engineering and Computer Science, KAIST Yuseong-gu Daejeon,

More information

Three DIMENSIONAL-CHIPS

Three DIMENSIONAL-CHIPS IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) ISSN: 2278-2834, ISBN: 2278-8735. Volume 3, Issue 4 (Sep-Oct. 2012), PP 22-27 Three DIMENSIONAL-CHIPS 1 Kumar.Keshamoni, 2 Mr. M. Harikrishna

More information

Delay efficient mux approach for finding the first two minimum values

Delay efficient mux approach for finding the first two minimum values Delay efficient mux approach for finding the first two minimum values Nakka Sivaraju¹ S Suman² ¹PG scholar, ECE Department, CEC, AP, India ²Assistant Professor, ECE Department, CEC, AP, India Abstract

More information

Adding C Programmability to Data Path Design

Adding C Programmability to Data Path Design Adding C Programmability to Data Path Design Gert Goossens Sr. Director R&D, Synopsys May 6, 2015 1 Smart Products Drive SoC Developments Feature-Rich Multi-Sensing Multi-Output Wirelessly Connected Always-On

More information

EFFICIENT PARALLEL MEMORY ORGANIZATION FOR TURBO DECODERS

EFFICIENT PARALLEL MEMORY ORGANIZATION FOR TURBO DECODERS In Proceedings of the European Signal Processing Conference, pages 831-83, Poznan, Poland, September 27. EFFICIENT PARALLEL MEMORY ORGANIZATION FOR TURBO DECODERS Perttu Salmela, Ruirui Gu*, Shuvra S.

More information

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017 Design of Low Power Adder in ALU Using Flexible Charge Recycling Dynamic Circuit Pallavi Mamidala 1 K. Anil kumar 2 mamidalapallavi@gmail.com 1 anilkumar10436@gmail.com 2 1 Assistant Professor, Dept of

More information

Baseline V IRAM Trimedia. Cycles ( x 1000 ) N

Baseline V IRAM Trimedia. Cycles ( x 1000 ) N CS 252 COMPUTER ARCHITECTURE MAY 2000 An Investigation of the QR Decomposition Algorithm on Parallel Architectures Vito Dai and Brian Limketkai Abstract This paper presents an implementation of a QR decomposition

More information

A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications

A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications Ju-Ho Sohn, Jeong-Ho Woo, Min-Wuk Lee, Hye-Jung Kim, Ramchan Woo, Hoi-Jun Yoo Semiconductor System

More information

MPSOC 2011 BEAUNE, FRANCE

MPSOC 2011 BEAUNE, FRANCE MPSOC 2011 BEAUNE, FRANCE BOADRES: A SCALABLE BASEBAND PROCESSOR TEMPLATE FOR Gbps RADIOS VICE PRESIDENT, CHAIRMAN OF THE TECHNOLOGY OFFICE PROFESSOR AT THE KATHOLIEKE UNIVERSITEIT LEUVEN STATUS SDR BASEBAND

More information

Efficient Implementation of Single Error Correction and Double Error Detection Code with Check Bit Precomputation

Efficient Implementation of Single Error Correction and Double Error Detection Code with Check Bit Precomputation http://dx.doi.org/10.5573/jsts.2012.12.4.418 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.12, NO.4, DECEMBER, 2012 Efficient Implementation of Single Error Correction and Double Error Detection

More information

Low-Power Adaptive Viterbi Decoder for TCM Using T-Algorithm

Low-Power Adaptive Viterbi Decoder for TCM Using T-Algorithm International Journal of Scientific and Research Publications, Volume 3, Issue 8, August 2013 1 Low-Power Adaptive Viterbi Decoder for TCM Using T-Algorithm MUCHHUMARRI SANTHI LATHA*, Smt. D.LALITHA KUMARI**

More information

Non-recursive complexity reduction encoding scheme for performance enhancement of polar codes

Non-recursive complexity reduction encoding scheme for performance enhancement of polar codes Non-recursive complexity reduction encoding scheme for performance enhancement of polar codes 1 Prakash K M, 2 Dr. G S Sunitha 1 Assistant Professor, Dept. of E&C, Bapuji Institute of Engineering and Technology,

More information

IEEE Proof Web Version

IEEE Proof Web Version IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1 From Parallelism Levels to a Multi-ASIP Architecture for Turbo Decoding Olivier Muller, Member, IEEE, Amer Baghdadi, and Michel Jézéquel,

More information

DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech)

DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech) DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech) K.Prasad Babu 2 M.tech (Ph.d) hanumanthurao19@gmail.com 1 kprasadbabuece433@gmail.com 2 1 PG scholar, VLSI, St.JOHNS

More information

A Network Storage LSI Suitable for Home Network

A Network Storage LSI Suitable for Home Network 258 HAN-KYU LIM et al : A NETWORK STORAGE LSI SUITABLE FOR HOME NETWORK A Network Storage LSI Suitable for Home Network Han-Kyu Lim*, Ji-Ho Han**, and Deog-Kyoon Jeong*** Abstract Storage over (SoE) is

More information

AN FFT PROCESSOR BASED ON 16-POINT MODULE

AN FFT PROCESSOR BASED ON 16-POINT MODULE AN FFT PROCESSOR BASED ON 6-POINT MODULE Weidong Li, Mark Vesterbacka and Lars Wanhammar Electronics Systems, Dept. of EE., Linköping University SE-58 8 LINKÖPING, SWEDEN E-mail: {weidongl, markv, larsw}@isy.liu.se,

More information

Quantized Iterative Message Passing Decoders with Low Error Floor for LDPC Codes

Quantized Iterative Message Passing Decoders with Low Error Floor for LDPC Codes Quantized Iterative Message Passing Decoders with Low Error Floor for LDPC Codes Xiaojie Zhang and Paul H. Siegel University of California, San Diego 1. Introduction Low-density parity-check (LDPC) codes

More information

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm 1 A.Malashri, 2 C.Paramasivam 1 PG Student, Department of Electronics and Communication K S Rangasamy College Of Technology,

More information

High Speed Special Function Unit for Graphics Processing Unit

High Speed Special Function Unit for Graphics Processing Unit High Speed Special Function Unit for Graphics Processing Unit Abd-Elrahman G. Qoutb 1, Abdullah M. El-Gunidy 1, Mohammed F. Tolba 1, and Magdy A. El-Moursy 2 1 Electrical Engineering Department, Fayoum

More information

Parallel-computing approach for FFT implementation on digital signal processor (DSP)

Parallel-computing approach for FFT implementation on digital signal processor (DSP) Parallel-computing approach for FFT implementation on digital signal processor (DSP) Yi-Pin Hsu and Shin-Yu Lin Abstract An efficient parallel form in digital signal processor can improve the algorithm

More information

/$ IEEE

/$ IEEE IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL. 56, NO. 5, MAY 2009 1005 Low-Power Memory-Reduced Traceback MAP Decoding for Double-Binary Convolutional Turbo Decoder Cheng-Hung Lin,

More information

High-performance and Low-power Consumption Vector Processor for LTE Baseband LSI

High-performance and Low-power Consumption Vector Processor for LTE Baseband LSI High-performance and Low-power Consumption Vector Processor for LTE Baseband LSI Yi Ge Mitsuru Tomono Makiko Ito Yoshio Hirose Recently, the transmission rate for handheld devices has been increasing by

More information

A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter

A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter A.S. Sneka Priyaa PG Scholar Government College of Technology Coimbatore ABSTRACT The Least Mean Square Adaptive Filter is frequently

More information

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 6T- SRAM for Low Power Consumption Mrs. J.N.Ingole 1, Ms.P.A.Mirge 2 Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 PG Student [Digital Electronics], Dept. of ExTC, PRMIT&R,

More information

LOW-DENSITY parity-check (LDPC) codes, which are defined

LOW-DENSITY parity-check (LDPC) codes, which are defined 734 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 56, NO. 9, SEPTEMBER 2009 Design of a Multimode QC-LDPC Decoder Based on Shift-Routing Network Chih-Hao Liu, Chien-Ching Lin, Shau-Wei

More information

A Low Power 32 Bit CMOS ROM Using a Novel ATD Circuit

A Low Power 32 Bit CMOS ROM Using a Novel ATD Circuit International Journal of Electrical and Computer Engineering (IJECE) Vol. 3, No. 4, August 2013, pp. 509~515 ISSN: 2088-8708 509 A Low Power 32 Bit CMOS ROM Using a Novel ATD Circuit Sidhant Kukrety*,

More information

Efficient VLSI Huffman encoder implementation and its application in high rate serial data encoding

Efficient VLSI Huffman encoder implementation and its application in high rate serial data encoding LETTER IEICE Electronics Express, Vol.14, No.21, 1 11 Efficient VLSI Huffman encoder implementation and its application in high rate serial data encoding Rongshan Wei a) and Xingang Zhang College of Physics

More information

Overlapped Scheduling for Folded LDPC Decoding Based on Matrix Permutation

Overlapped Scheduling for Folded LDPC Decoding Based on Matrix Permutation Overlapped Scheduling for Folded LDPC Decoding Based on Matrix Permutation In-Cheol Park and Se-Hyeon Kang Department of Electrical Engineering and Computer Science, KAIST {icpark, shkang}@ics.kaist.ac.kr

More information

A New MIMO Detector Architecture Based on A Forward-Backward Trellis Algorithm

A New MIMO Detector Architecture Based on A Forward-Backward Trellis Algorithm A New MIMO etector Architecture Based on A Forward-Backward Trellis Algorithm Yang Sun and Joseph R Cavallaro epartment of Electrical and Computer Engineering Rice University, Houston, TX 775 Email: {ysun,

More information

Software Defined Modems for The Internet of Things. Dr. John Haine, IP Operations Manager

Software Defined Modems for The Internet of Things. Dr. John Haine, IP Operations Manager Software Defined Modems for The Internet of Things Dr. John Haine, IP Operations Manager www.cognovo.com What things? 20 billion connected devices Manufactured for global markets Low cost Lifetimes from

More information

Development of Dependable Wireless System and Device

Development of Dependable Wireless System and Device December 6, 2013 JST International Symposium on Dependable VLSI Systems 2013 Development of Dependable Wireless System and Device Research Director: Kazuo Tsubouchi, Tohoku University Members: Akira Matsuzawa,

More information

High-Performance VLSI Architecture of H.264/AVC CAVLD by Parallel Run_before Estimation Algorithm *

High-Performance VLSI Architecture of H.264/AVC CAVLD by Parallel Run_before Estimation Algorithm * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 29, 595-605 (2013) High-Performance VLSI Architecture of H.264/AVC CAVLD by Parallel Run_before Estimation Algorithm * JONGWOO BAE 1 AND JINSOO CHO 2,+ 1

More information

A MULTIBANK MEMORY-BASED VLSI ARCHITECTURE OF DIGITAL VIDEO BROADCASTING SYMBOL DEINTERLEAVER

A MULTIBANK MEMORY-BASED VLSI ARCHITECTURE OF DIGITAL VIDEO BROADCASTING SYMBOL DEINTERLEAVER A MULTIBANK MEMORY-BASED VLSI ARCHITECTURE OF DIGITAL VIDEO BROADCASTING SYMBOL DEINTERLEAVER D.HARI BABU 1, B.NEELIMA DEVI 2 1,2 Noble college of engineering and technology, Nadergul, Hyderabad, Abstract-

More information

Design of Hierarchical Crossconnect WDM Networks Employing a Two-Stage Multiplexing Scheme of Waveband and Wavelength

Design of Hierarchical Crossconnect WDM Networks Employing a Two-Stage Multiplexing Scheme of Waveband and Wavelength 166 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 20, NO. 1, JANUARY 2002 Design of Hierarchical Crossconnect WDM Networks Employing a Two-Stage Multiplexing Scheme of Waveband and Wavelength

More information

Configuration latency constraint and frame duration /13/$ IEEE. a) Configuration latency constraint

Configuration latency constraint and frame duration /13/$ IEEE. a) Configuration latency constraint An efficient on-chip configuration infrastructure for a flexible multi-asip turbo decoder architecture Vianney Lapotre, Michael Hübner, Guy Gogniat, Purushotham Murugappa, Amer Baghdadi and Jean-Philippe

More information

Efficient Majority Logic Fault Detector/Corrector Using Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes

Efficient Majority Logic Fault Detector/Corrector Using Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes Efficient Majority Logic Fault Detector/Corrector Using Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes 1 U.Rahila Begum, 2 V. Padmajothi 1 PG Student, 2 Assistant Professor 1 Department Of

More information

/$ IEEE

/$ IEEE IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 10, OCTOBER 2006 1147 Transactions Briefs Highly-Parallel Decoding Architectures for Convolutional Turbo Codes Zhiyong He,

More information

Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP

Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Presenter: Course: EEC 289Q: Reconfigurable Computing Course Instructor: Professor Soheil Ghiasi Outline Overview of M.I.T. Raw processor

More information

ENERGY-EFFICIENT VLSI REALIZATION OF BINARY64 DIVISION WITH REDUNDANT NUMBER SYSTEMS 1 AVANIGADDA. NAGA SANDHYA RANI

ENERGY-EFFICIENT VLSI REALIZATION OF BINARY64 DIVISION WITH REDUNDANT NUMBER SYSTEMS 1 AVANIGADDA. NAGA SANDHYA RANI ENERGY-EFFICIENT VLSI REALIZATION OF BINARY64 DIVISION WITH REDUNDANT NUMBER SYSTEMS 1 AVANIGADDA. NAGA SANDHYA RANI 2 BALA KRISHNA.KONDA M.Tech, Assistant Professor 1,2 Eluru College Of Engineering And

More information

A Modified DRR-Based Non-real-time Service Scheduling Scheme in Wireless Metropolitan Networks

A Modified DRR-Based Non-real-time Service Scheduling Scheme in Wireless Metropolitan Networks A Modified DRR-Based Non-real-time Service Scheduling Scheme in Wireless Metropolitan Networks Han-Sheng Chuang 1, Liang-Teh Lee 1 and Chen-Feng Wu 2 1 Department of Computer Science and Engineering, Tatung

More information

Multi-Megabit Channel Decoder

Multi-Megabit Channel Decoder MPSoC 13 July 15-19, 2013 Otsu, Japan Multi-Gigabit Channel Decoders Ten Years After Norbert Wehn wehn@eit.uni-kl.de Multi-Megabit Channel Decoder MPSoC 03 N. Wehn UMTS standard: 2 Mbit/s throughput requirements

More information

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VII /Issue 2 / OCT 2016

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VII /Issue 2 / OCT 2016 NEW VLSI ARCHITECTURE FOR EXPLOITING CARRY- SAVE ARITHMETIC USING VERILOG HDL B.Anusha 1 Ch.Ramesh 2 shivajeehul@gmail.com 1 chintala12271@rediffmail.com 2 1 PG Scholar, Dept of ECE, Ganapathy Engineering

More information

WITH the development of the semiconductor technology,

WITH the development of the semiconductor technology, Dual-Link Hierarchical Cluster-Based Interconnect Architecture for 3D Network on Chip Guang Sun, Yong Li, Yuanyuan Zhang, Shijun Lin, Li Su, Depeng Jin and Lieguang zeng Abstract Network on Chip (NoC)

More information

Enhanced Reconfigurable Viterbi Decoder with NoC for OFDM Block of a Wireless Standard

Enhanced Reconfigurable Viterbi Decoder with NoC for OFDM Block of a Wireless Standard Enhanced Reconfigurable Viterbi Decoder with NoC for OFDM Block of a Wireless Standard Dr. D. Ganeshkumar 1, C.R. Suganya devi 2, Dr. V. Parimala 3 1 Professor and Head, Department of Electronics and Communication

More information

Near Optimal Repair Rate Built-in Redundancy Analysis with Very Small Hardware Overhead

Near Optimal Repair Rate Built-in Redundancy Analysis with Very Small Hardware Overhead Near Optimal Repair Rate Built-in Redundancy Analysis with Very Small Hardware Overhead Woosung Lee, Keewon Cho, Jooyoung Kim, and Sungho Kang Department of Electrical & Electronic Engineering, Yonsei

More information

Grid Middleware for Realizing Autonomous Resource Sharing: Grid Service Platform

Grid Middleware for Realizing Autonomous Resource Sharing: Grid Service Platform Grid Middleware for Realizing Autonomous Resource Sharing: Grid Service Platform V Soichi Shigeta V Haruyasu Ueda V Nobutaka Imamura (Manuscript received April 19, 27) These days, many enterprises are

More information

A High-Speed FPGA Implementation of an RSD-Based ECC Processor

A High-Speed FPGA Implementation of an RSD-Based ECC Processor RESEARCH ARTICLE International Journal of Engineering and Techniques - Volume 4 Issue 1, Jan Feb 2018 A High-Speed FPGA Implementation of an RSD-Based ECC Processor 1 K Durga Prasad, 2 M.Suresh kumar 1

More information

Using FPGA for Computer Architecture/Organization Education

Using FPGA for Computer Architecture/Organization Education IEEE Computer Society Technical Committee on Computer Architecture Newsletter, Jun. 1996, pp.31 35 Using FPGA for Computer Architecture/Organization Education Yamin Li and Wanming Chu Computer Architecture

More information

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology Senthil Ganesh R & R. Kalaimathi 1 Assistant Professor, Electronics and Communication Engineering, Info Institute of Engineering,

More information

ISSN: [Garade* et al., 6(1): January, 2017] Impact Factor: 4.116

ISSN: [Garade* et al., 6(1): January, 2017] Impact Factor: 4.116 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY FULLY REUSED VLSI ARCHITECTURE OF DSRC ENCODERS USING SOLS TECHNIQUE Supriya Shivaji Garade*, Prof. P. R. Badadapure * Department

More information

ISSN Vol.05,Issue.09, September-2017, Pages:

ISSN Vol.05,Issue.09, September-2017, Pages: WWW.IJITECH.ORG ISSN 2321-8665 Vol.05,Issue.09, September-2017, Pages:1693-1697 AJJAM PUSHPA 1, C. H. RAMA MOHAN 2 1 PG Scholar, Dept of ECE(DECS), Shirdi Sai Institute of Science and Technology, Anantapuramu,

More information

PERFORMANCE ANALYSIS OF HIGH EFFICIENCY LOW DENSITY PARITY-CHECK CODE DECODER FOR LOW POWER APPLICATIONS

PERFORMANCE ANALYSIS OF HIGH EFFICIENCY LOW DENSITY PARITY-CHECK CODE DECODER FOR LOW POWER APPLICATIONS American Journal of Applied Sciences 11 (4): 558-563, 2014 ISSN: 1546-9239 2014 Science Publication doi:10.3844/ajassp.2014.558.563 Published Online 11 (4) 2014 (http://www.thescipub.com/ajas.toc) PERFORMANCE

More information

High performance, power-efficient DSPs based on the TI C64x

High performance, power-efficient DSPs based on the TI C64x High performance, power-efficient DSPs based on the TI C64x Sridhar Rajagopal, Joseph R. Cavallaro, Scott Rixner Rice University {sridhar,cavallar,rixner}@rice.edu RICE UNIVERSITY Recent (2003) Research

More information

Concurrent Testing with RF

Concurrent Testing with RF Concurrent Testing with RF Jeff Brenner Verigy US EK Tan Verigy Singapore go/semi March 2010 1 Introduction Integration of multiple functional cores can be accomplished through the development of either

More information

Optimization of Task Scheduling and Memory Partitioning for Multiprocessor System on Chip

Optimization of Task Scheduling and Memory Partitioning for Multiprocessor System on Chip Optimization of Task Scheduling and Memory Partitioning for Multiprocessor System on Chip 1 Mythili.R, 2 Mugilan.D 1 PG Student, Department of Electronics and Communication K S Rangasamy College Of Technology,

More information

ERROR correcting codes are used to increase the bandwidth

ERROR correcting codes are used to increase the bandwidth 404 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 3, MARCH 2002 A 690-mW 1-Gb/s 1024-b, Rate-1/2 Low-Density Parity-Check Code Decoder Andrew J. Blanksby and Chris J. Howland Abstract A 1024-b, rate-1/2,

More information

Massively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain

Massively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain Massively Parallel Computing on Silicon: SIMD Implementations V.M.. Brea Univ. of Santiago de Compostela Spain GOAL Give an overview on the state-of of-the- art of Digital on-chip CMOS SIMD Solutions,

More information

On combining chase-2 and sum-product algorithms for LDPC codes

On combining chase-2 and sum-product algorithms for LDPC codes University of Wollongong Research Online Faculty of Engineering and Information Sciences - Papers: Part A Faculty of Engineering and Information Sciences 2012 On combining chase-2 and sum-product algorithms

More information

Implementation of Reduce the Area- Power Efficient Fixed-Point LMS Adaptive Filter with Low Adaptation-Delay

Implementation of Reduce the Area- Power Efficient Fixed-Point LMS Adaptive Filter with Low Adaptation-Delay Implementation of Reduce the Area- Power Efficient Fixed-Point LMS Adaptive Filter with Low Adaptation-Delay A.Sakthivel 1, A.Lalithakumar 2, T.Kowsalya 3 PG Scholar [VLSI], Muthayammal Engineering College,

More information

Area And Power Efficient LMS Adaptive Filter With Low Adaptation Delay

Area And Power Efficient LMS Adaptive Filter With Low Adaptation Delay e-issn: 2349-9745 p-issn: 2393-8161 Scientific Journal Impact Factor (SJIF): 1.711 International Journal of Modern Trends in Engineering and Research www.ijmter.com Area And Power Efficient LMS Adaptive

More information

Design of Viterbi Decoder for Noisy Channel on FPGA Ms. M.B.Mulik, Prof. U.L.Bombale, Prof. P.C.Bhaskar

Design of Viterbi Decoder for Noisy Channel on FPGA Ms. M.B.Mulik, Prof. U.L.Bombale, Prof. P.C.Bhaskar International Journal of Scientific & Engineering Research Volume 2, Issue 6, June-2011 1 Design of Viterbi Decoder for Noisy Channel on FPGA Ms. M.B.Mulik, Prof. U.L.Bombale, Prof. P.C.Bhaskar Abstract

More information

Synthetic Benchmark Generator for the MOLEN Processor

Synthetic Benchmark Generator for the MOLEN Processor Synthetic Benchmark Generator for the MOLEN Processor Stephan Wong, Guanzhou Luo, and Sorin Cotofana Computer Engineering Laboratory, Electrical Engineering Department, Delft University of Technology,

More information

A Comparison of Two Algorithms Involving Montgomery Modular Multiplication

A Comparison of Two Algorithms Involving Montgomery Modular Multiplication ISSN (Online) : 2319-8753 ISSN (Print) : 2347-6710 International Journal of Innovative Research in Science, Engineering and Technology An ISO 3297: 2007 Certified Organization Volume 6, Special Issue 5,

More information

A High-Throughput Processor for Cryptographic Hash Functions

A High-Throughput Processor for Cryptographic Hash Functions A High-Throughput Processor for Cryptographic Hash Functions Yuanhong Huo and Dake Liu Beijing Institute of Technology, Beijing 100081, China Email: {hyh, dake@bit.edu.cn Abstract This paper presents a

More information

EFFICIENT FAILURE PROCESSING ARCHITECTURE IN REGULAR EXPRESSION PROCESSOR

EFFICIENT FAILURE PROCESSING ARCHITECTURE IN REGULAR EXPRESSION PROCESSOR EFFICIENT FAILURE PROCESSING ARCHITECTURE IN REGULAR EXPRESSION PROCESSOR ABSTRACT SangKyun Yun Department of Computer and Telecom. Engineering, Yonsei University, Wonju, Korea skyun@yonsei.ac.kr Regular

More information

Analysis of Different Multiplication Algorithms & FPGA Implementation

Analysis of Different Multiplication Algorithms & FPGA Implementation IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 4, Issue 2, Ver. I (Mar-Apr. 2014), PP 29-35 e-issn: 2319 4200, p-issn No. : 2319 4197 Analysis of Different Multiplication Algorithms & FPGA

More information

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST SAKTHIVEL Assistant Professor, Department of ECE, Coimbatore Institute of Engineering and Technology Abstract- FPGA is

More information