Energy-efficient Reconfigurable FEC Processor for Multi-standard Wireless Communication Systems
|
|
- Spencer Pope
- 5 years ago
- Views:
Transcription
1 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.17, NO.3, JUNE, 2017 ISSN(Print) ISSN(Online) Energy-efficient Reconfigurable FEC Processor for Multi-standard Wireless Communication Systems Meng Li 1, Liesbet Van der Perre 2, Wim van Thillo 1, and Youngjoo Lee 3,* Abstract In this paper, we describe HW/SW cooptimizations for reconfigurable application specific instruction-set processors (ASIPs). Based on our previous very long instruction word (VLIW) ASIP, the proposed framework realizes various forward error-correction (FEC) algorithms for wireless communication systems. In order to enhance the energy efficiency, we newly introduce several design methodologies including high-radix algorithms, tasklevel out-of-order executions, and intensive resource allocations with loop-level rescheduling. The case study on the radix-4 turbo decoding shows that the proposed techniques improve the energy efficiency by 3.7 times compared to the previous architecture. Index Terms Digital integrated circuits, error correction codes, programmable circuits, wireless communication Manuscript received Mar. 21, 2016; accepted Dec. 14, Interuniversity Microelectronics Center (IMEC) vzw, 3001 Leuven, Belgium 2 Department of Electrical Engineering, KU Leuven, 3001 Leuven, Belgium 3 Department of Electrical Engineering, POSTECH, 37673, Pohang, Korea yjlee.ims@gmail.com I. INTRODUCTION In last decades, numerous communication standards have been continuously developed to improve the connectivity of mobile devices. Basically, recent specifications are requested to satisfy the severe demands on data rate, reliability, and bandwidth efficiency. In order to increase the data integrity, iterative forward error-correction (FEC) codes have been widely accepted because of their powerful error-correcting capabilities [1-3]. Due to the different parameters from FEC standards, it is quite challenging to design highly-optimized decoder ASICs while beating the tough time-to-market (TTM) requirements [4-6]. The previous processor-based solutions may provide flexibilities for reducing the TTM, however they normally use much more hardware resources than the fixed ASICs, resulting the power hungry realizations [7-11]. To provide the flexible decoder architecture achieving an acceptable energy-efficiency, this paper presents novel design frameworks for the FEC application specific instruction-set processors (ASIPs). In contrast to the previous multi-standard approaches developing the unified hardware resources among different FEC specifications [12, 13], the proposed design procedures consider co-optimizations between the hardware architecture and software kernels. More precisely, we propose novel methodologies in algorithm, architecture, and firmware levels based on our previous flexible ASIP [8]. In the proposed method, we first investigate hardware-friendly FEC decoding algorithms. By relaxing severe congestions on register-files (RFs), the proposed high-level instructions allow the task-level out-of-order execution, which reduces the number of operating cycles. Considering the long-latency memory requests inside of a loop, the proposed loop-level rescheduling enhances the decoding throughput further by changing the order of instructions for eliminating the waiting cycles. To show the impacts of the proposed design methods, the optimized radix-4 LTE turbo decoder on the FlexFec is implemented as a prototype in a 40nm low-power (LP) CMOS process. Compared to the previous non-optimized
2 334 MENG LI et al : ENERGY-EFFICIENT RECONFIGURABLE FEC PROCESSOR FOR MULTI-STANDARD WIRELESS Flipr core 1 2 RF 3 VM1 VM2 Xbar ALU1 ALU2 Reconfigurable AGU Background Memory Scalar ALU architecture, the prototype improves the area and energy efficiencies by 3.3 and 3.7 times, respectively. The rest of this paper is organized as follows. Section II depicts the backgrounds of this work. Section III presents our design frameworks. A case study on the radix-4 turbo decoder is described and compared to the previous works in Section IV. The conclusions are made in Section V. II. BACKGROUNDS Scalar RF Scalar Fetch Scalar In this section, we describe our previous FlexFec platform, which changes its resource configurations during the design time [8]. Fig. 1 shows a block diagram of the FlexFec including multiple processing units associated with two RFs, a multi-step crossbar network (Xbar) controlled by the flexible address generation unit (AGU), and high-speed host interfaces. The decoding process is performed by the flipr core, which is an energy-efficient VLIW processor connected to a number of on-chip SRAMs denoted as gray-colored blocks. Note that the operation is programmable by initializing the proper kernels to program memory (PM), data memory (DM), and AGU. The received code is firstly moved into the background memories, which are denoted as BM1 and BM2, realizing the double-buffering scheme. In the flipr core, one scalar and five 96-way vector operations are processed simultaneously. Two on-chip memories, VM1 and VM2, are reserved for storing intermediate data with single-cycle instructions. However, the BM cannot be accessed in a cycle as the multi-step Xbar is inevitable in decoding of iterative FEC codes [4-8]. Based on the generalized instruction-set architecture (ISA), the FlexFec can support arbitrary LDPC, turbo PM BM2 BM1 Host interface Fig. 1. The reconfigurable FlexFec ASIP architecture [8]. DM and Viterbi codes, which are the most popular FEC codes. However, it is limited to increase the throughput of Flexfec by introducing more parallel operations due to the severe writing congestions on the vector RF, as the software kernels are described by using single-cycle vector instructions. If the BM through the Xbar is frequently accessed, moreover, a number of waiting cycles are used for the following instructions having data dependencies. Hence, the decoding throughput of the previous ASIP is limited by the large portion of nooperation (NOP) instructions. For a high-throughput decoder, in general, the ASIP-based solutions use multiple cores, increasing the decoding energy significantly [9-11]. As high-throughput energy-efficient flexible decoders are strongly recommended for the future wireless systems, it is urgent to develop an advanced design framework that enhances the decoding throughput without increasing the energy consumption of each ASIP core. III. PROPOSED OPTIMIZATIONS Before defining the FEC ASIP architecture, it is necessary to select the proper high-throughput algorithms, which can be realized by simple hardware resources. Numerous researches have revealed attractive solutions for the multi-standard FEC decoders [6-8, 12, 13]. Simplified FEC algorithms such as the min-sum LDPC decoding and the max-log-map turbo decoding are actively used as they are conceptually based on the similar max (or min) operations [4, 5]. Parallel structure for layered LDPC decoders and sliding-window-based turbo decoders are combined into a flexible structure by sharing the same SRAM buffers [6]. Similar to the dedicated ASIC-based decoders, in addition, high-radix decoding algorithms are also adopted to the recent ASIPbased flexible FEC decoders [10, 13]. Although highradix algorithms are effective in reducing the size of onchip memories, however, the throughput of each ASIP core cannot be enhanced drastically due to the increased number of cycles for the complex computations. In order to increase the decoding throughput of the flexible ASIP, we present several software-level optimizations that actually shorten the processing time of high-radix decoding algorithms.
3 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.17, NO.3, JUNE, cycles ALU1 T1 T1 T1 T1 T1 T1 T1 T1 T2 T2 T2 T2 T2 T2 T2 T2 T3 T3 T3 T3 T3 T4 T4 T4 T4 (a) Multi-cycle high-level instruction 20 cycles ALU1 T1 T1 T1 T1 T1 T1 T1 T1 T3 T3 T3 T3 T3 T4 T4 T4 T4 Multi-cycle high-level instruction ALU2 T2 T2 T2 T2 T2 T2 T2 T2 (b) Fig. 2. The processing sequences having data dependencies based on (a) the conventional in-order execution, (b) the proposed tasklevel out-of-order execution. 1. Task-level Out-of-order Execution Conventionally, the ISA for flexible ASIPs contains simple instructions for basic vector operations that can be completed in one cycle. As illustrated in Fig. 2(a), let several independent tasks be serially issued to an ALU. In the figure, the shaded circle denoted as Tx represents a single-cycle instruction of the x-th task, which produces a writing request on the vector RF. In a task, the dependencies between two instructions are denoted as dotted arrows. There might be dependencies between two tasks, which are represented as solid arrows in Fig. 2(a). Note that a task consists of single-cycle instructions related to each other, causing the continuous writing accesses. Even though multiple issues of tasks are possible by utilizing additional ALUs, the overall computing time is still limited by the limited bandwidth of RFs for reading the operands and storing the intermediate results. To solve the severe writing congestions, we define a new ISA by using the dedicated high-level instructions. Conceptually, a high-level instruction is a multi- cycle instruction, which is composed of several arithmetic vector operations. The demands on the RFs are naturally alleviated as the highlevel instructions reduce the number of writing requests. Therefore, it is possible to allocate other RF-writing instructions by using the non-rf-writing cycles of the high-level instruction. Due to the serialized dependencies inside of a task, however, it is hard to collect the available instructions for non-rf-writing cycles. In our work, the task- level out-of-order execution is proposed for the parallel issue of following tasks, which are independent of the current task. As depicted in Fig. 2(b), for example, the first task includes a high-level instruction whose internal cycles are represented as squares, where the shaded node only makes requests on the RF. The instructions in the next independent task, i.e., T2, can be performed earlier at the second ALU by accessing the RF without any congestion. As a result, the processing cycles can be shortened by using the tasklevel out-of-order execution. 2. Multi-level ALU Architecture In general, the previous pipelined ASIPs process all the instructions sequentially by using the generalized data-path [7]. Although the generalized data-path provides the maximum level of flexibility, it requires a number of operating cycles due to the in-order processing. To support the proposed task-level out-oforder execution effectively, as shown in Fig. 3, we introduce the n-level ALU, where each level performs the pre- defined vector operation at the corresponding processing unit (PUx). Note that the vector instruction is issued from the first level, i.e., PU1. In every cycle, each PU transfers its instruction to the next PU with the proper intermediate results, until the instruction reaches the last cycle defined by ISA. Controlled by the wide multiplexor, the vector RF is accessed once in a cycle by selecting the corresponding level whose instruction is completed. Note that it is unnecessary to employ the RF-writing paths for every level. According to the new high-level instruction, the workloads of each PU have to be carefully distributed for restricting the number of RF-writing paths,
4 336 MENG LI et al : ENERGY-EFFICIENT RECONFIGURABLE FEC PROCESSOR FOR MULTI-STANDARD WIRELESS Issued instruction PU1 preserving the original critical delay as much as possible. To reduce the complexity, in addition, the first level of the ALU can be combined with the previous ALU that performs a simple operation in one cycle. Instead of using the additional ALU for parallel processing in Fig. 2(b), therefore, the out-of-order execution can be naturally implemented in a single multi-level ALU as shown in Fig. 4. While the multi-cycle instruction is performed by changing the level, the independent second task can be issued to the first PU of the proposed ALU. In summary, the multi-level architecture successfully supports the proposed task-level out-of-order execution, leading to the significant reduction in terms of the processing cycles as well as the complexity. As the size of the PM increases by using the additional ALUs in the VLIW architecture, in addition, the proposed ALU also achieves the memory-reduced ASIP. 3. Loop-level Rescheduling PU2 RF Multiplexor PUn Level-1 Level-2 Level-n Fig. 3. The proposed multi-level vector ALU architecture. Fig. 4. The task-level out-of-order execution using the proposed multi-level vector ALU. In the multi-standard FEC solutions, the flexible multistep crossbar is normally used for supporting various interleaving patterns [7, 8, 11]. Hence, the previous software suffers from the long latency of reading a codeword. For example, the previous FlexFec uses 5 cycles for accessing the BM [8]. In the proposed framework, we focus on that the iterative decoding process normally has numerous loops for performing the identical tasks in each bit position. Fig. 5(a) illustrates the conventional processing scenario of the loop operation using long-latency load instructions. For the sake of simplicity, the single-cycle arithmetic vector instructions are denoted as Ax without constructing tasks. The 5-cycle load operation is denoted as triangular nodes, where the shaded node accesses the vector RF to store the reading-out codeword. Similar to the previous figures, the dotted arrows show the data dependencies inside of the loop. Note that the following operations have to wait the completion of the load instruction due to the dependency, although the dedicated unit activates the memory accesses in parallel. If there are multiple loads in the loop, moreover, the nonworking waiting time is increased significantly, causing the severe throughput degradations. In the proposed method, we reorganize the processing order inside of the loop as shown in Fig. 5(b). Based on the proposed rescheduling, the load operation reads the memory for the next iteration of the loop, and the arithmetic operations of the current iteration no longer suffer from the time-consuming memory accesses. More precisely, the instruction denoted as A4 becomes the first operation of the loop in the proposed method, and the required loading operation for A4 is performed at the end of the loop to prepare the next iteration. As the proposed rescheduling is conceptually similar to the software pipelining technique [14], there are some additional cycles for the prologue and the epilogue of the first and the last iterations, respectively. By eliminating the waiting time, the processing cycles in a loop is reduced from 15 to 10 as exemplified in Fig. 5(b). In other word, the hardware utilization is maximized by the proposed rescheduling, reducing the processing time remarkably to achieve an energy-efficient FEC ASIP solution. IV. IMPLEMENTATION RESULTS To improve the throughput as well as the energy efficiency, we design a prototype ASIP-based flexible FEC decoder based on the proposed design methods. In this section, the radix-4 LTE turbo decoding on our ASIP is detailed as a case study. Based on the same design concepts, the radix-4 LDPC and Viterbi decoders are also implemented on the prototype ASIP. For the algorithm-level improvement of a turbo
5 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.17, NO.3, JUNE, cycle load instruction 15 cycles in a loop 5-cycle load instruction ALU A1 A2 A3 A4 A5 A6 A7 A8 (a) 10 cycles for prologue 10 cycles in a loop 5-cycle load instruction 5-cycle load instruction 5-cycle load instruction 5-cycle load instruction 5 cycles for epilogue ALU A1 A2 A3 A4 A5 A6 A7 A8 A1 A2 A3 A4 A5 A6 A7 A8 (b) Fig. 5. (a) The conventional loop processing with long-latency loads, (b) the processing sequence using the proposed loop-level rescheduling. Table 1. High-level instructions for Radix-4 Turbo Decoding Processing latency 2 cycles Branch metric Metric calculation Forward recursion Reliability generation Backward recursion Reliability generation Output generation Reliability generation 3 cycles ACS tree ACS tree ACS tree 4 cycles 5 cycles Codword collection activation activation ACS: Addition, comparison and selection LLR: Log-likelihood ratio Extrinsic value calculation activation LLR calculation activation decoder, we first select the radix-4 decoding algorithm, which is accepted at the recent decoder ASICs [4, 6]. Without changing the original ASIP architecture, the firmware is re-designed for the radix-4 processing at this level. To shorten the decoding time further, we split the firmware into several tasks and define the high-level instructions in four categories, i.e., branch metric calculations, forward/ backward recursions, and output generations [4]. In this case study, as shown in Table 1, 11 high-level instructions are newly introduced by taking up to 5 processing cycles. Note that all the high-level instructions are basically multi-cycle instructions. Therefore, as shown in the previous section, the multilevel ALU supporting the task-level out-of-execution can relax the intensive writing requests on the vector RF, leading to the energy-efficient decoding operations. Fig. 6 illustrates the processing steps of backward recursion in turbo decoding algorithm based on the proposed highlevel instructions. Compared to the conventional serialized operations, it is noticeable that the proposed task-level out-of-order technique successfully reduces the latency of backward recursion by 20%. Number of processing cycles 6x10 4 5x10 4 4x10 4 3x10 4 2x Without optimizations Number of processing cycles Radix-4 decoding algorithm Task-level out-of-order execution Decoding throughput Loop-level optimization Fig. 6. Cycle reductions and throughput improvements. Due to the native iterations in turbo decoding process, the loop-level rescheduling is additionally performed on forward and backward recursions, which are associated with long-latency load operations as depicted in Fig. 6. A number of instructions can be processed in parallel with the load by breaking the data dependencies inside of the loop, minimizing the overall processing cycles. Fig. 7 depicts how the proposed optimizations reduce the number of cycles for decoding a 6144b LTE turbo code by using the prototype ASIPs. The maximum number of turbo iterations is equally set to six. By reducing the processing cycles in each design step, the proposed schemes shorten the total number of required cycles by 6.35 times compared to the radix-2 turbo decoding on the previous FlexFec ASIP [8]. To investigate the impacts of the proposed work, all the architectures are designed at the speed of 450 MHz in a 40 nm CMOS process. As shown in Fig. 7, the Decoding throughput (Mb/s)
6 338 MENG LI et al : ENERGY-EFFICIENT RECONFIGURABLE FEC PROCESSOR FOR MULTI-STANDARD WIRELESS Energy efficiency (nj/b) Without optimizations Radix-4 decoding algorithm Task-level out-of-order execution Multi-level ALU architecture Loop-level rescheduling Fig. 7. Backward recursion example in radix-4 turbo decoding on the proposed flexible processor. throughput is gradually increased by using the proposed schemes. Note that the fully-optimized ASIP-based turbo decoder achieves a throughput of 315 Mb/s, which can support up to the category-5 of LTE systems [1]. Fig. 8 shows how the proposed design methods improve the area and energy efficiencies of turbo decoders, where the efficiencies are defined as follows: Area efficiency (μm 2 s/b) 2 Area (mm ) = (1) Throughput (Mb/s) Decoding power (mw) Energy efficiency (nj/b) = Throughput (Mb/s) By applying the radix-4 algorithm, as shown in Fig. 8, the decoder becomes cost-effective in terms of area and energy consumption. As the straight-forward radix-4 firmware cannot meet the high-throughput demands as shown in Fig. 7, the task-level out-of-order execution enhances the throughput by utilizing hardware resources in parallel. The multi-level ALU compensates area overheads by reducing the PM size and the loop-level optimization finally improves the energy efficiency by changing the order of operations in a loop. As a result, the proposed work reduces the area and energy efficiencies for LTE turbo decoding by 3.3 and 3.7 times, respectively. The implementation results of the prototype FEC ASIP are summarized and compared to the previous works in Table 2. For the fair comparison, we normalize all the efficiencies to 40 nm CMOS whose reference voltage is 0.9 V. In addition, the maximum number of iterations for turbo and LDPC decoding scenarios are set to six and ten, (2) Area efficiency (μm 2 s/b) Fig. 8. Area and energy efficiencies. Table 2. Implementation results of ASIP-based FEC decoders Turbo LDPC Viterbi This work [7] [8] [9] [10] Process (nm) Voltage (V) 0.9 N. A N. A. Area (mm 2 ) Frequency (MHz) Standards Throughput (Mb/s) Norm. area eff. (μm2 s/b) Norm. energy eff. (nj/b) Throughput (Mb/s) Norm. area eff. (μm2 s/b) Norm. energy eff. (nj/b) Throughput (Mb/s) Norm. area eff. (μm2 s/b) Norm. energy eff. (nj/b) LTE WiFi WiMAX LTE LTE WiFi WiFi WiMAX WiMAX N. A. LTE WiFi WiMAX N. A N. A N. A N. A N. A. Norm. area eff. = (40/Process) 2 Area efficiency Norm. energy eff. = (40/Process) (0.9/Voltage) 2 Energy efficiency respectively. Note the proposed ASIP can support arbitrary LDPC, turbo and Viterbi codes. According to the proposed novel optimizations, the prototype ASIP offers an attractive multi-standard FEC decoder. In case of the turbo decoding for LTE systems, for example, the proposed ASIP-based work achieves the highest decoding throughput among the existing ASIP-based turbo decoders, while providing the similar area and
7 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.17, NO.3, JUNE, energy efficiencies. VI. CONCLUSION In this paper, we have presented several design schemes to enhance the energy efficiency of the multistandard FEC ASIP. By introducing the advanced methods on algorithm, software firmware and hardware structure, the proposed work reduces the number of processing cycles. The case study on the radix-4 turbo decoding shows that proposed framework achieves a sufficient decoding throughput for the recent wireless systems, while lowering the area and energy efficiencies remarkably. ACKNOWLEDGMENTS This work was supported by the National Research Foundation (NRF) of Korea grants funded by the Korea government (MSIP) (2016R1C1B ). REFERENCES [1] Multiplexing and Channel Coding, 3GPP TS , Rev , Jun [2] IEEE Standard for Local and metropolitan area networks, Part 16: Air Interface for Fixed Broadband Wireless Access Systems, IEEE Std e-2005, [3] IEEE Standard for Information Technology Telecommunications and Information Exchange between Local and Metropolitan Area Networks Specific Requirements Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications, IEEE Std n- 2009, [4] W. Byun, H. Kim, and J.-H. Kim, High throughput radix-4 SISO decoding architecture with reduced memory requirement, J. Semicon. Technol. Sci., vol. 14, no. 4, pp , Aug [5] Y.-M. Jung, C.-H. Chung, Y.-H. Jung, and J.-S. Kim, 7.7 Gbps encoder design for IEEE ac QC-LDPC Codes, J. Semicond. Technol. Sci., vol. 14, no. 4, pp , Aug [6] C. Condo, M. Martina, and G. Masera, VLSI implementation of a multi-mode turbo/ldpc decoder architecture, IEEE Trans. Circuits Syst. I, Reg. Paper, vol. 60, no. 6, pp , June [7] Z. Wu and D. Liu, Flexible multistandard FEC processor design with ASIP methodology, in Proc. IEEE Int. Conf. Application-specific Systems, Architectures and Processors (ASAP), 2014, pp [8] F. Naessens et al., A mm mw reconfigurable LDPC and turbo encoder and decoder for n, e and 3GPP-LTE, in Proc. IEEE Symp. VLSI Circuits, 2010, pp [9] B. Noethen et al., A 105GOPS 36mm 2 heterogeneous SDR MPSoC with energy-aware dynamic scheduling and iterative detectiondecoding for 4G in 65nm CMOS, in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), 2014, pp [10] P. Murugapp, R. Al-Khayat, A. Baghdadi, and M. Jezequel, A flexible high throughput multi-asip architecture for LDPC and turbo decoding, in Proc. Design, Automation Test in Europe Conf. Exhib. (DATE), 2012, pp [11] Z. Wu, D. Liu, Z. Yang, Q. Wang, and W. Zhou, FPGA implementation of a multi-algorithm parallel FEC for SDR platforms, in Proc. IEEE Int. Conf. Field Programmable Logic and Applications (FPL), 2014, pp [12] S. Kunze, E. Matus, G. Fettweis, and T. Kobori, Combining LDPC, turbo and Viterbi decoders: Benefits and cost, in Proc. Int. Workshop on Signal Process. Syst. (SiPS), 2011, pp [13] J. Dion, M. Hamon, P. Penard, M. Arzel and M. Jezequel, Multi-standard trellis-based FEC decoder, in Proc. IEEE Conf. Design and Architectures for Signal and Image Processing (DASIP), 2012, pp [14] J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach, San Mateo, CA, USA: Morgan Kaufmann, 2011.
8 340 MENG LI et al : ENERGY-EFFICIENT RECONFIGURABLE FEC PROCESSOR FOR MULTI-STANDARD WIRELESS Meng Li received a PH.D. degree from Telecom Bretagne in electrical engineering, France in She joined the Green Radio Group of IMEC in 2012 and now she is a senior research engineer. Her research interests cover high speed and low power digital circuit design for essential components in wireless baseband, especially with the design of decoder for error control codes. Liesbet Van der Perre received the M.Sc. degree in electrical engineering from the KU Leuven, Belgium in The research for her thesis was completed at the Ecole Nationale Superieure de Telecommunications in Paris. She graduated summa cu laude with a Ph.D. degree in electrical engineering from the same university in After finishing her Ph.D. on propagation modeling at the Telecommunications group of the KU Leuven, Belgium, Dr. Van der Perre joined IMEC in 1997 in the wireless group. She took up responsibilities as system architect, project leader and program manager, scientific and program director with a focus on energy efficiency in broadband communication till Currently, she is a professor of electrical engineering department of the KU Leuven. She s an author and co-author of over 300 scientific publications. She was appointed honorary doctor at the faculty of engineering LTH, Lund University, in Wim Van Thillo received his master degree in electrical engineering and his undergraduate degree in business economics from the Katholieke Universiteit Leuven, Belgium. He obtained a PhD degree from the same university based on his research in IMEC s wireless communications group. In 2008, he was a visiting researcher at UC Berkeley s Connectivity Lab. From 2012 to 2014 he led IMEC s 79 GHz radar research program. Since January 2015 Wim is responsible for IMEC s R&D in cellular and WiFi transceivers, 60 GHz communications, 79 GHz radar and 140 GHz sensors. Youngjoo Lee received the B.S., M.S. and Ph.D. degrees in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea, in 2008, 2010 and 2014, respectively. Since February 2017, he has been an Assistant Professor in the department of Electrical Engineering, POSTECH, Pohang, Korea. Prior to joining POSTECH, he was with Interuniversity Microelectronics Center (IMEC), Leuven, Belgium, from May 2014 to February 2015, where he researched reconfigurable SoC platforms for software-defined radio systems. From March 2015 to February 2017, he was with the Faculty of the Department Electronic Engineering, Kwangwoon University, Seoul, Korea. His current research interests include the algorithms and architectures for embedded processors, intelligent transportation systems, advanced error-correction codes, and mixed-signal circuit designs.
THE turbo code is one of the most attractive forward error
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 63, NO. 2, FEBRUARY 2016 211 Memory-Reduced Turbo Decoding Architecture Using NII Metric Compression Youngjoo Lee, Member, IEEE, Meng
More information/$ IEEE
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 56, NO. 1, JANUARY 2009 81 Bit-Level Extrinsic Information Exchange Method for Double-Binary Turbo Codes Ji-Hoon Kim, Student Member,
More informationISSCC 2003 / SESSION 8 / COMMUNICATIONS SIGNAL PROCESSING / PAPER 8.7
ISSCC 2003 / SESSION 8 / COMMUNICATIONS SIGNAL PROCESSING / PAPER 8.7 8.7 A Programmable Turbo Decoder for Multiple 3G Wireless Standards Myoung-Cheol Shin, In-Cheol Park KAIST, Daejeon, Republic of Korea
More informationHigh Throughput Radix-4 SISO Decoding Architecture with Reduced Memory Requirement
JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.14, NO.4, AUGUST, 2014 http://dx.doi.org/10.5573/jsts.2014.14.4.407 High Throughput Radix-4 SISO Decoding Architecture with Reduced Memory Requirement
More informationMemory-Reduced Turbo Decoding Architecture Using NII Metric Compression
Memory-Reduced Turbo Decoding Architecture Using NII Metric Compression Syed kareem saheb, Research scholar, Dept. of ECE, ANU, GUNTUR,A.P, INDIA. E-mail:sd_kareem@yahoo.com A. Srihitha PG student dept.
More informationBER Guaranteed Optimization and Implementation of Parallel Turbo Decoding on GPU
2013 8th International Conference on Communications and Networking in China (CHINACOM) BER Guaranteed Optimization and Implementation of Parallel Turbo Decoding on GPU Xiang Chen 1,2, Ji Zhu, Ziyu Wen,
More informationA Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors
A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors Brent Bohnenstiehl and Bevan Baas Department of Electrical and Computer Engineering University of California, Davis {bvbohnen,
More informationStopping-free dynamic configuration of a multi-asip turbo decoder
2013 16th Euromicro Conference on Digital System Design Stopping-free dynamic configuration of a multi-asip turbo decoder Vianney Lapotre, Purushotham Murugappa, Guy Gogniat, Amer Baghdadi, Michael Hübner
More informationRECENTLY, researches on gigabit wireless personal area
146 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 55, NO. 2, FEBRUARY 2008 An Indexed-Scaling Pipelined FFT Processor for OFDM-Based WPAN Applications Yuan Chen, Student Member, IEEE,
More informationA Scalable Multi-Core ASIP Virtual Platform For Standard-Compliant Trellis Decoding
A Scalable Multi-Core ASIP Virtual Platform For Standard-Compliant Trellis Decoding Matthias Jung, Christian Brehm, Norbert Wehn Microelectronic Systems Design Research Group University of Kaiserslautern,
More informationHigh Speed Downlink Packet Access efficient turbo decoder architecture: 3GPP Advanced Turbo Decoder
I J C T A, 9(24), 2016, pp. 291-298 International Science Press High Speed Downlink Packet Access efficient turbo decoder architecture: 3GPP Advanced Turbo Decoder Parvathy M.*, Ganesan R.*,** and Tefera
More informationLinköping University Post Print. epuma: a novel embedded parallel DSP platform for predictable computing
Linköping University Post Print epuma: a novel embedded parallel DSP platform for predictable computing Jian Wang, Joar Sohl, Olof Kraigher and Dake Liu N.B.: When citing this work, cite the original article.
More informationASIP LDPC DESIGN FOR AD AND AC
ASIP LDPC DESIGN FOR 802.11AD AND 802.11AC MENG LI CSI DEPARTMENT 3/NOV/2014 GDR-ISIS @ TELECOM BRETAGNE BREST FRANCE OUTLINES 1. Introduction of IMEC and CSI department 2. ASIP design flow 3. Template
More informationTHE orthogonal frequency-division multiplex (OFDM)
26 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 57, NO. 1, JANUARY 2010 A Generalized Mixed-Radix Algorithm for Memory-Based FFT Processors Chen-Fong Hsiao, Yuan Chen, Member, IEEE,
More informationA 256-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology
http://dx.doi.org/10.5573/jsts.014.14.6.760 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.14, NO.6, DECEMBER, 014 A 56-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology Sung-Joon Lee
More informationA scalable, fixed-shuffling, parallel FFT butterfly processing architecture for SDR environment
LETTER IEICE Electronics Express, Vol.11, No.2, 1 9 A scalable, fixed-shuffling, parallel FFT butterfly processing architecture for SDR environment Ting Chen a), Hengzhu Liu, and Botao Zhang College of
More informationFlexible wireless communication architectures
Flexible wireless communication architectures Sridhar Rajagopal Department of Electrical and Computer Engineering Rice University, Houston TX Faculty Candidate Seminar Southern Methodist University April
More informationEFFICIENT RECURSIVE IMPLEMENTATION OF A QUADRATIC PERMUTATION POLYNOMIAL INTERLEAVER FOR LONG TERM EVOLUTION SYSTEMS
Rev. Roum. Sci. Techn. Électrotechn. et Énerg. Vol. 61, 1, pp. 53 57, Bucarest, 016 Électronique et transmission de l information EFFICIENT RECURSIVE IMPLEMENTATION OF A QUADRATIC PERMUTATION POLYNOMIAL
More informationGated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver
Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver E.Kanniga 1, N. Imocha Singh 2,K.Selva Rama Rathnam 3 Professor Department of Electronics and Telecommunication, Bharath
More informationDesign of Low-Power and Low-Latency 256-Radix Crossbar Switch Using Hyper-X Network Topology
JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.15, NO.1, FEBRUARY, 2015 http://dx.doi.org/10.5573/jsts.2015.15.1.077 Design of Low-Power and Low-Latency 256-Radix Crossbar Switch Using Hyper-X Network
More informationFAST FOURIER TRANSFORM (FFT) and inverse fast
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 11, NOVEMBER 2004 2005 A Dynamic Scaling FFT Processor for DVB-T Applications Yu-Wei Lin, Hsuan-Yu Liu, and Chen-Yi Lee Abstract This paper presents an
More informationA New List Decoding Algorithm for Short-Length TBCCs With CRC
Received May 15, 2018, accepted June 11, 2018, date of publication June 14, 2018, date of current version July 12, 2018. Digital Object Identifier 10.1109/ACCESS.2018.2847348 A New List Decoding Algorithm
More informationLow Complexity Architecture for Max* Operator of Log-MAP Turbo Decoder
International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Low
More informationA VLSI Architecture for H.264/AVC Variable Block Size Motion Estimation
Journal of Automation and Control Engineering Vol. 3, No. 1, February 20 A VLSI Architecture for H.264/AVC Variable Block Size Motion Estimation Dam. Minh Tung and Tran. Le Thang Dong Center of Electrical
More informationTwiddle Factor Transformation for Pipelined FFT Processing
Twiddle Factor Transformation for Pipelined FFT Processing In-Cheol Park, WonHee Son, and Ji-Hoon Kim School of EECS, Korea Advanced Institute of Science and Technology, Daejeon, Korea icpark@ee.kaist.ac.kr,
More informationReal-time and smooth scalable video streaming system with bitstream extractor intellectual property implementation
LETTER IEICE Electronics Express, Vol.11, No.5, 1 6 Real-time and smooth scalable video streaming system with bitstream extractor intellectual property implementation Liang-Hung Wang 1a), Yi-Mao Hsiao
More informationA CORDIC Algorithm with Improved Rotation Strategy for Embedded Applications
A CORDIC Algorithm with Improved Rotation Strategy for Embedded Applications Kui-Ting Chen Research Center of Information, Production and Systems, Waseda University, Fukuoka, Japan Email: nore@aoni.waseda.jp
More informationThe Serial Commutator FFT
The Serial Commutator FFT Mario Garrido Gálvez, Shen-Jui Huang, Sau-Gee Chen and Oscar Gustafsson Journal Article N.B.: When citing this work, cite the original article. 2016 IEEE. Personal use of this
More informationHigh Performance Memory Read Using Cross-Coupled Pull-up Circuitry
High Performance Memory Read Using Cross-Coupled Pull-up Circuitry Katie Blomster and José G. Delgado-Frias School of Electrical Engineering and Computer Science Washington State University Pullman, WA
More informationAn Area-Efficient BIRA With 1-D Spare Segments
206 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 26, NO. 1, JANUARY 2018 An Area-Efficient BIRA With 1-D Spare Segments Donghyun Kim, Hayoung Lee, and Sungho Kang Abstract The
More informationA Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on
A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on on-chip Donghyun Kim, Kangmin Lee, Se-joong Lee and Hoi-Jun Yoo Semiconductor System Laboratory, Dept. of EECS, Korea Advanced
More informationLLR-based Successive-Cancellation List Decoder for Polar Codes with Multi-bit Decision
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLIC HERE TO EDIT < LLR-based Successive-Cancellation List Decoder for Polar Codes with Multi-bit Decision Bo Yuan and eshab. Parhi, Fellow,
More informationDUE to the high computational complexity and real-time
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 3, MARCH 2005 445 A Memory-Efficient Realization of Cyclic Convolution and Its Application to Discrete Cosine Transform Hun-Chen
More informationProgrammable Turbo Decoder Supporting Multiple Third-Generation Wireless Standards
Programmable Turbo Decoder Supporting Multiple Third-eneration Wireless Standards Myoung-Cheol Shin and In-Cheol Park Department of Electrical Engineering and Computer Science, KAIST Yuseong-gu Daejeon,
More informationThree DIMENSIONAL-CHIPS
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) ISSN: 2278-2834, ISBN: 2278-8735. Volume 3, Issue 4 (Sep-Oct. 2012), PP 22-27 Three DIMENSIONAL-CHIPS 1 Kumar.Keshamoni, 2 Mr. M. Harikrishna
More informationDelay efficient mux approach for finding the first two minimum values
Delay efficient mux approach for finding the first two minimum values Nakka Sivaraju¹ S Suman² ¹PG scholar, ECE Department, CEC, AP, India ²Assistant Professor, ECE Department, CEC, AP, India Abstract
More informationAdding C Programmability to Data Path Design
Adding C Programmability to Data Path Design Gert Goossens Sr. Director R&D, Synopsys May 6, 2015 1 Smart Products Drive SoC Developments Feature-Rich Multi-Sensing Multi-Output Wirelessly Connected Always-On
More informationEFFICIENT PARALLEL MEMORY ORGANIZATION FOR TURBO DECODERS
In Proceedings of the European Signal Processing Conference, pages 831-83, Poznan, Poland, September 27. EFFICIENT PARALLEL MEMORY ORGANIZATION FOR TURBO DECODERS Perttu Salmela, Ruirui Gu*, Shuvra S.
More informationINTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017
Design of Low Power Adder in ALU Using Flexible Charge Recycling Dynamic Circuit Pallavi Mamidala 1 K. Anil kumar 2 mamidalapallavi@gmail.com 1 anilkumar10436@gmail.com 2 1 Assistant Professor, Dept of
More informationBaseline V IRAM Trimedia. Cycles ( x 1000 ) N
CS 252 COMPUTER ARCHITECTURE MAY 2000 An Investigation of the QR Decomposition Algorithm on Parallel Architectures Vito Dai and Brian Limketkai Abstract This paper presents an implementation of a QR decomposition
More informationA 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications
A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications Ju-Ho Sohn, Jeong-Ho Woo, Min-Wuk Lee, Hye-Jung Kim, Ramchan Woo, Hoi-Jun Yoo Semiconductor System
More informationMPSOC 2011 BEAUNE, FRANCE
MPSOC 2011 BEAUNE, FRANCE BOADRES: A SCALABLE BASEBAND PROCESSOR TEMPLATE FOR Gbps RADIOS VICE PRESIDENT, CHAIRMAN OF THE TECHNOLOGY OFFICE PROFESSOR AT THE KATHOLIEKE UNIVERSITEIT LEUVEN STATUS SDR BASEBAND
More informationEfficient Implementation of Single Error Correction and Double Error Detection Code with Check Bit Precomputation
http://dx.doi.org/10.5573/jsts.2012.12.4.418 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.12, NO.4, DECEMBER, 2012 Efficient Implementation of Single Error Correction and Double Error Detection
More informationLow-Power Adaptive Viterbi Decoder for TCM Using T-Algorithm
International Journal of Scientific and Research Publications, Volume 3, Issue 8, August 2013 1 Low-Power Adaptive Viterbi Decoder for TCM Using T-Algorithm MUCHHUMARRI SANTHI LATHA*, Smt. D.LALITHA KUMARI**
More informationNon-recursive complexity reduction encoding scheme for performance enhancement of polar codes
Non-recursive complexity reduction encoding scheme for performance enhancement of polar codes 1 Prakash K M, 2 Dr. G S Sunitha 1 Assistant Professor, Dept. of E&C, Bapuji Institute of Engineering and Technology,
More informationIEEE Proof Web Version
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1 From Parallelism Levels to a Multi-ASIP Architecture for Turbo Decoding Olivier Muller, Member, IEEE, Amer Baghdadi, and Michel Jézéquel,
More informationDYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech)
DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech) K.Prasad Babu 2 M.tech (Ph.d) hanumanthurao19@gmail.com 1 kprasadbabuece433@gmail.com 2 1 PG scholar, VLSI, St.JOHNS
More informationA Network Storage LSI Suitable for Home Network
258 HAN-KYU LIM et al : A NETWORK STORAGE LSI SUITABLE FOR HOME NETWORK A Network Storage LSI Suitable for Home Network Han-Kyu Lim*, Ji-Ho Han**, and Deog-Kyoon Jeong*** Abstract Storage over (SoE) is
More informationAN FFT PROCESSOR BASED ON 16-POINT MODULE
AN FFT PROCESSOR BASED ON 6-POINT MODULE Weidong Li, Mark Vesterbacka and Lars Wanhammar Electronics Systems, Dept. of EE., Linköping University SE-58 8 LINKÖPING, SWEDEN E-mail: {weidongl, markv, larsw}@isy.liu.se,
More informationQuantized Iterative Message Passing Decoders with Low Error Floor for LDPC Codes
Quantized Iterative Message Passing Decoders with Low Error Floor for LDPC Codes Xiaojie Zhang and Paul H. Siegel University of California, San Diego 1. Introduction Low-density parity-check (LDPC) codes
More informationLow Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm
Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm 1 A.Malashri, 2 C.Paramasivam 1 PG Student, Department of Electronics and Communication K S Rangasamy College Of Technology,
More informationHigh Speed Special Function Unit for Graphics Processing Unit
High Speed Special Function Unit for Graphics Processing Unit Abd-Elrahman G. Qoutb 1, Abdullah M. El-Gunidy 1, Mohammed F. Tolba 1, and Magdy A. El-Moursy 2 1 Electrical Engineering Department, Fayoum
More informationParallel-computing approach for FFT implementation on digital signal processor (DSP)
Parallel-computing approach for FFT implementation on digital signal processor (DSP) Yi-Pin Hsu and Shin-Yu Lin Abstract An efficient parallel form in digital signal processor can improve the algorithm
More information/$ IEEE
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL. 56, NO. 5, MAY 2009 1005 Low-Power Memory-Reduced Traceback MAP Decoding for Double-Binary Convolutional Turbo Decoder Cheng-Hung Lin,
More informationHigh-performance and Low-power Consumption Vector Processor for LTE Baseband LSI
High-performance and Low-power Consumption Vector Processor for LTE Baseband LSI Yi Ge Mitsuru Tomono Makiko Ito Yoshio Hirose Recently, the transmission rate for handheld devices has been increasing by
More informationA Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter
A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter A.S. Sneka Priyaa PG Scholar Government College of Technology Coimbatore ABSTRACT The Least Mean Square Adaptive Filter is frequently
More information6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1
6T- SRAM for Low Power Consumption Mrs. J.N.Ingole 1, Ms.P.A.Mirge 2 Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 PG Student [Digital Electronics], Dept. of ExTC, PRMIT&R,
More informationLOW-DENSITY parity-check (LDPC) codes, which are defined
734 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 56, NO. 9, SEPTEMBER 2009 Design of a Multimode QC-LDPC Decoder Based on Shift-Routing Network Chih-Hao Liu, Chien-Ching Lin, Shau-Wei
More informationA Low Power 32 Bit CMOS ROM Using a Novel ATD Circuit
International Journal of Electrical and Computer Engineering (IJECE) Vol. 3, No. 4, August 2013, pp. 509~515 ISSN: 2088-8708 509 A Low Power 32 Bit CMOS ROM Using a Novel ATD Circuit Sidhant Kukrety*,
More informationEfficient VLSI Huffman encoder implementation and its application in high rate serial data encoding
LETTER IEICE Electronics Express, Vol.14, No.21, 1 11 Efficient VLSI Huffman encoder implementation and its application in high rate serial data encoding Rongshan Wei a) and Xingang Zhang College of Physics
More informationOverlapped Scheduling for Folded LDPC Decoding Based on Matrix Permutation
Overlapped Scheduling for Folded LDPC Decoding Based on Matrix Permutation In-Cheol Park and Se-Hyeon Kang Department of Electrical Engineering and Computer Science, KAIST {icpark, shkang}@ics.kaist.ac.kr
More informationA New MIMO Detector Architecture Based on A Forward-Backward Trellis Algorithm
A New MIMO etector Architecture Based on A Forward-Backward Trellis Algorithm Yang Sun and Joseph R Cavallaro epartment of Electrical and Computer Engineering Rice University, Houston, TX 775 Email: {ysun,
More informationSoftware Defined Modems for The Internet of Things. Dr. John Haine, IP Operations Manager
Software Defined Modems for The Internet of Things Dr. John Haine, IP Operations Manager www.cognovo.com What things? 20 billion connected devices Manufactured for global markets Low cost Lifetimes from
More informationDevelopment of Dependable Wireless System and Device
December 6, 2013 JST International Symposium on Dependable VLSI Systems 2013 Development of Dependable Wireless System and Device Research Director: Kazuo Tsubouchi, Tohoku University Members: Akira Matsuzawa,
More informationHigh-Performance VLSI Architecture of H.264/AVC CAVLD by Parallel Run_before Estimation Algorithm *
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 29, 595-605 (2013) High-Performance VLSI Architecture of H.264/AVC CAVLD by Parallel Run_before Estimation Algorithm * JONGWOO BAE 1 AND JINSOO CHO 2,+ 1
More informationA MULTIBANK MEMORY-BASED VLSI ARCHITECTURE OF DIGITAL VIDEO BROADCASTING SYMBOL DEINTERLEAVER
A MULTIBANK MEMORY-BASED VLSI ARCHITECTURE OF DIGITAL VIDEO BROADCASTING SYMBOL DEINTERLEAVER D.HARI BABU 1, B.NEELIMA DEVI 2 1,2 Noble college of engineering and technology, Nadergul, Hyderabad, Abstract-
More informationDesign of Hierarchical Crossconnect WDM Networks Employing a Two-Stage Multiplexing Scheme of Waveband and Wavelength
166 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 20, NO. 1, JANUARY 2002 Design of Hierarchical Crossconnect WDM Networks Employing a Two-Stage Multiplexing Scheme of Waveband and Wavelength
More informationConfiguration latency constraint and frame duration /13/$ IEEE. a) Configuration latency constraint
An efficient on-chip configuration infrastructure for a flexible multi-asip turbo decoder architecture Vianney Lapotre, Michael Hübner, Guy Gogniat, Purushotham Murugappa, Amer Baghdadi and Jean-Philippe
More informationEfficient Majority Logic Fault Detector/Corrector Using Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes
Efficient Majority Logic Fault Detector/Corrector Using Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes 1 U.Rahila Begum, 2 V. Padmajothi 1 PG Student, 2 Assistant Professor 1 Department Of
More information/$ IEEE
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 10, OCTOBER 2006 1147 Transactions Briefs Highly-Parallel Decoding Architectures for Convolutional Turbo Codes Zhiyong He,
More informationProcessor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP
Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Presenter: Course: EEC 289Q: Reconfigurable Computing Course Instructor: Professor Soheil Ghiasi Outline Overview of M.I.T. Raw processor
More informationENERGY-EFFICIENT VLSI REALIZATION OF BINARY64 DIVISION WITH REDUNDANT NUMBER SYSTEMS 1 AVANIGADDA. NAGA SANDHYA RANI
ENERGY-EFFICIENT VLSI REALIZATION OF BINARY64 DIVISION WITH REDUNDANT NUMBER SYSTEMS 1 AVANIGADDA. NAGA SANDHYA RANI 2 BALA KRISHNA.KONDA M.Tech, Assistant Professor 1,2 Eluru College Of Engineering And
More informationA Modified DRR-Based Non-real-time Service Scheduling Scheme in Wireless Metropolitan Networks
A Modified DRR-Based Non-real-time Service Scheduling Scheme in Wireless Metropolitan Networks Han-Sheng Chuang 1, Liang-Teh Lee 1 and Chen-Feng Wu 2 1 Department of Computer Science and Engineering, Tatung
More informationMulti-Megabit Channel Decoder
MPSoC 13 July 15-19, 2013 Otsu, Japan Multi-Gigabit Channel Decoders Ten Years After Norbert Wehn wehn@eit.uni-kl.de Multi-Megabit Channel Decoder MPSoC 03 N. Wehn UMTS standard: 2 Mbit/s throughput requirements
More informationINTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VII /Issue 2 / OCT 2016
NEW VLSI ARCHITECTURE FOR EXPLOITING CARRY- SAVE ARITHMETIC USING VERILOG HDL B.Anusha 1 Ch.Ramesh 2 shivajeehul@gmail.com 1 chintala12271@rediffmail.com 2 1 PG Scholar, Dept of ECE, Ganapathy Engineering
More informationWITH the development of the semiconductor technology,
Dual-Link Hierarchical Cluster-Based Interconnect Architecture for 3D Network on Chip Guang Sun, Yong Li, Yuanyuan Zhang, Shijun Lin, Li Su, Depeng Jin and Lieguang zeng Abstract Network on Chip (NoC)
More informationEnhanced Reconfigurable Viterbi Decoder with NoC for OFDM Block of a Wireless Standard
Enhanced Reconfigurable Viterbi Decoder with NoC for OFDM Block of a Wireless Standard Dr. D. Ganeshkumar 1, C.R. Suganya devi 2, Dr. V. Parimala 3 1 Professor and Head, Department of Electronics and Communication
More informationNear Optimal Repair Rate Built-in Redundancy Analysis with Very Small Hardware Overhead
Near Optimal Repair Rate Built-in Redundancy Analysis with Very Small Hardware Overhead Woosung Lee, Keewon Cho, Jooyoung Kim, and Sungho Kang Department of Electrical & Electronic Engineering, Yonsei
More informationGrid Middleware for Realizing Autonomous Resource Sharing: Grid Service Platform
Grid Middleware for Realizing Autonomous Resource Sharing: Grid Service Platform V Soichi Shigeta V Haruyasu Ueda V Nobutaka Imamura (Manuscript received April 19, 27) These days, many enterprises are
More informationA High-Speed FPGA Implementation of an RSD-Based ECC Processor
RESEARCH ARTICLE International Journal of Engineering and Techniques - Volume 4 Issue 1, Jan Feb 2018 A High-Speed FPGA Implementation of an RSD-Based ECC Processor 1 K Durga Prasad, 2 M.Suresh kumar 1
More informationUsing FPGA for Computer Architecture/Organization Education
IEEE Computer Society Technical Committee on Computer Architecture Newsletter, Jun. 1996, pp.31 35 Using FPGA for Computer Architecture/Organization Education Yamin Li and Wanming Chu Computer Architecture
More informationDesign and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology
Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology Senthil Ganesh R & R. Kalaimathi 1 Assistant Professor, Electronics and Communication Engineering, Info Institute of Engineering,
More informationISSN: [Garade* et al., 6(1): January, 2017] Impact Factor: 4.116
IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY FULLY REUSED VLSI ARCHITECTURE OF DSRC ENCODERS USING SOLS TECHNIQUE Supriya Shivaji Garade*, Prof. P. R. Badadapure * Department
More informationISSN Vol.05,Issue.09, September-2017, Pages:
WWW.IJITECH.ORG ISSN 2321-8665 Vol.05,Issue.09, September-2017, Pages:1693-1697 AJJAM PUSHPA 1, C. H. RAMA MOHAN 2 1 PG Scholar, Dept of ECE(DECS), Shirdi Sai Institute of Science and Technology, Anantapuramu,
More informationPERFORMANCE ANALYSIS OF HIGH EFFICIENCY LOW DENSITY PARITY-CHECK CODE DECODER FOR LOW POWER APPLICATIONS
American Journal of Applied Sciences 11 (4): 558-563, 2014 ISSN: 1546-9239 2014 Science Publication doi:10.3844/ajassp.2014.558.563 Published Online 11 (4) 2014 (http://www.thescipub.com/ajas.toc) PERFORMANCE
More informationHigh performance, power-efficient DSPs based on the TI C64x
High performance, power-efficient DSPs based on the TI C64x Sridhar Rajagopal, Joseph R. Cavallaro, Scott Rixner Rice University {sridhar,cavallar,rixner}@rice.edu RICE UNIVERSITY Recent (2003) Research
More informationConcurrent Testing with RF
Concurrent Testing with RF Jeff Brenner Verigy US EK Tan Verigy Singapore go/semi March 2010 1 Introduction Integration of multiple functional cores can be accomplished through the development of either
More informationOptimization of Task Scheduling and Memory Partitioning for Multiprocessor System on Chip
Optimization of Task Scheduling and Memory Partitioning for Multiprocessor System on Chip 1 Mythili.R, 2 Mugilan.D 1 PG Student, Department of Electronics and Communication K S Rangasamy College Of Technology,
More informationERROR correcting codes are used to increase the bandwidth
404 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 3, MARCH 2002 A 690-mW 1-Gb/s 1024-b, Rate-1/2 Low-Density Parity-Check Code Decoder Andrew J. Blanksby and Chris J. Howland Abstract A 1024-b, rate-1/2,
More informationMassively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain
Massively Parallel Computing on Silicon: SIMD Implementations V.M.. Brea Univ. of Santiago de Compostela Spain GOAL Give an overview on the state-of of-the- art of Digital on-chip CMOS SIMD Solutions,
More informationOn combining chase-2 and sum-product algorithms for LDPC codes
University of Wollongong Research Online Faculty of Engineering and Information Sciences - Papers: Part A Faculty of Engineering and Information Sciences 2012 On combining chase-2 and sum-product algorithms
More informationImplementation of Reduce the Area- Power Efficient Fixed-Point LMS Adaptive Filter with Low Adaptation-Delay
Implementation of Reduce the Area- Power Efficient Fixed-Point LMS Adaptive Filter with Low Adaptation-Delay A.Sakthivel 1, A.Lalithakumar 2, T.Kowsalya 3 PG Scholar [VLSI], Muthayammal Engineering College,
More informationArea And Power Efficient LMS Adaptive Filter With Low Adaptation Delay
e-issn: 2349-9745 p-issn: 2393-8161 Scientific Journal Impact Factor (SJIF): 1.711 International Journal of Modern Trends in Engineering and Research www.ijmter.com Area And Power Efficient LMS Adaptive
More informationDesign of Viterbi Decoder for Noisy Channel on FPGA Ms. M.B.Mulik, Prof. U.L.Bombale, Prof. P.C.Bhaskar
International Journal of Scientific & Engineering Research Volume 2, Issue 6, June-2011 1 Design of Viterbi Decoder for Noisy Channel on FPGA Ms. M.B.Mulik, Prof. U.L.Bombale, Prof. P.C.Bhaskar Abstract
More informationSynthetic Benchmark Generator for the MOLEN Processor
Synthetic Benchmark Generator for the MOLEN Processor Stephan Wong, Guanzhou Luo, and Sorin Cotofana Computer Engineering Laboratory, Electrical Engineering Department, Delft University of Technology,
More informationA Comparison of Two Algorithms Involving Montgomery Modular Multiplication
ISSN (Online) : 2319-8753 ISSN (Print) : 2347-6710 International Journal of Innovative Research in Science, Engineering and Technology An ISO 3297: 2007 Certified Organization Volume 6, Special Issue 5,
More informationA High-Throughput Processor for Cryptographic Hash Functions
A High-Throughput Processor for Cryptographic Hash Functions Yuanhong Huo and Dake Liu Beijing Institute of Technology, Beijing 100081, China Email: {hyh, dake@bit.edu.cn Abstract This paper presents a
More informationEFFICIENT FAILURE PROCESSING ARCHITECTURE IN REGULAR EXPRESSION PROCESSOR
EFFICIENT FAILURE PROCESSING ARCHITECTURE IN REGULAR EXPRESSION PROCESSOR ABSTRACT SangKyun Yun Department of Computer and Telecom. Engineering, Yonsei University, Wonju, Korea skyun@yonsei.ac.kr Regular
More informationAnalysis of Different Multiplication Algorithms & FPGA Implementation
IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 4, Issue 2, Ver. I (Mar-Apr. 2014), PP 29-35 e-issn: 2319 4200, p-issn No. : 2319 4197 Analysis of Different Multiplication Algorithms & FPGA
More informationFPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST
FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST SAKTHIVEL Assistant Professor, Department of ECE, Coimbatore Institute of Engineering and Technology Abstract- FPGA is
More information