Flexible wireless communication architectures
|
|
- Sydney Porter
- 5 years ago
- Views:
Transcription
1 Flexible wireless communication architectures Sridhar Rajagopal Department of Electrical and Computer Engineering Rice University, Houston TX Faculty Candidate Seminar Southern Methodist University April 23, 2003 RICE UNIVERSITY This work has been supported in part by NSF, Nokia and Texas Instruments
2 Future wireless devices demand flexibility Wireless Cellular Wireless LAN Bluetooth/ Home Networks Multiple algorithms and environments supported in same device High data rate mobile devices with multimedia Flexible algorithms: Multiple antennas, complex signal processing Flexible architectures: High performance (Mbps), low power (mw) Fast design with structured exploration RICE UNIVERSITY2
3 Flexibility needed in different layers Application Layer Puppeteer project at Rice Network Layer MAC Layer Physical Layer Flexible Algorithms Mapping Flexible Architectures Analog RF RICE UNIVERSITY3
4 Research vision: Attain flexibility Design me Algorithms: Flexibility: support variety of sophisticated algorithms Architectures: Flexibility: adapts hardware to algorithms Fast, structured design exploration RICE UNIVERSITY4
5 Contributions: Algorithms Multi-user channel estimation:[jnl. Of VLSI Sig. Proc. 02, ASAP 00] Matrix-inversions Numerical techniques conjugate-gradient descent for complexity reduction Multi-user detection: [ISCAS 01] Block-based computation to streaming computations Pipelining, lower memory requirements Parallel, fixed-point, streaming VLSI implementations [IEEE Trans. Wireless Comm. 02] RICE UNIVERSITY5
6 Contributions: Architectures Heterogeneous DSP-FPGA system designs: [ICSPAT 00] Computer arithmetic:[symp. On Comp. Arith 01] Dynamic truncation in ASICs using on-line arithmetic with Most Significant Digit First computation [Ph.D. Thesis] Scalable Wireless Application-specific Processors (SWAPs) Rapid, structured architectures with flexibility-performance tradeoffs RICE UNIVERSITY6
7 Scalable Wireless Application-specific Processors Family of flexible programmable processors Clusters of ALUs High performance by supporting 100 s of ALUs Can provide customization for various algorithms Adapts ( swaps ) architecture dynamically for power Scale ALUs???? Scale Clusters RICE UNIVERSITY7
8 Rapid, structured design for SWAPs Low complexity, parallel, fixed point algorithms ASIC design DSP design apply apply? Architecture Exploration??? SWAPs RICE UNIVERSITY8
9 Research vision summary Provide a structured framework to rapidly explore: flexible, high performance, low power architectures (SWAPs) Efficient algorithm design for mapping to SWAPs Understanding of algorithms, DSPs and ASICs used Flexibility-performance trade-offs Inter-disciplinary research: Wireless communications, VLSI Signal Processing, Computer architecture, Computer arithmetic, Circuits, CAD, Compilers RICE UNIVERSITY9
10 Talk Outline Research vision SWAPs - Background Algorithm design for SWAPs Architecture design for SWAPs Current and Future Research Goals RICE UNIVERSITY10
11 SWAPs borrow from DSPs DSPs use : Instruction Level Parallelism (ILP) Subword Parallelism (MMX) Not enough ALUs for GOPs of computation-- Need 100 s TI C6x has 8 ALUs Why not more ALUs? Cannot support more registers (area,ports) Difficult to find ILP as ALUs increase 1 ALU RF Register File RICE UNIVERSITY11
12 SWAPs borrow from ASICs Exploit data parallelism (DP) Available in many wireless algorithms This is what ASICs do! int i,a[n],b[n],sum[n]; // 32 bits short int c[n],d[n],diff[n]; // 16 bits packed for (i = 0; i< 1024; i) { } sum[i] = a[i] b[i]; diff[i] = c[i] - d[i]; Subword ILP DP RICE UNIVERSITY12
13 SWAPs borrow from stream processors Kernels (computation) and streams (communication) Use local data in clusters providing GOPs support Imagine stream processor at Stanford [Rixner 01] Input Data Kernel Stream Output Data received signal Matched filter Interference Cancellation Viterbi decoding Decoded bits Correlator channel estimation RICE UNIVERSITY13 Scott Rixner. Stream Processor Architecture, Kluwer Academic Publishers: Boston, MA, 2001.
14 SWAPs are multi-cluster DSPs Memory: Stream Register File (SRF) ILP Internal Memory DSP (1 cluster) ILP DP SWAPs adapt clusters to DP Identical clusters, same operations. Power-down unused FUs, clusters RICE UNIVERSITY14
15 Arithmetic clusters in SWAPs From/To SRF Distributed Register Files (supports more ALUs) SRF Cross Point / / / / Scratchpad (indexed accesses) Intercluster Network Comm. Unit RICE UNIVERSITY15
16 Talk Outline Research vision SWAPs Background Algorithm design for SWAPs Architecture design for SWAPs Current and Future Research Goals RICE UNIVERSITY16
17 SWAPs: Physical layer algorithms Antenna Baseband processing RF Front-end Detection Channel estimation Decoding Higher (MAC/Network/ OS) Layers Complex signal processing algorithms with GOPs of computation RICE UNIVERSITY17
18 SWAP mapping example: Viterbi decoding Multiple antenna systems (MIMO systems) Complexity exponential with transmit x receive antennas Estimation: Linear MMSE, blind, conjugate gradient. Detection: FFT, (blind) interference cancellation. Decoding: Viterbi, Turbo, LDPC. & joint schemes SWAP flexibility lets you use the best algorithms for the situation Example for concept demonstration: Viterbi decoding RICE UNIVERSITY18
19 Parallel Viterbi Decoding for SWAPs Detected bits ACS Unit Traceback Unit Decoded bits Add-Compare-Select (ACS) : trellis interconnect : computations Parallelism depends on constraint length (#states) Traceback: searching Conventional Sequential (No DP) with dynamic branching Difficult to implement in parallel architecture Use Register Exchange (RE) parallel solution RICE UNIVERSITY19
20 Parallel Viterbi needs re-ordering for SWAPs ACS in SWAPs Regular ACS DP vector X(0) X(2) X(4) X(6) X(8) X(10) X(12) X(14) X(1) X(3) X(5) X(7) X(9) X(11) X(13) X(15) X(0) X(1) X(2) X(3) X(4) X(5) X(6) X(7) X(8) X(9) X(10) X(11) X(12) X(13) X(14) X(15) X(0) X(1) X(2) X(3) X(4) X(5) X(6) X(7) X(8) X(9) X(10) X(11) X(12) X(13) X(14) X(15) X(0) X(1) X(2) X(3) X(4) X(5) X(6) X(7) X(8) X(9) X(10) X(11) X(12) X(13) X(14) X(15) Exploiting Viterbi DP in SWAPs: Use RE instead of regular traceback Re-order ACS, RE RICE UNIVERSITY20
21 Talk Outline Research vision SWAP Background Algorithm design for SWAPs Architecture design for SWAPs Current and Future Research Goals RICE UNIVERSITY21
22 SWAP architecture design More clusters better than more ALUs/per cluster (if #clusters > 2) 1. Decide how many clusters Exploit DP ILP???? DP 2. Decide what to put within each cluster Maximize ILP with high functional unit efficiency Search design space with explore tool Time-power-area characterization RICE UNIVERSITY22
23 Design a SWAP cluster: Explore Auto-exploration of adders and multipliers for ACS" (80,34) (85,24) (Adder util%, Multiplier util%) 160 (85,17) 140 (85,11) (85,13) (70,59) Instruction count (72,22) (72,19) (61,22) 3 #Adders 4 (76,33) (60,26) (50,22) 5 (73,41) (61,33) (48,26) (39,22) 5 (65,45) (49,33) (39,27) 4 (62,62) (47,43) (40,32) 3 (54,59) (39,41) 2 #Multipliers (43,58) 1 RICE UNIVERSITY23
24 Explore tool benefits Instruction count vs. ALU efficiency What goes inside each cluster Design customized application-specific units Better performance with increased ALU utilization Explore multiple algorithms turn off functional units not in use for given kernel Vdd-gating, clock gating techniques RICE UNIVERSITY24
25 Example for SWAP architecture design Explore Algorithm 1 : 3 adders, 3 multipliers, 32 clusters DP ILP Explore Algorithm 2 : 4 adders, 1 multiplier, 64 clusters Explore Algorithm 3 : 2 adders, 2 multipliers, 64 clusters Explore Algorithm 4 : 2 adders, 2 multipliers, 16 clusters Chosen Architecture: 4 adders, 3 multipliers, 64 clusters RICE UNIVERSITY25
26 SWAP flexibility provides power savings Multiple algorithms Different ALU, cluster requirements Turning off ALUs ( add mul compiler options) Use the right #ALUs from explore tool Turning off clusters Data across SRF of all clusters Cluster only has access to its own SRF Next kernel may need data from SRF of other clusters Reconfiguration support needs to be provided RICE UNIVERSITY26
27 SWAPs provide cluster reconfiguration SRF LATCH LATCH LATCH LATCH MDX2 MDX2 MDX1 Mux-Demux Network With Stream buffers Clusters Additional latency (few cycles) due to microcontroller stalls - Minimal loss in performance RICE UNIVERSITY27
28 Cluster reconfiguration for Viterbi DP Can be turned OFF Packet 1 Constraint length 7 (16 clusters) Packet 2 Constraint length 9 (64 clusters) Packet 3 Constraint length 5 (4 clusters) RICE UNIVERSITY28
29 SWAPs provide flexibility at negligible overhead Clusters Memory 64-bit Rate ½ Packet 1 K = 7 Execution Time (cycles) Kernels (Computation) Packet 2 K = 9 No Data Memory accesses Packet 3 K = 5 RICE UNIVERSITY29
30 SWAP exploration for Viterbi decoding Frequency needed to attain real-time (in MHz) 1000 DSP Different SWAPs (Without reconfiguration) Max DP Number of clusters K = 9 K = 7 K = 5 Same SWAP (With reconfiguration) Ideal C64x (w/o co-proc) needs ~200 MHz for real-time RICE UNIVERSITY30
31 SWAPs : Salient features 1-2 orders of magnitude better than a DSP Any constraint length 10 MHz at 128 Kbps Same code for all constraint lengths no need to re-compile or load another code as long as parallelism/cluster ratio is constant Power savings due to dynamic cluster scaling RICE UNIVERSITY31
32 Expected SWAP power consumption Power model based on [Khailany 03] 64 clusters and 1 multiplier per cluster: 0.13 micron, 1.2 V Peak Active Power: ~9 mw at 1 MHz (DSP ~1 mw) Area: ~53.7 mm 2 10 MHz, 128 Kbps with reconfiguration Viterbi Clusters Used Peak Power K = 9 64 ~90 mw K = 7 16 ~28.57 mw K = 5 4 ~13.8 mw overhead 0 ~8.1 mw DSP, K = 9 1 ~200 mw Power (in mw) Active Clusters (max 64) RICE UNIVERSITY32 Exploring the VLSI Scalability of Stream Processors, Brucek Khailany et al, Proceedings of the Ninth Symposium on High Performance Computer Architecture, February 8-12, 2003
33 Multiuser Estimation-DetectionDecoding Real-time target : 128 Kbps per user Frequency needed to attain real-time (in MHz) DSP Number of clusters FAST MEDIUM SLOW 32-user base-station Mobile Fading scenarios Ideal C64x (w/o co-proc) needs ~15 GHz for real-time RICE UNIVERSITY33
34 Expected SWAP power : base-station 32 user base-station with 3 X s per cluster and 64 clusters: 0.13 micron, 1.2 V Peak Active Power: ~18.19 mw for 1 MHz (increased X) Area: ~93.4 mm 2 Total Peak Base-station power consumption: ~18.19 W at 1 GHz for 32 users at 128 Kbps/user RICE UNIVERSITY34
35 Talk Outline Research vision SWAP Background Algorithm design for SWAPs Architecture design for SWAPs Current and Future Research Goals RICE UNIVERSITY35
36 Current research: Flexibility vs. performance SWAPs: 128 Kbps at ~ mwfor Viterbi Borrow DP from ASICs! suitable for base-stations Flexibility more important than power suitable for mobile devices Power constraints tighter can be customized for further power savings Handset SWAPs (H-SWAPs) Borrow Task pipelining from ASICs! Application-specific units and specialized comm. network RICE UNIVERSITY36
37 37 RICE UNIVERSITY Handset SWAPs: H-SWAPs Trade Data Parallelism for Task Pipelining SRF DP SWAPs (max. clusters and reconfigure) Limited DP SWAPlet (limit clusters) Limited DP Limited DP Limited DP H-SWAPs (collection of customized SWAPlets)
38 Sample points in architecture exploration Programmable solutions with increased customization DSPs (1 cluster) SWAPs (multiple) H-SWAPs (optimized for handsets) ILP Subword ILP Subword DP ILP Subword DP Task Pipelining Custom ALUs Performance, Power benefits (with decreasing flexibility) RICE UNIVERSITY38
39 Future: Efficient algorithms and mapping Channel Estimator Non- Coherent STC Coherent STC Channel Equalizer MRC Detector Demodulator Decoder Beamforming Multipath Channel Turbo Equalizer Multiple antenna systems with 1-2 orders-of-magnitude higher complexity RICE UNIVERSITY39
40 Future research: Architectures Generalized and structured framework and tools Joint algorithm-architecture exploration Area-time-power-flexibility tradeoffs Potential applications: embedded systems Image and Video processing: Cameras : variety of compression algorithms Biomedical applications: Hearing aids: DSP running on body heat Sensor networks Compression of data before transmission Quote: Gene Frantz, TI Fellow RICE UNIVERSITY40
41 SWAPs: Flexibility, Performance, Power Need flexibility in future wireless devices Algorithms and Architectures Rapid Exploration for Scalable, Wireless Application-specific Processors Structured approach with flexibility-performance trade-offs SWAPs - flexibility, high performance and low power Exploit data parallelism like ASICs 1-2 orders better performance than DSPs Turn off unused clusters and unused ALUs for low power RICE UNIVERSITY41
High performance, power-efficient DSPs based on the TI C64x
High performance, power-efficient DSPs based on the TI C64x Sridhar Rajagopal, Joseph R. Cavallaro, Scott Rixner Rice University {sridhar,cavallar,rixner}@rice.edu RICE UNIVERSITY Recent (2003) Research
More informationDesign space exploration for real-time embedded stream processors
Design space exploration for real-time embedded stream processors Sridhar Rajagopal, Joseph R. Cavallaro, and Scott Rixner Department of Electrical and Computer Engineering Rice University sridhar, cavallar,
More informationReconfigurable stream processors for wireless base-stations
Reconfigurable stream processors for wireless base-stations ridhar Rajagopal (sridhar@rice.edu) eptember 9, 2003, 3:25 AM Abstract The need to support evolving standards, rapid prototyping and fast time-to-market
More informationReconfigurable Stream Processors for Wireless Base-stations
Reconfigurable Stream Processors for Wireless Base-stations Sridhar Rajagopal Scott Rixner Joseph R. avallaro Rice University {sridhar,rixner,cavallar}@rice.edu Abstract This paper presents the design
More informationDATA-PARALLEL DIGITAL SIGNAL PROCESSORS: ALGORITHM MAPPING, ARCHITECTURE SCALING AND WORKLOAD ADAPTATION
DATA-PARALLEL DIGITAL SIGNAL PROCESSORS: ALGORITHM MAPPING, ARCHITECTURE SCALING AND WORKLOAD ADAPTATION Sridhar Rajagopal Thesis: Doctor of Philosophy Electrical and Computer Engineering Rice University,
More informationDesigning Scalable Wireless Application-specific Processors
Designing calable Wireless Application-specific Processors ridhar Rajagopal (sridhar@rice.edu) eptember 1, 2003, 10:52 PM Abstract This paper presents a structured way of designing and exploring scalable,
More informationReconfigurable VLSI Communication Processor Architectures
Reconfigurable VLSI Communication Processor Architectures Joseph R. Cavallaro Center for Multimedia Communication www.cmc.rice.edu Department of Electrical and Computer Engineering Rice University, Houston
More informationImproving Power Efficiency in Stream Processors Through Dynamic Cluster Reconfiguration
Improving Power Efficiency in Stream Processors Through Dynamic luster Reconfiguration Sridhar Rajagopal WiQuest ommunications Allen, T 75 sridhar.rajagopal@wiquest.com Scott Rixner and Joseph R. avallaro
More informationImproving Power Efficiency in Stream Processors Through Dynamic Reconfiguration
Improving Power Efficiency in Stream Processors Through Dynamic Reconfiguration June 5, 24 Abstract Stream processors support hundreds of functional units in a programmable architecture by clustering those
More informationA PROGRAMMABLE COMMUNICATIONS PROCESSOR FOR THIRD GENERATION WIRELESS COMMUNICATION SYSTEMS
A PROGRAMMABLE COMMUNICATIONS PROCESSOR FOR THIRD GENERATION WIRELESS COMMUNICATION SYSTEMS Sridhar Rajagopal and Joseph R. Cavallaro Rice University Center for Multimedia Communication Department of Electrical
More informationMicroprocessor Extensions for Wireless Communications
Microprocessor Extensions for Wireless Communications Sridhar Rajagopal and Joseph R. Cavallaro DRAFT REPORT Rice University Center for Multimedia Communication Department of Electrical and Computer Engineering
More informationReconfigurable Architectures for Wireless Systems: Design Exploration and Integration Challenges
Reconfigurable Architectures for Wireless Systems: Design Exploration and Integration Challenges Joseph R. Cavallaro, Michael C. Brogioli, Alexandre de Baynast, and Predrag Radosavljevic, Center for Multimedia
More informationAll MSEE students are required to take the following two core courses: Linear systems Probability and Random Processes
MSEE Curriculum All MSEE students are required to take the following two core courses: 3531-571 Linear systems 3531-507 Probability and Random Processes The course requirements for students majoring in
More informationLeveraging Mobile GPUs for Flexible High-speed Wireless Communication
0 Leveraging Mobile GPUs for Flexible High-speed Wireless Communication Qi Zheng, Cao Gao, Trevor Mudge, Ronald Dreslinski *, Ann Arbor The 3 rd International Workshop on Parallelism in Mobile Platforms
More informationDesign space exploration for real-time embedded stream processors
Design space eploration for real-time embedded stream processors Sridhar Rajagopal, Joseph R. Cavallaro, and Scott Riner Department of Electrical and Computer Engineering Rice University sridhar, cavallar,
More informationA 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing
A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing Dean Truong, Wayne Cheng, Tinoosh Mohsenin, Zhiyi Yu, Toney Jacobson, Gouri Landge, Michael Meeuwsen, Christine
More informationMPSOC 2011 BEAUNE, FRANCE
MPSOC 2011 BEAUNE, FRANCE BOADRES: A SCALABLE BASEBAND PROCESSOR TEMPLATE FOR Gbps RADIOS VICE PRESIDENT, CHAIRMAN OF THE TECHNOLOGY OFFICE PROFESSOR AT THE KATHOLIEKE UNIVERSITEIT LEUVEN STATUS SDR BASEBAND
More informationIMAGINE: Signal and Image Processing Using Streams
IMAGINE: Signal and Image Processing Using Streams Brucek Khailany William J. Dally, Scott Rixner, Ujval J. Kapasi, Peter Mattson, Jinyung Namkoong, John D. Owens, Brian Towles Concurrent VLSI Architecture
More informationReconfigurable Cell Array for DSP Applications
Outline econfigurable Cell Array for DSP Applications Chenxin Zhang Department of Electrical and Information Technology Lund University, Sweden econfigurable computing Coarse-grained reconfigurable cell
More informationA 167-processor 65 nm Computational Platform with Per-Processor Dynamic Supply Voltage and Dynamic Clock Frequency Scaling
A 167-processor 65 nm Computational Platform with Per-Processor Dynamic Supply Voltage and Dynamic Clock Frequency Scaling Dean Truong, Wayne Cheng, Tinoosh Mohsenin, Zhiyi Yu, Toney Jacobson, Gouri Landge,
More informationECE 747 Digital Signal Processing Architecture. DSP Implementation Architectures
ECE 747 Digital Signal Processing Architecture DSP Implementation Architectures Spring 2006 W. Rhett Davis NC State University W. Rhett Davis NC State University ECE 406 Spring 2006 Slide 1 My Goal Challenge
More informationAn introduction to DSP s. Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures
An introduction to DSP s Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures DSP example: mobile phone DSP example: mobile phone with video camera DSP: applications Why a DSP?
More informationAn Asynchronous Array of Simple Processors for DSP Applications
An Asynchronous Array of Simple Processors for DSP Applications Zhiyi Yu, Michael Meeuwsen, Ryan Apperson, Omar Sattari, Michael Lai, Jeremy Webb, Eric Work, Tinoosh Mohsenin, Mandeep Singh, Bevan Baas
More informationCrash Course in Wireless Video
Lifemote April 24, 2018 Ludwig Wittgenstein The context in which words are used, the intent with which they are uttered, determines their meaning. Successful communication is guessing which game the speaker
More informationVLSI Signal Processing
VLSI Signal Processing Programmable DSP Architectures Chih-Wei Liu VLSI Signal Processing Lab Department of Electronics Engineering National Chiao Tung University Outline DSP Arithmetic Stream Interface
More informationImplementation of a Dual-Mode SDR Smart Antenna Base Station Supporting WiBro and TDD HSDPA
Implementation of a Dual-Mode SDR Smart Antenna Base Station Supporting WiBro and TDD HSDPA Jongeun Kim, Sukhwan Mun, Taeyeol Oh,Yusuk Yun, Seungwon Choi 1 HY-SDR Research Center, Hanyang University, Seoul,
More informationLinköping University Post Print. epuma: a novel embedded parallel DSP platform for predictable computing
Linköping University Post Print epuma: a novel embedded parallel DSP platform for predictable computing Jian Wang, Joar Sohl, Olof Kraigher and Dake Liu N.B.: When citing this work, cite the original article.
More informationEmbedded Computation
Embedded Computation What is an Embedded Processor? Any device that includes a programmable computer, but is not itself a general-purpose computer [W. Wolf, 2000]. Commonly found in cell phones, automobiles,
More informationasoc: : A Scalable On-Chip Communication Architecture
asoc: : A Scalable On-Chip Communication Architecture Russell Tessier, Jian Liang,, Andrew Laffely,, and Wayne Burleson University of Massachusetts, Amherst Reconfigurable Computing Group Supported by
More informationOpenRadio. A programmable wireless dataplane. Manu Bansal Stanford University. Joint work with Jeff Mehlman, Sachin Katti, Phil Levis
OpenRadio A programmable wireless dataplane Manu Bansal Stanford University Joint work with Jeff Mehlman, Sachin Katti, Phil Levis HotSDN 12, August 13, 2012, Helsinki, Finland 2 Opening up the radio Why?
More informationBenchmarking Multithreaded, Multicore and Reconfigurable Processors
Insight, Analysis, and Advice on Signal Processing Technology Benchmarking Multithreaded, Multicore and Reconfigurable Processors Berkeley Design Technology, Inc. 2107 Dwight Way, Second Floor Berkeley,
More informationECE 637 Integrated VLSI Circuits. Introduction. Introduction EE141
ECE 637 Integrated VLSI Circuits Introduction EE141 1 Introduction Course Details Instructor Mohab Anis; manis@vlsi.uwaterloo.ca Text Digital Integrated Circuits, Jan Rabaey, Prentice Hall, 2 nd edition
More informationCAD for VLSI. Debdeep Mukhopadhyay IIT Madras
CAD for VLSI Debdeep Mukhopadhyay IIT Madras Tentative Syllabus Overall perspective of VLSI Design MOS switch and CMOS, MOS based logic design, the CMOS logic styles, Pass Transistors Introduction to Verilog
More informationLow-power Architecture. By: Jonathan Herbst Scott Duntley
Low-power Architecture By: Jonathan Herbst Scott Duntley Why low power? Has become necessary with new-age demands: o Increasing design complexity o Demands of and for portable equipment Communication Media
More informationData Parallel Architectures
EE392C: Advanced Topics in Computer Architecture Lecture #2 Chip Multiprocessors and Polymorphic Processors Thursday, April 3 rd, 2003 Data Parallel Architectures Lecture #2: Thursday, April 3 rd, 2003
More informationOverview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips
Overview CSE372 Digital Systems Organization and Design Lab Prof. Milo Martin Unit 5: Hardware Synthesis CAD (Computer Aided Design) Use computers to design computers Virtuous cycle Architectural-level,
More informationClassification of Semiconductor LSI
Classification of Semiconductor LSI 1. Logic LSI: ASIC: Application Specific LSI (you have to develop. HIGH COST!) For only mass production. ASSP: Application Specific Standard Product (you can buy. Low
More informationAN FFT PROCESSOR BASED ON 16-POINT MODULE
AN FFT PROCESSOR BASED ON 6-POINT MODULE Weidong Li, Mark Vesterbacka and Lars Wanhammar Electronics Systems, Dept. of EE., Linköping University SE-58 8 LINKÖPING, SWEDEN E-mail: {weidongl, markv, larsw}@isy.liu.se,
More informationThe Implementation and Analysis of Important Symmetric Ciphers on Stream Processor
2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore The Implementation and Analysis of Important Symmetric Ciphers on Stream Processor
More informationVideo-Aware Wireless Networks (VAWN) Final Meeting January 23, 2014
Video-Aware Wireless Networks (VAWN) Final Meeting January 23, 2014 1/26 ! Real-time Video Transmission! Challenges and Opportunities! Lessons Learned for Real-time Video! Mitigating Losses in Scalable
More informationProcessor Applications. The Processor Design Space. World s Cellular Subscribers. Nov. 12, 1997 Bob Brodersen (http://infopad.eecs.berkeley.
Processor Applications CS 152 Computer Architecture and Engineering Introduction to Architectures for Digital Signal Processing Nov. 12, 1997 Bob Brodersen (http://infopad.eecs.berkeley.edu) 1 General
More informationDelay Time Analysis of Reconfigurable. Firewall Unit
Delay Time Analysis of Reconfigurable Unit Tomoaki SATO C&C Systems Center, Hirosaki University Hirosaki 036-8561 Japan Phichet MOUNGNOUL Faculty of Engineering, King Mongkut's Institute of Technology
More informationIMAGINE: MEDIA PROCESSING
IMAGINE: MEDIA PROCESSING WITH STREAMS THE POWER-EFFICIENT IMAGINE STREAM PROCESSOR ACHIEVES PERFORMANCE DENSITIES COMPARABLE TO THOSE OF SPECIAL-PURPOSE EMBEDDED PROCESSORS. EXECUTING PROGRAMS MAPPED
More informationSoftware Defined Modem A commercial platform for wireless handsets
Software Defined Modem A commercial platform for wireless handsets Charles F Sturman VP Marketing June 22 nd ~ 24 th Brussels charles.stuman@cognovo.com www.cognovo.com Agenda SDM Separating hardware from
More informationCommunication Processors
Communication Processors Sridhar Rajagopal WiQuest Communications, Inc. sridhar.rajagopal@wiquest.com and Joseph R. Cavallaro Department of Electrical and Computer Engineering Rice University cavallar@rice.edu
More informationAn Ultra Low-Power WOLA Filterbank Implementation in Deep Submicron Technology
An Ultra ow-power WOA Filterbank Implementation in Deep Submicron Technology R. Brennan, T. Schneider Dspfactory td 611 Kumpf Drive, Unit 2 Waterloo, Ontario, Canada N2V 1K8 Abstract The availability of
More informationGraph-based Framework for Flexible Baseband Function Splitting and Placement in C-RAN
Graph-based Framework for Flexible Baseband Function Splitting and Placement in C-RAN Group Meeting Presentation (Paper Review) J. Liu, Graph-based framework for flexible baseband function splitting and
More informationThe Imagine Stream Processor
The Imagine Stream Processor Ujval J. Kapasi, William J. Dally, Scott Rixner, John D. Owens, and Brucek Khailany Computer Systems Laboratory Computer Systems Laboratory Stanford University, Stanford, CA
More informationAbstract A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE
A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE Reiner W. Hartenstein, Rainer Kress, Helmut Reinig University of Kaiserslautern Erwin-Schrödinger-Straße, D-67663 Kaiserslautern, Germany
More informationMPSoC Design Space Exploration Framework
MPSoC Design Space Exploration Framework Gerd Ascheid RWTH Aachen University, Germany Outline Motivation: MPSoC requirements in wireless and multimedia MPSoC design space exploration framework Summary
More informationA Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors
A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors Brent Bohnenstiehl and Bevan Baas Department of Electrical and Computer Engineering University of California, Davis {bvbohnen,
More informationEE482C, L1, Apr 4, 2002 Copyright (C) by William J. Dally, All Rights Reserved. Today s Class Meeting. EE482S Lecture 1 Stream Processor Architecture
1 Today s Class Meeting EE482S Lecture 1 Stream Processor Architecture April 4, 2002 William J Dally Computer Systems Laboratory Stanford University billd@cslstanfordedu What is EE482C? Material covered
More informationFixed Point Streaming Fft Processor For Ofdm
Fixed Point Streaming Fft Processor For Ofdm Sudhir Kumar Sa Rashmi Panda Aradhana Raju Abstract Fast Fourier Transform (FFT) processors are today one of the most important blocks in communication systems.
More informationArchitectural Support for Reducing Parallel Processing Overhead in an Embedded Multiprocessor
2010 IEEE/IFIP International Conference on Embedded and Ubiquitous Computing Architectural Support for Reducing Parallel Processing Overhead in an Embedded Multiprocessor Jian Wang, Joar Sohl and Dake
More informationLecture 41: Introduction to Reconfigurable Computing
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 41: Introduction to Reconfigurable Computing Michael Le, Sp07 Head TA April 30, 2007 Slides Courtesy of Hayden So, Sp06 CS61c Head TA Following
More informationStanford University Computer Systems Laboratory. Stream Scheduling. Ujval J. Kapasi, Peter Mattson, William J. Dally, John D. Owens, Brian Towles
Stanford University Concurrent VLSI Architecture Memo 122 Stanford University Computer Systems Laboratory Stream Scheduling Ujval J. Kapasi, Peter Mattson, William J. Dally, John D. Owens, Brian Towles
More informationAnySP: Anytime Anywhere Anyway Signal Processing
1 AnySP: Anytime Anywhere Anyway Signal Processing Mark Woh 1, Sangwon Seo 1, Scott Mahlke 1,Trevor Mudge 1, Chaitali Chakrabarti 2, Krisztian Flautner 3 University of Michigan ACAL 1 Arizona State University
More informationA Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms
A Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms Jingzhao Ou and Viktor K. Prasanna Department of Electrical Engineering, University of Southern California Los Angeles, California,
More informationEECS150 - Digital Design Lecture 09 - Parallelism
EECS150 - Digital Design Lecture 09 - Parallelism Feb 19, 2013 John Wawrzynek Spring 2013 EECS150 - Lec09-parallel Page 1 Parallelism Parallelism is the act of doing more than one thing at a time. Optimization
More informationAbstract of the Book
Book Keywords IEEE 802.16, IEEE 802.16m, mobile WiMAX, 4G, IMT-Advanced, 3GPP LTE, 3GPP LTE-Advanced, Broadband Wireless, Wireless Communications, Cellular Systems, Network Architecture Abstract of the
More informationLecture 20: High-level Synthesis (1)
Lecture 20: High-level Synthesis (1) Slides courtesy of Deming Chen Some slides are from Prof. S. Levitan of U. of Pittsburgh Outline High-level synthesis introduction High-level synthesis operations Scheduling
More informationFrequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System
Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Chi Zhang, Viktor K Prasanna University of Southern California {zhan527, prasanna}@usc.edu fpga.usc.edu ACM
More informationHigh-performance and Low-power Consumption Vector Processor for LTE Baseband LSI
High-performance and Low-power Consumption Vector Processor for LTE Baseband LSI Yi Ge Mitsuru Tomono Makiko Ito Yoshio Hirose Recently, the transmission rate for handheld devices has been increasing by
More informationEECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141
EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 14 EE141 Outline Parallelism EE141 2 Parallelism Parallelism is the act of doing more
More informationEE482S Lecture 1 Stream Processor Architecture
EE482S Lecture 1 Stream Processor Architecture April 4, 2002 William J. Dally Computer Systems Laboratory Stanford University billd@csl.stanford.edu 1 Today s Class Meeting What is EE482C? Material covered
More informationEmbedded Systems: Hardware Components (part I) Todor Stefanov
Embedded Systems: Hardware Components (part I) Todor Stefanov Leiden Embedded Research Center Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Outline Generic Embedded System
More informationEmbedded Systems. 7. System Components
Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic
More informationVertex Shader Design I
The following content is extracted from the paper shown in next page. If any wrong citation or reference missing, please contact ldvan@cs.nctu.edu.tw. I will correct the error asap. This course used only
More informationCoarse Grain Reconfigurable Arrays are Signal Processing Engines!
Coarse Grain Reconfigurable Arrays are Signal Processing Engines! Advanced Topics in Telecommunications, Algorithms and Implementation Platforms for Wireless Communications, TLT-9707 Waqar Hussain Researcher
More informationEvaluating MMX Technology Using DSP and Multimedia Applications
Evaluating MMX Technology Using DSP and Multimedia Applications Ravi Bhargava * Lizy K. John * Brian L. Evans Ramesh Radhakrishnan * November 22, 1999 The University of Texas at Austin Department of Electrical
More informationA Study on Systems Beyond IMT-2000 in Korea
A Study on Systems Beyond IMT-2000 in Korea May 28, 2002 Vice President Ki-Chul Han, Ph.D (kchan kchan@etri.re. @etri.re.kr kr) Mobile Telecommunication Research Laboratory Electronics and Telecommunciations
More informationExploring Logic Block Granularity for Regular Fabrics
1530-1591/04 $20.00 (c) 2004 IEEE Exploring Logic Block Granularity for Regular Fabrics A. Koorapaty, V. Kheterpal, P. Gopalakrishnan, M. Fu, L. Pileggi {aneeshk, vkheterp, pgopalak, mfu, pileggi}@ece.cmu.edu
More informationSystem-on-Chip Architecture for Mobile Applications. Sabyasachi Dey
System-on-Chip Architecture for Mobile Applications Sabyasachi Dey Email: sabyasachi.dey@gmail.com Agenda What is Mobile Application Platform Challenges Key Architecture Focus Areas Conclusion Mobile Revolution
More informationTwo-level Reconfigurable Architecture for High-Performance Signal Processing
International Conference on Engineering of Reconfigurable Systems and Algorithms, ERSA 04, pp. 177 183, Las Vegas, Nevada, June 2004. Two-level Reconfigurable Architecture for High-Performance Signal Processing
More informationDevelopment of Dependable Wireless System and Device
December 6, 2013 JST International Symposium on Dependable VLSI Systems 2013 Development of Dependable Wireless System and Device Research Director: Kazuo Tsubouchi, Tohoku University Members: Akira Matsuzawa,
More informationWIRELESS SENSOR NETWORK
1 WIRELESS SENSOR NETWORK Dr. H. K. Verma Distinguished Professor (EEE) Sharda University, Greater Noida (Formerly: Deputy Director and Professor of Instrumentation Indian Institute of Technology Roorkee)
More informationProject Proposals. 1 Project 1: On-chip Support for ILP, DLP, and TLP in an Imagine-like Stream Processor
EE482C: Advanced Computer Organization Lecture #12 Stream Processor Architecture Stanford University Tuesday, 14 May 2002 Project Proposals Lecture #12: Tuesday, 14 May 2002 Lecturer: Students of the class
More informationCOE 561 Digital System Design & Synthesis Introduction
1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design
More informationTitle: Using low-power dual-port for inter processor communication in next generation mobile handsets
Title: Using low-power dual-port for inter processor communication in next generation mobile handsets Abstract: The convergence of mobile phones and other consumer-driven devices such as PDAs, MP3 players,
More informationUpcoming Video Standards. Madhukar Budagavi, Ph.D. DSPS R&D Center, Dallas Texas Instruments Inc.
Upcoming Video Standards Madhukar Budagavi, Ph.D. DSPS R&D Center, Dallas Texas Instruments Inc. Outline Brief history of Video Coding standards Scalable Video Coding (SVC) standard Multiview Video Coding
More information{ rizwan.rasheed, aawatif.menouni eurecom.fr,
Reconfigurable Viterbi Decoder for Mobile Platform Rizwan RASHEED, Mobile Communications Department, Institut Eurecom, Sophia Antipolis, France Aawatif MENOUNI HAYAR, Mobile Communications Department,
More informationOrganic Computing. Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design
Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design 1 Reconfigurable Computing Platforms 2 The Von Neumann Computer Principle In 1945, the
More informationTowards 5G: Advancements from IoT to mmwave Communcations. Next Generation and Standards Princeton IEEE 5G Summit May 26, 2015
Towards 5G: Advancements from IoT to mmwave Communcations Next Generation and Standards Princeton IEEE 5G Summit May 26, 2015 5G requirements and challenges 1000x network capacity 10x higher data rate,
More informationLANCOM Techpaper IEEE n Indoor Performance
Introduction The standard IEEE 802.11n features a number of new mechanisms which significantly increase available bandwidths. The former wireless LAN standards based on 802.11a/g enable physical gross
More informationMassively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain
Massively Parallel Computing on Silicon: SIMD Implementations V.M.. Brea Univ. of Santiago de Compostela Spain GOAL Give an overview on the state-of of-the- art of Digital on-chip CMOS SIMD Solutions,
More informationExploiting ILP with SW Approaches. Aleksandar Milenković, Electrical and Computer Engineering University of Alabama in Huntsville
Lecture : Exploiting ILP with SW Approaches Aleksandar Milenković, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Basic Pipeline Scheduling and Loop
More informationSystolic Arrays for Reconfigurable DSP Systems
Systolic Arrays for Reconfigurable DSP Systems Rajashree Talatule Department of Electronics and Telecommunication G.H.Raisoni Institute of Engineering & Technology Nagpur, India Contact no.-7709731725
More informationA Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on
A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on on-chip Donghyun Kim, Kangmin Lee, Se-joong Lee and Hoi-Jun Yoo Semiconductor System Laboratory, Dept. of EECS, Korea Advanced
More informationVector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks
Vector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks Christos Kozyrakis Stanford University David Patterson U.C. Berkeley http://csl.stanford.edu/~christos Motivation Ideal processor
More informationSession: Configurable Systems. Tailored SoC building using reconfigurable IP blocks
IP 08 Session: Configurable Systems Tailored SoC building using reconfigurable IP blocks Lodewijk T. Smit, Gerard K. Rauwerda, Jochem H. Rutgers, Maciej Portalski and Reinier Kuipers Recore Systems www.recoresystems.com
More informationCurrent and Projected Digital Complexity of DMT VDSL
June 1, 1999 1 Standards Project: T1E1.4:99-268 VDSL Title: Current and Projected Digital Complexity of DMT VDSL Source: Texas Instruments Author: C. S. Modlin J. S. Chow Texas Instruments 2043 Samaritan
More informationLecture 5. Other Adder Issues
Lecture 5 Other Adder Issues Mark Horowitz Computer Systems Laboratory Stanford University horowitz@stanford.edu Copyright 24 by Mark Horowitz with information from Brucek Khailany 1 Overview Reading There
More informationBenchmarking Processors for DSP Applications
Insight, Analysis, and Advice on Signal Processing Technology Benchmarking Processors for DSP Applications Berkeley Design Technology, Inc. 2107 Dwight Way, Second Floor Berkeley, California 94704 USA
More informationDesigning Area and Performance Constrained SIMD/VLIW Image Processing Architectures
Designing Area and Performance Constrained SIMD/VLIW Image Processing Architectures Hamed Fatemi 1,2, Henk Corporaal 2, Twan Basten 2, Richard Kleihorst 3,and Pieter Jonker 4 1 h.fatemi@tue.nl 2 Eindhoven
More informationVerilog for High Performance
Verilog for High Performance Course Description This course provides all necessary theoretical and practical know-how to write synthesizable HDL code through Verilog standard language. The course goes
More informationFPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST
FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST SAKTHIVEL Assistant Professor, Department of ECE, Coimbatore Institute of Engineering and Technology Abstract- FPGA is
More informationA Streaming Multi-Threaded Model
A Streaming Multi-Threaded Model Extended Abstract Eylon Caspi, André DeHon, John Wawrzynek September 30, 2001 Summary. We present SCORE, a multi-threaded model that relies on streams to expose thread
More informationXPU A Programmable FPGA Accelerator for Diverse Workloads
XPU A Programmable FPGA Accelerator for Diverse Workloads Jian Ouyang, 1 (ouyangjian@baidu.com) Ephrem Wu, 2 Jing Wang, 1 Yupeng Li, 1 Hanlin Xie 1 1 Baidu, Inc. 2 Xilinx Outlines Background - FPGA for
More informationAR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors
AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors Computer Sciences Department University of Wisconsin Madison http://www.cs.wisc.edu/~ericro/ericro.html ericro@cs.wisc.edu High-Performance
More informationAdvance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts
Computer Architectures Advance CPU Design Tien-Fu Chen National Chung Cheng Univ. Adv CPU-0 MMX technology! Basic concepts " small native data types " compute-intensive operations " a lot of inherent parallelism
More information