Flexible wireless communication architectures

Size: px
Start display at page:

Download "Flexible wireless communication architectures"

Transcription

1 Flexible wireless communication architectures Sridhar Rajagopal Department of Electrical and Computer Engineering Rice University, Houston TX Faculty Candidate Seminar Southern Methodist University April 23, 2003 RICE UNIVERSITY This work has been supported in part by NSF, Nokia and Texas Instruments

2 Future wireless devices demand flexibility Wireless Cellular Wireless LAN Bluetooth/ Home Networks Multiple algorithms and environments supported in same device High data rate mobile devices with multimedia Flexible algorithms: Multiple antennas, complex signal processing Flexible architectures: High performance (Mbps), low power (mw) Fast design with structured exploration RICE UNIVERSITY2

3 Flexibility needed in different layers Application Layer Puppeteer project at Rice Network Layer MAC Layer Physical Layer Flexible Algorithms Mapping Flexible Architectures Analog RF RICE UNIVERSITY3

4 Research vision: Attain flexibility Design me Algorithms: Flexibility: support variety of sophisticated algorithms Architectures: Flexibility: adapts hardware to algorithms Fast, structured design exploration RICE UNIVERSITY4

5 Contributions: Algorithms Multi-user channel estimation:[jnl. Of VLSI Sig. Proc. 02, ASAP 00] Matrix-inversions Numerical techniques conjugate-gradient descent for complexity reduction Multi-user detection: [ISCAS 01] Block-based computation to streaming computations Pipelining, lower memory requirements Parallel, fixed-point, streaming VLSI implementations [IEEE Trans. Wireless Comm. 02] RICE UNIVERSITY5

6 Contributions: Architectures Heterogeneous DSP-FPGA system designs: [ICSPAT 00] Computer arithmetic:[symp. On Comp. Arith 01] Dynamic truncation in ASICs using on-line arithmetic with Most Significant Digit First computation [Ph.D. Thesis] Scalable Wireless Application-specific Processors (SWAPs) Rapid, structured architectures with flexibility-performance tradeoffs RICE UNIVERSITY6

7 Scalable Wireless Application-specific Processors Family of flexible programmable processors Clusters of ALUs High performance by supporting 100 s of ALUs Can provide customization for various algorithms Adapts ( swaps ) architecture dynamically for power Scale ALUs???? Scale Clusters RICE UNIVERSITY7

8 Rapid, structured design for SWAPs Low complexity, parallel, fixed point algorithms ASIC design DSP design apply apply? Architecture Exploration??? SWAPs RICE UNIVERSITY8

9 Research vision summary Provide a structured framework to rapidly explore: flexible, high performance, low power architectures (SWAPs) Efficient algorithm design for mapping to SWAPs Understanding of algorithms, DSPs and ASICs used Flexibility-performance trade-offs Inter-disciplinary research: Wireless communications, VLSI Signal Processing, Computer architecture, Computer arithmetic, Circuits, CAD, Compilers RICE UNIVERSITY9

10 Talk Outline Research vision SWAPs - Background Algorithm design for SWAPs Architecture design for SWAPs Current and Future Research Goals RICE UNIVERSITY10

11 SWAPs borrow from DSPs DSPs use : Instruction Level Parallelism (ILP) Subword Parallelism (MMX) Not enough ALUs for GOPs of computation-- Need 100 s TI C6x has 8 ALUs Why not more ALUs? Cannot support more registers (area,ports) Difficult to find ILP as ALUs increase 1 ALU RF Register File RICE UNIVERSITY11

12 SWAPs borrow from ASICs Exploit data parallelism (DP) Available in many wireless algorithms This is what ASICs do! int i,a[n],b[n],sum[n]; // 32 bits short int c[n],d[n],diff[n]; // 16 bits packed for (i = 0; i< 1024; i) { } sum[i] = a[i] b[i]; diff[i] = c[i] - d[i]; Subword ILP DP RICE UNIVERSITY12

13 SWAPs borrow from stream processors Kernels (computation) and streams (communication) Use local data in clusters providing GOPs support Imagine stream processor at Stanford [Rixner 01] Input Data Kernel Stream Output Data received signal Matched filter Interference Cancellation Viterbi decoding Decoded bits Correlator channel estimation RICE UNIVERSITY13 Scott Rixner. Stream Processor Architecture, Kluwer Academic Publishers: Boston, MA, 2001.

14 SWAPs are multi-cluster DSPs Memory: Stream Register File (SRF) ILP Internal Memory DSP (1 cluster) ILP DP SWAPs adapt clusters to DP Identical clusters, same operations. Power-down unused FUs, clusters RICE UNIVERSITY14

15 Arithmetic clusters in SWAPs From/To SRF Distributed Register Files (supports more ALUs) SRF Cross Point / / / / Scratchpad (indexed accesses) Intercluster Network Comm. Unit RICE UNIVERSITY15

16 Talk Outline Research vision SWAPs Background Algorithm design for SWAPs Architecture design for SWAPs Current and Future Research Goals RICE UNIVERSITY16

17 SWAPs: Physical layer algorithms Antenna Baseband processing RF Front-end Detection Channel estimation Decoding Higher (MAC/Network/ OS) Layers Complex signal processing algorithms with GOPs of computation RICE UNIVERSITY17

18 SWAP mapping example: Viterbi decoding Multiple antenna systems (MIMO systems) Complexity exponential with transmit x receive antennas Estimation: Linear MMSE, blind, conjugate gradient. Detection: FFT, (blind) interference cancellation. Decoding: Viterbi, Turbo, LDPC. & joint schemes SWAP flexibility lets you use the best algorithms for the situation Example for concept demonstration: Viterbi decoding RICE UNIVERSITY18

19 Parallel Viterbi Decoding for SWAPs Detected bits ACS Unit Traceback Unit Decoded bits Add-Compare-Select (ACS) : trellis interconnect : computations Parallelism depends on constraint length (#states) Traceback: searching Conventional Sequential (No DP) with dynamic branching Difficult to implement in parallel architecture Use Register Exchange (RE) parallel solution RICE UNIVERSITY19

20 Parallel Viterbi needs re-ordering for SWAPs ACS in SWAPs Regular ACS DP vector X(0) X(2) X(4) X(6) X(8) X(10) X(12) X(14) X(1) X(3) X(5) X(7) X(9) X(11) X(13) X(15) X(0) X(1) X(2) X(3) X(4) X(5) X(6) X(7) X(8) X(9) X(10) X(11) X(12) X(13) X(14) X(15) X(0) X(1) X(2) X(3) X(4) X(5) X(6) X(7) X(8) X(9) X(10) X(11) X(12) X(13) X(14) X(15) X(0) X(1) X(2) X(3) X(4) X(5) X(6) X(7) X(8) X(9) X(10) X(11) X(12) X(13) X(14) X(15) Exploiting Viterbi DP in SWAPs: Use RE instead of regular traceback Re-order ACS, RE RICE UNIVERSITY20

21 Talk Outline Research vision SWAP Background Algorithm design for SWAPs Architecture design for SWAPs Current and Future Research Goals RICE UNIVERSITY21

22 SWAP architecture design More clusters better than more ALUs/per cluster (if #clusters > 2) 1. Decide how many clusters Exploit DP ILP???? DP 2. Decide what to put within each cluster Maximize ILP with high functional unit efficiency Search design space with explore tool Time-power-area characterization RICE UNIVERSITY22

23 Design a SWAP cluster: Explore Auto-exploration of adders and multipliers for ACS" (80,34) (85,24) (Adder util%, Multiplier util%) 160 (85,17) 140 (85,11) (85,13) (70,59) Instruction count (72,22) (72,19) (61,22) 3 #Adders 4 (76,33) (60,26) (50,22) 5 (73,41) (61,33) (48,26) (39,22) 5 (65,45) (49,33) (39,27) 4 (62,62) (47,43) (40,32) 3 (54,59) (39,41) 2 #Multipliers (43,58) 1 RICE UNIVERSITY23

24 Explore tool benefits Instruction count vs. ALU efficiency What goes inside each cluster Design customized application-specific units Better performance with increased ALU utilization Explore multiple algorithms turn off functional units not in use for given kernel Vdd-gating, clock gating techniques RICE UNIVERSITY24

25 Example for SWAP architecture design Explore Algorithm 1 : 3 adders, 3 multipliers, 32 clusters DP ILP Explore Algorithm 2 : 4 adders, 1 multiplier, 64 clusters Explore Algorithm 3 : 2 adders, 2 multipliers, 64 clusters Explore Algorithm 4 : 2 adders, 2 multipliers, 16 clusters Chosen Architecture: 4 adders, 3 multipliers, 64 clusters RICE UNIVERSITY25

26 SWAP flexibility provides power savings Multiple algorithms Different ALU, cluster requirements Turning off ALUs ( add mul compiler options) Use the right #ALUs from explore tool Turning off clusters Data across SRF of all clusters Cluster only has access to its own SRF Next kernel may need data from SRF of other clusters Reconfiguration support needs to be provided RICE UNIVERSITY26

27 SWAPs provide cluster reconfiguration SRF LATCH LATCH LATCH LATCH MDX2 MDX2 MDX1 Mux-Demux Network With Stream buffers Clusters Additional latency (few cycles) due to microcontroller stalls - Minimal loss in performance RICE UNIVERSITY27

28 Cluster reconfiguration for Viterbi DP Can be turned OFF Packet 1 Constraint length 7 (16 clusters) Packet 2 Constraint length 9 (64 clusters) Packet 3 Constraint length 5 (4 clusters) RICE UNIVERSITY28

29 SWAPs provide flexibility at negligible overhead Clusters Memory 64-bit Rate ½ Packet 1 K = 7 Execution Time (cycles) Kernels (Computation) Packet 2 K = 9 No Data Memory accesses Packet 3 K = 5 RICE UNIVERSITY29

30 SWAP exploration for Viterbi decoding Frequency needed to attain real-time (in MHz) 1000 DSP Different SWAPs (Without reconfiguration) Max DP Number of clusters K = 9 K = 7 K = 5 Same SWAP (With reconfiguration) Ideal C64x (w/o co-proc) needs ~200 MHz for real-time RICE UNIVERSITY30

31 SWAPs : Salient features 1-2 orders of magnitude better than a DSP Any constraint length 10 MHz at 128 Kbps Same code for all constraint lengths no need to re-compile or load another code as long as parallelism/cluster ratio is constant Power savings due to dynamic cluster scaling RICE UNIVERSITY31

32 Expected SWAP power consumption Power model based on [Khailany 03] 64 clusters and 1 multiplier per cluster: 0.13 micron, 1.2 V Peak Active Power: ~9 mw at 1 MHz (DSP ~1 mw) Area: ~53.7 mm 2 10 MHz, 128 Kbps with reconfiguration Viterbi Clusters Used Peak Power K = 9 64 ~90 mw K = 7 16 ~28.57 mw K = 5 4 ~13.8 mw overhead 0 ~8.1 mw DSP, K = 9 1 ~200 mw Power (in mw) Active Clusters (max 64) RICE UNIVERSITY32 Exploring the VLSI Scalability of Stream Processors, Brucek Khailany et al, Proceedings of the Ninth Symposium on High Performance Computer Architecture, February 8-12, 2003

33 Multiuser Estimation-DetectionDecoding Real-time target : 128 Kbps per user Frequency needed to attain real-time (in MHz) DSP Number of clusters FAST MEDIUM SLOW 32-user base-station Mobile Fading scenarios Ideal C64x (w/o co-proc) needs ~15 GHz for real-time RICE UNIVERSITY33

34 Expected SWAP power : base-station 32 user base-station with 3 X s per cluster and 64 clusters: 0.13 micron, 1.2 V Peak Active Power: ~18.19 mw for 1 MHz (increased X) Area: ~93.4 mm 2 Total Peak Base-station power consumption: ~18.19 W at 1 GHz for 32 users at 128 Kbps/user RICE UNIVERSITY34

35 Talk Outline Research vision SWAP Background Algorithm design for SWAPs Architecture design for SWAPs Current and Future Research Goals RICE UNIVERSITY35

36 Current research: Flexibility vs. performance SWAPs: 128 Kbps at ~ mwfor Viterbi Borrow DP from ASICs! suitable for base-stations Flexibility more important than power suitable for mobile devices Power constraints tighter can be customized for further power savings Handset SWAPs (H-SWAPs) Borrow Task pipelining from ASICs! Application-specific units and specialized comm. network RICE UNIVERSITY36

37 37 RICE UNIVERSITY Handset SWAPs: H-SWAPs Trade Data Parallelism for Task Pipelining SRF DP SWAPs (max. clusters and reconfigure) Limited DP SWAPlet (limit clusters) Limited DP Limited DP Limited DP H-SWAPs (collection of customized SWAPlets)

38 Sample points in architecture exploration Programmable solutions with increased customization DSPs (1 cluster) SWAPs (multiple) H-SWAPs (optimized for handsets) ILP Subword ILP Subword DP ILP Subword DP Task Pipelining Custom ALUs Performance, Power benefits (with decreasing flexibility) RICE UNIVERSITY38

39 Future: Efficient algorithms and mapping Channel Estimator Non- Coherent STC Coherent STC Channel Equalizer MRC Detector Demodulator Decoder Beamforming Multipath Channel Turbo Equalizer Multiple antenna systems with 1-2 orders-of-magnitude higher complexity RICE UNIVERSITY39

40 Future research: Architectures Generalized and structured framework and tools Joint algorithm-architecture exploration Area-time-power-flexibility tradeoffs Potential applications: embedded systems Image and Video processing: Cameras : variety of compression algorithms Biomedical applications: Hearing aids: DSP running on body heat Sensor networks Compression of data before transmission Quote: Gene Frantz, TI Fellow RICE UNIVERSITY40

41 SWAPs: Flexibility, Performance, Power Need flexibility in future wireless devices Algorithms and Architectures Rapid Exploration for Scalable, Wireless Application-specific Processors Structured approach with flexibility-performance trade-offs SWAPs - flexibility, high performance and low power Exploit data parallelism like ASICs 1-2 orders better performance than DSPs Turn off unused clusters and unused ALUs for low power RICE UNIVERSITY41

High performance, power-efficient DSPs based on the TI C64x

High performance, power-efficient DSPs based on the TI C64x High performance, power-efficient DSPs based on the TI C64x Sridhar Rajagopal, Joseph R. Cavallaro, Scott Rixner Rice University {sridhar,cavallar,rixner}@rice.edu RICE UNIVERSITY Recent (2003) Research

More information

Design space exploration for real-time embedded stream processors

Design space exploration for real-time embedded stream processors Design space exploration for real-time embedded stream processors Sridhar Rajagopal, Joseph R. Cavallaro, and Scott Rixner Department of Electrical and Computer Engineering Rice University sridhar, cavallar,

More information

Reconfigurable stream processors for wireless base-stations

Reconfigurable stream processors for wireless base-stations Reconfigurable stream processors for wireless base-stations ridhar Rajagopal (sridhar@rice.edu) eptember 9, 2003, 3:25 AM Abstract The need to support evolving standards, rapid prototyping and fast time-to-market

More information

Reconfigurable Stream Processors for Wireless Base-stations

Reconfigurable Stream Processors for Wireless Base-stations Reconfigurable Stream Processors for Wireless Base-stations Sridhar Rajagopal Scott Rixner Joseph R. avallaro Rice University {sridhar,rixner,cavallar}@rice.edu Abstract This paper presents the design

More information

DATA-PARALLEL DIGITAL SIGNAL PROCESSORS: ALGORITHM MAPPING, ARCHITECTURE SCALING AND WORKLOAD ADAPTATION

DATA-PARALLEL DIGITAL SIGNAL PROCESSORS: ALGORITHM MAPPING, ARCHITECTURE SCALING AND WORKLOAD ADAPTATION DATA-PARALLEL DIGITAL SIGNAL PROCESSORS: ALGORITHM MAPPING, ARCHITECTURE SCALING AND WORKLOAD ADAPTATION Sridhar Rajagopal Thesis: Doctor of Philosophy Electrical and Computer Engineering Rice University,

More information

Designing Scalable Wireless Application-specific Processors

Designing Scalable Wireless Application-specific Processors Designing calable Wireless Application-specific Processors ridhar Rajagopal (sridhar@rice.edu) eptember 1, 2003, 10:52 PM Abstract This paper presents a structured way of designing and exploring scalable,

More information

Reconfigurable VLSI Communication Processor Architectures

Reconfigurable VLSI Communication Processor Architectures Reconfigurable VLSI Communication Processor Architectures Joseph R. Cavallaro Center for Multimedia Communication www.cmc.rice.edu Department of Electrical and Computer Engineering Rice University, Houston

More information

Improving Power Efficiency in Stream Processors Through Dynamic Cluster Reconfiguration

Improving Power Efficiency in Stream Processors Through Dynamic Cluster Reconfiguration Improving Power Efficiency in Stream Processors Through Dynamic luster Reconfiguration Sridhar Rajagopal WiQuest ommunications Allen, T 75 sridhar.rajagopal@wiquest.com Scott Rixner and Joseph R. avallaro

More information

Improving Power Efficiency in Stream Processors Through Dynamic Reconfiguration

Improving Power Efficiency in Stream Processors Through Dynamic Reconfiguration Improving Power Efficiency in Stream Processors Through Dynamic Reconfiguration June 5, 24 Abstract Stream processors support hundreds of functional units in a programmable architecture by clustering those

More information

A PROGRAMMABLE COMMUNICATIONS PROCESSOR FOR THIRD GENERATION WIRELESS COMMUNICATION SYSTEMS

A PROGRAMMABLE COMMUNICATIONS PROCESSOR FOR THIRD GENERATION WIRELESS COMMUNICATION SYSTEMS A PROGRAMMABLE COMMUNICATIONS PROCESSOR FOR THIRD GENERATION WIRELESS COMMUNICATION SYSTEMS Sridhar Rajagopal and Joseph R. Cavallaro Rice University Center for Multimedia Communication Department of Electrical

More information

Microprocessor Extensions for Wireless Communications

Microprocessor Extensions for Wireless Communications Microprocessor Extensions for Wireless Communications Sridhar Rajagopal and Joseph R. Cavallaro DRAFT REPORT Rice University Center for Multimedia Communication Department of Electrical and Computer Engineering

More information

Reconfigurable Architectures for Wireless Systems: Design Exploration and Integration Challenges

Reconfigurable Architectures for Wireless Systems: Design Exploration and Integration Challenges Reconfigurable Architectures for Wireless Systems: Design Exploration and Integration Challenges Joseph R. Cavallaro, Michael C. Brogioli, Alexandre de Baynast, and Predrag Radosavljevic, Center for Multimedia

More information

All MSEE students are required to take the following two core courses: Linear systems Probability and Random Processes

All MSEE students are required to take the following two core courses: Linear systems Probability and Random Processes MSEE Curriculum All MSEE students are required to take the following two core courses: 3531-571 Linear systems 3531-507 Probability and Random Processes The course requirements for students majoring in

More information

Leveraging Mobile GPUs for Flexible High-speed Wireless Communication

Leveraging Mobile GPUs for Flexible High-speed Wireless Communication 0 Leveraging Mobile GPUs for Flexible High-speed Wireless Communication Qi Zheng, Cao Gao, Trevor Mudge, Ronald Dreslinski *, Ann Arbor The 3 rd International Workshop on Parallelism in Mobile Platforms

More information

Design space exploration for real-time embedded stream processors

Design space exploration for real-time embedded stream processors Design space eploration for real-time embedded stream processors Sridhar Rajagopal, Joseph R. Cavallaro, and Scott Riner Department of Electrical and Computer Engineering Rice University sridhar, cavallar,

More information

A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing

A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing Dean Truong, Wayne Cheng, Tinoosh Mohsenin, Zhiyi Yu, Toney Jacobson, Gouri Landge, Michael Meeuwsen, Christine

More information

MPSOC 2011 BEAUNE, FRANCE

MPSOC 2011 BEAUNE, FRANCE MPSOC 2011 BEAUNE, FRANCE BOADRES: A SCALABLE BASEBAND PROCESSOR TEMPLATE FOR Gbps RADIOS VICE PRESIDENT, CHAIRMAN OF THE TECHNOLOGY OFFICE PROFESSOR AT THE KATHOLIEKE UNIVERSITEIT LEUVEN STATUS SDR BASEBAND

More information

IMAGINE: Signal and Image Processing Using Streams

IMAGINE: Signal and Image Processing Using Streams IMAGINE: Signal and Image Processing Using Streams Brucek Khailany William J. Dally, Scott Rixner, Ujval J. Kapasi, Peter Mattson, Jinyung Namkoong, John D. Owens, Brian Towles Concurrent VLSI Architecture

More information

Reconfigurable Cell Array for DSP Applications

Reconfigurable Cell Array for DSP Applications Outline econfigurable Cell Array for DSP Applications Chenxin Zhang Department of Electrical and Information Technology Lund University, Sweden econfigurable computing Coarse-grained reconfigurable cell

More information

A 167-processor 65 nm Computational Platform with Per-Processor Dynamic Supply Voltage and Dynamic Clock Frequency Scaling

A 167-processor 65 nm Computational Platform with Per-Processor Dynamic Supply Voltage and Dynamic Clock Frequency Scaling A 167-processor 65 nm Computational Platform with Per-Processor Dynamic Supply Voltage and Dynamic Clock Frequency Scaling Dean Truong, Wayne Cheng, Tinoosh Mohsenin, Zhiyi Yu, Toney Jacobson, Gouri Landge,

More information

ECE 747 Digital Signal Processing Architecture. DSP Implementation Architectures

ECE 747 Digital Signal Processing Architecture. DSP Implementation Architectures ECE 747 Digital Signal Processing Architecture DSP Implementation Architectures Spring 2006 W. Rhett Davis NC State University W. Rhett Davis NC State University ECE 406 Spring 2006 Slide 1 My Goal Challenge

More information

An introduction to DSP s. Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures

An introduction to DSP s. Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures An introduction to DSP s Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures DSP example: mobile phone DSP example: mobile phone with video camera DSP: applications Why a DSP?

More information

An Asynchronous Array of Simple Processors for DSP Applications

An Asynchronous Array of Simple Processors for DSP Applications An Asynchronous Array of Simple Processors for DSP Applications Zhiyi Yu, Michael Meeuwsen, Ryan Apperson, Omar Sattari, Michael Lai, Jeremy Webb, Eric Work, Tinoosh Mohsenin, Mandeep Singh, Bevan Baas

More information

Crash Course in Wireless Video

Crash Course in Wireless Video Lifemote April 24, 2018 Ludwig Wittgenstein The context in which words are used, the intent with which they are uttered, determines their meaning. Successful communication is guessing which game the speaker

More information

VLSI Signal Processing

VLSI Signal Processing VLSI Signal Processing Programmable DSP Architectures Chih-Wei Liu VLSI Signal Processing Lab Department of Electronics Engineering National Chiao Tung University Outline DSP Arithmetic Stream Interface

More information

Implementation of a Dual-Mode SDR Smart Antenna Base Station Supporting WiBro and TDD HSDPA

Implementation of a Dual-Mode SDR Smart Antenna Base Station Supporting WiBro and TDD HSDPA Implementation of a Dual-Mode SDR Smart Antenna Base Station Supporting WiBro and TDD HSDPA Jongeun Kim, Sukhwan Mun, Taeyeol Oh,Yusuk Yun, Seungwon Choi 1 HY-SDR Research Center, Hanyang University, Seoul,

More information

Linköping University Post Print. epuma: a novel embedded parallel DSP platform for predictable computing

Linköping University Post Print. epuma: a novel embedded parallel DSP platform for predictable computing Linköping University Post Print epuma: a novel embedded parallel DSP platform for predictable computing Jian Wang, Joar Sohl, Olof Kraigher and Dake Liu N.B.: When citing this work, cite the original article.

More information

Embedded Computation

Embedded Computation Embedded Computation What is an Embedded Processor? Any device that includes a programmable computer, but is not itself a general-purpose computer [W. Wolf, 2000]. Commonly found in cell phones, automobiles,

More information

asoc: : A Scalable On-Chip Communication Architecture

asoc: : A Scalable On-Chip Communication Architecture asoc: : A Scalable On-Chip Communication Architecture Russell Tessier, Jian Liang,, Andrew Laffely,, and Wayne Burleson University of Massachusetts, Amherst Reconfigurable Computing Group Supported by

More information

OpenRadio. A programmable wireless dataplane. Manu Bansal Stanford University. Joint work with Jeff Mehlman, Sachin Katti, Phil Levis

OpenRadio. A programmable wireless dataplane. Manu Bansal Stanford University. Joint work with Jeff Mehlman, Sachin Katti, Phil Levis OpenRadio A programmable wireless dataplane Manu Bansal Stanford University Joint work with Jeff Mehlman, Sachin Katti, Phil Levis HotSDN 12, August 13, 2012, Helsinki, Finland 2 Opening up the radio Why?

More information

Benchmarking Multithreaded, Multicore and Reconfigurable Processors

Benchmarking Multithreaded, Multicore and Reconfigurable Processors Insight, Analysis, and Advice on Signal Processing Technology Benchmarking Multithreaded, Multicore and Reconfigurable Processors Berkeley Design Technology, Inc. 2107 Dwight Way, Second Floor Berkeley,

More information

ECE 637 Integrated VLSI Circuits. Introduction. Introduction EE141

ECE 637 Integrated VLSI Circuits. Introduction. Introduction EE141 ECE 637 Integrated VLSI Circuits Introduction EE141 1 Introduction Course Details Instructor Mohab Anis; manis@vlsi.uwaterloo.ca Text Digital Integrated Circuits, Jan Rabaey, Prentice Hall, 2 nd edition

More information

CAD for VLSI. Debdeep Mukhopadhyay IIT Madras

CAD for VLSI. Debdeep Mukhopadhyay IIT Madras CAD for VLSI Debdeep Mukhopadhyay IIT Madras Tentative Syllabus Overall perspective of VLSI Design MOS switch and CMOS, MOS based logic design, the CMOS logic styles, Pass Transistors Introduction to Verilog

More information

Low-power Architecture. By: Jonathan Herbst Scott Duntley

Low-power Architecture. By: Jonathan Herbst Scott Duntley Low-power Architecture By: Jonathan Herbst Scott Duntley Why low power? Has become necessary with new-age demands: o Increasing design complexity o Demands of and for portable equipment Communication Media

More information

Data Parallel Architectures

Data Parallel Architectures EE392C: Advanced Topics in Computer Architecture Lecture #2 Chip Multiprocessors and Polymorphic Processors Thursday, April 3 rd, 2003 Data Parallel Architectures Lecture #2: Thursday, April 3 rd, 2003

More information

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips Overview CSE372 Digital Systems Organization and Design Lab Prof. Milo Martin Unit 5: Hardware Synthesis CAD (Computer Aided Design) Use computers to design computers Virtuous cycle Architectural-level,

More information

Classification of Semiconductor LSI

Classification of Semiconductor LSI Classification of Semiconductor LSI 1. Logic LSI: ASIC: Application Specific LSI (you have to develop. HIGH COST!) For only mass production. ASSP: Application Specific Standard Product (you can buy. Low

More information

AN FFT PROCESSOR BASED ON 16-POINT MODULE

AN FFT PROCESSOR BASED ON 16-POINT MODULE AN FFT PROCESSOR BASED ON 6-POINT MODULE Weidong Li, Mark Vesterbacka and Lars Wanhammar Electronics Systems, Dept. of EE., Linköping University SE-58 8 LINKÖPING, SWEDEN E-mail: {weidongl, markv, larsw}@isy.liu.se,

More information

The Implementation and Analysis of Important Symmetric Ciphers on Stream Processor

The Implementation and Analysis of Important Symmetric Ciphers on Stream Processor 2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore The Implementation and Analysis of Important Symmetric Ciphers on Stream Processor

More information

Video-Aware Wireless Networks (VAWN) Final Meeting January 23, 2014

Video-Aware Wireless Networks (VAWN) Final Meeting January 23, 2014 Video-Aware Wireless Networks (VAWN) Final Meeting January 23, 2014 1/26 ! Real-time Video Transmission! Challenges and Opportunities! Lessons Learned for Real-time Video! Mitigating Losses in Scalable

More information

Processor Applications. The Processor Design Space. World s Cellular Subscribers. Nov. 12, 1997 Bob Brodersen (http://infopad.eecs.berkeley.

Processor Applications. The Processor Design Space. World s Cellular Subscribers. Nov. 12, 1997 Bob Brodersen (http://infopad.eecs.berkeley. Processor Applications CS 152 Computer Architecture and Engineering Introduction to Architectures for Digital Signal Processing Nov. 12, 1997 Bob Brodersen (http://infopad.eecs.berkeley.edu) 1 General

More information

Delay Time Analysis of Reconfigurable. Firewall Unit

Delay Time Analysis of Reconfigurable. Firewall Unit Delay Time Analysis of Reconfigurable Unit Tomoaki SATO C&C Systems Center, Hirosaki University Hirosaki 036-8561 Japan Phichet MOUNGNOUL Faculty of Engineering, King Mongkut's Institute of Technology

More information

IMAGINE: MEDIA PROCESSING

IMAGINE: MEDIA PROCESSING IMAGINE: MEDIA PROCESSING WITH STREAMS THE POWER-EFFICIENT IMAGINE STREAM PROCESSOR ACHIEVES PERFORMANCE DENSITIES COMPARABLE TO THOSE OF SPECIAL-PURPOSE EMBEDDED PROCESSORS. EXECUTING PROGRAMS MAPPED

More information

Software Defined Modem A commercial platform for wireless handsets

Software Defined Modem A commercial platform for wireless handsets Software Defined Modem A commercial platform for wireless handsets Charles F Sturman VP Marketing June 22 nd ~ 24 th Brussels charles.stuman@cognovo.com www.cognovo.com Agenda SDM Separating hardware from

More information

Communication Processors

Communication Processors Communication Processors Sridhar Rajagopal WiQuest Communications, Inc. sridhar.rajagopal@wiquest.com and Joseph R. Cavallaro Department of Electrical and Computer Engineering Rice University cavallar@rice.edu

More information

An Ultra Low-Power WOLA Filterbank Implementation in Deep Submicron Technology

An Ultra Low-Power WOLA Filterbank Implementation in Deep Submicron Technology An Ultra ow-power WOA Filterbank Implementation in Deep Submicron Technology R. Brennan, T. Schneider Dspfactory td 611 Kumpf Drive, Unit 2 Waterloo, Ontario, Canada N2V 1K8 Abstract The availability of

More information

Graph-based Framework for Flexible Baseband Function Splitting and Placement in C-RAN

Graph-based Framework for Flexible Baseband Function Splitting and Placement in C-RAN Graph-based Framework for Flexible Baseband Function Splitting and Placement in C-RAN Group Meeting Presentation (Paper Review) J. Liu, Graph-based framework for flexible baseband function splitting and

More information

The Imagine Stream Processor

The Imagine Stream Processor The Imagine Stream Processor Ujval J. Kapasi, William J. Dally, Scott Rixner, John D. Owens, and Brucek Khailany Computer Systems Laboratory Computer Systems Laboratory Stanford University, Stanford, CA

More information

Abstract A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE

Abstract A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE Reiner W. Hartenstein, Rainer Kress, Helmut Reinig University of Kaiserslautern Erwin-Schrödinger-Straße, D-67663 Kaiserslautern, Germany

More information

MPSoC Design Space Exploration Framework

MPSoC Design Space Exploration Framework MPSoC Design Space Exploration Framework Gerd Ascheid RWTH Aachen University, Germany Outline Motivation: MPSoC requirements in wireless and multimedia MPSoC design space exploration framework Summary

More information

A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors

A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors Brent Bohnenstiehl and Bevan Baas Department of Electrical and Computer Engineering University of California, Davis {bvbohnen,

More information

EE482C, L1, Apr 4, 2002 Copyright (C) by William J. Dally, All Rights Reserved. Today s Class Meeting. EE482S Lecture 1 Stream Processor Architecture

EE482C, L1, Apr 4, 2002 Copyright (C) by William J. Dally, All Rights Reserved. Today s Class Meeting. EE482S Lecture 1 Stream Processor Architecture 1 Today s Class Meeting EE482S Lecture 1 Stream Processor Architecture April 4, 2002 William J Dally Computer Systems Laboratory Stanford University billd@cslstanfordedu What is EE482C? Material covered

More information

Fixed Point Streaming Fft Processor For Ofdm

Fixed Point Streaming Fft Processor For Ofdm Fixed Point Streaming Fft Processor For Ofdm Sudhir Kumar Sa Rashmi Panda Aradhana Raju Abstract Fast Fourier Transform (FFT) processors are today one of the most important blocks in communication systems.

More information

Architectural Support for Reducing Parallel Processing Overhead in an Embedded Multiprocessor

Architectural Support for Reducing Parallel Processing Overhead in an Embedded Multiprocessor 2010 IEEE/IFIP International Conference on Embedded and Ubiquitous Computing Architectural Support for Reducing Parallel Processing Overhead in an Embedded Multiprocessor Jian Wang, Joar Sohl and Dake

More information

Lecture 41: Introduction to Reconfigurable Computing

Lecture 41: Introduction to Reconfigurable Computing inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 41: Introduction to Reconfigurable Computing Michael Le, Sp07 Head TA April 30, 2007 Slides Courtesy of Hayden So, Sp06 CS61c Head TA Following

More information

Stanford University Computer Systems Laboratory. Stream Scheduling. Ujval J. Kapasi, Peter Mattson, William J. Dally, John D. Owens, Brian Towles

Stanford University Computer Systems Laboratory. Stream Scheduling. Ujval J. Kapasi, Peter Mattson, William J. Dally, John D. Owens, Brian Towles Stanford University Concurrent VLSI Architecture Memo 122 Stanford University Computer Systems Laboratory Stream Scheduling Ujval J. Kapasi, Peter Mattson, William J. Dally, John D. Owens, Brian Towles

More information

AnySP: Anytime Anywhere Anyway Signal Processing

AnySP: Anytime Anywhere Anyway Signal Processing 1 AnySP: Anytime Anywhere Anyway Signal Processing Mark Woh 1, Sangwon Seo 1, Scott Mahlke 1,Trevor Mudge 1, Chaitali Chakrabarti 2, Krisztian Flautner 3 University of Michigan ACAL 1 Arizona State University

More information

A Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms

A Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms A Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms Jingzhao Ou and Viktor K. Prasanna Department of Electrical Engineering, University of Southern California Los Angeles, California,

More information

EECS150 - Digital Design Lecture 09 - Parallelism

EECS150 - Digital Design Lecture 09 - Parallelism EECS150 - Digital Design Lecture 09 - Parallelism Feb 19, 2013 John Wawrzynek Spring 2013 EECS150 - Lec09-parallel Page 1 Parallelism Parallelism is the act of doing more than one thing at a time. Optimization

More information

Abstract of the Book

Abstract of the Book Book Keywords IEEE 802.16, IEEE 802.16m, mobile WiMAX, 4G, IMT-Advanced, 3GPP LTE, 3GPP LTE-Advanced, Broadband Wireless, Wireless Communications, Cellular Systems, Network Architecture Abstract of the

More information

Lecture 20: High-level Synthesis (1)

Lecture 20: High-level Synthesis (1) Lecture 20: High-level Synthesis (1) Slides courtesy of Deming Chen Some slides are from Prof. S. Levitan of U. of Pittsburgh Outline High-level synthesis introduction High-level synthesis operations Scheduling

More information

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Chi Zhang, Viktor K Prasanna University of Southern California {zhan527, prasanna}@usc.edu fpga.usc.edu ACM

More information

High-performance and Low-power Consumption Vector Processor for LTE Baseband LSI

High-performance and Low-power Consumption Vector Processor for LTE Baseband LSI High-performance and Low-power Consumption Vector Processor for LTE Baseband LSI Yi Ge Mitsuru Tomono Makiko Ito Yoshio Hirose Recently, the transmission rate for handheld devices has been increasing by

More information

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141 EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 14 EE141 Outline Parallelism EE141 2 Parallelism Parallelism is the act of doing more

More information

EE482S Lecture 1 Stream Processor Architecture

EE482S Lecture 1 Stream Processor Architecture EE482S Lecture 1 Stream Processor Architecture April 4, 2002 William J. Dally Computer Systems Laboratory Stanford University billd@csl.stanford.edu 1 Today s Class Meeting What is EE482C? Material covered

More information

Embedded Systems: Hardware Components (part I) Todor Stefanov

Embedded Systems: Hardware Components (part I) Todor Stefanov Embedded Systems: Hardware Components (part I) Todor Stefanov Leiden Embedded Research Center Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Outline Generic Embedded System

More information

Embedded Systems. 7. System Components

Embedded Systems. 7. System Components Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic

More information

Vertex Shader Design I

Vertex Shader Design I The following content is extracted from the paper shown in next page. If any wrong citation or reference missing, please contact ldvan@cs.nctu.edu.tw. I will correct the error asap. This course used only

More information

Coarse Grain Reconfigurable Arrays are Signal Processing Engines!

Coarse Grain Reconfigurable Arrays are Signal Processing Engines! Coarse Grain Reconfigurable Arrays are Signal Processing Engines! Advanced Topics in Telecommunications, Algorithms and Implementation Platforms for Wireless Communications, TLT-9707 Waqar Hussain Researcher

More information

Evaluating MMX Technology Using DSP and Multimedia Applications

Evaluating MMX Technology Using DSP and Multimedia Applications Evaluating MMX Technology Using DSP and Multimedia Applications Ravi Bhargava * Lizy K. John * Brian L. Evans Ramesh Radhakrishnan * November 22, 1999 The University of Texas at Austin Department of Electrical

More information

A Study on Systems Beyond IMT-2000 in Korea

A Study on Systems Beyond IMT-2000 in Korea A Study on Systems Beyond IMT-2000 in Korea May 28, 2002 Vice President Ki-Chul Han, Ph.D (kchan kchan@etri.re. @etri.re.kr kr) Mobile Telecommunication Research Laboratory Electronics and Telecommunciations

More information

Exploring Logic Block Granularity for Regular Fabrics

Exploring Logic Block Granularity for Regular Fabrics 1530-1591/04 $20.00 (c) 2004 IEEE Exploring Logic Block Granularity for Regular Fabrics A. Koorapaty, V. Kheterpal, P. Gopalakrishnan, M. Fu, L. Pileggi {aneeshk, vkheterp, pgopalak, mfu, pileggi}@ece.cmu.edu

More information

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey System-on-Chip Architecture for Mobile Applications Sabyasachi Dey Email: sabyasachi.dey@gmail.com Agenda What is Mobile Application Platform Challenges Key Architecture Focus Areas Conclusion Mobile Revolution

More information

Two-level Reconfigurable Architecture for High-Performance Signal Processing

Two-level Reconfigurable Architecture for High-Performance Signal Processing International Conference on Engineering of Reconfigurable Systems and Algorithms, ERSA 04, pp. 177 183, Las Vegas, Nevada, June 2004. Two-level Reconfigurable Architecture for High-Performance Signal Processing

More information

Development of Dependable Wireless System and Device

Development of Dependable Wireless System and Device December 6, 2013 JST International Symposium on Dependable VLSI Systems 2013 Development of Dependable Wireless System and Device Research Director: Kazuo Tsubouchi, Tohoku University Members: Akira Matsuzawa,

More information

WIRELESS SENSOR NETWORK

WIRELESS SENSOR NETWORK 1 WIRELESS SENSOR NETWORK Dr. H. K. Verma Distinguished Professor (EEE) Sharda University, Greater Noida (Formerly: Deputy Director and Professor of Instrumentation Indian Institute of Technology Roorkee)

More information

Project Proposals. 1 Project 1: On-chip Support for ILP, DLP, and TLP in an Imagine-like Stream Processor

Project Proposals. 1 Project 1: On-chip Support for ILP, DLP, and TLP in an Imagine-like Stream Processor EE482C: Advanced Computer Organization Lecture #12 Stream Processor Architecture Stanford University Tuesday, 14 May 2002 Project Proposals Lecture #12: Tuesday, 14 May 2002 Lecturer: Students of the class

More information

COE 561 Digital System Design & Synthesis Introduction

COE 561 Digital System Design & Synthesis Introduction 1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design

More information

Title: Using low-power dual-port for inter processor communication in next generation mobile handsets

Title: Using low-power dual-port for inter processor communication in next generation mobile handsets Title: Using low-power dual-port for inter processor communication in next generation mobile handsets Abstract: The convergence of mobile phones and other consumer-driven devices such as PDAs, MP3 players,

More information

Upcoming Video Standards. Madhukar Budagavi, Ph.D. DSPS R&D Center, Dallas Texas Instruments Inc.

Upcoming Video Standards. Madhukar Budagavi, Ph.D. DSPS R&D Center, Dallas Texas Instruments Inc. Upcoming Video Standards Madhukar Budagavi, Ph.D. DSPS R&D Center, Dallas Texas Instruments Inc. Outline Brief history of Video Coding standards Scalable Video Coding (SVC) standard Multiview Video Coding

More information

{ rizwan.rasheed, aawatif.menouni eurecom.fr,

{ rizwan.rasheed, aawatif.menouni eurecom.fr, Reconfigurable Viterbi Decoder for Mobile Platform Rizwan RASHEED, Mobile Communications Department, Institut Eurecom, Sophia Antipolis, France Aawatif MENOUNI HAYAR, Mobile Communications Department,

More information

Organic Computing. Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design

Organic Computing. Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design 1 Reconfigurable Computing Platforms 2 The Von Neumann Computer Principle In 1945, the

More information

Towards 5G: Advancements from IoT to mmwave Communcations. Next Generation and Standards Princeton IEEE 5G Summit May 26, 2015

Towards 5G: Advancements from IoT to mmwave Communcations. Next Generation and Standards Princeton IEEE 5G Summit May 26, 2015 Towards 5G: Advancements from IoT to mmwave Communcations Next Generation and Standards Princeton IEEE 5G Summit May 26, 2015 5G requirements and challenges 1000x network capacity 10x higher data rate,

More information

LANCOM Techpaper IEEE n Indoor Performance

LANCOM Techpaper IEEE n Indoor Performance Introduction The standard IEEE 802.11n features a number of new mechanisms which significantly increase available bandwidths. The former wireless LAN standards based on 802.11a/g enable physical gross

More information

Massively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain

Massively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain Massively Parallel Computing on Silicon: SIMD Implementations V.M.. Brea Univ. of Santiago de Compostela Spain GOAL Give an overview on the state-of of-the- art of Digital on-chip CMOS SIMD Solutions,

More information

Exploiting ILP with SW Approaches. Aleksandar Milenković, Electrical and Computer Engineering University of Alabama in Huntsville

Exploiting ILP with SW Approaches. Aleksandar Milenković, Electrical and Computer Engineering University of Alabama in Huntsville Lecture : Exploiting ILP with SW Approaches Aleksandar Milenković, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Outline Basic Pipeline Scheduling and Loop

More information

Systolic Arrays for Reconfigurable DSP Systems

Systolic Arrays for Reconfigurable DSP Systems Systolic Arrays for Reconfigurable DSP Systems Rajashree Talatule Department of Electronics and Telecommunication G.H.Raisoni Institute of Engineering & Technology Nagpur, India Contact no.-7709731725

More information

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on on-chip Donghyun Kim, Kangmin Lee, Se-joong Lee and Hoi-Jun Yoo Semiconductor System Laboratory, Dept. of EECS, Korea Advanced

More information

Vector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks

Vector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks Vector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks Christos Kozyrakis Stanford University David Patterson U.C. Berkeley http://csl.stanford.edu/~christos Motivation Ideal processor

More information

Session: Configurable Systems. Tailored SoC building using reconfigurable IP blocks

Session: Configurable Systems. Tailored SoC building using reconfigurable IP blocks IP 08 Session: Configurable Systems Tailored SoC building using reconfigurable IP blocks Lodewijk T. Smit, Gerard K. Rauwerda, Jochem H. Rutgers, Maciej Portalski and Reinier Kuipers Recore Systems www.recoresystems.com

More information

Current and Projected Digital Complexity of DMT VDSL

Current and Projected Digital Complexity of DMT VDSL June 1, 1999 1 Standards Project: T1E1.4:99-268 VDSL Title: Current and Projected Digital Complexity of DMT VDSL Source: Texas Instruments Author: C. S. Modlin J. S. Chow Texas Instruments 2043 Samaritan

More information

Lecture 5. Other Adder Issues

Lecture 5. Other Adder Issues Lecture 5 Other Adder Issues Mark Horowitz Computer Systems Laboratory Stanford University horowitz@stanford.edu Copyright 24 by Mark Horowitz with information from Brucek Khailany 1 Overview Reading There

More information

Benchmarking Processors for DSP Applications

Benchmarking Processors for DSP Applications Insight, Analysis, and Advice on Signal Processing Technology Benchmarking Processors for DSP Applications Berkeley Design Technology, Inc. 2107 Dwight Way, Second Floor Berkeley, California 94704 USA

More information

Designing Area and Performance Constrained SIMD/VLIW Image Processing Architectures

Designing Area and Performance Constrained SIMD/VLIW Image Processing Architectures Designing Area and Performance Constrained SIMD/VLIW Image Processing Architectures Hamed Fatemi 1,2, Henk Corporaal 2, Twan Basten 2, Richard Kleihorst 3,and Pieter Jonker 4 1 h.fatemi@tue.nl 2 Eindhoven

More information

Verilog for High Performance

Verilog for High Performance Verilog for High Performance Course Description This course provides all necessary theoretical and practical know-how to write synthesizable HDL code through Verilog standard language. The course goes

More information

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST SAKTHIVEL Assistant Professor, Department of ECE, Coimbatore Institute of Engineering and Technology Abstract- FPGA is

More information

A Streaming Multi-Threaded Model

A Streaming Multi-Threaded Model A Streaming Multi-Threaded Model Extended Abstract Eylon Caspi, André DeHon, John Wawrzynek September 30, 2001 Summary. We present SCORE, a multi-threaded model that relies on streams to expose thread

More information

XPU A Programmable FPGA Accelerator for Diverse Workloads

XPU A Programmable FPGA Accelerator for Diverse Workloads XPU A Programmable FPGA Accelerator for Diverse Workloads Jian Ouyang, 1 (ouyangjian@baidu.com) Ephrem Wu, 2 Jing Wang, 1 Yupeng Li, 1 Hanlin Xie 1 1 Baidu, Inc. 2 Xilinx Outlines Background - FPGA for

More information

AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors

AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors Computer Sciences Department University of Wisconsin Madison http://www.cs.wisc.edu/~ericro/ericro.html ericro@cs.wisc.edu High-Performance

More information

Advance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts

Advance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts Computer Architectures Advance CPU Design Tien-Fu Chen National Chung Cheng Univ. Adv CPU-0 MMX technology! Basic concepts " small native data types " compute-intensive operations " a lot of inherent parallelism

More information