Leveraging Mobile GPUs for Flexible High-speed Wireless Communication

Size: px
Start display at page:

Download "Leveraging Mobile GPUs for Flexible High-speed Wireless Communication"

Transcription

1 0 Leveraging Mobile GPUs for Flexible High-speed Wireless Communication Qi Zheng, Cao Gao, Trevor Mudge, Ronald Dreslinski *, Ann Arbor The 3 rd International Workshop on Parallelism in Mobile Platforms (PRISM-3) Jun 14,

2 Popular Mobile Device & Applications 1 1 1

3 Fast Technology Evolution 2 1.0E E E E E

4 Support of Multiple Standards 3 3 3

5 Wireless Processor in Mobile Devices 4! Hardware accelerator based processors! ASIC! DSP with many ASICs! FPGA customized as ASICs " High performance " High energy efficiency Long time to market Hard and costly to maintain compatibility Software-defined Radio! 4 4

6 Software-defined Radio (SDR) 5! Wireless system is implemented entirely in software! General-purpose programming language " Very short time to market Only implement new software for new technologies " Easy to maintain compatibility Different programs for different standards? Performance? Energy efficiency 5 5

7 Processor for SDR GPU! 6! High computing throughput! High energy efficiency! Explore DLP and TLP! Good programmability 6 6

8 Graphics Processing Unit (GPU)! High throughput:! GOPS-level peak throughput! SIMT execution model! Make use of abundant DLP and TLP! Good programming support! CUDA/OpenCL 7 Desktop GPU Power 100W Mobile GPU Power ~ 1W 7 7

9 Our Contribution 8! Parallel implementations of LTE baseband kernels on a mobile GPGPU! Performance of running LTE baseband! Meet the specification peak data rate for physical layer kernels! Get close to 10Mpbs typical data rate with for the Turbo decoder! Meet 20ms latency budget! Power and energy efficiency of a mobile GPGPU! High power compared with application-specific processors! Still in an acceptable range! Implications of GPU architecture to further reduce the power 8 8

10 Outline 9! Motivation! LTE baseband kernels! Running LTE baseband on a mobile GPGPU! Implications for future mobile GPGPUs! Conclusion 9 9

11 LTE Baseband Downlink Receiver 10! Downlink Receiver has the most computations in a mobile device Radio PHY Layer kernel Antenna MAC Layer kernel RF# OFDM# Demodula.on# Channel# Es.mate# MIMO## Detect# Turbo# Decoder# Rate# Match# Descrambling# Modula.on# Demapper# 10 10

12 Parallelism in Each Kernel 11! The PHY layer kernels! Antenna-level! Symbol-level! Subcarrier-level! Algorithm-level! The Turbo Decoder! Trellis-level! Subblock-level! For all kernels! Batch multiple packets 11 11

13 Outline 12! Motivation! LTE baseband kernels! Running LTE baseband on a mobile GPGPU! Methodology! PHY layer! Turbo decoder! Power and energy efficiency! Implications for future mobile GPGPUs! Conclusion 12 12

14 Experimental Setup! 13 Jetson TK1 Board! NVIDIA 4-Plus-1 quad-core ARM Cortex-A15 CPU! NVIDIA Kepler GK20a GPU! 192 CUDA cores! 852MHz! 256 KB RF / 64 KB L1 memory /!! 128 KB L2 cache 2G DDR3L RAM (shared by CPU and GPU) Wattsup.net Power Meter! 33% board power consumed by SoC 13 13

15 Outline 14! Motivation! LTE baseband kernels! Running LTE baseband on a mobile GPGPU! Methodology! PHY layer! Turbo decoder! Power and energy efficiency! Implications for future mobile GPGPUs! Conclusion 14 14

16 PHY Layer Throughput 15 Achieved Throughputs (Mbps) The number of packets 1x1+QPSK 2x2+QPSK 1x1+16QAM 2x2+16QAM Packet batching increases throughput Saturate at

17 Compare with Specification Peak 16 Throughput#Normalized#to# Specifica>on#Peak# 7.0# 6.0# 5.0# 4.0# 3.0# 2.0# 1.0# 0.0# 0# 1# 2# 3# 4# 5# 6# 7# The#number#of#packets# 1x1+QPSK# 2x2+QPSK# 1x1+16QAM# 2x2+16QAM# Fulfill Overall, but 2 packets one specification -- a good peak batching data size rates 16 16

18 Compare with 10Mbps Typical Rate 17 Throughput#Normalized#to# Average/Typical#User#Rate# 18.0# 16.0# 14.0# 12.0# 10.0# 8.0# 6.0# 4.0# 2.0# 0.0# 0# 1# 2# 3# 4# 5# 6# 7# 8# 9# The#number#of#packets# 1x1+QPSK# 2x2+QPSK# 1x1+16QAM# 2x2+16QAM# Fulfill all typical user data rates 17 17

19 PHY Layer Latency 18 Worst Case Latency (ms) The number of packets 1x1+QPSK 2x2+QPSK 1x1+16QAM 2x2+16QAM Packet batching also increases latency Packet batching also increases latency, can batch 8 packets given 20ms budgets 18 18

20 Outline 19! Motivation! LTE baseband kernels! Running LTE baseband on a mobile GPGPU! Methodology! PHY layer! Turbo decoder! Power and energy efficiency! Implications for future mobile GPGPUs! Conclusion 19 19

21 Turbo Throughput 20 Turbo"Decoding" Throughput"(Mbps)" 10" 8" 6" 4" 2" 0" 0" 2" 4" 6" 8" 10" 12" 14" 16" 18" The"number"of"packets" #Subblock=16" #Subblock=32" #Subblock=64" Throughput increases with #subblocks an batching sizes 32 subblocks + 2 packets: 8.4 Mpbs throughput, 4.8 ms latency <<peak rate, but ~10Mbp typical rate *Qi Zheng, et al, Parallelization Techniques for Implementing Trellis Algorithms on Graphics Processors, ISCAS 2013, Beijing 20 20

22 Outline 21! Motivation! LTE baseband kernels! Running LTE baseband on a mobile GPGPU! Methodology! PHY layer! Turbo decoder! Power and energy efficiency! Implications for future mobile GPGPUs! Conclusion 21 21

23 Energy Efficiency 22 Energy Efficiency (Mb/J) GPU Frequency (MHz) Higher frequency achieves better energy efficiency Run to finish 22 22

24 Power Consumption 23 Processor Standards Details ASIC (LeoCore) FPGA (MAGALI) DSP (Tomahawk) CPU (SoftLTE) WiMAX LTE LTE, WiMAX LTE 70MHz Mobile GPU LTE 23 23

25 Outline 24! Motivation! LTE baseband kernels! Running LTE baseband on a mobile GPGPU! Implications for future mobile GPGPUs! Conclusion 24 24

26 Reduce Energy from Unused Register! Register file is underutilized running PHY layers! Due to max_thread/sm and max_thread_block/sm! Power gate unused registers Normalized#RF#U:liza:on# 100%# 90%# 80%# 70%# 60%# 50%# 40%# 30%# 20%# 10%# 0%# 1# 2# 4# 8# 16# The#number#of#packets# 25 1x1+QPSK# 2x2+QPSK# 1x1+16QAM# 2x2+16QAM# 25 25

27 GPU Support for Trellis Algorithm 26! Trellis Algorithm! Turbo algorithm! Viterbi algorithm! Baum-Welch algorithm! Coding theory! Speech recognition! Data compression! Architecture and ISA support! Similar to AES instruction set in Intel CPU 26 26

28 Outline 27! Motivation! LTE baseband kernels! Running LTE baseband on a mobile GPGPU! Implications for future mobile GPGPUs! Conclusion 27 27

29 Conclusion 28! Mobile GPUs can be used for wireless communication in a mobile device! Meet specification peaks for 3 out of 4 PHY configurations! Meet the typical rate for all PHY configurations! Turbo is close to the typical data rate! Need special supports for the Turbo decoder in a mobile GPU! Energy is high compared with application-specific solutions! But in the accepting range! Need further optimizations 28 28

30 29 Thanks! Any questions? 29 29

31 30 Backup Slides 30 30

32 Quick Subscription Growth World mobile-cellular subscriptions (Millions)* Year *ITU report -- The World in 2011: ICT Facts and Figures 31 31

33 Our Contribution 32! Parallel implementations of LTE baseband kernels on a mobile GPGPU! Throughputs and latency of different configurations! Energy efficiency of a mobile GPGPU 32 32

34 Parallelism in PHY Layer Kernels 33 Kernel Parallelism Number of threads OFDM Modulation (FFT) Channel Estimation MIMO detector Antenna-level Symbol-level Algorithm-level Antenna-level Subcarrier-level Symbol-level Subcarrier-level N ant N sym N FFT N ant N sub N sym N sub Modulation demapper Antenna-level Symbol-level Subcarrier-level Algorithm-level N ant N sym N sub N Mod N ant N sym N sub log 2 (N Mod ) Descrambling Antenna-level Symbol-level Subcarrier-level N ant N sym N sub 33 33

35 Parallelism in Turbo Decoder* 34! Total number of threads N thread = N packet N subblock Thread trellis! Implementation performance tradeoff Parallelism Scheme Throughput Latency Bit Error Rate Packet-level Better Worse No Change Subblock-level Better No Change Worse Trellis-level Better No Change No Change Subblock+NII Worse No Change Better Subblock+TS Worse No Change Better *Qi Zheng, et al, Parallelization Techniques for Implementing Trellis Algorithms on Graphics Processors, ISCAS 2013, Beijing 34 34

36 Mobile GPU Architecture 35! GPU on NVIDIA Jetson Tegra K1 board! Kepler architecture 192 CUDA MHz! 64KB L1 memory / 128KB L2 cache / Share 2GB DDR3L with CPU 35 35

37 Performance Metrics 36! Collected stats for a variety of important events! CUDA Event API! Throughput! Latency! NVIDIA Profiler! Achieved occupancy! Memory utilization! Dynamic instruction count 36 36

38 GPU Occupancy 37 Achieved#Occupancy# 100%# 90%# 80%# 70%# 60%# 50%# 40%# 30%# 20%# 10%# 0%# Achieved"Throughputs"(Mbps)" 200" 180" 160" 140" 120" 100" 80" 60" 40" 20" 0" 0# 2# 4# 6# 8# 10# 12# 14# 16# 18# The#number#of#packets# 0" 5" 10" 15" 20" 25" 30" 35" The"number"of"packets" 1x1+QPSK" 2x2+QPSK" 1x1+16QAM" 2x2+16QAM" Packet batching increases throughput which saturates at 8 packets due to full GPU utilization 37 37

39 Hotspot Analysis 38 Execu9on#Time#Breakdown# 100%# 90%# 80%# 70%# 60%# 50%# 40%# 30%# 20%# 10%# 0%# 1x1+QPSK# 2x2+QPSK# 1x1+16QAM# 2x2+16QAM# Rate#Matching# Descrambling# Demodula9on# MIMO#Detec9on# Channel#Es9ma9on# FFT# Demodulation dominates due to its heavy computation 38 38

40 Hotspot Analysis -- #instructions %# Executed(Instruc7ons( 90%# 80%# 70%# 60%# 50%# 40%# 30%# 20%# 10%# 0%# Rate(Matching( Descrambling( Demodula7on( MIMO(Detec7on( Channel(Es7ma7on( FFT( 1x1+QPSK( 2x2+QPSK( 1x1+16QAM( 2x2+16QAM( Demodulation dominates due to its heavy computation 39 39

41 Turbo Throughput 40 Turbo"Decoding" Throughput"(Mbps)" 10" 8" 6" 4" 2" 0" 0" 2" 4" 6" 8" 10" 12" 14" 16" 18" The"number"of"packets" #Subblock=16" #Subblock=32" #Subblock=64" Throughput increases with the number of subblocks and packets, and starts saturating at 64 subblocks 40 40

42 Frequency Scaling -- PHY 41 Achieved Throughput (Mbps) GPU Frequency (MHz) 1x1+QPSK 2x2+QPSK 1x1+16QAM 2x2+16QAM Throughputs have almost linear scaling with the frequency 41 41

43 Frequency Scaling -- Turbo 42 Achieved#Throughput#(Mbps)# 9.0# 8.0# 7.0# 6.0# 5.0# 4.0# 3.0# 2.0# 1.0# 0.0# 0# 100# 200# 300# 400# 500# 600# 700# 800# 900# GPU#Frequency#(MHz)# Throughputs have almost linear scaling with the frequency 42 42

Parallelization Techniques for Implementing Trellis Algorithms on Graphics Processors

Parallelization Techniques for Implementing Trellis Algorithms on Graphics Processors 1 Parallelization Techniques for Implementing Trellis Algorithms on Graphics Processors Qi Zheng*, Yajing Chen*, Ronald Dreslinski*, Chaitali Chakrabarti +, Achilleas Anastasopoulos*, Scott Mahlke*, Trevor

More information

OpenRadio. A programmable wireless dataplane. Manu Bansal Stanford University. Joint work with Jeff Mehlman, Sachin Katti, Phil Levis

OpenRadio. A programmable wireless dataplane. Manu Bansal Stanford University. Joint work with Jeff Mehlman, Sachin Katti, Phil Levis OpenRadio A programmable wireless dataplane Manu Bansal Stanford University Joint work with Jeff Mehlman, Sachin Katti, Phil Levis HotSDN 12, August 13, 2012, Helsinki, Finland 2 Opening up the radio Why?

More information

Flexible wireless communication architectures

Flexible wireless communication architectures Flexible wireless communication architectures Sridhar Rajagopal Department of Electrical and Computer Engineering Rice University, Houston TX Faculty Candidate Seminar Southern Methodist University April

More information

Software Defined Modem A commercial platform for wireless handsets

Software Defined Modem A commercial platform for wireless handsets Software Defined Modem A commercial platform for wireless handsets Charles F Sturman VP Marketing June 22 nd ~ 24 th Brussels charles.stuman@cognovo.com www.cognovo.com Agenda SDM Separating hardware from

More information

White Paper Using Cyclone III FPGAs for Emerging Wireless Applications

White Paper Using Cyclone III FPGAs for Emerging Wireless Applications White Paper Introduction Emerging wireless applications such as remote radio heads, pico/femto base stations, WiMAX customer premises equipment (CPE), and software defined radio (SDR) have stringent power

More information

Benchmarking Multithreaded, Multicore and Reconfigurable Processors

Benchmarking Multithreaded, Multicore and Reconfigurable Processors Insight, Analysis, and Advice on Signal Processing Technology Benchmarking Multithreaded, Multicore and Reconfigurable Processors Berkeley Design Technology, Inc. 2107 Dwight Way, Second Floor Berkeley,

More information

Compilation of Parametric Dataflow Applications for Software-Defined-Radio-Dedicated MPSoCs DREAM seminar

Compilation of Parametric Dataflow Applications for Software-Defined-Radio-Dedicated MPSoCs DREAM seminar Compilation of Parametric Dataflow Applications for Software-Defined-Radio-Dedicated MPSoCs DREAM seminar Mickaël Dardaillon Research Intern with NOKIA Technologies January 27th, 2015 2 / 33 What we know

More information

Benchmarking Processors for DSP Applications

Benchmarking Processors for DSP Applications Insight, Analysis, and Advice on Signal Processing Technology Benchmarking Processors for DSP Applications Berkeley Design Technology, Inc. 2107 Dwight Way, Second Floor Berkeley, California 94704 USA

More information

Manycore and GPU Channelisers. Seth Hall High Performance Computing Lab, AUT

Manycore and GPU Channelisers. Seth Hall High Performance Computing Lab, AUT Manycore and GPU Channelisers Seth Hall High Performance Computing Lab, AUT GPU Accelerated Computing GPU-accelerated computing is the use of a graphics processing unit (GPU) together with a CPU to accelerate

More information

Experiences Using Tegra K1 and X1 for Highly Energy Efficient Computing

Experiences Using Tegra K1 and X1 for Highly Energy Efficient Computing Experiences Using Tegra K1 and X1 for Highly Energy Efficient Computing Gaurav Mitra Andrew Haigh Luke Angove Anish Varghese Eric McCreath Alistair P. Rendell Research School of Computer Science Australian

More information

Simplifying FPGA Design for SDR with a Network on Chip Architecture

Simplifying FPGA Design for SDR with a Network on Chip Architecture Simplifying FPGA Design for SDR with a Network on Chip Architecture Matt Ettus Ettus Research GRCon13 Outline 1 Introduction 2 RF NoC 3 Status and Conclusions USRP FPGA Capability Gen

More information

Parallel HMMs. Parallel Implementation of Hidden Markov Models for Wireless Applications

Parallel HMMs. Parallel Implementation of Hidden Markov Models for Wireless Applications Parallel HMMs Parallel Implementation of Hidden Markov Models for Wireless Applications Authors Shawn Hymel (Wireless@VT, Virginia Tech) Ihsan Akbar (Harris Corporation) Jeffrey Reed (Wireless@VT, Virginia

More information

RFNoC : RF Network on Chip Martin Braun, Jonathon Pendlum GNU Radio Conference 2015

RFNoC : RF Network on Chip Martin Braun, Jonathon Pendlum GNU Radio Conference 2015 RFNoC : RF Network on Chip Martin Braun, Jonathon Pendlum GNU Radio Conference 2015 Outline Motivation Current situation Goal RFNoC Basic concepts Architecture overview Summary No Demo! See our booth,

More information

THE LEADER IN VISUAL COMPUTING

THE LEADER IN VISUAL COMPUTING MOBILE EMBEDDED THE LEADER IN VISUAL COMPUTING 2 TAKING OUR VISION TO REALITY HPC DESIGN and VISUALIZATION AUTO GAMING 3 BEST DEVELOPER EXPERIENCE Tools for Fast Development Debug and Performance Tuning

More information

BER Guaranteed Optimization and Implementation of Parallel Turbo Decoding on GPU

BER Guaranteed Optimization and Implementation of Parallel Turbo Decoding on GPU 2013 8th International Conference on Communications and Networking in China (CHINACOM) BER Guaranteed Optimization and Implementation of Parallel Turbo Decoding on GPU Xiang Chen 1,2, Ji Zhu, Ziyu Wen,

More information

Choosing a Processor: Benchmarks and Beyond (S043)

Choosing a Processor: Benchmarks and Beyond (S043) Insight, Analysis, and Advice on Signal Processing Technology Choosing a Processor: Benchmarks and Beyond (S043) Jeff Bier Berkeley Design Technology, Inc. Berkeley, California USA +1 (510) 665-1600 info@bdti.com

More information

Simulation, prototyping and verification of standards-based wireless communications

Simulation, prototyping and verification of standards-based wireless communications Simulation, prototyping and verification of standards-based wireless communications Colin McGuire, Neil MacEwen 2015 The MathWorks, Inc. 1 Real Time LTE Cell Scanner with MATLAB and Simulink 2 Real time

More information

SDA: Software-Defined Accelerator for Large- Scale DNN Systems

SDA: Software-Defined Accelerator for Large- Scale DNN Systems SDA: Software-Defined Accelerator for Large- Scale DNN Systems Jian Ouyang, 1 Shiding Lin, 1 Wei Qi, Yong Wang, Bo Yu, Song Jiang, 2 1 Baidu, Inc. 2 Wayne State University Introduction of Baidu A dominant

More information

The WINLAB Cognitive Radio Platform

The WINLAB Cognitive Radio Platform The WINLAB Cognitive Radio Platform IAB Meeting, Fall 2007 Rutgers, The State University of New Jersey Ivan Seskar Software Defined Radio/ Cognitive Radio Terminology Software Defined Radio (SDR) is any

More information

TR An Overview of NVIDIA Tegra K1 Architecture. Ang Li, Radu Serban, Dan Negrut

TR An Overview of NVIDIA Tegra K1 Architecture. Ang Li, Radu Serban, Dan Negrut TR-2014-17 An Overview of NVIDIA Tegra K1 Architecture Ang Li, Radu Serban, Dan Negrut November 20, 2014 Abstract This paperwork gives an overview of NVIDIA s Jetson TK1 Development Kit and its Tegra K1

More information

CEVA-X1 Lightweight Multi-Purpose Processor for IoT

CEVA-X1 Lightweight Multi-Purpose Processor for IoT CEVA-X1 Lightweight Multi-Purpose Processor for IoT 1 Cellular IoT for The Massive Internet of Things Narrowband LTE Technologies Days Battery Life Years LTE-Advanced LTE Cat-1 Cat-M1 Cat-NB1 >10Mbps Up

More information

Graph-based Framework for Flexible Baseband Function Splitting and Placement in C-RAN

Graph-based Framework for Flexible Baseband Function Splitting and Placement in C-RAN Graph-based Framework for Flexible Baseband Function Splitting and Placement in C-RAN Group Meeting Presentation (Paper Review) J. Liu, Graph-based framework for flexible baseband function splitting and

More information

Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers

Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers Johann Hauswald, Michael A. Laurenzano, Yunqi Zhang, Cheng Li, Austin Rovinski,

More information

Reconfigurable VLSI Communication Processor Architectures

Reconfigurable VLSI Communication Processor Architectures Reconfigurable VLSI Communication Processor Architectures Joseph R. Cavallaro Center for Multimedia Communication www.cmc.rice.edu Department of Electrical and Computer Engineering Rice University, Houston

More information

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks Naveen Suda, Vikas Chandra *, Ganesh Dasika *, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, Yu

More information

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand

More information

Versal: AI Engine & Programming Environment

Versal: AI Engine & Programming Environment Engineering Director, Xilinx Silicon Architecture Group Versal: Engine & Programming Environment Presented By Ambrose Finnerty Xilinx DSP Technical Marketing Manager October 16, 2018 MEMORY MEMORY MEMORY

More information

A Multi-Tiered Optimization Framework for Heterogeneous Computing

A Multi-Tiered Optimization Framework for Heterogeneous Computing A Multi-Tiered Optimization Framework for Heterogeneous Computing IEEE HPEC 2014 Alan George Professor of ECE University of Florida Herman Lam Assoc. Professor of ECE University of Florida Andrew Milluzzi

More information

SDA: Software-Defined Accelerator for Large- Scale DNN Systems

SDA: Software-Defined Accelerator for Large- Scale DNN Systems SDA: Software-Defined Accelerator for Large- Scale DNN Systems Jian Ouyang, 1 Shiding Lin, 1 Wei Qi, 1 Yong Wang, 1 Bo Yu, 1 Song Jiang, 2 1 Baidu, Inc. 2 Wayne State University Introduction of Baidu A

More information

High-performance and Low-power Consumption Vector Processor for LTE Baseband LSI

High-performance and Low-power Consumption Vector Processor for LTE Baseband LSI High-performance and Low-power Consumption Vector Processor for LTE Baseband LSI Yi Ge Mitsuru Tomono Makiko Ito Yoshio Hirose Recently, the transmission rate for handheld devices has been increasing by

More information

A System Solution for High-Performance, Low Power SDR

A System Solution for High-Performance, Low Power SDR A System Solution for High-Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztián Flautner 2 1 Advanced Computer Architecture Laboratory

More information

Are Polar Codes Practical?

Are Polar Codes Practical? Are Polar Codes Practical? Prof. Warren J. Gross McGill University, Montreal, QC. Coding: from Practice to Theory Berkeley, CA. Feb. 10 th, 2015. Achieving the Channel Capacity Successive cancellation

More information

CUDA. Matthew Joyner, Jeremy Williams

CUDA. Matthew Joyner, Jeremy Williams CUDA Matthew Joyner, Jeremy Williams Agenda What is CUDA? CUDA GPU Architecture CPU/GPU Communication Coding in CUDA Use cases of CUDA Comparison to OpenCL What is CUDA? What is CUDA? CUDA is a parallel

More information

A Software Development and Validation Framework for SDR Platforms

A Software Development and Validation Framework for SDR Platforms A Software Development and Validation Framework for SDR Platforms Jeroen.Declerck@imec.be Outline IMEC SDR Platform Problem Statement Framework (XMSF) Implementation XMSS server Graphical logger IMEC SDR

More information

Algorithm-Architecture Co- Design for Efficient SDR Signal Processing

Algorithm-Architecture Co- Design for Efficient SDR Signal Processing Algorithm-Architecture Co- Design for Efficient SDR Signal Processing Min Li, limin@imec.be Wireless Research, IMEC Introduction SDR Baseband Platforms Today are Usually Based on ILP + DLP + MP Massive

More information

PicoSDR goes GNU Radio. Tristan Martin Jan 2013

PicoSDR goes GNU Radio. Tristan Martin Jan 2013 Tristan Martin Jan 2013 Table of content Model Based Design tool for FPGA Development (MBDK) Model Based Design tool for host development (GNU Radio) PicoSDR : High End MIMO RF frontend for GNU Radio Radio420M

More information

Multimedia in Mobile Phones. Architectures and Trends Lund

Multimedia in Mobile Phones. Architectures and Trends Lund Multimedia in Mobile Phones Architectures and Trends Lund 091124 Presentation Henrik Ohlsson Contact: henrik.h.ohlsson@stericsson.com Working with multimedia hardware (graphics and displays) at ST- Ericsson

More information

A GPU Implementation of a Real-Time MIMO Detector

A GPU Implementation of a Real-Time MIMO Detector A GPU Implementation of a Real-Time MIMO Detector Michael Wu, Siddharth Gupta, Yang Sun, and Joseph R. Cavallaro Electrical and Computer Engineering Rice University Houston, Texas 77005 {mbw2, sgupta,

More information

OCP Engineering Workshop - Telco

OCP Engineering Workshop - Telco OCP Engineering Workshop - Telco Low Latency Mobile Edge Computing Trevor Hiatt Product Management, IDT IDT Company Overview Founded 1980 Workforce Approximately 1,800 employees Headquarters San Jose,

More information

IMPLEMENTATION OF MPI-BASED WIMAX BASE STATION SYSTEM FOR SDR

IMPLEMENTATION OF MPI-BASED WIMAX BASE STATION SYSTEM FOR SDR IMPLEMENTATION OF MPI-BASED WIMAX BASE STATION SYSTEM FOR SDR Hyohan Kim (HY-SDR Research Center, Hanyang Univ., Seoul, South Korea; hhkim@dsplab.hanyang.ac.kr); Chiyoung Ahn(HY-SDR Research center, Hanyang

More information

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based

More information

High performance, power-efficient DSPs based on the TI C64x

High performance, power-efficient DSPs based on the TI C64x High performance, power-efficient DSPs based on the TI C64x Sridhar Rajagopal, Joseph R. Cavallaro, Scott Rixner Rice University {sridhar,cavallar,rixner}@rice.edu RICE UNIVERSITY Recent (2003) Research

More information

Summary of WP5 Integration and Validation Second Year. FP7 ICT Objective 1.1 The Network of the Future

Summary of WP5 Integration and Validation Second Year. FP7 ICT Objective 1.1 The Network of the Future Summary of WP5 Integration and Validation Second Year FP7 ICT Objective 1.1 The Network of the Future 1 Outline WP5 Outlook Testbed 1 Testbed 2 Testbed 3 Road map 2 WP5 Outlook Year 1 Year 2 Year 3 Testbeds

More information

Improved Convolutional Coding and Decoding of IEEE802.11n Based on General Purpose Processors

Improved Convolutional Coding and Decoding of IEEE802.11n Based on General Purpose Processors 2013 th International Conference on Communications and Networking in China (CHINACOM) Improved Convolutional Coding and Decoding of IEEE02.11n Based on General Purpose Processors Yanuo Xu, Kai Niu, Zhiqiang

More information

Rhythm: Harnessing Data Parallel Hardware for Server Workloads

Rhythm: Harnessing Data Parallel Hardware for Server Workloads Rhythm: Harnessing Data Parallel Hardware for Server Workloads Sandeep R. Agrawal $ Valentin Pistol $ Jun Pang $ John Tran # David Tarjan # Alvin R. Lebeck $ $ Duke CS # NVIDIA Explosive Internet Growth

More information

The rcuda middleware and applications

The rcuda middleware and applications The rcuda middleware and applications Will my application work with rcuda? rcuda currently provides binary compatibility with CUDA 5.0, virtualizing the entire Runtime API except for the graphics functions,

More information

8/28/12. CSE 820 Graduate Computer Architecture. Richard Enbody. Dr. Enbody. 1 st Day 2

8/28/12. CSE 820 Graduate Computer Architecture. Richard Enbody. Dr. Enbody. 1 st Day 2 CSE 820 Graduate Computer Architecture Richard Enbody Dr. Enbody 1 st Day 2 1 Why Computer Architecture? Improve coding. Knowledge to make architectural choices. Ability to understand articles about architecture.

More information

Real Time MIMO and Cooperative MIMO Systems Testbed for latest Wireless Broadband Systems. B. Project Summary:

Real Time MIMO and Cooperative MIMO Systems Testbed for latest Wireless Broadband Systems. B. Project Summary: Real Time MIMO and Cooperative MIMO Systems Testbed for latest Wireless Broadband Systems B. Project Summary: 1. Project Objectives: Research on the MIMO detection algorithms and Cooperative-MIMO (C- MIMO)

More information

Elaborazione dati real-time su architetture embedded many-core e FPGA

Elaborazione dati real-time su architetture embedded many-core e FPGA Elaborazione dati real-time su architetture embedded many-core e FPGA DAVIDE ROSSI A L E S S A N D R O C A P O T O N D I G I U S E P P E T A G L I A V I N I A N D R E A M A R O N G I U C I R I - I C T

More information

Exploring Task Parallelism for Heterogeneous Systems Using Multicore Task Management API

Exploring Task Parallelism for Heterogeneous Systems Using Multicore Task Management API EuroPAR 2016 ROME Workshop Exploring Task Parallelism for Heterogeneous Systems Using Multicore Task Management API Suyang Zhu 1, Sunita Chandrasekaran 2, Peng Sun 1, Barbara Chapman 1, Marcus Winter 3,

More information

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Chi Zhang, Viktor K Prasanna University of Southern California {zhan527, prasanna}@usc.edu fpga.usc.edu ACM

More information

Deep Learning Accelerators

Deep Learning Accelerators Deep Learning Accelerators Abhishek Srivastava (as29) Samarth Kulshreshtha (samarth5) University of Illinois, Urbana-Champaign Submitted as a requirement for CS 433 graduate student project Outline Introduction

More information

Reconfigurable Multicore Server Processors for Low Power Operation

Reconfigurable Multicore Server Processors for Low Power Operation Reconfigurable Multicore Server Processors for Low Power Operation Ronald G. Dreslinski, David Fick, David Blaauw, Dennis Sylvester, Trevor Mudge University of Michigan, Advanced Computer Architecture

More information

An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection

An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection Hiroyuki Usui, Jun Tanabe, Toru Sano, Hui Xu, and Takashi Miyamori Toshiba Corporation, Kawasaki, Japan Copyright 2013,

More information

Exploiting CUDA Dynamic Parallelism for low power ARM based prototypes

Exploiting CUDA Dynamic Parallelism for low power ARM based prototypes www.bsc.es Exploiting CUDA Dynamic Parallelism for low power ARM based prototypes Vishal Mehta Engineer, Barcelona Supercomputing Center vishal.mehta@bsc.es BSC/UPC CUDA Centre of Excellence (CCOE) Training

More information

Near-Threshold Computing: Reclaiming Moore s Law

Near-Threshold Computing: Reclaiming Moore s Law 1 Near-Threshold Computing: Reclaiming Moore s Law Dr. Ronald G. Dreslinski Research Fellow Ann Arbor 1 1 Motivation 1000000 Transistors (100,000's) 100000 10000 Power (W) Performance (GOPS) Efficiency (GOPS/W)

More information

Seamless Dynamic Runtime Reconfiguration in a Software-Defined Radio

Seamless Dynamic Runtime Reconfiguration in a Software-Defined Radio Seamless Dynamic Runtime Reconfiguration in a Software-Defined Radio Michael L Dickens, J Nicholas Laneman, and Brian P Dunn WINNF 11 Europe 2011-Jun-22/24 Overview! Background on relevant SDR! Problem

More information

Practical MU-MIMO User Selection on ac Commodity Networks

Practical MU-MIMO User Selection on ac Commodity Networks Practical MU-MIMO User Selection on 802.11ac Commodity Networks Sanjib Sur Ioannis Pefkianakis, Xinyu Zhang and Kyu-Han Kim From Legacy to Gbps Wi-Fi 1999-2003 2009 What is new in 802.11ac? 2013 Legacy

More information

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,

More information

Session: Configurable Systems. Tailored SoC building using reconfigurable IP blocks

Session: Configurable Systems. Tailored SoC building using reconfigurable IP blocks IP 08 Session: Configurable Systems Tailored SoC building using reconfigurable IP blocks Lodewijk T. Smit, Gerard K. Rauwerda, Jochem H. Rutgers, Maciej Portalski and Reinier Kuipers Recore Systems www.recoresystems.com

More information

Hermes: An Integrated CPU/GPU Microarchitecture for IP Routing

Hermes: An Integrated CPU/GPU Microarchitecture for IP Routing Hermes: An Integrated CPU/GPU Microarchitecture for IP Routing Yuhao Zhu * Yangdong Deng Yubei Chen * Electrical and Computer Engineering University of Texas at Austin Institute of Microelectronics Tsinghua

More information

CHALLENGES TO LTE PROGRESS. The Evolution of Mobile Broadband and Regulatory Policy

CHALLENGES TO LTE PROGRESS. The Evolution of Mobile Broadband and Regulatory Policy CHALLENGES TO LTE PROGRESS The Evolution of Mobile Broadband and Regulatory Policy LTE North America, Dallas, Texas, November 21-22, 2013 4G Americas Board of Governors Exabytes per Month Traffic Growth

More information

Nutaq. PicoSDR FPGA-based, MIMO-Enabled, tunable RF SDR solutions PRODUCT SHEET I MONTREAL I NEW YORK I. nutaq. .com QUEBEC

Nutaq. PicoSDR FPGA-based, MIMO-Enabled, tunable RF SDR solutions PRODUCT SHEET I MONTREAL I NEW YORK I. nutaq. .com QUEBEC Nutaq PicoSDR FPGA-based, MIMO-Enabled, tunable RF SDR solutions PRODUCT SHEET QUEBEC I MONTREAL I NEW YORK I nutaq.com Nutaq PicoSDR Includes Nutaq OFDM Reference Design Up to 4 independent TRX, synchronized

More information

Hardware Acceleration of Feature Detection and Description Algorithms on Low Power Embedded Platforms

Hardware Acceleration of Feature Detection and Description Algorithms on Low Power Embedded Platforms Hardware Acceleration of Feature Detection and Description Algorithms on LowPower Embedded Platforms Onur Ulusel, Christopher Picardo, Christopher Harris, Sherief Reda, R. Iris Bahar, School of Engineering,

More information

PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor

PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor Taeho Kgil, Shaun D Souza, Ali Saidi, Nathan Binkert, Ronald Dreslinski, Steve Reinhardt, Krisztian Flautner,

More information

Abstract of the Book

Abstract of the Book Book Keywords IEEE 802.16, IEEE 802.16m, mobile WiMAX, 4G, IMT-Advanced, 3GPP LTE, 3GPP LTE-Advanced, Broadband Wireless, Wireless Communications, Cellular Systems, Network Architecture Abstract of the

More information

Specializing Hardware for Image Processing

Specializing Hardware for Image Processing Lecture 6: Specializing Hardware for Image Processing Visual Computing Systems So far, the discussion in this class has focused on generating efficient code for multi-core processors such as CPUs and GPUs.

More information

An Evaluation of Unified Memory Technology on NVIDIA GPUs

An Evaluation of Unified Memory Technology on NVIDIA GPUs An Evaluation of Unified Memory Technology on NVIDIA GPUs Wenqiang Li 1, Guanghao Jin 2, Xuewen Cui 1, Simon See 1,3 Center for High Performance Computing, Shanghai Jiao Tong University, China 1 Tokyo

More information

Connect your IoT device: Bluetooth 5, , NB-IoT

Connect your IoT device: Bluetooth 5, , NB-IoT Connect your IoT device: Bluetooth 5, 802.15.4, NB-IoT Prithi Ramakrishnan Arm TechTalk 2017 IoT connectivity technologies Multiple standards, different applications Throughput Unlicensed >100Mbps Wi-Fi

More information

Nutaq. PicoSDR FPGA-based, MIMO-Enabled, tunable RF SDR solutions PRODUCT SHEET I MONTREAL I NEW YORK I. nutaq. .com QUEBEC

Nutaq. PicoSDR FPGA-based, MIMO-Enabled, tunable RF SDR solutions PRODUCT SHEET I MONTREAL I NEW YORK I. nutaq. .com QUEBEC Nutaq PicoSDR FPGA-based, MIMO-Enabled, tunable RF SDR solutions PRODUCT SHEET QUEBEC I MONTREAL I NEW YORK I nutaq.com Nutaq PicoSDR Includes Nutaq OFDM Reference Design Up to 4 independent TRX, synchronized

More information

LTE for Mobile Consumer Devices -ETSI M2M Workshop Oct. 2011, Sophia Antipolis, France

LTE for Mobile Consumer Devices -ETSI M2M Workshop Oct. 2011, Sophia Antipolis, France LTE for Mobile Consumer Devices -ETSI M2M Workshop 2011-26 -27 Oct. 2011, Sophia Antipolis, France Sony Europe Limited Yuichi Morioka Yuichi.Morioka@eu.sony.com Overview Recent Trends of the Mobile Market

More information

ASTRI/CTA data analysis on parallel and low-power platforms

ASTRI/CTA data analysis on parallel and low-power platforms ICT Workshop INAF, Cefalù 2015 Universidade de São Paulo Instituto de Astronomia, Geofisica e Ciencias Atmosferica ASTRI/CTA data analysis on parallel and low-power platforms Alberto Madonna, Michele Mastropietro

More information

Connect Your IoT Device: Bluetooth 5, , NB-IoT

Connect Your IoT Device: Bluetooth 5, , NB-IoT Connect Your IoT Device: Bluetooth 5, 802.15.4, NB-IoT Craig Tou Business Development Manager, Arm Arm Tech Symposia 2017, Taipei IoT Devices - Everything Connects New classes of connectivity for a new

More information

A Glimpse at the Wireless Data Communications Standards. Fanny Mlinarsky 8 August 2007

A Glimpse at the Wireless Data Communications Standards. Fanny Mlinarsky 8 August 2007 A Glimpse at the Wireless Data Communications Standards Fanny Mlinarsky 8 August 2007 IMS Infrastructure for FMC IP Multimedia Subsystem (IMS) IP Network Fixed Mobile Convergence Mobile Fixed www.octoscope.com

More information

ECE 747 Digital Signal Processing Architecture. DSP Implementation Architectures

ECE 747 Digital Signal Processing Architecture. DSP Implementation Architectures ECE 747 Digital Signal Processing Architecture DSP Implementation Architectures Spring 2006 W. Rhett Davis NC State University W. Rhett Davis NC State University ECE 406 Spring 2006 Slide 1 My Goal Challenge

More information

Simplifying FPGA Design with A Novel Network-on-Chip Architecture

Simplifying FPGA Design with A Novel Network-on-Chip Architecture Simplifying FPGA Design with A Novel Network-on-Chip Architecture ABSTRACT John Malsbury Ettus Research 1043 N Shoreline Blvd Suite 100 +1 (650) 967-2870 john.malsbury@ettus.com As wireless communications

More information

Embedded Linux Conference San Diego 2016

Embedded Linux Conference San Diego 2016 Embedded Linux Conference San Diego 2016 Linux Power Management Optimization on the Nvidia Jetson Platform Merlin Friesen merlin@gg-research.com About You Target Audience - The presentation is introductory

More information

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based

More information

Implementation of a Dual-Mode SDR Smart Antenna Base Station Supporting WiBro and TDD HSDPA

Implementation of a Dual-Mode SDR Smart Antenna Base Station Supporting WiBro and TDD HSDPA Implementation of a Dual-Mode SDR Smart Antenna Base Station Supporting WiBro and TDD HSDPA Jongeun Kim, Sukhwan Mun, Taeyeol Oh,Yusuk Yun, Seungwon Choi 1 HY-SDR Research Center, Hanyang University, Seoul,

More information

Transcede. Jim Johnston CTO. Solving 4G Challenges for Pico, Micro and Macrocell Platforms

Transcede. Jim Johnston CTO. Solving 4G Challenges for Pico, Micro and Macrocell Platforms Transcede Solving 4G Challenges for Pico, Micro and Macrocell Platforms Jim Johnston CTO Communications Convergence Processing Mindspeed Technologies, Inc. August 24, 2010 The Next Internet Wave Mobility

More information

Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks

Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks Charles Eckert Xiaowei Wang Jingcheng Wang Arun Subramaniyan Ravi Iyer Dennis Sylvester David Blaauw Reetuparna Das M-Bits Research

More information

Design Choices for FPGA-based SoCs When Adding a SATA Storage }

Design Choices for FPGA-based SoCs When Adding a SATA Storage } U4 U7 U7 Q D U5 Q D Design Choices for FPGA-based SoCs When Adding a SATA Storage } Lorenz Kolb & Endric Schubert, Missing Link Electronics Rudolf Usselmann, ASICS World Services Motivation for SATA Storage

More information

MPSOC 2011 BEAUNE, FRANCE

MPSOC 2011 BEAUNE, FRANCE MPSOC 2011 BEAUNE, FRANCE BOADRES: A SCALABLE BASEBAND PROCESSOR TEMPLATE FOR Gbps RADIOS VICE PRESIDENT, CHAIRMAN OF THE TECHNOLOGY OFFICE PROFESSOR AT THE KATHOLIEKE UNIVERSITEIT LEUVEN STATUS SDR BASEBAND

More information

Accelerating the Implicit Integration of Stiff Chemical Systems with Emerging Multi-core Technologies

Accelerating the Implicit Integration of Stiff Chemical Systems with Emerging Multi-core Technologies Accelerating the Implicit Integration of Stiff Chemical Systems with Emerging Multi-core Technologies John C. Linford John Michalakes Manish Vachharajani Adrian Sandu IMAGe TOY 2009 Workshop 2 Virginia

More information

August, 2010 Enabling Software Defined Radio with the Modem Vector Signal Processor ENT-F0766

August, 2010 Enabling Software Defined Radio with the Modem Vector Signal Processor ENT-F0766 August, 2010 Enabling Software Defined Radio with the Modem Vector Signal Processor ENT-F0766 Kevin Traylor Freescale Fellow Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions

More information

Compilation of Parametric Dataflow Applications for Software-Defined-Radio-Dedicated MPSoCs

Compilation of Parametric Dataflow Applications for Software-Defined-Radio-Dedicated MPSoCs Compilation of Parametric Dataflow Applications for Software-Defined-Radio-Dedicated MPSoCs PhD work of Mickael Dardaillon Mickaël Dardaillon, Kevin Marquet (Citi), Tanguy Risset (Citi), Jérôme Martin

More information

Powering Real-time Radio Astronomy Signal Processing with latest GPU architectures

Powering Real-time Radio Astronomy Signal Processing with latest GPU architectures Powering Real-time Radio Astronomy Signal Processing with latest GPU architectures Harshavardhan Reddy Suda NCRA, India Vinay Deshpande NVIDIA, India Bharat Kumar NVIDIA, India What signals we are processing?

More information

Third Genera+on USRP Devices and the RF Network- On- Chip. Leif Johansson Market Development RF, Comm and SDR

Third Genera+on USRP Devices and the RF Network- On- Chip. Leif Johansson Market Development RF, Comm and SDR Third Genera+on USRP Devices and the RF Network- On- Chip Leif Johansson Market Development RF, Comm and SDR About Ettus Research Leader in soeware defined radio and signals intelligence Maker of USRP

More information

Building heterogeneous networks of green base stations on TI s KeyStone II architecture

Building heterogeneous networks of green base stations on TI s KeyStone II architecture WHITE PAPER Introduction Zhihong Lin, Strategic Marketing Manager, Wireless Base Station Infrastructure Texas Instruments Fueled by the ability to connect to any device anytime and anywhere, both interpersonal

More information

Implementation of a Fully-Parallel Turbo Decoder on a General-Purpose Graphics Processing Unit

Implementation of a Fully-Parallel Turbo Decoder on a General-Purpose Graphics Processing Unit Received June 7, 2016, accepted June 28, 2016, date of publication June 29, 2016, date of current version October 6, 2016. Digital Object Identifier 10.1109/ACCESS.2016.2586309 Implementation of a Fully-Parallel

More information

A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors

A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors Brent Bohnenstiehl and Bevan Baas Department of Electrical and Computer Engineering University of California, Davis {bvbohnen,

More information

Benchmarking Real-World In-Vehicle Applications

Benchmarking Real-World In-Vehicle Applications Benchmarking Real-World In-Vehicle Applications NVIDIA GTC 2015-03-18 m y c a b l e GmbH Michael Carstens-Behrens Gartenstraße 10 24534 Neumuenster, Germany +49 4321 559 56-55 +49 4321 559 56-10 mcb@mycable.de

More information

UNIVERSITAT POLITÈCNICA DE CATALUNYA

UNIVERSITAT POLITÈCNICA DE CATALUNYA Department of Signal Theory and Communications UNIVERSITAT POLITÈCNICA DE CATALUNYA RESOURCE MANAGEMENT STRATEGIES FOR SDR CLOUDS Vuk Marojevic Ismael Gomez Gabriel Montoro Antoni Gelonch Pere Gilabert

More information

Reminder. Course project team forming deadline. Course project ideas. Friday 9/8 11:59pm You will be randomly assigned to a team after the deadline

Reminder. Course project team forming deadline. Course project ideas. Friday 9/8 11:59pm You will be randomly assigned to a team after the deadline Reminder Course project team forming deadline Friday 9/8 11:59pm You will be randomly assigned to a team after the deadline Course project ideas If you have difficulty in finding team mates, send your

More information

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research Fast packet processing in the cloud Dániel Géhberger Ericsson Research Outline Motivation Service chains Hardware related topics, acceleration Virtualization basics Software performance and acceleration

More information

5G Technology update. Dr. David Hammarwall Head of Product Line 5G, Ericsson

5G Technology update. Dr. David Hammarwall Head of Product Line 5G, Ericsson 5G Technology update Dr. David Hammarwall Head of Product Line 5G, Ericsson 5g Vision One network supporting multiple use cases and industries Massive MTC Critical MTC Enhanced MBB Fixed Wireless Access

More information

CSC573: TSHA Introduction to Accelerators

CSC573: TSHA Introduction to Accelerators CSC573: TSHA Introduction to Accelerators Sreepathi Pai September 5, 2017 URCS Outline Introduction to Accelerators GPU Architectures GPU Programming Models Outline Introduction to Accelerators GPU Architectures

More information

HPC with Multicore and GPUs

HPC with Multicore and GPUs HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville COSC 594 Lecture Notes March 22, 2017 1/20 Outline Introduction - Hardware

More information

A PROGRAMMABLE BASEBAND PLATFORM FOR SOFTWARE-DEFINED RADIO

A PROGRAMMABLE BASEBAND PLATFORM FOR SOFTWARE-DEFINED RADIO A PROGRAMMABLE BASEBAND PLATFORM FOR SOFTWARE-DEFINED RADIO Hans-Martin Bluethgen, Cyprian Grassmann, Wolfgang Raab, Ulrich Ramacher, Josef Hausner, Infineon Technologies AG, 81609 Munich, Germany, Hans-Martin.Bluethgen@infineon.com

More information

Carlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain)

Carlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain) Carlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain) 4th IEEE International Workshop of High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB

More information