Modeling a 4G LTE System in MATLAB

Similar documents
Accelerating System Simulations

Modeling a 4G LTE System in MATLAB Idin Motedayen-Aval Senior Applications Engineer MathWorks

Speeding up MATLAB Applications Sean de Wolski Application Engineer

Parallel and Distributed Computing with MATLAB The MathWorks, Inc. 1

Getting Started with MATLAB Francesca Perino

Modeling a 4G LTE System in MATLAB

Parallel and Distributed Computing with MATLAB Gerardo Hernández Manager, Application Engineer

Moving MATLAB Algorithms into Complete Designs with Fixed-Point Simulation and Code Generation

Optimizing and Accelerating Your MATLAB Code

Multicore Computer, GPU 및 Cluster 환경에서의 MATLAB Parallel Computing 기능

Mit MATLAB auf der Überholspur Methoden zur Beschleunigung von MATLAB Anwendungen

Large Data in MATLAB: A Seismic Data Processing Case Study U. M. Sundar Senior Application Engineer

Speeding up MATLAB Applications The MathWorks, Inc.

Daniel D. Warner. May 31, Introduction to Parallel Matlab. Daniel D. Warner. Introduction. Matlab s 5-fold way. Basic Matlab Example

High Performance and GPU Computing in MATLAB

Technical Computing with MATLAB

Mit MATLAB auf der Überholspur Methoden zur Beschleunigung von MATLAB Anwendungen

MatCL - OpenCL MATLAB Interface

Introduction to C and HDL Code Generation from MATLAB

How Real-Time Testing Improves the Design of a PMSM Controller

Parallel Computing with MATLAB

Embarquez votre Intelligence Artificielle (IA) sur CPU, GPU et FPGA

Deep learning in MATLAB From Concept to CUDA Code

Scaling up MATLAB Analytics Marta Wilczkowiak, PhD Senior Applications Engineer MathWorks

Using Parallel Computing Toolbox to accelerate the Video and Image Processing Speed. Develop parallel code interactively

NumbaPro CUDA Python. Square matrix multiplication

Integrate MATLAB Analytics into Enterprise Applications

Hardware-Software Co-Design and Prototyping on SoC FPGAs Puneet Kumar Prateek Sikka Application Engineering Team

Using Intel Math Kernel Library with MathWorks* MATLAB* on Intel Xeon Phi Coprocessor System

Model-Based Design for Altera FPGAs Using HDL Code Generation The MathWorks, Inc. 1

Optimization and Implementation of Embedded Signal Processing Algorithms Jonas Rutström Senior Application Engineer

Model-Based Design: Design with Simulation in Simulink

MATLAB AND PARALLEL COMPUTING

designing a GPU Computing Solution

Integrate MATLAB Analytics into Enterprise Applications

Matlab for Engineers

Model-Based Design for Video/Image Processing Applications

MATLAB Based Optimization Techniques and Parallel Computing

The Lekha 3GPP LTE Turbo Decoder IP Core meets 3GPP LTE specification 3GPP TS V Release 10[1].

General Purpose GPU Computing in Partial Wave Analysis

MATLAB. Senior Application Engineer The MathWorks Korea The MathWorks, Inc. 2

컴퓨터비전의최신기술 : Deep Learning, 3D Vision and Embedded Vision

개발과정에서의 MATLAB 과 C 의연동 ( 영상처리분야 )

Simulation, prototyping and verification of standards-based wireless communications

MATLAB: The challenges involved in providing a high-level language on a GPU

Model-Based Design: Generating Embedded Code for Prototyping or Production

GPU ACCELERATED DATABASE MANAGEMENT SYSTEMS

Hardware Implementation and Verification by Model-Based Design Workflow - Communication Models to FPGA-based Radio

MATLAB Parallel Computing Toolbox Benchmark for an Embarrassingly Parallel Application

Parallel Computing with MATLAB on Discovery Cluster

GPU-Accelerated Beat Detection for Dancing Monkeys

LDPC Simulation With CUDA GPU

2015 The MathWorks, Inc. 1

What s New for MATLAB David Willingham

Georgia Institute of Technology Center for Signal and Image Processing Steve Conover February 2009

What s New with the MATLAB and Simulink Product Families. Marta Wilczkowiak & Coorous Mohtadi Application Engineering Group

Real-Time Testing in a Modern, Agile Development Workflow

Making the Most of your MATLAB Models to Improve Verification

INTRODUCTION TO MATLAB PARALLEL COMPUTING TOOLBOX

Scaling MATLAB. for Your Organisation and Beyond. Rory Adams The MathWorks, Inc. 1

What s New in MATLAB and Simulink

Practical Introduction to CUDA and GPU

MathWorks Products and Prices North America January 2018

System Requirements & Platform Availability by Product for R2016b

High Performance Computing for Engineers

Intro to System Generator. Objectives. After completing this module, you will be able to:

CUDA. Matthew Joyner, Jeremy Williams

Stream Processing with CUDA TM A Case Study Using Gamebryo's Floodgate Technology

Deep Learning: Transforming Engineering and Science The MathWorks, Inc.

Integrate MATLAB Analytics into Enterprise Applications

Avnet Speedway Design Workshop

What s New in MATLAB May 16, 2017

International Journal of Computer Science and Network (IJCSN) Volume 1, Issue 4, August ISSN

Audio Signal Processing in MATLAB Youssef Abdelilah Senior Product Manager

Implementing MATLAB Algorithms in FPGAs and ASICs By Alexander Schreiber Senior Application Engineer MathWorks

Introducing Simulink R2012b for Signal Processing & Communications Graham Reith Senior Team Leader, UK Application Engineering

Introduction to GPU Computing. 周国峰 Wuhan University 2017/10/13

Advanced CUDA Optimization 1. Introduction

Supporting Data Parallelism in Matcloud: Final Report

Designing and Prototyping Digital Systems on SoC FPGA The MathWorks, Inc. 1

Dynamic Cuda with F# HPC GPU & F# Meetup. March 19. San Jose, California

Application Development and Deployment With MATLAB

Renderscript Accelerated Advanced Image and Video Processing on ARM Mali T-600 GPUs. Lihua Zhang, Ph.D. MulticoreWare Inc.

What s New in MATLAB and Simulink The MathWorks, Inc. 1

Parallel Computing with Matlab and R

Avnet Speedway Design Workshop

Accelerate FPGA Prototyping with

Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany

Modeling and Simulating Social Systems with MATLAB

G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G

CUDA Programming Model

System-Level ASIC Algorithm Simulation Platform using Simulink

AperTO - Archivio Istituzionale Open Access dell'università di Torino

Using a GPU in InSAR processing to improve performance

GPUs Open New Avenues in Medical MRI

2015 The MathWorks, Inc. 1

Parallel Processing Tool-box

Introduction to GPU hardware and to CUDA

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)

Numba: A Compiler for Python Functions

Transcription:

Modeling a 4G LTE System in MATLAB Part 2: Simulation acceleration Houman Zarrinkoub PhD. Signal Processing Product Manager MathWorks houmanz@mathworks.com 2011 The MathWorks, Inc. 1

Why simulation acceleration? From algorithm exploration to system design Size and complexity of models increases Time needed for a single simulation increases Number of test cases increases Test cases become larger Need to reduce simulation time during design Need to reduce time for large scale testing during prototyping and verification 2

MATLAB is quite fast Optimized and widely-used libraries BLAS: Basic Linear Algebra Subroutines (multithreaded) LAPACK: Linear Algebra Package JIT (Just In Time) Acceleration On-the-fly multithreaded code generation for increased speed Built-in support for vector and matrix operations Parallel computing support to utilize additional cores Parallel Computing Toolbox MATLAB Distributed Computing Server GPU support 3

Simulation acceleration options in MATLAB System Objects User s Code Parallel Computing >> Demo commacceleration MATLAB to C GPU processing 4

Parallel Simulation Runs Worker TOOLBOXES BLOCKSETS Worker Worker Worker Task 1 Task 2 Task 3 Task 4 >> Demo Time Time 5

Summary matlabpool available workers No modification of algorithm Use parfor loop instead of for loop Parallel computation or simulation leads to further acceleration More cores = more speed 6

Simulation acceleration options in MATLAB System Objects User s Code Parallel Computing MATLAB to C GPU processing 7

What is a Graphics Processing Unit (GPU) Originally for graphics acceleration, now also used for scientific calculations Massively parallel array of integer and floating point processors Typically hundreds of processors per card GPU cores complement CPU cores Dedicated high-speed memory 8

Why would you want to use a GPU? Speed up execution of computationally intensive simulations For example: Performance: A\b with Double Precision 9

Ease of Use Options for Targeting GPUs 1) Use GPU with MATLAB built-in functions 2) Execute MATLAB functions elementwise on the GPU 3) Create kernels from existing CUDA code and PTX files Greater Control 10

Data Transfer between MATLAB and GPU % Push data from CPU to GPU memory Agpu = gpuarray(a) % Bring results from GPU memory back to CPU B = gather(bgpu) 11

GPU Processing with Communications System Toolbox Alternative implementation for many System objects take advantage of GPU processing Use Parallel Computing Toolbox to execute many communications algorithms directly on the GPU GPU System objects comm.gpu.turbodecoder comm.gpu.viterbidecoder comm.gpu.ldpcdecoder comm.gpu.pskdemodulator comm.gpu.awgnchannel Easy-to-use syntax Dramatically accelerate simulations 12

Example: Turbo Coding Impressive coding gain High computational complexity Bit-error rate performance as a function of number of iterations = comm.turbodecoder( NumIterations, numiter, 13

Acceleration with GPU System objects Version Elapsed time Acceleration CPU 8 hours 1.0 1 GPU 40 minutes 12.0 Same numerical results Cluster of 4 GPUs 11 minutes 43.0 = comm.turbodecoder( comm.gpu.turbodecoder( NumIterations, N, = comm.awgnchannel( = comm.gpu.awgnchannel( 14

Key Operations in Turbo Coding Function CPU GPU Version 1 % Turbo Encoder htenc = comm.turboencoder('trellisstructure',poly2trellis(4, [13 15], 13),.. 'InterleaverIndices', intrlvrindices) % AWG Noise hawgn = comm.awgnchannel('noisemethod', 'Variance'); % BER measurement hber = comm.errorrate; % Turbo Decoder htdec = comm.turbodecoder( 'TrellisStructure',poly2trellis(4, [13 15], 13),... 'InterleaverIndices', intrlvrindices,'numiterations', numiter); % Turbo Encoder htenc = comm.turboencoder('trellisstructure',poly2trellis(4, [13 15], 13),.. 'InterleaverIndices', intrlvrindices) % AWG Noise hawgn = comm.awgnchannel('noisemethod', 'Variance'); % BER measurement hber = comm.errorrate; % Turbo Decoder htdec = comm.gpu.turbodecoder( 'TrellisStructure',poly2trellis(4, [13 15], 13),... 'InterleaverIndices', intrlvrindices,'numiterations', numiter); ber = zeros(3,1); %initialize BER output %% Processing loop while ( ber(1) < MaxNumErrs && ber(2) < MaxNumBits) data = randn(blklength, 1)>0.5; % Encode random data bits yenc = step(htenc, data); %Modulate, Add noise to real bipolar data modout = 1-2*yEnc; rdata = step(hawgn, modout); % Convert to log-likelihood ratios for decoding llrdata = (-2/noiseVar).*rData; % Turbo Decode decdata = step(htdec, llrdata); % Calculate errors ber = step(hber, data, decdata); end ber = zeros(3,1); %initialize BER output %% Processing loop while ( ber(1) < MaxNumErrs && ber(2) < MaxNumBits) data = randn(blklength, 1)>0.5; % Encode random data bits yenc = step(htenc, data); %Modulate, Add noise to real bipolar data modout = 1-2*yEnc; rdata = step(hawgn, modout); % Convert to log-likelihood ratios for decoding llrdata = (-2/noiseVar).*rData; % Turbo Decode decdata = step(htdec, llrdata); % Calculate errors ber = step(hber, data, decdata); end 15

Profile results in Turbo Coding Function CPU GPU Version 1 % Turbo Encoder <0.01 htenc = comm.turboencoder('trellisstructure',poly2trellis(4, [13 15], 13),.. 'InterleaverIndices', intrlvrindices) % AWG Noise <0.01 hawgn = comm.awgnchannel('noisemethod', 'Variance'); % BER measurement <0.01 hber = comm.errorrate; % Turbo Decoder <0.01 htdec = comm.turbodecoder( 'TrellisStructure',poly2trellis(4, [13 15], 13),... 'InterleaverIndices', intrlvrindices,'numiterations', numiter); % Turbo Encoder <0.01 htenc = comm.turboencoder('trellisstructure',poly2trellis(4, [13 15], 13),.. 'InterleaverIndices', intrlvrindices) % AWG Noise <0.01 hawgn = comm.awgnchannel('noisemethod', 'Variance'); % BER measurement <0.01 hber = comm.errorrate; % Turbo Decoder 0.02 htdec = comm.gpu.turbodecoder( 'TrellisStructure',poly2trellis(4, [13 15], 13),... 'InterleaverIndices', intrlvrindices,'numiterations', numiter); <0.01 ber = zeros(3,1); %initialize BER output %% Processing loop while ( ber(1) < MaxNumErrs && ber(2) < MaxNumBits) 0.30 data = randn(blklength, 1)>0.5; % Encode random data bits 2.33 yenc = step(htenc, data); %Modulate, Add noise to real bipolar data 0.05 modout = 1-2*yEnc; 1.50 rdata = step(hawgn, modout); % Convert to log-likelihood ratios for decoding 0.03 llrdata = (-2/noiseVar).*rData; % Turbo Decode 330.54 decdata = step(htdec, llrdata); % Calculate errors 0.17 ber = step(hber, data, decdata); end <0.01 ber = zeros(3,1); %initialize BER output %% Processing loop while ( ber(1) < MaxNumErrs && ber(2) < MaxNumBits) 0.28 data = randn(blklength, 1)>0.5; % Encode random data bits 2.38 yenc = step(htenc, data); %Modulate, Add noise to real bipolar data 0.05 modout = 1-2*yEnc; 1.45 rdata = step(hawgn, modout); % Convert to log-likelihood ratios for decoding 0.04 llrdata = (-2/noiseVar).*rData; % Turbo Decode 98.18 decdata = step(htdec, llrdata); % Calculate errors 0.17 ber = step(hber, data, decdata); end 16

Key Operations in Turbo Coding Function CPU GPU Version 2 % Turbo Encoder htenc = comm.turboencoder('trellisstructure',poly2trellis(4, [13 15], 13),.. 'InterleaverIndices', intrlvrindices) % AWG Noise hawgn = comm.awgnchannel('noisemethod', 'Variance'); % BER measurement hber = comm.errorrate; % Turbo Decoder htdec = comm.turbodecoder('trellisstructure',poly2trellis(4, [13 15], 13),... 'InterleaverIndices', intrlvrindices,'numiterations', numiter); %% Processing loop while ( ber(1) < MaxNumErrs && ber(2) < MaxNumBits) data = randn(blklength, 1)>0.5; % Encode random data bits yenc = step(htenc, data); %Modulate, Add noise to real bipolar data modout = 1-2*yEnc; rdata = step(hawgn, modout); % Convert to log-likelihood ratios for decoding llrdata = (-2/noiseVar).*rData; % Turbo Decode decdata = step(htdec, llrdata); % Calculate errors ber = step(hber, data, decdata); end % Turbo Encoder htenc = comm.turboencoder('trellisstructure',poly2trellis(4, [13 15], 13),.. 'InterleaverIndices', intrlvrindices) % AWG Noise hawgn = comm.gpu.awgnchannel ('NoiseMethod', 'Variance'); % BER measurement hber = comm.errorrate; % Turbo Decoder - setup for Multi-frame or Multi-user processing numframes = 30; htdec = comm.gpu.turbodecoder('trellisstructure',poly2trellis(4, [13 15], 13),... 'InterleaverIndices', intrlvrindices,'numiterations',numiter, NumFrames,numFrames); %% Processing loop while ( ber(1) < MaxNumErrs && ber(2) < MaxNumBits) data = randn(numframes*blklength, 1)>0.5; % Encode random data bits yenc = gpuarray(multiframestep(htenc, data, numframes)); %Modulate, Add noise to real bipolar data modout = 1-2*yEnc; rdata = step(hawgn, modout); % Convert to log-likelihood ratios for decoding llrdata = (-2/noiseVar).*rData; % Turbo Decode decdata = step(htdec, llrdata); % Calculate errors ber=step(hber, data, gather(decdata)); end 17

Profile results in Turbo Coding Function CPU GPU Version 2 % Turbo Encoder <0.01 htenc = comm.turboencoder('trellisstructure',poly2trellis(4, [13 15], 13),.. 'InterleaverIndices', intrlvrindices) % AWG Noise <0.01 hawgn = comm.awgnchannel('noisemethod', 'Variance'); % BER measurement <0.01 hber = comm.errorrate; % Turbo Decoder <0.01 htdec = comm.turbodecoder( 'TrellisStructure',poly2trellis(4, [13 15], 13),... 'InterleaverIndices', intrlvrindices,'numiterations', numiter); %% Processing loop while ( ber(1) < MaxNumErrs && ber(2) < MaxNumBits) 0.30 data = randn(blklength, 1)>0.5; % Encode random data bits 2.33 yenc = step(htenc, data); %Modulate, Add noise to real bipolar data 0.05 modout = 1-2*yEnc; 1.50 rdata = step(hawgn, modout); % Convert to log-likelihood ratios for decoding 0.03 llrdata = (-2/noiseVar).*rData; % Turbo Decode 330.54 decdata = step(htdec, llrdata); % Calculate errors 0.17 ber = step(hber, data, decdata); end % Turbo Encoder <0.01 htenc = comm.turboencoder('trellisstructure',poly2trellis(4, [13 15], 13),.. 'InterleaverIndices', intrlvrindices) % AWG Noise 0.03 hawgn = comm.gpu.awgnchannel ('NoiseMethod', 'Variance'); % BER measurement <0.01 hber = comm.errorrate; % Turbo Decoder - setup for Multi-frame or Multi-user processing 0.01 numframes = 30; 0.01 htdec = comm.gpu.turbodecoder('trellisstructure', poly2trellis(4, [13 15], 13),'InterleaverIndices', intrlvrindices, 'NumIterations',numIter, NumFrames,numFrames); %% Processing loop while ( ber(1) < MaxNumErrs && ber(2) < MaxNumBits) 0.22 data = randn(numframes*blklength, 1)>0.5; % Encode random data bits 2.45 yenc = gpuarray(multiframestep(htenc, data, numframes)); %Modulate, Add noise to real bipolar data 0.02 modout = 1-2*yEnc; 0.31 rdata = step(hawgn, modout); % Convert to log-likelihood ratios for decoding 0.01 llrdata = (-2/noiseVar).*rData; % Turbo Decode 20.89 decdata = step(htdec, llrdata); % Calculate errors 0.09 ber=step(hber, data, gather(decdata)); end 18

Things to note when targeting GPU Minimize data transfer between CPU and GPU. Using GPU only makes sense if data size is large. Some functions in MATLAB are optimized and can be faster than the GPU equivalent (eg. FFT). Use arrayfun to explicitly specify elementwise operations. 19

Acceleration Strategies Applied in MATLAB Option 1. Best Practices in Programming Vectorization & pre-allocation Environment tools. (i.e. Profiler, Code Analyzer) 2. Better Algorithms Ideal environment for algorithm exploration Rich set of functionality (e.g. System objects) 3. More Processors or Cores High level parallel constructs (e.g. parfor, matlabpool) Utilize cluster, clouds, and grids 4. Refactoring the Implementation Compiled code (MEX) GPUs, FPGA-in-the-Loop Technology / Product MATLAB, Toolboxes, System Toolboxes MATLAB, Toolboxes, System Toolboxes Parallel Computing Toolbox, MATLAB Distributed Computing Server MATLAB, MATLAB Coder, Parallel Computing Toolbox 20

Summary MATLAB is the ideal language for LTE modeling and simulation Communications System Toolbox extend breadth of MATLAB modeling tools You can accelerate simulation with a variety of options in MATLAB Parallel computing, GPU processing, MATLAB to C Address implementation workflow gaps with Automatic MATLAB to C/C++ and HDL code generation Hardware-in-the-loop verification 21

Call to Action Attend the 3rd part of this seminar Direct path from system model to implementation C and HDL code generation Fixed-point modeling Radio-in-the-loop with USRP2 22

Thank You Q & A 23