Generic Polyphase Filterbanks with CUDA

Size: px

Start display at page:

Download "Generic Polyphase Filterbanks with CUDA"

Denis Alexander
5 years ago
Views:

1 Generic Polyphase Filterbanks with CUDA Jan Krämer German Aerospace Center Communication and Navigation Satellite Networks Weßling Knowledge for Tomorrow

2 Slide 1 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > Outline 1. Motivation 2. Short introduction to CUDA 3. PFBs and the Channelizer 4. Translation to CUDA 5. Results 6. Release plans and future changes

3 Slide 1 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > Outline 1. Motivation 2. Short introduction to CUDA 3. PFBs and the Channelizer 4. Translation to CUDA 5. Results 6. Release plans and future changes

4 Slide 2 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > Once upon a time in a space project Multicarrier scheme with 15/30/45 carrier

5 Slide 2 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > Once upon a time in a space project Multicarrier scheme with 15/30/45 carrier So let s just use a PFB, right?

6 Slide 3 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > Early Trouble 45 carrier means 45x the bandwidth Only % guardband available At least 3x oversampling needed Up to 1500 tap filters needed

www.dlr.de Slide 3 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > 04.02.

7 Slide 3 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > Early Trouble 45 carrier means 45x the bandwidth Only % guardband available At least 3x oversampling needed Up to 1500 tap filters needed

8 Slide 4 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > Early Trouble CPU reference implementation 1000 taps 35dB rejection Originally 9x oversampling 2 Msamples/second achieved 4 Msamples/second needed

9 Slide 4 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > Outline 1. Motivation 2. Short introduction to CUDA 3. PFBs and the Channelizer 4. Translation to CUDA 5. Results 6. Release plans and future changes

10 Slide 5 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > What is CUDA NVidias framework for GPGPU Used mainly to accelerate scientific computing Uses the massive amount of available compute cores inside a GPU

11 Slide 6 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > GPU Interior GPU consists of several Streaming Multiprocessors (SM) Each SM consists of numerous compute or CUDA cores Single-Instruction Multiple-Threads (SIMT) structure Several kinds of memory Global Memory (GDDR5 RAM) (slow) On-Chip (shared) Memory per SM (faster) Registers (blazingly fast)

12 Slide 7 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > CUDA Interior Builds a (up to) 3 dimensional Grid The Gid contains the (up to) 3 dimensional Thread Blocks containing the threads Groups of 32 threads inside a Thread Block are grouped together Warp

13 Slide 8 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > CUDA Interior

14 Slide 9 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > Thread Execution Each Block has a unique ID inside the Grid Each thread has a unique global ID Thread Scheduler assigns each Thread Block to one SM and executed concurrently All threads in a Warp are executed concurrently inside the SM

15 Slide 10 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > Performance Bottlenecks Uncoalesced loads from global memory Several cache-lines to be loaded

16 Slide 10 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > Performance Bottlenecks Uncoalesced loads from global memory Several cache-lines to be loaded Bank conflicts when accessing shared memory

17 Slide 10 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > Performance Bottlenecks Uncoalesced loads from global memory Several cache-lines to be loaded Bank conflicts when accessing shared memory Branching Which instruction should be executed?

18 Slide 10 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > Outline 1. Motivation 2. Short introduction to CUDA 3. PFBs and the Channelizer 4. Translation to CUDA 5. Results 6. Release plans and future changes

19 Slide 11 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > Why PFBs and Channelizers/Synthesizers? Used to reduce computational complexity for resampling filters Used to separate small bandwidth channels Used to generate multicarrier broadband signals

20 Slide 12 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > Structure of a PFB Channelizer Extracting a channel with 1 N of the total bandwidth Mix Signal to Baseband Apply anti-alias filter Downsample the signal

21 Slide 13 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > Structure of a PFB Channelizer Extracting a channel with 1 N of the total bandwidth Mix Signal to Baseband Apply anti-alias filter Downsample the signal N-phase PFB splits one-dimensional filter in its N different phase shares

22 Slide 14 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > Structure of a PFB Channelizer Taps of the regular prototype filter

23 Slide 15 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > Structure of a PFB Channelizer Taps of the regular prototype filter Split into 4 polyphase partitions

24 Slide 16 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > Structure of a PFB Channelizer Taps of the regular prototype filter Split into 4 polyphase partitions Newly structured dataflow

www.dlr.de Slide 17 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > 04.02.

25 Slide 17 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > Structure of a PFB Channelizer Taps of the regular prototype filter Split into 4 polyphase partitions Newly structured dataflow FFT separates all the channels

26 Slide 18 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > Structure of a PFB Channelizer Oversampling can be achieved by manipulating the input commutator and FFT input To synthesize several incoming channels just the reorder the operations

27 Slide 18 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > Outline 1. Motivation 2. Short introduction to CUDA 3. PFBs and the Channelizer 4. Translation to CUDA 5. Results 6. Release plans and future changes

28 Slide 18 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer >

29 Slide 19 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > Identifying necessary operations Channelizer consists of 4 operations Shuffle the input stream Polyphase filtering FFT Shuffle the output stream

30 Slide 20 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > Input Shuffling Input Commutator implemented as matrix traversal Number of threads needs to accomodate to the filter history Grid dimension takes care of this Input buffer reads are coalesced Block x-dimension same size as polyphase partition Intermediate buffer writes are therfore not coalesced

31 Slide 21 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > Filter Operations Block X dimension computes several input samples Block Y dimension computes oversampled output samples Grid X dimension represents polyphase partitions Grid Y dimension provide additional concurrency (due to block thread limits)

32 Slide 22 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > Filter Operations Each threadblock transfers memory from global memory to shared memory Each sample is accessed several times shared memory offers faster memory transfers Register and shared memory spills are avoided

33 Slide 23 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > FFT and Output Shuffling FFT is the CuFFT of CUDA Output shuffling implemented as double loop done on Host CPU (for now)

34 Slide 23 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > Outline 1. Motivation 2. Short introduction to CUDA 3. PFBs and the Channelizer 4. Translation to CUDA 5. Results 6. Release plans and future changes

35 Slide 24 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > Channel PFB 32 Channels No Oversampling 437 taps prototype filter

36 Slide 24 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > Channel PFB

37 Slide 25 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > Channel PFB 45 Channels 3x Oversampling 1501 taps prototype filter

38 Slide 25 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > Channel PFB

39 Slide 25 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > Outline 1. Motivation 2. Short introduction to CUDA 3. PFBs and the Channelizer 4. Translation to CUDA 5. Results 6. Release plans and future changes

40 Slide 26 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer > Release Plan Release Date: TBD Still some bureaucratic hurdles Still dependent on project code License: LGPL3 Platform Github (Group KN-SAN) Follow for release news

41 Slide 27 of 27 > Generic Polyphase Filterbanks with CUDA > Jan Krämer >

Coprocessor Accelerated Filterbank Extension Library

Coprocessor Accelerated Filterbank Extension Library Mummy, are we there yet Jan Krämer DLR Institute of Communication and Navigation (IKN) 04.02.2018 Overview Introduction Arbitrary Resampler Transition