GPU-Based Real-Time SAS Processing On-Board Autonomous Underwater Vehicles

Similar documents
Using GPUs to Accelerate Synthetic Aperture Sonar Imaging via Backpropagation

AUTOMATIC RECTIFICATION OF SIDE-SCAN SONAR IMAGES

Evaluating the Potential of Graphics Processors for High Performance Embedded Computing

Several imaging algorithms for synthetic aperture sonar and forward looking gap-filler in real-time and post-processing on IXSEA s Shadows sonar

ASSESSMENT OF SHALLOW WATER PERFORMANCE USING INTERFEROMETRIC SONAR COHERENCE

Motion estimation of unmanned marine vehicles Massimo Caccia

Vision-based Localization of an Underwater Robot in a Structured Environment

SS4 - Spatial array processing

Visual Speed Adaptation for Improved Sensor Coverage in a Multi-Vehicle Survey Mission

GEOACOUSTICS GEOSWATH PLUS DATA PROCESSING WITH CARIS HIPS 8.1

Mobile Robots: An Introduction.

LAIR. UNDERWATER ROBOTICS Field Explorations in Marine Biology, Oceanography, and Archeology

DYNAMIC POSITIONING CONFERENCE September 16-17, Sensors

Prepared for: CALIFORNIA COAST COMMISSION c/o Dr. Stephen Schroeter 45 Fremont Street, Suite 2000 San Francisco, CA

Ship Detection and Motion Parameter Estimation with TanDEM-X in Large Along-Track Baseline Configuration

Ocean High Technology Institute, Inc. Tadahiko Katsura Japan Hydrographic Association

Alignment of Centimeter Scale Bathymetry using Six Degrees of Freedom

Motion Compensation of AUV-Based Synthetic Aperture Sonar

Avoiding Shared Memory Bank Conflicts in Rate Conversion Filtering

High performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli

UNDERWATER TERRAIN AIDED NAVIGATION BASED ON ACOUSTIC IMAGING

AUTONOMOUS PLANETARY ROVER CONTROL USING INVERSE SIMULATION

Video Mosaics for Virtual Environments, R. Szeliski. Review by: Christopher Rasmussen

Modelling and Simulation of the Autonomous Underwater Vehicle (AUV) Robot

Using CUDA to Accelerate Radar Image Processing

Image-Quality Prediction of Synthetic Aperture Sonar Imagery

Interferometric Measurements Using Redundant Phase Centers of Synthetic Aperture Sonars

Geocoder: An Efficient Backscatter Map Constructor

Design, Implementation and Performance Evaluation of Synthetic Aperture Radar Signal Processor on FPGAs

Inertial Systems. Ekinox Series TACTICAL GRADE MEMS. Motion Sensing & Navigation IMU AHRS MRU INS VG

Testing the Possibilities of Using IMUs with Different Types of Movements

EE631 Cooperating Autonomous Mobile Robots

600M SPECTRE FIBEROPTIC ROV SYSTEM

Perspective-MB. -- Bathymetry Processing Guide. Kongsberg.ALL Addendum

Future Technologies and Developments In Sonar Systems For Conventional Submarines

The Results of Limiting MRU Updates In Multibeam Data by Pat Sanders, HYPACK, Inc.

John W. Romein. Netherlands Institute for Radio Astronomy (ASTRON) Dwingeloo, the Netherlands

HIGH-PERFORMANCE TOMOGRAPHIC IMAGING AND APPLICATIONS

Using Side Scan Sonar to Relative Navigation

Navigational Aids 1 st Semester/2007/TF 7:30 PM -9:00 PM

Accelerating the acceleration search a case study. By Chris Laidler

Wavelength Estimation Method Based on Radon Transform and Image Texture

NOISE SUSCEPTIBILITY OF PHASE UNWRAPPING ALGORITHMS FOR INTERFEROMETRIC SYNTHETIC APERTURE SONAR

The Littoral Challenge A SONAR Perspective Dr. Kevin Brinkmann, ATLAS ELEKTRONIK GmbH Littoral OpTech West, September 23 rd and 24 th

Workhorse ADCP Multi- Directional Wave Gauge Primer

ESTIMATION OF DETECTION/CLASSIFICATION PERFORMANCE USING INTERFEROMETRIC SONAR COHERENCE

CS283: Robotics Fall 2016: Sensors

A MODEL-BASED MULTI-VIEW IMAGE REGISTRATION METHOD FOR SAS IMAGES

Estimation of Automotive Pitch, Yaw, and Roll using Enhanced Phase Correlation on Multiple Far-field Windows

Particle Image Velocimetry Part - 3

Action TU1208 Civil Engineering Applications of Ground Penetrating Radar. SPOT-GPR: a freeware tool for target detection and localization in GPR data

SONAR DATA SIMULATION WITH APPLICATION TO MULTI- BEAM ECHO SOUNDERS

Implementing a Speech Recognition System on a GPU using CUDA. Presented by Omid Talakoub Astrid Yi

Efficient Seabed Coverage Path Planning for ASVs and AUVs*

REDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS

Dual-Platform GMTI: First Results With The TerraSAR-X/TanDEM-X Constellation

Major project components: Sensors Robot hardware/software integration Kinematic model generation High-level control

Autonomous Underwater Vehicle Control with a 3 Actuator False Center Mechanism

NAVIPAC/NAVISCAN WITH EDGETECH 6205

AN APPROACH TO DEVELOPING A REFERENCE PROFILER

A new motion compensation algorithm of floating lidar system for the assessment of turbulence intensity

Three-dimensional digital elevation model of Mt. Vesuvius from NASA/JPL TOPSAR

Interval based 3D-mapping of an island with a sector-scan sonar

USING A CLUSTERING TECHNIQUE FOR DETECTION OF MOVING TARGETS IN CLUTTER-CANCELLED QUICKSAR IMAGES

A METHOD FOR EXTRACTING LINES AND THEIR UNCERTAINTY FROM ACOUSTIC UNDERWATER IMAGES FOR SLAM. David Ribas Pere Ridao José Neira Juan Domingo Tardós

Georgia Institute of Technology Center for Signal and Image Processing Steve Conover February 2009

DATA FORMAT DEFINITION DOCUMENT

Sample Sizes: up to 1 X1 X 1/4. Scanners: 50 X 50 X 17 microns and 15 X 15 X 7 microns

Digital electronics & Embedded systems

Graphics Processing Unit (GPU) Acceleration of Machine Vision Software for Space Flight Applications

ROBOT TEAMS CH 12. Experiments with Cooperative Aerial-Ground Robots

Feature Based Navigation for a Platform Inspection AUV

A GPU Implementation of Tiled Belief Propagation on Markov Random Fields. Hassan Eslami Theodoros Kasampalis Maria Kotsifakou

NCSR 4/WP.4 Annex 2: Draft guidelines for shipborne Position, Navigation and Timing (PNT) data processing

CONVOLUTIONAL NEURAL NETWORK TRANSFER LEARNING FOR UNDERWATER OBJECT CLASSIFICATION

Fast time domain beamforming for synthetic aperture sonar

INITIAL RESULTS OF A LOW-COST SAR: YINSAR

Convolution Soup: A case study in CUDA optimization. The Fairmont San Jose 10:30 AM Friday October 2, 2009 Joe Stam

Zürich. Roland Siegwart Margarita Chli Martin Rufli Davide Scaramuzza. ETH Master Course: L Autonomous Mobile Robots Summary

Sensor technology for mobile robots

A SWITCHED-ANTENNA NADIR-LOOKING INTERFEROMETRIC SAR ALTIMETER FOR TERRAIN-AIDED NAVIGATION

Outline Sensors. EE Sensors. H.I. Bozma. Electric Electronic Engineering Bogazici University. December 13, 2017

The HPEC Challenge Benchmark Suite

Simrad EK80. Software Release Note Introduction

INESC TEC. Centre for Telecomunications and Multimedia. 21 March Manuel Ricardo. CTM Coordinator

9 Degrees of Freedom Inertial Measurement Unit with AHRS [RKI-1430]

Interpolation and extrapolation of motion capture data

Implementation of underwater precise navigation system for a remotely operated mine disposal vehicle

Real-time full matrix capture with auto-focussing of known geometry through dual layered media

Porting CPU-based Multiprocessing Algorithms to GPU for Distributed Acoustic Sensing

3D target shape from SAS images based on a deformable mesh

Reconfigurable Robot

Using infrared proximity sensors for close 2D localization and object size recognition. Richard Berglind Neonode

Coherence Estimation for Repeat-Pass Interferometry

Future Computer Vision Algorithms for Traffic Sign Recognition Systems

Using a GPU in InSAR processing to improve performance

3D Programming. 3D Programming Concepts. Outline. 3D Concepts. 3D Concepts -- Coordinate Systems. 3D Concepts Displaying 3D Models

Hardware-In-the-loop simulation platform for the design, testing and validation of autonomous control system for unmanned underwater vehicle

Novel Magnetic Field Mapping Technology for Small and Closed Aperture Undulators

On Efficient Data Collection and Event Detection with Delay Minimization in Deep Sea

Transcription:

GPU Technology Conference 2013 GPU-Based Real-Time SAS Processing On-Board Autonomous Underwater Vehicles Jesús Ortiz Advanced Robotics Department (ADVR) Istituto Italiano di tecnologia (IIT) Francesco Baralli Center for Maritime Research and Experimentation (CMRE) NATO Science and Technology Organization (STO) March 18-21, 2013 San Jose, California

CMRE@NATO STO Overview NATO Science and Technology Organization Center for Maritime Research and Experimentation Within the framework of the STO, CMRE organizes and conducts scientific research and technology development, and delivers innovative and field-tested S&T solutions to address the defense and security needs of the Alliance. The CMRE is customer funded. Customers will be comprised of NATO bodies, NATO Nations and other parties, consistent with NATO policy. The mission of the CMRE is centered on the maritime domain, but may extrapolate into other domains to meet customers demands. CMRE maintains a set of core technical capabilities, including: underwater acoustics, sensors and signal processing, autonomy, ocean engineering, seagoing capability, and AUVs. GPU Technology Conference March 18-21, 2013 San Jose, California 2

ADVR@IIT Overview Istituto Italiano di Tecnologia Advanced Robotics Department Innovative and multidisciplinary approach to humanoid design and control, and the development of novel robotic components and technologies: Hardware: mechanical/ electrical design and fabrication, sensor systems, actuation development, etc. Software: control, computer software, human factors, etc. Transition from traditional (hard-bodied) robots towards a new generation of hybrid systems (biologically inspired concepts: muscle, bone, tendon, skin ) Research activities: core scientific/technological research aimed at providing fundamental competencies needed to develop robotic and humanoid technology advanced research demonstrators that provide large focused research projects integrating the core sciences GPU Technology Conference March 18-21, 2013 San Jose, California 3

Autonomous Underwater Vehicle (AUV) Basic: Perform underwater missions following pre-programmed path No human operator in the control loop Carry different sets of environmental and/or imaging sensors Communicate with surface station and with other agents Advanced Real-time data processing and analysis High level of autonomy: adaptive behavior Cooperative behavior among different AUVs GPU Technology Conference March 18-21, 2013 San Jose, California 4

Range Sonar Imaging Synthetic Aperture Sonar (SAS): Multiple pulses to create a large synthetic array (aperture) One pixel of the image is produced using information from multiple pulses Multistatic approach (one emitter, multiple receivers) The emitted ping is a chirp (broad frequency band) Pings GPU Technology Conference March 18-21, 2013 San Jose, California 5

Sonar on both sides (port & starboard) Two receiver arrays per side: Main array (36 elements) Bathymetric array (12 elements) AUV Yaw (heading, North) Heave (Depth, vertical) Roll Surge (longitudinal, along-track) Pitch Sway (lateral, cross-track, range) GPU Technology Conference March 18-21, 2013 San Jose, California 6

Synthetic Aperture Sonar (SAS) Port side Range (cross-track) Along-track (movement) Starboard side Range (cross-track) GPU Technology Conference March 18-21, 2013 San Jose, California 7

SAS result GPU Technology Conference March 18-21, 2013 San Jose, California 8

SAS processing Preprocessing (calibration & matching) Displaced Phase Center Antenna (DPCA) Navigation Integration SAS imaging SAS Interferometry 3D images Automatic Target Recognition (ATR) Replanning GPU Technology Conference March 18-21, 2013 San Jose, California 9

SAS processing Preprocessing (calibration & matching) Displaced Phase Center Antenna (DPCA) Navigation Integration SAS imaging SAS Interferometry 3D images Automatic Target Recognition (ATR) Replanning GPU Technology Conference March 18-21, 2013 San Jose, California 10

SAS processing Time (4Hz 250 ms per ping) PING 1 Preprocessing PING 2 Preprocessing PING 3 Preprocessing... PING N Preprocessing DPCA DPCA DPCA Motion Motion Motion SAS Imaging SAS Imaging SAS Imaging Stacking tiles GPU Technology Conference March 18-21, 2013 San Jose, California 11

SAS processing Time (4Hz 250 ms per ping) PING 1 Preprocessing PING 2 Preprocessing PING 3 Preprocessing... PING N Preprocessing DPCA DPCA DPCA Motion Motion Motion SAS Imaging SAS Imaging SAS Imaging Stacking tiles GPU Technology Conference March 18-21, 2013 San Jose, California 12

Preprocessing Read Replica Replica Shading FFT Replica Freq. Read Ping Ping FFT Replica Shaded Copy to Device Copy to Device Copy to Device Replica Freq. Multiply Ping Calibration FFT Replica Shaded Multiply Inverse FFT Inverse FFT Matched Ping Shaded Ping Memory transf. CPU operation GPU operation Data in host Data in device Only once GPU Technology Conference March 18-21, 2013 San Jose, California 13

Preprocessing Read Replica Replica Shading FFT Replica Freq. Read Ping Ping FFT Replica Shaded Copy to Device Copy to Device Copy to Device Replica Freq. Multiply Ping Calibration FFT Replica Shaded Multiply Inverse FFT Inverse FFT Matched Ping Shaded Ping Memory transf. CPU operation GPU operation Data in host Data in device Only once GPU Technology Conference March 18-21, 2013 San Jose, California 14

Calibrate acoustic data Adjust the phase of each receiver due to wiring and reading delays where: Preprocessing - Calibration A c = A r C A c r = A r r C r A r i C i A c i = A r r C i + A r i C r A c is the calibrated Acoustic data (complex) A r is the raw Acoustic data (complex) C is the Calibration (complex of modulus 1) GPU Technology Conference March 18-21, 2013 San Jose, California 15

Preprocessing Read Replica Replica Shading FFT Replica Freq. Read Ping Ping FFT Replica Shaded Copy to Device Copy to Device Copy to Device Replica Freq. Multiply Ping Calibration FFT Replica Shaded Multiply Inverse FFT Inverse FFT Matched Ping Shaded Ping Memory transf. CPU operation GPU operation Data in host Data in device Only once GPU Technology Conference March 18-21, 2013 San Jose, California 16

Preprocessing - Matching Match acoustic data with the replica Unshaded replica For DPCA Shaded replica For Imaging A mu = A c R u where: A ms = A c R s A mu is the Unshaded Matched Acoustic data (complex) A ms is the Shaded Matched Acoustic data (complex) R u is the Unshaded Replica (complex) R s is the Shaded Replica (complex) GPU Technology Conference March 18-21, 2013 San Jose, California 17

Preprocessing - Results Calibration done in place to reduce the memory requirements FFT with FFTW (CPU) and cufft (GPU) CPU 1 GPU 2 Speedup 84.3 ms 3.7 ms x23 1 Running on an Intel Core i7-920 @ 2.67 GHz 2 Running on a NVIDIA Tesla C1060 GPU Technology Conference March 18-21, 2013 San Jose, California 18

SAS processing Time (4Hz 250 ms per ping) PING 1 Preprocessing PING 2 Preprocessing PING 3 Preprocessing... PING N Preprocessing DPCA DPCA DPCA Motion Motion Motion SAS Imaging SAS Imaging SAS Imaging Stacking tiles GPU Technology Conference March 18-21, 2013 San Jose, California 19

Phase Center Approximation Tx d Rx PCA d/2 GPU Technology Conference March 18-21, 2013 San Jose, California 20

DPCA concept N d (N-M) d d Ping (P) Ping (P+1) D=M d/2 (N-M) d D AUV displacement d Separation elements N Number of elements K=(N-M) Number of overlapped elements R. Heremans, A. Bellettini and M. Pinto, Milestone: Displaced Phase Center Array, September 2006, http://www.sic.rma.ac.be/~rhereman/milestones/dpca.pdf GPU Technology Conference March 18-21, 2013 San Jose, California 21

time-samples DPCA calculation Find best correlation between consecutive pings Use INS for first approximation of surge Get best correlation surge correction Get delay sway calculation Use INS for other variables (roll, pitch, yaw) Receiver elements K overlapped elements K+1 overlapped elements K-1 overlapped elements GPU Technology Conference March 18-21, 2013 San Jose, California 22

Motion Estimation INS information: Speed surge estimation Accelerometer pitch & roll Compass yaw Depth heave DPCA: Surge correction Sway calculation Roll Yaw (heading, North) Heave (Depth, vertical) Surge (longitudinal, along-track) Pitch Sway (lateral, cross-track, range) GPU Technology Conference March 18-21, 2013 San Jose, California 23

DPCA - Cells Divide the acoustic data in cells Each cell covers a different range of time/distance AUV Cell 1 Cell 2 Cell 3 Cell 4 Acoustic data Terrain GPU Technology Conference March 18-21, 2013 San Jose, California 24

DPCA Processing Matched Ping Previous Copy to cell Normalize FFT Shift data FFT Shift data Inverse FFT Scale For each cell Matched Ping Current Copy to cell Normalize FFT Beamforming Inverse FFT Scale Fourier Interpolation Calculate maxima Copy to Host Maxima Navigation DPCA results: Surge, sway, roll, pitch, yaw, correlation factor, transmitter and receiver position, etc Memory transf. CPU operation GPU operation Data in host Data in device GPU Technology Conference March 18-21, 2013 San Jose, California 25

DPCA Processing Matched Ping Previous Copy to cell Normalize FFT Shift data FFT Shift data Inverse FFT Scale For each cell Matched Ping Current Copy to cell Normalize FFT Beamforming Inverse FFT Scale Fourier Interpolation Calculate maxima Copy to Host Maxima Navigation DPCA results: Surge, sway, roll, pitch, yaw, correlation factor, transmitter and receiver position, etc Memory transf. CPU operation GPU operation Data in host Data in device GPU Technology Conference March 18-21, 2013 San Jose, California 26

Calculate modulus: DPCA - Normalize Where: N s modulus = R 2 2 i + I i R i is the real part of the sample i I i is the imaginary part of the sample i N s is the number of samples Divide the values by the modulus: i=1 R i = R i modulus I i = I i modulus GPU Technology Conference March 18-21, 2013 San Jose, California 27

DPCA - Normalize Calculate the modulus of each sample Each thread calculates more than one sample where: (t+1) N st modulus 2 t = R 2 2 i + I i i=t N st +1 modulus t 2 is the partial modulus calculated by the thread t N st is the number of samples per thread Save the partial modulus in the shared memory GPU Technology Conference March 18-21, 2013 San Jose, California 28

DPCA - Normalize Calculate the total modulus using a tree 1 2 3 4 5 6 7 8 1 2 3 4 1 2 3 4 1 2 1 2 1 syncthreads(); if (threadidx.x < 128) modulus[threadidx.x] += modulus[threadidx.x + 128]; syncthreads(); if (threadidx.x < 64) modulus[threadidx.x] += modulus[threadidx.x + 64]; syncthreads();... syncthreads(); if (threadidx.x < 2) modulus[threadidx.x] += modulus[threadidx.x + 2]; syncthreads(); if (threadidx.x == 0) modulus[threadidx.x] += modulus[threadidx.x + 1]; syncthreads(); GPU Technology Conference March 18-21, 2013 San Jose, California 29

DPCA Processing Matched Ping Previous Copy to cell Normalize FFT Shift data FFT Shift data Inverse FFT Scale For each cell Matched Ping Current Copy to cell Normalize FFT Beamforming Inverse FFT Scale Fourier Interpolation Calculate maxima Copy to Host Maxima Navigation DPCA results: Surge, sway, roll, pitch, yaw, correlation factor, transmitter and receiver position, etc Memory transf. CPU operation GPU operation Data in host Data in device GPU Technology Conference March 18-21, 2013 San Jose, California 30

DPCA - Beamforming Multiply current and previous cell with a beamforming coefficient: correlation i = A curr prev i A N K+i, with i = 1 K where bfarray i = correlation i bfcoefficient i A curr i is the current acoustic data A prev i is the previous acoustic data correlation i is the correlation between the current and previous bfcoefficient i is the beamforming coefficient bfarray i is the beamforming array GPU Technology Conference March 18-21, 2013 San Jose, California 31

DPCA Processing Matched Ping Previous Copy to cell Normalize FFT Shift data FFT Shift data Inverse FFT Scale For each cell Matched Ping Current Copy to cell Normalize FFT Beamforming Inverse FFT Scale Fourier Interpolation Calculate maxima Copy to Host Maxima Navigation DPCA results: Surge, sway, roll, pitch, yaw, correlation factor, transmitter and receiver position, etc Memory transf. CPU operation GPU operation Data in host Data in device GPU Technology Conference March 18-21, 2013 San Jose, California 32

DPCA - Maxima Find the maximum and use parabolic filtering We need to store the two neighbors: Three points in total (left, maximum, right) Two values per point (complex values, not enough with modulus) Only position of the maximum Each thread calculates a local maximum between a group of N st (number of samples per thread) Stores modulus of the maximum and the position in shared memory GPU Technology Conference March 18-21, 2013 San Jose, California 33

DPCA - Maximums Calculate the maximum using a tree, first thread stores results in global memory if (threadidx.x < 128) { if (modulus[threadidx.x] < modulus[threadidx.x + 128]) { modulus[threadidx.x] = modulus[threadidx.x + 128]; positions[threadidx.x] = positions[threadidx.x + 128]; } } syncthreads();... syncthreads(); if (threadidx.x < 2) { if (modulus[threadidx.x] < modulus[threadidx.x + 2]) { modulus[threadidx.x] = modulus[threadidx.x + 2]; positions[threadidx.x] = positions[threadidx.x + 2]; } } syncthreads(); GPU Technology Conference March 18-21, 2013 San Jose, California 34

DPCA - Results Parabolic filtering, motion calculation, navigation integration always on the CPU FFT with FFTW (CPU) and cufft (GPU) CPU 1 GPU 2 Speedup 93.5 ms 18.4 ms x5 1 Running on an Intel Core i7-920 @ 2.67 GHz 2 Running on a NVIDIA Tesla C1060 GPU Technology Conference March 18-21, 2013 San Jose, California 35

SAS processing Time (4Hz 250 ms per ping) PING 1 Preprocessing PING 2 Preprocessing PING 3 Preprocessing... PING N Preprocessing DPCA DPCA DPCA Motion Motion Motion SAS Imaging SAS Imaging SAS Imaging Stacking tiles GPU Technology Conference March 18-21, 2013 San Jose, California 36

For each cell SAS Imaging Processing Calculate Weight Shaded Ping Weights Copy to Device Redundant Element Shading Weights Beamforming, filter, TX compensation, SAS compensation, stacking DPCA results Memory transf. Copy to Host Local tiles Stacking Global tiles CPU operation GPU operation Data in host Data in device Only once GPU Technology Conference March 18-21, 2013 San Jose, California 37

For each cell SAS Imaging Processing Calculate Weight Shaded Ping Weights Copy to Device Redundant Element Shading Weights Beamforming, filter, TX compensation, SAS compensation, stacking DPCA results Memory transf. Copy to Host Local tiles Stacking Global tiles CPU operation GPU operation Data in host Data in device Only once GPU Technology Conference March 18-21, 2013 San Jose, California 38

Redundant Element Shading Correct redundancy of the elements It depends on surge Example: 8 elements and displacement of 3...... Ping n-1 Ping n 3 3 2 3 3 2 3 3 Ping n+1...... 0.333 0.333 0.5 0.333 0.333 0.5 0.333 0.333 GPU Technology Conference March 18-21, 2013 San Jose, California 39

For each cell SAS Imaging Processing Calculate Weight Shaded Ping Weights Copy to Device Redundant Element Shading Weights Beamforming, filter, TX compensation, SAS compensation, stacking DPCA results Memory transf. Copy to Host Local tiles Stacking Global tiles CPU operation GPU operation Data in host Data in device Only once GPU Technology Conference March 18-21, 2013 San Jose, California 40

Redundant Element Shading Multiply the acoustic data by the weight: Where: A j w = A j w j A w j is the weighted acoustic data of the element j A j is the acoustic data of the element j w j is the weight for the element j The multiplication is done in place (acoustic data is overwritten), to save memory. GPU Technology Conference March 18-21, 2013 San Jose, California 41

For each cell SAS Imaging Processing Calculate Weight Shaded Ping Weights Copy to Device Redundant Element Shading Weights Beamforming, filter, TX compensation, SAS compensation, stacking DPCA results Memory transf. Copy to Host Local tiles Stacking Global tiles CPU operation GPU operation Data in host Data in device Only once GPU Technology Conference March 18-21, 2013 San Jose, California 42

SAS Imaging - Beamforming Each cell has: Min range Max range We calculate: Min X Max X With this values we have the bounding box (dark areas) X Range GPU Technology Conference March 18-21, 2013 San Jose, California 43

SAS Imaging - Beamforming Number of blocks = Cells in range Number of threads = Cells in X Variable number of cells in range No big problem Variable number of cells in X Optimization problem Each thread calculates a fix number of points (adjusted to the maximum expected size) and we vary the number of threads maybe odd number of threads Variable number of points per thread and fixed number of threads (optimized number of threads for occupancy) Problem with small numbers Variable number of points per thread and number of threads More adaptable to any size Store pair of values in a table (input: number of cells in X, output: number of points per thread and number of threads) GPU Technology Conference March 18-21, 2013 San Jose, California 44

SAS Imaging - Beamforming Blocks and threads distribution Blocks (range) 1 2 3 4 1 2 3 4 1 2 1 2 3 4 1 2 1 2 3 4 1 2 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 1 2 3 4 1 2 3 4 1 2 1 2 3 4 1 2 3 4 1 2 Threads (X) GPU Technology Conference March 18-21, 2013 San Jose, California 45

SAS Imaging - Beamforming Range index (r) and position (R) of the block: r = r min + j, where j = 0.. N b 1 and N b = r max -r min R = r R res + R min where: r min and r max are the minimum and maximum range index R min and R max are the minimum and maximum range distance N b is the number of blocks r is the range index R is the range distance R res is the resolution GPU Technology Conference March 18-21, 2013 San Jose, California 46

SAS Imaging - Beamforming Calculate x min and x max from range and beam angle X index of the thread: x = x min + i + k N t, where i = 0... N t 1 and k = 0... N pt 1 Alternative calculation (slower): x = x min + i N pt + k where: x min and x max are the minimum and maximum X index N t is the number of threads N pt is the number of points per thread x is the X index Conversion between index and position analogue to range, with X, X min, X max and X res GPU Technology Conference March 18-21, 2013 San Jose, California 47

SAS Imaging - Beamforming With position in range and X we can calculate the distance to the transmitter and to each receiver using the AUV position estimation from the DPCA The total distance travelled by the sound (transmitter floor receiver) gives us the sample index in the acoustic data Transmitter Receiver X Range GPU Technology Conference March 18-21, 2013 San Jose, California 48

SAS Imaging - Filter Instead of taking one single sample, we use a filter similar to sinc Store acoustic data and filter in 1D textures (float2 format) distancetransmitter = calculate_distance_to_transmitter(postransmitter, postile) tileacc = 0.0 for each receiver distancereceiver = calculate_distance_to_receiver(posreceiver, postile) pathlength = distancetransmitter + distancereceiver indexdata = calculate_index_data(pathlength) indexfilter = calculate_index_filter(...) tilesum = 0.0 for filter size tilesum += filter(...) * acousticdata(...) next tileacc += tilesum * distancereceiver next tileacc *= distancetransmitter / nreceivers GPU Technology Conference March 18-21, 2013 San Jose, California 49

Multiply the accumulated value by the compensation factors: TX compensation: Eliminate rippling effects (similar to inverse sinc filter) SAS compensation: Add smoothing in the beam Stacking: SAS Imaging - Compensation Accumulate value in the tile GPU Technology Conference March 18-21, 2013 San Jose, California 50

Performance depends on the resolution and range Three configurations: SAS Imaging - Results A. Testing: Short range (110-150m), medium resolution (0.025x0.050), port side B. Sea trials: Full range (35-145m), medium resolution (0.050x0.050), both sides C. Scientific: Full range (35-145m), high resolution (0.025x0.015), port side CPU 1 GPU 2 Speedup A 3663.8ms 47.8ms x77 B 8614.2ms 115.0ms x75 C 28588.2ms 352.9ms x81 1 Running on an Intel Core i7-920 @ 2.67 GHz 2 Running on a NVIDIA Tesla C1060 GPU Technology Conference March 18-21, 2013 San Jose, California 51

Global performance CPU 1 GPU 2 Speedup Pre-processing 84.3 ms 3.7 ms x23 DPCA 93.5 ms 18.4 ms x5 Imaging A 3663.8ms 47.8ms x77 Imaging B 8614.2ms 115.0ms x75 Imaging C 28588.2ms 352.9ms x81 Pings per second A 0.26 14.3 x55 Pings per second B 0.11 6.3 x57 Pings per second C 0.035 2.7 x77 1 Running on an Intel Core i7-920 @ 2.67 GHz 2 Running on a NVIDIA Tesla C1060 GPU Technology Conference March 18-21, 2013 San Jose, California 52

Sea Trials Software tested on board an AUV x86 processor with a NVIDIA GT240 equivalent graphics card 4 pings per second (250ms between pings) Running in real-time in medium resolution (configuration B) GPU Technology Conference March 18-21, 2013 San Jose, California 53

Future work Upgrade the on-board GPU to calculate in real-time in high resolution Further software improvements to increase performance Continue development of multi-gpu and multi-cpu versions for desktop computers (scientific research) Add further algorithms in the chain (interferometry generate 3D images from bathymetric and main arrays) Update algorithms? Adapt them to GPU Currently testing the NVIDIA Carma platform It s already running! GPU Technology Conference March 18-21, 2013 San Jose, California 54

Thank you very much! GPU Technology Conference March 18-21, 2013 San Jose, California 55