Digital Earth Routine on Tegra K1

Size: px
Start display at page:

Download "Digital Earth Routine on Tegra K1"

Transcription

1 Digital Earth Routine on Tegra K1 Aerosol Optical Depth Retrieval Performance Comparison and Energy Efficiency

2 Energy matters! Ecological A topic that affects us all Economical Reasons Practical Curiosity My Background: - Many years of research in High Performance Computing at Fraunhofer SCAI, Germany - Compiler development - Remote Sensing together with the Academy of Science, China

3 Sites.ieee.org mx.nthu.edu.tw AOD Retrieval Method Research cooperation: Fraunhofer SCAI (Germany) and the Academy of Science (China) Aerosol Optical Depth (AOD) is a significant optical property of aerosols AOD is applied to the atmospheric correction of remotely sensed surface features for monitoring volcanic eruptions, forest fires and air quality in general as well as climate predictions from satellites Measurements of different wavelengths for each pixel on earth (with a spacial resolution e.g. 1 km) are stacked into a data cube and form the input for Remote Sensing algorithms

4 Sites.ieee.org AOD Retrieval Method Input Data Collection Daily observations of the MODerate resolution Imaging Spectrometer MODIS from the NASA satellites TERRA und AQUA (i=1,2) Three different wavelengths from the visible spectrum (470, 550, 660 nm) are considered (j=1,2,3) The satellites were placed into a near-polar, sunsynchronous orbit at an altitude of 705km Both complement each other as they observe the same earth regions at different times of the day

5 AOD Retrieval Method Background Consider the Atmosphere as turbid medium following the Lambert-Beer-Law Optical Depth τ = τ R + τ G + τ A The total thickness τ consists of Rayleigh Scattering 4,085 τ R = 0, λ j (λ j wavelength to j) Mie Scattering τ G τ G τ R Chanel wavelength(nm) transmissivity gas-opt. thickn. abs. gas is quasi constant (ozone + water + oxygen + others» tabels) Absorption or (mainly) Scattering through aerosols (AOD ) α Ångstrom's turbidity formula: τ j = β i λ j ( β i AOD for λ j = 1μm ) τ A Example: Cloud Droplets Particles are relatively large» small α» Scattering nearly constant over λ j

6 AOD Retrieval Method SRAP-MODIS Algorithm (Synergetic Retrieval of Aerosol Properties) Difference between TopOfAtmosphere- and Surface-Reflectance (Atmospheric Distortion) τ R τ A τ R τ A Ratio of two observations is constant for all wavelengths Estimate parameters Approximation of the Jacobi-Matrix Influence of the atmosphere decreases rapidly with increasing wavelength» Approximation by TOA values of large wavelengths with minor influences α, β 1, β 2 by Quasi-Newton Method Derive AOD for different wavelengths with Ångstrom's turbidity formula

7 AOD Retrieval Method IMPORTANT for the parallelization of the Retrieval-Method AOD calculation is independent for each pixel and can be performed solely based on the respective wavelengths-vector in the data cube Quasi-Newton-Method for each pixel to determine α, β 1, β 2 The Rate of Convergence for different pixels may vary seriously, e.g. between OL pixels (over-land) OS pixels (over-sea) Masked pixels (more about that later ) Additionally the control-flow may follow different paths in the AOD kernel Diverging branches

8 Power intake (watt) Threads AOD Retrieval on multi-core processors Shared Memory parallelization with OpenMP Static OpenMP-Scheduling Problem: Imbalance on cores Reason on the one hand: Quasi-Newton (convergence) Load-Imbalance Reason on the other hand: Varying pixel data may lead to different branches in the AOD kernel (e.g. cloud-masking) Branch-Divergence static second

9 Power intake (watt) AOD Retrieval on multi-core processors Shared Memory parallelization with OpenMP Solution: adapted scheduling of the pixels AOD threads Similar pixels Similar convergence Similar pixels nearby each other Instead of blocking the iterations/pixels statically in large chunks Small blocks, e.g. of size 1 OR dynamic scheduling As each kernel run is relatively work-intensive, the thereby introduced overhead is insignificant Cloud-Masking Dependencies static dynamic second

10 AOD Retrieval on GPUs Similar to multi-core Solution again: adapted scheduling of the pixels AOD threads Similar pixels Similar convergence Similar pixels nearby each other Thread-Blocks Only little branch-divergence per construction Not too many pixels per Thread-Block - nearby Similar pixels Similar convergence Thread-Block Tuning *NOT*: more is better Registers per block restrictions Programmer (can and has to) optimize parameters GPUs are very well suited for the Retrieval kernel but not necessarily for other parts of the workflow

11 AOD Retrieval on GPUs Speedup with increasing input size Data Transfer GPU overall MC overall GPU calc MC calc 0

12 DRAM DRAM AOD Retrieval on GPUs Comparison of CPU and GPU architecture CPU vs. GPU problem-dependent (part) Low Latency vs. High Throughput Lots of automatisms vs. (still) lots of manual tuning Optimization of Thread-Blocks, register-assignment, occupancy (e.g. registers vs. threads), memory-accesses (shared memory bank conflicts, global memory coalescing), ALU ALU ALU ALU Control Unit Cache Level vs. L2

13 AOD Workflow on HYBRID systems more than the Retrieval is needed Multi-Core Multi-Core or GPU Multi-Core or GPU or HYBRID

14 Why EMBEDDED? EMBEDDED architectures are interesting in various fields of research Energy plays a major role today Satellite on-board observations Automotive sector, e.g. high performance embedded systems for in-vehicle applications The convergence of HPC and embedded systems in our heterogeneous computing future (Kaeli et al. 2011) The Exascale Challenge (Moore s Law) and future HPC systems Relatively cheap combination of multi-cores and GPUs today

15 AOD on EMBEDDED architectures NVIDIA JetsonTK1 Jetson TK1 energy efficient SoC for high performance under strong energy constraints

16 AOD Retrieval on MIXED EMBEDDED JetsonTK1

17 AOD Retrieval Method JetsonTK1 Runtime 1xSoC CPU 1HPCore CPU 4HPCores GPU 1HPCore 4xSoC GPU 1HPCore XeonWS CPU 1T CPU 4T GPU

18 AOD Retrieval Method JetsonTK1 Runtime (Scaling) 1xSoC xSoC 3xSoC xSoC 3xSoC 2xSoC 4xSoC xSoC

19 AOD Retrieval Method JetsonTK1 Energy 1xSoC CPU 1HPCore CPU 4HPCores GPU 1HPCore 4xSoC GPU 1HPCore XeonWS CPU 1T CPU 4T GPU

20 Publications 2015 Multi-Core Processors and Graphics Processing Unit Accelerators for Parallel Retrieval of Aerosol Optical Depth from Satellite Data: Implementation, Performance and Energy Efficiency J. Liu, D. Feld, Y. Xue, J. Garcke and T. Soddemann IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 2016 Design of a Hybrid Parallel Workflow for Efficient Aerosol Optical Depth Retrieval from MODIS Satellite Data for Computers with Multi-core Processors and GPUs J. Liu, D. Feld, Y. Xue, J. Garcke, T. Soddemann and P. Pan International Journal of Digital Earth Workstation1 Corei GHz 8T(HT), GTX460 1xSoC ARM Cortex A GHz 4T, Kepler "192" Energy-Efficiency and Performance Comparison of Aerosol Optical Depth 5000 (AOD) retrieval on distributed Embedded SoC architectures with Nvidia GPUs 0 Workstation32Xeon D. Feld, E. Schricker, Workstation1 J. Liu, Core-i7- Y. Xue, 1xSoC J. Garcke ARM Cortex and T. Soddemann E V GHz 8T(HT), GTX GHz 8T(HT), GTX460 A GHz 4T, Kepler "192" Workstation32Xeon E V GHz 8T(HT), GTX680 SCAI Book (Springer) [t.b.a.] CPU MC GPU HYBRID DYNAMIC CPU MC GPU HYBRID DYNAMIC

21 Energy matters! Workflow as a tool Restrictions Power intake Energy consumption Runtime restriction (real-time) Minimize runtime (post-processing) e.g. on-board missions Goals influence each other Extension of the methods e.g.: Pixel-sorting to reduce divergence respect dependencies Further methods, other input data, other goals

22 Thanks for your attention! Questions? NASA Earth Observatory

Energy-Efficiency and Performance Comparison of Aerosol Optical Depth Retrieval on Distributed Embedded SoC Architectures

Energy-Efficiency and Performance Comparison of Aerosol Optical Depth Retrieval on Distributed Embedded SoC Architectures Energy-Efficiency and Performance Comparison of Aerosol Optical Depth Retrieval on Distributed Embedded SoC Architectures Dustin Feld, Jochen Garcke, Jia Liu, Eric Schricker, Thomas Soddemann, and Yong

More information

Global and Regional Retrieval of Aerosol from MODIS

Global and Regional Retrieval of Aerosol from MODIS Global and Regional Retrieval of Aerosol from MODIS Why study aerosols? CLIMATE VISIBILITY Presented to UMBC/NESDIS June 4, 24 Robert Levy, Lorraine Remer, Yoram Kaufman, Allen Chu, Russ Dickerson modis-atmos.gsfc.nasa.gov

More information

Atmospheric correction of hyperspectral ocean color sensors: application to HICO

Atmospheric correction of hyperspectral ocean color sensors: application to HICO Atmospheric correction of hyperspectral ocean color sensors: application to HICO Amir Ibrahim NASA GSFC / USRA Bryan Franz, Zia Ahmad, Kirk knobelspiesse (NASA GSFC), and Bo-Cai Gao (NRL) Remote sensing

More information

A Survey of Modelling and Rendering of the Earth s Atmosphere

A Survey of Modelling and Rendering of the Earth s Atmosphere Spring Conference on Computer Graphics 00 A Survey of Modelling and Rendering of the Earth s Atmosphere Jaroslav Sloup Department of Computer Science and Engineering Czech Technical University in Prague

More information

JAXA Himawari Monitor Aerosol Products. JAXA Earth Observation Research Center (EORC) September 2018

JAXA Himawari Monitor Aerosol Products. JAXA Earth Observation Research Center (EORC) September 2018 JAXA Himawari Monitor Aerosol Products JAXA Earth Observation Research Center (EORC) September 2018 1 2 JAXA Himawari Monitor JAXA has been developing Himawari-8 products using the retrieval algorithms

More information

Monte Carlo Ray Tracing Based Non-Linear Mixture Model of Mixed Pixels in Earth Observation Satellite Imagery Data

Monte Carlo Ray Tracing Based Non-Linear Mixture Model of Mixed Pixels in Earth Observation Satellite Imagery Data Monte Carlo Ray Tracing Based Non-Linear Mixture Model of Mixed Pixels in Earth Observation Satellite Imagery Data Verification of non-linear mixed pixel model with real remote sensing satellite images

More information

CUDA Optimization with NVIDIA Nsight Visual Studio Edition 3.0. Julien Demouth, NVIDIA

CUDA Optimization with NVIDIA Nsight Visual Studio Edition 3.0. Julien Demouth, NVIDIA CUDA Optimization with NVIDIA Nsight Visual Studio Edition 3.0 Julien Demouth, NVIDIA What Will You Learn? An iterative method to optimize your GPU code A way to conduct that method with Nsight VSE APOD

More information

UV Remote Sensing of Volcanic Ash

UV Remote Sensing of Volcanic Ash UV Remote Sensing of Volcanic Ash Kai Yang University of Maryland College Park WMO Inter-comparison of Satellite-based Volcanic Ash Retrieval Algorithms Workshop June 26 July 2, 2015, Madison, Wisconsin

More information

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono Introduction to CUDA Algoritmi e Calcolo Parallelo References q This set of slides is mainly based on: " CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory " Slide of Applied

More information

Barcelona Supercomputing Center

Barcelona Supercomputing Center www.bsc.es Barcelona Supercomputing Center Centro Nacional de Supercomputación EMIT 2016. Barcelona June 2 nd, 2016 Barcelona Supercomputing Center Centro Nacional de Supercomputación BSC-CNS objectives:

More information

Aerosol Optical Depth Retrieval from Satellite Data in China. Professor Dr. Yong Xue

Aerosol Optical Depth Retrieval from Satellite Data in China. Professor Dr. Yong Xue Aerosol Optical Depth Retrieval from Satellite Data in China Professor Dr. Yong Xue Research Report Outline Multi-scale quantitative retrieval of Aerosol optical depth (AOD) over land Spatial resolution:

More information

The NIR- and SWIR-based On-orbit Vicarious Calibrations for VIIRS

The NIR- and SWIR-based On-orbit Vicarious Calibrations for VIIRS The NIR- and SWIR-based On-orbit Vicarious Calibrations for VIIRS Menghua Wang NOAA/NESDIS/STAR E/RA3, Room 3228, 5830 University Research Ct. College Park, MD 20746, USA Menghua.Wang@noaa.gov Workshop

More information

Experiences Using Tegra K1 and X1 for Highly Energy Efficient Computing

Experiences Using Tegra K1 and X1 for Highly Energy Efficient Computing Experiences Using Tegra K1 and X1 for Highly Energy Efficient Computing Gaurav Mitra Andrew Haigh Luke Angove Anish Varghese Eric McCreath Alistair P. Rendell Research School of Computer Science Australian

More information

GEOG 4110/5100 Advanced Remote Sensing Lecture 2

GEOG 4110/5100 Advanced Remote Sensing Lecture 2 GEOG 4110/5100 Advanced Remote Sensing Lecture 2 Data Quality Radiometric Distortion Radiometric Error Correction Relevant reading: Richards, sections 2.1 2.8; 2.10.1 2.10.3 Data Quality/Resolution Spatial

More information

Porting The Spectral Element Community Atmosphere Model (CAM-SE) To Hybrid GPU Platforms

Porting The Spectral Element Community Atmosphere Model (CAM-SE) To Hybrid GPU Platforms Porting The Spectral Element Community Atmosphere Model (CAM-SE) To Hybrid GPU Platforms http://www.scidacreview.org/0902/images/esg13.jpg Matthew Norman Jeffrey Larkin Richard Archibald Valentine Anantharaj

More information

JAXA Himawari Monitor Aerosol Products. JAXA Earth Observation Research Center (EORC) August 2018

JAXA Himawari Monitor Aerosol Products. JAXA Earth Observation Research Center (EORC) August 2018 JAXA Himawari Monitor Aerosol Products JAXA Earth Observation Research Center (EORC) August 2018 1 JAXA Himawari Monitor JAXA has been developing Himawari 8 products using the retrieval algorithms based

More information

Direct radiative forcing of aerosol

Direct radiative forcing of aerosol Direct radiative forcing of aerosol 1) Model simulation: A. Rinke, K. Dethloff, M. Fortmann 2) Thermal IR forcing - FTIR: J. Notholt, C. Rathke, (C. Ritter) 3) Challenges for remote sensing retrieval:

More information

LIGHT SCATTERING THEORY

LIGHT SCATTERING THEORY LIGHT SCATTERING THEORY Laser Diffraction (Static Light Scattering) When a Light beam Strikes a Particle Some of the light is: Diffracted Reflected Refracted Absorbed and Reradiated Reflected Refracted

More information

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand

More information

Menghua Wang NOAA/NESDIS/STAR Camp Springs, MD 20746, USA

Menghua Wang NOAA/NESDIS/STAR Camp Springs, MD 20746, USA Ocean EDR Product Calibration and Validation Plan Progress Report: VIIRS Ocean Color Algorithm Evaluations and Data Processing and Analyses Define a VIIRS Proxy Data Stream Define the required in situ

More information

Kohei Arai 1 1Graduate School of Science and Engineering Saga University Saga City, Japan. Kenta Azuma 2 2 Cannon Electronics Inc.

Kohei Arai 1 1Graduate School of Science and Engineering Saga University Saga City, Japan. Kenta Azuma 2 2 Cannon Electronics Inc. Method for Surface Reflectance Estimation with MODIS by Means of Bi-Section between MODIS and Estimated Radiance as well as Atmospheric Correction with Skyradiometer Kohei Arai 1 1Graduate School of Science

More information

Hyperspectral Unmixing on GPUs and Multi-Core Processors: A Comparison

Hyperspectral Unmixing on GPUs and Multi-Core Processors: A Comparison Hyperspectral Unmixing on GPUs and Multi-Core Processors: A Comparison Dept. of Mechanical and Environmental Informatics Kimura-Nakao lab. Uehara Daiki Today s outline 1. Self-introduction 2. Basics of

More information

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono Introduction to CUDA Algoritmi e Calcolo Parallelo References This set of slides is mainly based on: CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory Slide of Applied

More information

Improved MODIS Aerosol Retrieval using Modified VIS/MIR Surface Albedo Ratio Over Urban Scenes

Improved MODIS Aerosol Retrieval using Modified VIS/MIR Surface Albedo Ratio Over Urban Scenes Improved MODIS Aerosol Retrieval using Modified VIS/MIR Surface Albedo Ratio Over Urban Scenes Min Min Oo, Matthias Jerg, Yonghua Wu Barry Gross, Fred Moshary, Sam Ahmed Optical Remote Sensing Lab City

More information

Manycore and GPU Channelisers. Seth Hall High Performance Computing Lab, AUT

Manycore and GPU Channelisers. Seth Hall High Performance Computing Lab, AUT Manycore and GPU Channelisers Seth Hall High Performance Computing Lab, AUT GPU Accelerated Computing GPU-accelerated computing is the use of a graphics processing unit (GPU) together with a CPU to accelerate

More information

Parallel Algorithm Engineering

Parallel Algorithm Engineering Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework and numa control Examples

More information

Prototyping GOES-R Albedo Algorithm Based on MODIS Data Tao He a, Shunlin Liang a, Dongdong Wang a

Prototyping GOES-R Albedo Algorithm Based on MODIS Data Tao He a, Shunlin Liang a, Dongdong Wang a Prototyping GOES-R Albedo Algorithm Based on MODIS Data Tao He a, Shunlin Liang a, Dongdong Wang a a. Department of Geography, University of Maryland, College Park, USA Hongyi Wu b b. University of Electronic

More information

Comparison of Full-resolution S-NPP CrIS Radiance with Radiative Transfer Model

Comparison of Full-resolution S-NPP CrIS Radiance with Radiative Transfer Model Comparison of Full-resolution S-NPP CrIS Radiance with Radiative Transfer Model Xu Liu NASA Langley Research Center W. Wu, S. Kizer, H. Li, D. K. Zhou, and A. M. Larar Acknowledgements Yong Han NOAA STAR

More information

CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS

CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS 1 Last time Each block is assigned to and executed on a single streaming multiprocessor (SM). Threads execute in groups of 32 called warps. Threads in

More information

Seawater reflectance in the near-ir

Seawater reflectance in the near-ir Seawater reflectance in the near-ir Maéva DORON David DOXARAN Simon BELANGER Marcel BABIN Laboratoire d'océanographie de Villefranche Seawater Reflectance in the Near-IR Doron, Doxaran, Bélanger & Babin

More information

Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs

Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,

More information

Motivation. Aerosol Retrieval Over Urban Areas with High Resolution Hyperspectral Sensors

Motivation. Aerosol Retrieval Over Urban Areas with High Resolution Hyperspectral Sensors Motivation Aerosol etrieval Over Urban Areas with High esolution Hyperspectral Sensors Barry Gross (CCNY) Oluwatosin Ogunwuyi (Ugrad CCNY) Brian Cairns (NASA-GISS) Istvan Laszlo (NOAA-NESDIS) Aerosols

More information

Data Mining Support for Aerosol Retrieval and Analysis:

Data Mining Support for Aerosol Retrieval and Analysis: Data Mining Support for Aerosol Retrieval and Analysis: Our Approach and Preliminary Results Zoran Obradovic 1 joint work with Amy Braverman 2, Bo Han 1, Zhanqing Li 3, Yong Li 1, Kang Peng 1, Yilian Qin

More information

Class 11 Introduction to Surface BRDF and Atmospheric Scattering. Class 12/13 - Measurements of Surface BRDF and Atmospheric Scattering

Class 11 Introduction to Surface BRDF and Atmospheric Scattering. Class 12/13 - Measurements of Surface BRDF and Atmospheric Scattering University of Maryland Baltimore County - UMBC Phys650 - Special Topics in Experimental Atmospheric Physics (Spring 2009) J. V. Martins and M. H. Tabacniks http://userpages.umbc.edu/~martins/phys650/ Class

More information

Using GPUs to compute the multilevel summation of electrostatic forces

Using GPUs to compute the multilevel summation of electrostatic forces Using GPUs to compute the multilevel summation of electrostatic forces David J. Hardy Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology University of

More information

A Generic Approach For Inversion And Validation Of Surface Reflectance and Aerosol Over Land: Application To Landsat 8 And Sentinel 2

A Generic Approach For Inversion And Validation Of Surface Reflectance and Aerosol Over Land: Application To Landsat 8 And Sentinel 2 A Generic Approach For Inversion And Validation Of Surface Reflectance and Aerosol Over Land: Application To Landsat 8 And Sentinel 2 Eric Vermote NASA Goddard Space Flight Center, Code 619, Greenbelt,

More information

NASA e-deep Blue aerosol update: MODIS Collection 6 and VIIRS

NASA e-deep Blue aerosol update: MODIS Collection 6 and VIIRS NASA e-deep Blue aerosol update: MODIS Collection 6 and VIIRS Andrew M. Sayer, N. Christina Hsu (PI), Corey Bettenhausen, Nick Carletta, Jaehwa Lee, Colin Seftor, Jeremy Warner Past team members: Ritesh

More information

1. Particle Scattering. Cogito ergo sum, i.e. Je pense, donc je suis. - René Descartes

1. Particle Scattering. Cogito ergo sum, i.e. Je pense, donc je suis. - René Descartes 1. Particle Scattering Cogito ergo sum, i.e. Je pense, donc je suis. - René Descartes Generally gas and particles do not scatter isotropically. The phase function, scattering efficiency, and single scattering

More information

Retrieval of optical and microphysical properties of ocean constituents using polarimetric remote sensing

Retrieval of optical and microphysical properties of ocean constituents using polarimetric remote sensing Retrieval of optical and microphysical properties of ocean constituents using polarimetric remote sensing Presented by: Amir Ibrahim Optical Remote Sensing Laboratory, The City College of the City University

More information

Nonlinear Mixing Model of Mixed Pixels in Remote Sensing Satellite Images Taking Into Account Landscape

Nonlinear Mixing Model of Mixed Pixels in Remote Sensing Satellite Images Taking Into Account Landscape Vol. 4, No., 23 Nonlinear Mixing Model of Mixed Pixels in Remote Sensing Satellite Images Taking Into Account Landscape Verification of the proposed nonlinear pixed pixel model through simulation studies

More information

ECE 8823: GPU Architectures. Objectives

ECE 8823: GPU Architectures. Objectives ECE 8823: GPU Architectures Introduction 1 Objectives Distinguishing features of GPUs vs. CPUs Major drivers in the evolution of general purpose GPUs (GPGPUs) 2 1 Chapter 1 Chapter 2: 2.2, 2.3 Reading

More information

GOES-R AWG Radiation Budget Team: Absorbed Shortwave Radiation at surface (ASR) algorithm June 9, 2010

GOES-R AWG Radiation Budget Team: Absorbed Shortwave Radiation at surface (ASR) algorithm June 9, 2010 GOES-R AWG Radiation Budget Team: Absorbed Shortwave Radiation at surface (ASR) algorithm June 9, 2010 Presented By: Istvan Laszlo NOAA/NESDIS/STAR 1 ASR Team Radiation Budget AT chair: Istvan Laszlo ASR

More information

Data-intensive computing in radiative transfer modelling

Data-intensive computing in radiative transfer modelling German Aerospace Center (DLR) Remote Sensing Technology Institute (IMF) Data-intensive computing in radiative transfer modelling Dmitry Efremenko Diego Loyola Adrian Doicu Thomas Trautmann Dmitry.Efremenko@dlr.de

More information

Accelerators in Technical Computing: Is it Worth the Pain?

Accelerators in Technical Computing: Is it Worth the Pain? Accelerators in Technical Computing: Is it Worth the Pain? A TCO Perspective Sandra Wienke, Dieter an Mey, Matthias S. Müller Center for Computing and Communication JARA High-Performance Computing RWTH

More information

Carlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain)

Carlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain) Carlos Reaño, Javier Prades and Federico Silla Technical University of Valencia (Spain) 4th IEEE International Workshop of High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB

More information

Timothy Lanfear, NVIDIA HPC

Timothy Lanfear, NVIDIA HPC GPU COMPUTING AND THE Timothy Lanfear, NVIDIA FUTURE OF HPC Exascale Computing will Enable Transformational Science Results First-principles simulation of combustion for new high-efficiency, lowemision

More information

TEMPO & GOES-R synergy update and! GEO-TASO aerosol retrieval!

TEMPO & GOES-R synergy update and! GEO-TASO aerosol retrieval! TEMPO & GOES-R synergy update and! GEO-TASO aerosol retrieval! Jun Wang! Xiaoguang Xu, Shouguo Ding, Weizhen Hou! University of Nebraska-Lincoln!! Robert Spurr! RT solutions!! Xiong Liu, Kelly Chance!

More information

HPC with GPU and its applications from Inspur. Haibo Xie, Ph.D

HPC with GPU and its applications from Inspur. Haibo Xie, Ph.D HPC with GPU and its applications from Inspur Haibo Xie, Ph.D xiehb@inspur.com 2 Agenda I. HPC with GPU II. YITIAN solution and application 3 New Moore s Law 4 HPC? HPC stands for High Heterogeneous Performance

More information

HPC with Multicore and GPUs

HPC with Multicore and GPUs HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville COSC 594 Lecture Notes March 22, 2017 1/20 Outline Introduction - Hardware

More information

CPU-GPU Heterogeneous Computing

CPU-GPU Heterogeneous Computing CPU-GPU Heterogeneous Computing Advanced Seminar "Computer Engineering Winter-Term 2015/16 Steffen Lammel 1 Content Introduction Motivation Characteristics of CPUs and GPUs Heterogeneous Computing Systems

More information

Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers

Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,

More information

Heterogeneous platforms

Heterogeneous platforms Heterogeneous platforms Systems combining main processors and accelerators e.g., CPU + GPU, CPU + Intel MIC, AMD APU, ARM SoC Any platform using a GPU is a heterogeneous platform! Further in this talk

More information

Designing Parallel Programs. This review was developed from Introduction to Parallel Computing

Designing Parallel Programs. This review was developed from Introduction to Parallel Computing Designing Parallel Programs This review was developed from Introduction to Parallel Computing Author: Blaise Barney, Lawrence Livermore National Laboratory references: https://computing.llnl.gov/tutorials/parallel_comp/#whatis

More information

Calibration Techniques for NASA s Remote Sensing Ocean Color Sensors

Calibration Techniques for NASA s Remote Sensing Ocean Color Sensors Calibration Techniques for NASA s Remote Sensing Ocean Color Sensors Gerhard Meister, Gene Eplee, Bryan Franz, Sean Bailey, Chuck McClain NASA Code 614.2 Ocean Biology Processing Group October 21st, 2010

More information

Towards a robust model of planetary thermal profiles

Towards a robust model of planetary thermal profiles Towards a robust model of planetary thermal profiles RT Equation: General Solution: RT Equation: General Solution: Extinction coefficient Emission coefficient How would you express the Source function

More information

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI. CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance

More information

MET 4410 Remote Sensing: Radar and Satellite Meteorology MET 5412 Remote Sensing in Meteorology. Lecture 9: Reflection and Refraction (Petty Ch4)

MET 4410 Remote Sensing: Radar and Satellite Meteorology MET 5412 Remote Sensing in Meteorology. Lecture 9: Reflection and Refraction (Petty Ch4) MET 4410 Remote Sensing: Radar and Satellite Meteorology MET 5412 Remote Sensing in Meteorology Lecture 9: Reflection and Refraction (Petty Ch4) When to use the laws of reflection and refraction? EM waves

More information

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367

More information

CS 179 Lecture 4. GPU Compute Architecture

CS 179 Lecture 4. GPU Compute Architecture CS 179 Lecture 4 GPU Compute Architecture 1 This is my first lecture ever Tell me if I m not speaking loud enough, going too fast/slow, etc. Also feel free to give me lecture feedback over email or at

More information

LECTURE 37: Ray model of light and Snell's law

LECTURE 37: Ray model of light and Snell's law Lectures Page 1 Select LEARNING OBJECTIVES: LECTURE 37: Ray model of light and Snell's law Understand when the ray model of light is applicable. Be able to apply Snell's Law of Refraction to any system.

More information

When MPPDB Meets GPU:

When MPPDB Meets GPU: When MPPDB Meets GPU: An Extendible Framework for Acceleration Laura Chen, Le Cai, Yongyan Wang Background: Heterogeneous Computing Hardware Trend stops growing with Moore s Law Fast development of GPU

More information

PART I: Collecting data from National Earth Observations

PART I: Collecting data from National Earth Observations Investigation: Air Pollution In this investigation, you are going to explore air pollution around the world for an entire calendar year. We will be using three tools, the National Earth Observations (NEO)

More information

Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA. Part 1: Hardware design and programming model

Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA. Part 1: Hardware design and programming model Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA Part 1: Hardware design and programming model Dirk Ribbrock Faculty of Mathematics, TU dortmund 2016 Table of Contents Why parallel

More information

Shortwave infrared measurements of the TROPOMI instrument on the Sentinel 5 Precursor mission

Shortwave infrared measurements of the TROPOMI instrument on the Sentinel 5 Precursor mission Shortwave infrared measurements of the TROPOMI instrument on the Sentinel 5 Precursor mission Jochen Landgraf 1, Joost aan de Brugh 1, Haili Hu 1,Tobias Borsdorff 1, Remco Scheepmaker 1, Andre Butz 2,

More information

4.5 Images Formed by the Refraction of Light

4.5 Images Formed by the Refraction of Light Figure 89: Practical structure of an optical fibre. Absorption in the glass tube leads to a gradual decrease in light intensity. For optical fibres, the glass used for the core has minimum absorption at

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

Two-Phase flows on massively parallel multi-gpu clusters

Two-Phase flows on massively parallel multi-gpu clusters Two-Phase flows on massively parallel multi-gpu clusters Peter Zaspel Michael Griebel Institute for Numerical Simulation Rheinische Friedrich-Wilhelms-Universität Bonn Workshop Programming of Heterogeneous

More information

Understanding The MODIS Aerosol Products

Understanding The MODIS Aerosol Products Understanding The MODIS Aerosol Products Rich Kleidman Science Systems and Applications Rob Levy Science Systems and Applications Lorraine Remer NASA Goddard Space Flight Center Chistina Chu NASA Goddard

More information

CUDA on ARM Update. Developing Accelerated Applications on ARM. Bas Aarts and Donald Becker

CUDA on ARM Update. Developing Accelerated Applications on ARM. Bas Aarts and Donald Becker CUDA on ARM Update Developing Accelerated Applications on ARM Bas Aarts and Donald Becker CUDA on ARM: a forward-looking development platform for high performance, energy efficient hybrid computing It

More information

Evaluation and Exploration of Next Generation Systems for Applicability and Performance Volodymyr Kindratenko Guochun Shi

Evaluation and Exploration of Next Generation Systems for Applicability and Performance Volodymyr Kindratenko Guochun Shi Evaluation and Exploration of Next Generation Systems for Applicability and Performance Volodymyr Kindratenko Guochun Shi National Center for Supercomputing Applications University of Illinois at Urbana-Champaign

More information

Aeolus L2A optical properties products and assimilation in air quality models

Aeolus L2A optical properties products and assimilation in air quality models Aeolus L2A optical properties products and assimilation in air quality models Thomas Flament, Angela Benedetti, P. Martinet, E. Martins, L. El Amraoui, A. Dabas, P. Flamant Toulouse, 28 March 2017 Aladin,

More information

CSE 591: GPU Programming. Using CUDA in Practice. Klaus Mueller. Computer Science Department Stony Brook University

CSE 591: GPU Programming. Using CUDA in Practice. Klaus Mueller. Computer Science Department Stony Brook University CSE 591: GPU Programming Using CUDA in Practice Klaus Mueller Computer Science Department Stony Brook University Code examples from Shane Cook CUDA Programming Related to: score boarding load and store

More information

Modern Processor Architectures. L25: Modern Compiler Design

Modern Processor Architectures. L25: Modern Compiler Design Modern Processor Architectures L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant minimising the number of instructions

More information

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of

More information

A Large-Scale Cross-Architecture Evaluation of Thread-Coarsening. Alberto Magni, Christophe Dubach, Michael O'Boyle

A Large-Scale Cross-Architecture Evaluation of Thread-Coarsening. Alberto Magni, Christophe Dubach, Michael O'Boyle A Large-Scale Cross-Architecture Evaluation of Thread-Coarsening Alberto Magni, Christophe Dubach, Michael O'Boyle Introduction Wide adoption of GPGPU for HPC Many GPU devices from many of vendors AMD

More information

Dispersion Polarization

Dispersion Polarization Dispersion Polarization Phys Phys 2435: 22: Chap. 33, 31, Pg 1 Dispersion New Topic Phys 2435: Chap. 33, Pg 2 The Visible Spectrum Remember that white light contains all the colors of the s p e c t r u

More information

Realization of a low energy HPC platform powered by renewables - A case study: Technical, numerical and implementation aspects

Realization of a low energy HPC platform powered by renewables - A case study: Technical, numerical and implementation aspects Realization of a low energy HPC platform powered by renewables - A case study: Technical, numerical and implementation aspects Markus Geveler, Stefan Turek, Dirk Ribbrock PACO Magdeburg 2015 / 7 / 7 markus.geveler@math.tu-dortmund.de

More information

Revision History. Applicable Documents

Revision History. Applicable Documents Revision History Version Date Revision History Remarks 1.0 2011.11-1.1 2013.1 Update of the processing algorithm of CAI Level 3 NDVI, which yields the NDVI product Ver. 01.00. The major updates of this

More information

Lecture 1: CS/ECE 3810 Introduction

Lecture 1: CS/ECE 3810 Introduction Lecture 1: CS/ECE 3810 Introduction Today s topics: Why computer organization is important Logistics Modern trends 1 Why Computer Organization 2 Image credits: uber, extremetech, anandtech Why Computer

More information

Heterogeneous SoCs. May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 1

Heterogeneous SoCs. May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 1 COSCOⅣ Heterogeneous SoCs M5171111 HASEGAWA TORU M5171112 IDONUMA TOSHIICHI May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 1 Contents Background Heterogeneous technology May 28, 2014 COMPUTER SYSTEM COLLOQUIUM

More information

Tesla Architecture, CUDA and Optimization Strategies

Tesla Architecture, CUDA and Optimization Strategies Tesla Architecture, CUDA and Optimization Strategies Lan Shi, Li Yi & Liyuan Zhang Hauptseminar: Multicore Architectures and Programming Page 1 Outline Tesla Architecture & CUDA CUDA Programming Optimization

More information

High Performance Computing

High Performance Computing The Need for Parallelism High Performance Computing David McCaughan, HPC Analyst SHARCNET, University of Guelph dbm@sharcnet.ca Scientific investigation traditionally takes two forms theoretical empirical

More information

Munara Tolubaeva Technical Consulting Engineer. 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries.

Munara Tolubaeva Technical Consulting Engineer. 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries. Munara Tolubaeva Technical Consulting Engineer 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries. notices and disclaimers Intel technologies features and benefits depend

More information

Exploring Task Parallelism for Heterogeneous Systems Using Multicore Task Management API

Exploring Task Parallelism for Heterogeneous Systems Using Multicore Task Management API EuroPAR 2016 ROME Workshop Exploring Task Parallelism for Heterogeneous Systems Using Multicore Task Management API Suyang Zhu 1, Sunita Chandrasekaran 2, Peng Sun 1, Barbara Chapman 1, Marcus Winter 3,

More information

GPU Fundamentals Jeff Larkin November 14, 2016

GPU Fundamentals Jeff Larkin November 14, 2016 GPU Fundamentals Jeff Larkin , November 4, 206 Who Am I? 2002 B.S. Computer Science Furman University 2005 M.S. Computer Science UT Knoxville 2002 Graduate Teaching Assistant 2005 Graduate

More information

Addressing Heterogeneity in Manycore Applications

Addressing Heterogeneity in Manycore Applications Addressing Heterogeneity in Manycore Applications RTM Simulation Use Case stephane.bihan@caps-entreprise.com Oil&Gas HPC Workshop Rice University, Houston, March 2008 www.caps-entreprise.com Introduction

More information

1.Rayleigh and Mie scattering. 2.Phase functions. 4.Single and multiple scattering

1.Rayleigh and Mie scattering. 2.Phase functions. 4.Single and multiple scattering 5 November 2014 Outline 1.Rayleigh and Mie scattering 2.Phase functions 3.Extinction 4.Single and multiple scattering Luca Lelli luca@iup.physik.uni-bremen.de Room U2080 Phone 0421.218.62097 Scattering

More information

CUDA OPTIMIZATION WITH NVIDIA NSIGHT ECLIPSE EDITION

CUDA OPTIMIZATION WITH NVIDIA NSIGHT ECLIPSE EDITION CUDA OPTIMIZATION WITH NVIDIA NSIGHT ECLIPSE EDITION WHAT YOU WILL LEARN An iterative method to optimize your GPU code Some common bottlenecks to look out for Performance diagnostics with NVIDIA Nsight

More information

Chapter 24. Wave Optics. Wave Optics. The wave nature of light is needed to explain various phenomena

Chapter 24. Wave Optics. Wave Optics. The wave nature of light is needed to explain various phenomena Chapter 24 Wave Optics Wave Optics The wave nature of light is needed to explain various phenomena Interference Diffraction Polarization The particle nature of light was the basis for ray (geometric) optics

More information

Fast Hardware For AI

Fast Hardware For AI Fast Hardware For AI Karl Freund karl@moorinsightsstrategy.com Sr. Analyst, AI and HPC Moor Insights & Strategy Follow my blogs covering Machine Learning Hardware on Forbes: http://www.forbes.com/sites/moorinsights

More information

Computer Vision on Tegra K1. Chen Sagiv SagivTech Ltd.

Computer Vision on Tegra K1. Chen Sagiv SagivTech Ltd. Computer Vision on Tegra K1 Chen Sagiv SagivTech Ltd. Established in 2009 and headquartered in Israel Core domain expertise: GPU Computing and Computer Vision What we do: - Technology - Solutions - Projects

More information

Advances of parallel computing. Kirill Bogachev May 2016

Advances of parallel computing. Kirill Bogachev May 2016 Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being

More information

TR An Overview of NVIDIA Tegra K1 Architecture. Ang Li, Radu Serban, Dan Negrut

TR An Overview of NVIDIA Tegra K1 Architecture. Ang Li, Radu Serban, Dan Negrut TR-2014-17 An Overview of NVIDIA Tegra K1 Architecture Ang Li, Radu Serban, Dan Negrut November 20, 2014 Abstract This paperwork gives an overview of NVIDIA s Jetson TK1 Development Kit and its Tegra K1

More information

MODULE 3. FACTORS AFFECTING 3D LASER SCANNING

MODULE 3. FACTORS AFFECTING 3D LASER SCANNING MODULE 3. FACTORS AFFECTING 3D LASER SCANNING Learning Outcomes: This module discusses factors affecting 3D laser scanner performance. Students should be able to explain the impact of various factors on

More information

Steve Scott, Tesla CTO SC 11 November 15, 2011

Steve Scott, Tesla CTO SC 11 November 15, 2011 Steve Scott, Tesla CTO SC 11 November 15, 2011 What goal do these products have in common? Performance / W Exaflop Expectations First Exaflop Computer K Computer ~10 MW CM5 ~200 KW Not constant size, cost

More information

Parallel Computing. November 20, W.Homberg

Parallel Computing. November 20, W.Homberg Mitglied der Helmholtz-Gemeinschaft Parallel Computing November 20, 2017 W.Homberg Why go parallel? Problem too large for single node Job requires more memory Shorter time to solution essential Better

More information

General Purpose GPU Computing in Partial Wave Analysis

General Purpose GPU Computing in Partial Wave Analysis JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data

More information

TUNING CUDA APPLICATIONS FOR MAXWELL

TUNING CUDA APPLICATIONS FOR MAXWELL TUNING CUDA APPLICATIONS FOR MAXWELL DA-07173-001_v6.5 August 2014 Application Note TABLE OF CONTENTS Chapter 1. Maxwell Tuning Guide... 1 1.1. NVIDIA Maxwell Compute Architecture... 1 1.2. CUDA Best Practices...2

More information

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield NVIDIA GTX200: TeraFLOPS Visual Computing August 26, 2008 John Tynefield 2 Outline Execution Model Architecture Demo 3 Execution Model 4 Software Architecture Applications DX10 OpenGL OpenCL CUDA C Host

More information

Hardware Acceleration of Feature Detection and Description Algorithms on Low Power Embedded Platforms

Hardware Acceleration of Feature Detection and Description Algorithms on Low Power Embedded Platforms Hardware Acceleration of Feature Detection and Description Algorithms on LowPower Embedded Platforms Onur Ulusel, Christopher Picardo, Christopher Harris, Sherief Reda, R. Iris Bahar, School of Engineering,

More information