GPU Technology Conference 2015 Silicon Valley

Size: px
Start display at page:

Download "GPU Technology Conference 2015 Silicon Valley"

Transcription

1 GPU Technology Conference 2015 Silicon Valley Big Data in Real Time: An Approach to Predictive Analytics for Alpha Generation and Risk Management Yigal Jhirad and Blay Tarnoff March 19, 2015

2 Table of Contents I. State-Space Models: State Instantiation Portfolio and Risk Management: Big Data/Real Time Sensitivity II. Parallel Approach/Predictive Analytics Cluster Analysis: Coherence, Correlation, Cointegration Evolutionary Optimization III. Summary IV. Author Biographies DISCLAIMER: This presentation is for information purposes only. The presenter accepts no liability for the content of this presentation, or for the consequences of any actions taken on the basis of the information provided. Although the information in this presentation is considered to be accurate, this is not a representation that it is complete or should be relied upon as a sole resource, as the information contained herein is subject to change. 2

3 State-Space Models Portfolio/Risk Management Financials markets display discrete dynamic micro states and regimes Traditional valuation techniques may not be as effective, e.g. Options markets Implied Volatility vs. Jump Diffusion Regime Shifts Big Data: Time Series Tick Data, Interday, and Intraday Predictive Analytics: Identifying Structural Breaks Alpha Generation, Risk Management, Market Impact Parallel Processing: APL/CPU and CUDA/GPU physics based modeling exploit hardware efficiently Array Based Processing paradigm Matrix/Vector thought process is key APL is a programming language whose quantum data object is an array, which is fundamental to parallel processing and can leverage parallel processing across CPU s CUDA leverages GPU Hardware Application in Econometrics and Applied Mathematics Monte Carlo Simulations Fourier Analysis Principal Components Optimization Cluster Analysis Cointegration 3

4 Cluster Analysis: Correlation & Cointegration Cluster Analysis: A multivariate technique designed to identify relationships and cohesion Factor Analysis, Risk Model Development Correlation Analysis: Pairwise analysis of data across assets. Each pairwise comparison can be run in parallel. Use Correlation or Cointegration as primary input to cluster analysis Apply proprietary signal filter to remove selected data and reduce spurious correlations 4

5 Evolutionary Optimization Asymptotic Multi-Phase Optimization Identify a target portfolio of stocks that is trending consistently over consecutive periods using specialized, possibly time-sensitive, optimization algorithms Establish a portfolio of stocks that is performing in a cohesive way Identifying and Assessing factors driving outperformance Optimize a basket of factors to track target portfolio Look at factors such as Value vs. Growth, Large Cap vs. Small Cap Optimized Factor Attribution of Targeted Portfolio Relative Ranking Cash/Assets 1 Capex/Assets 2 Dividend Yield 3 Dividend Growth (1 and 5 Year) 4 Market Cap (High - Low) 5 Dividend Payout 6 ROIC 7 E/P 8 Indicative factor attribution of target portfolio Period: 2nd Half of

6 Cluster Analysis: Correlation Application example: Correlation function removing N/A values Correlation measures the direction and strength of a linear relationship between variables. The Pearson product moment correlation between two variables X and Y is calculated as: For N assets there are unique correlation pairs Given an N x M matrix A in which each row is a list of returns for a particular equity, return an N x N matrix R in which each element is the scalar result of correlating each row of A to every other Each element of A may be an N/A value When processing a pair of rows, the calculation must include neither each N/A value nor the corresponding element in the other row. This requires evaluating each pair separately. As a result the increased computational demand is more effectively implemented through a parallel processing solution. As the matrix size increases the benefits of parallel processing become more significant. 6

7 Cluster Analysis: Configuration Tesla K20c Hardware Constraints Compute capability: 3.5 Warp size: 32 Number of shared memory banks: 32 Coalescence capacity (4-byte words): 128 bytes = 32 contiguous words Max blocks / warps / threads per multiprocessor: 16 / 64 / 2048 Max registers per multiprocessor: 64K Max shared memory per multiprocessor: 48K Software Constraints/Response Run 32 independent correlations per block for output coalescence Read 32 prices/returns per correlation for input coalescence Registers used per thread: 33 => max 15 blocks per 4 warps per block Shared memory used per block: per warp => max 16 blocks per 4 warps per block Thread block configuration: 4 x 32 => 15 blocks / 60 warps / 1920 threads per SM 7

8 Cluster Analysis: Kernel Operation Each pair of rows produces a single correlation Input Output

9 Cluster Analysis: Kernel Correlation computation X X Y Y X X 2 Y Y 2 Two passes through prices/returns in global memory required First pass: Compute means of X and Y X, Y Second pass: Subtract means from X and Y X, Y Sum X Y, X X and Y Y X Y, X 2, Y 2 Multiply resultant X Y by reciprocal square roots of X 2 Y 2

10 Cluster Analysis: Kernel First Pass Computation of means (row 33, col 0 of block grid) Repeat: read next section of Prices/Returns into registers Ns necessary for NA handling: 0 where either Xs or Ys is NA; 1 where Xs and Ys are both valid

11 Cluster Analysis: Kernel First Pass Computation of means (row 33, col 0 of block grid) Repeat: sum values in registers into shared memory + + +

12 Cluster Analysis: Kernel First Pass Computation of means (row 33, col 0 of block grid) Sum rows to obtain totals for each row of Prices/Returns Save totals to shared memory Divide totals by N Assets

13 Cluster Analysis: Kernel Second Pass Computation of correlations Read section of Prices/ Returns into registers (reuse registers from first pass) Ns necessary for NA handling: 0 where either Xs or Ys is NA; 1 where Xs and Ys are both valid

14 Cluster Analysis: Kernel Second Pass Computation of correlations Subtract means of X and Y

15 Cluster Analysis: Kernel Second Pass Computation of correlations Multiply and sum (reuse X, Y, N from first pass) + +

16 Cluster Analysis: Kernel Second Pass Computation of correlations Multiply and sum (reuse X, Y, N from first pass) +

17 Cluster Analysis: Kernel Second Pass Computation of correlations Sum rows to obtain of terms for each row of Prices/Returns X X Y Y X X 2 Y Y 2

18 Cluster Analysis: Kernel Second Pass Computation of correlations X X Y Y X X 2 Y Y 2 Assets rsqrt(x2) rsqrt(y2) XY

19 Cluster Analysis: Kernel Output to global memory : row 33, col 0 of block grid Assets Assets Assets

20 Cluster Analysis: Big Data, Real Time Host/Global Memory Pinned-mapped memory eliminates transfer complexity, reduces overhead Kernel processing speed approximately 0.1 seconds for 586 assets 120 periods Input Output

21 Cluster Analysis: Big Data, Real Time Overview Data source constantly streaming tick data Simple routine periodically aggregates ticks into VWAPs or returns, producing an array of prices, one per ticker VWAPs/returns loaded into one column of input Prices/Returns matrix in host memory at every interval Streaming data source Routine to periodically aggregate ticks into VWAPs Aggregation by period enables arbitrary real time speed, limited by speed of GPU kernel algorithm G P U To target application

22 Summary State-Space model Instantiation using Intraday Time Series data for Predictive Analytics Our approach is based on Time Series Cluster Analysis and Evolutionary Optimization Identifying Structural Breaks, Patterns, and Signals Coherency and Group membership Applications in Portfolio Management for Alpha Generation and Risk Management 22

23

24

25

26

27 Author Biographies Yigal D. Jhirad, Senior Vice President, is Director of Quantitative and Derivatives Strategies and a Portfolio Manager for Cohen & Steers options and real assets strategies. Mr. Jhirad heads the firm s Investment Risk Committee. Prior to joining the firm in 2007, Mr. Jhirad was an Executive Director in the institutional equities division of Morgan Stanley, where he headed the company s portfolio and derivatives strategies effort. He was responsible for developing, implementing, and marketing quantitative and derivatives products to a broad array of institutional clients, including hedge funds, active and passive funds, pension funds and endowments. Mr. Jhirad holds a BS in Economics from the Wharton School of the University of Pennsylvania. He is a Financial Risk Manager (FRM) as Certified by the Global Association of Risk Professionals. yjhirad@yahoo.com Blay A. Tarnoff is a senior applications developer and database architect. He specializes in array programming and database design and development. He has developed equity and derivatives applications for program trading, proprietary trading, quantitative strategy, and risk management. Mr. Tarnoff holds an AB in Mathematics from Brown University. He is currently a consultant at Cohen & Steers and was previously at Morgan Stanley. blay@eblay.com 27

GTC 2018 Silicon Valley, California

GTC 2018 Silicon Valley, California GTC 2018 Silicon Valley, California Predictive Learning of Factor Based Strategies using Deep Neural Networks for Investment and Risk Management Yigal Jhirad and Blay Tarnoff March 27, 2018 GTC 2018: Table

More information

MSc Econometrics. VU Amsterdam School of Business and Economics. Academic year

MSc Econometrics. VU Amsterdam School of Business and Economics. Academic year MSc Econometrics VU Amsterdam School of Business and Economics Academic year 2018 2019 MSc Econometrics @ SBE VU Amsterdam prof. dr. Siem Jan Koopman (s.j.koopman@vu.nl) 2 of 27 MSc Econometrics @ SBE

More information

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield NVIDIA GTX200: TeraFLOPS Visual Computing August 26, 2008 John Tynefield 2 Outline Execution Model Architecture Demo 3 Execution Model 4 Software Architecture Applications DX10 OpenGL OpenCL CUDA C Host

More information

NetAdvantage. User s Guide

NetAdvantage. User s Guide NetAdvantage User s Guide Welcome to NetAdvantage. This user guide will show you everything you need to know to access and utilize the wealth of information available from S&P NetAdvantage. This is an

More information

Block Lanczos-Montgomery method over large prime fields with GPU accelerated dense operations

Block Lanczos-Montgomery method over large prime fields with GPU accelerated dense operations Block Lanczos-Montgomery method over large prime fields with GPU accelerated dense operations Nikolai Zamarashkin and Dmitry Zheltkov INM RAS, Gubkina 8, Moscow, Russia {nikolai.zamarashkin,dmitry.zheltkov}@gmail.com

More information

TRADING WORKSTATION [NEW] ADDITIONAL FEATURE LIST

TRADING WORKSTATION [NEW] ADDITIONAL FEATURE LIST ABSTRACT The intend of this document is to provide the information regarding additional features incorporated in the latest release of New Trading Workstation. TRADING WORKSTATION [NEW] ADDITIONAL FEATURE

More information

Backtesting in the Cloud

Backtesting in the Cloud Backtesting in the Cloud A Scalable Market Data Optimization Model for Amazon s AWS Environment A Tick Data Custom Data Solutions Group Case Study Bob Fenster, Software Engineer and AWS Certified Solutions

More information

GPU Fundamentals Jeff Larkin November 14, 2016

GPU Fundamentals Jeff Larkin November 14, 2016 GPU Fundamentals Jeff Larkin , November 4, 206 Who Am I? 2002 B.S. Computer Science Furman University 2005 M.S. Computer Science UT Knoxville 2002 Graduate Teaching Assistant 2005 Graduate

More information

Parallel Computing: Parallel Architectures Jin, Hai

Parallel Computing: Parallel Architectures Jin, Hai Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer

More information

B. Tech. Project Second Stage Report on

B. Tech. Project Second Stage Report on B. Tech. Project Second Stage Report on GPU Based Active Contours Submitted by Sumit Shekhar (05007028) Under the guidance of Prof Subhasis Chaudhuri Table of Contents 1. Introduction... 1 1.1 Graphic

More information

High performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli

High performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli High performance 2D Discrete Fourier Transform on Heterogeneous Platforms Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli Motivation Fourier Transform widely used in Physics, Astronomy, Engineering

More information

Optimization solutions for the segmented sum algorithmic function

Optimization solutions for the segmented sum algorithmic function Optimization solutions for the segmented sum algorithmic function ALEXANDRU PÎRJAN Department of Informatics, Statistics and Mathematics Romanian-American University 1B, Expozitiei Blvd., district 1, code

More information

Performance impact of dynamic parallelism on different clustering algorithms

Performance impact of dynamic parallelism on different clustering algorithms Performance impact of dynamic parallelism on different clustering algorithms Jeffrey DiMarco and Michela Taufer Computer and Information Sciences, University of Delaware E-mail: jdimarco@udel.edu, taufer@udel.edu

More information

New Chart Module in SaxoTrader and SaxoWebTrader PLATFORM RELEASE NOTES

New Chart Module in SaxoTrader and SaxoWebTrader PLATFORM RELEASE NOTES New Chart Module in SaxoTrader and SaxoWebTrader PLATFORM RELEASE NOTES Summary This document describes the features of the new chart module in the SaxoTrader and SaxoWebTrader platforms that will replace

More information

A. Risks of Specific Securities Utilized A. Disciplinary Information... 15

A. Risks of Specific Securities Utilized A. Disciplinary Information... 15 Page1 Page2 Page3 A. Information about Our Firm... 5 B. Types of Advisory Services Offered... 5 C. Money Management Process and Restrictions... 6 D. Participation in Wrap Fee Programs.... 6 E. Assets under

More information

CUDA Optimization with NVIDIA Nsight Visual Studio Edition 3.0. Julien Demouth, NVIDIA

CUDA Optimization with NVIDIA Nsight Visual Studio Edition 3.0. Julien Demouth, NVIDIA CUDA Optimization with NVIDIA Nsight Visual Studio Edition 3.0 Julien Demouth, NVIDIA What Will You Learn? An iterative method to optimize your GPU code A way to conduct that method with Nsight VSE APOD

More information

Parallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model

Parallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model Parallel Programming Principle and Practice Lecture 9 Introduction to GPGPUs and CUDA Programming Model Outline Introduction to GPGPUs and Cuda Programming Model The Cuda Thread Hierarchy / Memory Hierarchy

More information

Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation

Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation Nikolai Sakharnykh - NVIDIA San Jose Convention Center, San Jose, CA September 21, 2010 Introduction Tridiagonal solvers very popular

More information

E6895 Advanced Big Data Analytics Lecture 8: GPU Examples and GPU on ios devices

E6895 Advanced Big Data Analytics Lecture 8: GPU Examples and GPU on ios devices E6895 Advanced Big Data Analytics Lecture 8: GPU Examples and GPU on ios devices Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph

More information

2017 INVESTMENT MANAGEMENT CONFERENCE NEW YORK Big Data: Risks and Rewards for Investment Management

2017 INVESTMENT MANAGEMENT CONFERENCE NEW YORK Big Data: Risks and Rewards for Investment Management 2017 INVESTMENT MANAGEMENT CONFERENCE NEW YORK Big Data: Risks and Rewards for Investment Management Derek N. Steingarten, New York Julia B. Jacobson, Boston Barbara Bridges, VP of Legal & Compliance,

More information

Tesla Architecture, CUDA and Optimization Strategies

Tesla Architecture, CUDA and Optimization Strategies Tesla Architecture, CUDA and Optimization Strategies Lan Shi, Li Yi & Liyuan Zhang Hauptseminar: Multicore Architectures and Programming Page 1 Outline Tesla Architecture & CUDA CUDA Programming Optimization

More information

GRAPHICS PROCESSING UNITS

GRAPHICS PROCESSING UNITS GRAPHICS PROCESSING UNITS Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 4, John L. Hennessy and David A. Patterson, Morgan Kaufmann, 2011

More information

On Massively Parallel Algorithms to Track One Path of a Polynomial Homotopy

On Massively Parallel Algorithms to Track One Path of a Polynomial Homotopy On Massively Parallel Algorithms to Track One Path of a Polynomial Homotopy Jan Verschelde joint with Genady Yoffe and Xiangcheng Yu University of Illinois at Chicago Department of Mathematics, Statistics,

More information

FRM TRAINING Our National Industrial Training Authority registration Number is NITA/TRN/1261.

FRM TRAINING Our National Industrial Training Authority registration Number is NITA/TRN/1261. FRM TRAINING The demand for financial risk managers has never been higher than now since the global financial crisis of 2007-2009. The interconnectedness of global financial system and the rapid evolution

More information

Bachelor of Applied Finance (Financial Planning)

Bachelor of Applied Finance (Financial Planning) Course information for Bachelor of Applied Finance (Financial Planning) Course Number HE20521 Location St George Ultimo Course Structure The structure below is the typical study pattern for a full time

More information

Click to edit Master title style

Click to edit Master title style Click Investment to edit Master Management subtitle for High Net Worth Individuals and Institutions www.cambriainvestments.com Disclaimer This presentation is for informational purposes and is not an offer

More information

Course information for

Course information for Course information for Bachelor of Applied Commerce Majoring in Financial Planning - HE20531 Majoring in Accounting - HE20532 Diploma of Applied Commerce - HE20515 Course Design Bachelor of Applied Commerce

More information

Double-Precision Matrix Multiply on CUDA

Double-Precision Matrix Multiply on CUDA Double-Precision Matrix Multiply on CUDA Parallel Computation (CSE 60), Assignment Andrew Conegliano (A5055) Matthias Springer (A995007) GID G--665 February, 0 Assumptions All matrices are square matrices

More information

A Global Look at IT Audit Best Practices

A Global Look at IT Audit Best Practices A Global Look at IT Audit Best Practices 2015 IT Audit Benchmarking Survey March 2015 Speakers Kevin McCreary is a Senior Manager in Protiviti s IT Risk practice. He has extensive IT audit and regulatory

More information

Equity Screening Manual

Equity Screening Manual Equity Screening Manual Equity Screening The following table lists Universal Screening s features and their corresponding page numbers in the manual: If you are looking at... Find out how to: See page

More information

Applications of Berkeley s Dwarfs on Nvidia GPUs

Applications of Berkeley s Dwarfs on Nvidia GPUs Applications of Berkeley s Dwarfs on Nvidia GPUs Seminar: Topics in High-Performance and Scientific Computing Team N2: Yang Zhang, Haiqing Wang 05.02.2015 Overview CUDA The Dwarfs Dynamic Programming Sparse

More information

Portland State University ECE 588/688. Graphics Processors

Portland State University ECE 588/688. Graphics Processors Portland State University ECE 588/688 Graphics Processors Copyright by Alaa Alameldeen 2018 Why Graphics Processors? Graphics programs have different characteristics from general purpose programs Highly

More information

Lecture 25 Nonlinear Programming. November 9, 2009

Lecture 25 Nonlinear Programming. November 9, 2009 Nonlinear Programming November 9, 2009 Outline Nonlinear Programming Another example of NLP problem What makes these problems complex Scalar Function Unconstrained Problem Local and global optima: definition,

More information

Accelerating Financial Applications on the GPU

Accelerating Financial Applications on the GPU Accelerating Financial Applications on the GPU Scott Grauer-Gray Robert Searles William Killian John Cavazos Department of Computer and Information Science University of Delaware Sixth Workshop on General

More information

Transcend Information, Inc.

Transcend Information, Inc. Transcend Information, Inc. Investor Conference Symbol: 2451 November 09,2018 Disclaimer The predictive and financial information mentioned in the present briefing as promulgated simultaneously is based

More information

Next Generation Directional MWD Tool Requirements for Improving Safety and Accuracy

Next Generation Directional MWD Tool Requirements for Improving Safety and Accuracy Next Generation Directional MWD Tool Requirements for Improving Safety and Accuracy DEA-164 Discussion and Proposal for a JIP Market/Technology Feasibility Study Son Pham ConocoPhillips (Sponsor) +1 281

More information

Chapter 04. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Chapter 04. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1 Chapter 04 Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure 4.1 Potential speedup via parallelism from MIMD, SIMD, and both MIMD and SIMD over time for

More information

Introduction to Programming

Introduction to Programming Introduction to Programming G. Bakalli March 8, 2017 G. Bakalli Introduction to Programming March 8, 2017 1 / 33 Outline 1 Programming in Finance 2 Types of Languages Interpreters Compilers 3 Programming

More information

A Many-Core Machine Model for Designing Algorithms with Minimum Parallelism Overheads

A Many-Core Machine Model for Designing Algorithms with Minimum Parallelism Overheads A Many-Core Machine Model for Designing Algorithms with Minimum Parallelism Overheads Sardar Anisul Haque Marc Moreno Maza Ning Xie University of Western Ontario, Canada IBM CASCON, November 4, 2014 ardar

More information

Optimisation Myths and Facts as Seen in Statistical Physics

Optimisation Myths and Facts as Seen in Statistical Physics Optimisation Myths and Facts as Seen in Statistical Physics Massimo Bernaschi Institute for Applied Computing National Research Council & Computer Science Department University La Sapienza Rome - ITALY

More information

Oracle and Tangosol Acquisition Announcement

Oracle and Tangosol Acquisition Announcement Oracle and Tangosol Acquisition Announcement March 23, 2007 The following is intended to outline our general product direction. It is intended for information purposes only, and may

More information

Altera SDK for OpenCL

Altera SDK for OpenCL Altera SDK for OpenCL A novel SDK that opens up the world of FPGAs to today s developers Altera Technology Roadshow 2013 Today s News Altera today announces its SDK for OpenCL Altera Joins Khronos Group

More information

Multi-Processors and GPU

Multi-Processors and GPU Multi-Processors and GPU Philipp Koehn 7 December 2016 Predicted CPU Clock Speed 1 Clock speed 1971: 740 khz, 2016: 28.7 GHz Source: Horowitz "The Singularity is Near" (2005) Actual CPU Clock Speed 2 Clock

More information

CUDA. Schedule API. Language extensions. nvcc. Function type qualifiers (1) CUDA compiler to handle the standard C extensions.

CUDA. Schedule API. Language extensions. nvcc. Function type qualifiers (1) CUDA compiler to handle the standard C extensions. Schedule CUDA Digging further into the programming manual Application Programming Interface (API) text only part, sorry Image utilities (simple CUDA examples) Performace considerations Matrix multiplication

More information

Investor conference 2018/Q3

Investor conference 2018/Q3 Investor conference 2018/Q3 Disclaimer The information presented and referred herein are based upon the information obtained internally and externally from our company. In light of the forward-looking

More information

Integers are whole numbers; they include negative whole numbers and zero. For example -7, 0, 18 are integers, 1.5 is not.

Integers are whole numbers; they include negative whole numbers and zero. For example -7, 0, 18 are integers, 1.5 is not. What is an INTEGER/NONINTEGER? Integers are whole numbers; they include negative whole numbers and zero. For example -7, 0, 18 are integers, 1.5 is not. What is a REAL/IMAGINARY number? A real number is

More information

CUDA Optimization: Memory Bandwidth Limited Kernels CUDA Webinar Tim C. Schroeder, HPC Developer Technology Engineer

CUDA Optimization: Memory Bandwidth Limited Kernels CUDA Webinar Tim C. Schroeder, HPC Developer Technology Engineer CUDA Optimization: Memory Bandwidth Limited Kernels CUDA Webinar Tim C. Schroeder, HPC Developer Technology Engineer Outline We ll be focussing on optimizing global memory throughput on Fermi-class GPUs

More information

TUNING CUDA APPLICATIONS FOR MAXWELL

TUNING CUDA APPLICATIONS FOR MAXWELL TUNING CUDA APPLICATIONS FOR MAXWELL DA-07173-001_v7.0 March 2015 Application Note TABLE OF CONTENTS Chapter 1. Maxwell Tuning Guide... 1 1.1. NVIDIA Maxwell Compute Architecture... 1 1.2. CUDA Best Practices...2

More information

COUNTRY PROFILE AUSTRALIA

COUNTRY PROFILE AUSTRALIA COUNTRY PROFILE Statistical tables Factor I: Economic Performance WORLD COMPETITIVENESS RANKING 2018 All data are available from the World Competitiveness Online. Visit our eshop 1 COMPETITIVENESS TRENDS

More information

COUNTRY PROFILE. Mexico

COUNTRY PROFILE. Mexico COUNTRY PROFILE Mexico Statistical tables Factor I: Economic Performance WORLD COMPETITIVENESS RANKING 2018 All data are available from the World Competitiveness Online. Visit our eshop 1 COMPETITIVENESS

More information

Adapting efinance Web Server Farms to Changing Market Demands

Adapting efinance Web Server Farms to Changing Market Demands Adapting efinance Web Farms to Changing Market Demands Ronald Moore/Achim Müller/Ralf Müller/Klaus Temmen IS Innovative Software AG Sandweg 94, 60316 Frankfurt am Main {ronald.moore amueller ralf.mueller

More information

Conference call October 26, :00 / Helsinki 08:00 / New York 1 Nokia 2016 Q3 2017

Conference call October 26, :00 / Helsinki 08:00 / New York 1 Nokia 2016 Q3 2017 Conference call October 26, 2017 15:00 / Helsinki 08:00 / New York 1 Nokia 2016 Q3 2017 Disclaimer It should be noted that Nokia and its business are exposed to various risks and uncertainties, and certain

More information

COUNTRY PROFILE. Croatia

COUNTRY PROFILE. Croatia COUNTRY PROFILE Croatia Statistical tables Factor I: Economic Performance WORLD COMPETITIVENESS RANKING 2018 All data are available from the World Competitiveness Online. Visit our eshop 1 COMPETITIVENESS

More information

Shared Memory. Table of Contents. Shared Memory Learning CUDA to Solve Scientific Problems. Objectives. Technical Issues Shared Memory.

Shared Memory. Table of Contents. Shared Memory Learning CUDA to Solve Scientific Problems. Objectives. Technical Issues Shared Memory. Table of Contents Shared Memory Learning CUDA to Solve Scientific Problems. 1 Objectives Miguel Cárdenas Montes Centro de Investigaciones Energéticas Medioambientales y Tecnológicas, Madrid, Spain miguel.cardenas@ciemat.es

More information

Guide for Transitioning From Fixed To Flexible Scenario Format

Guide for Transitioning From Fixed To Flexible Scenario Format Guide for Transitioning From Fixed To Flexible Scenario Format Updated December 23 2016 GGY AXIS 5001 Yonge Street Suite 1300 Toronto, ON M2N 6P6 Phone: 416-250-6777 Toll free: 1-877-GGY-AXIS Fax: 416-250-6776

More information

Vodafone Group Plc Michel Combes. Deutsche Bank European TMT Conference September 2011

Vodafone Group Plc Michel Combes. Deutsche Bank European TMT Conference September 2011 Vodafone Group Plc Michel Combes Deutsche Bank European TMT Conference September 2011 Disclaimer Information in the following presentation relating to the price at which relevant investments have been

More information

Warps and Reduction Algorithms

Warps and Reduction Algorithms Warps and Reduction Algorithms 1 more on Thread Execution block partitioning into warps single-instruction, multiple-thread, and divergence 2 Parallel Reduction Algorithms computing the sum or the maximum

More information

In-Memory Computing Brings Operational Intelligence to Business Challenges DR. WILLIAM L. BAIN SCALEOUT SOFTWARE, INC.

In-Memory Computing Brings Operational Intelligence to Business Challenges DR. WILLIAM L. BAIN SCALEOUT SOFTWARE, INC. In-Memory Computing Brings Operational Intelligence to Business Challenges DR. WILLIAM L. BAIN SCALEOUT SOFTWARE, INC. About the Speaker Dr. William Bain, Founder & CEO of ScaleOut Software: Email: wbain@scaleoutsoftware.com

More information

COUNTRY PROFILE. Ireland

COUNTRY PROFILE. Ireland COUNTRY PROFILE Ireland Statistical tables Factor I: Economic Performance WORLD COMPETITIVENESS RANKING 18 All data are available from the World Competitiveness Online. Visit our eshop 1 COMPETITIVENESS

More information

Task Graph. Name: Problem: Context: D B. C Antecedent. Task Graph

Task Graph. Name: Problem: Context: D B. C Antecedent. Task Graph Graph Name: Graph Note: The Graph pattern is a concurrent execution pattern and should not be confused with the Arbitrary Static Graph architectural pattern (1) which addresses the overall organization

More information

Part A: Time Series Application

Part A: Time Series Application MBA 555, MANAGERIAL ECONOMICS Computer-Assisted-Learning (CAL): Neuro-economic nonlinear Business Modeling Base Year: AY 2014 Dr. Gordon H. Dash, Jr. Submissions must be made by email: ghdash@uri.edu Due

More information

TUNING CUDA APPLICATIONS FOR MAXWELL

TUNING CUDA APPLICATIONS FOR MAXWELL TUNING CUDA APPLICATIONS FOR MAXWELL DA-07173-001_v6.5 August 2014 Application Note TABLE OF CONTENTS Chapter 1. Maxwell Tuning Guide... 1 1.1. NVIDIA Maxwell Compute Architecture... 1 1.2. CUDA Best Practices...2

More information

Financial Analytics Acceleration

Financial Analytics Acceleration Financial Analytics Acceleration Presented By Name GEORGI GAYDADJIEV Title Director of Maxeler IoT-Labs Date Dec 10, 2018 FPGA technology is getting traction among Datacenter providers and is expected

More information

Hayashi-Yoshida coefficient estimator implementnation in CUDA

Hayashi-Yoshida coefficient estimator implementnation in CUDA Technical University of Crete Department of Electrical & Computer Engineering Hayashi-Yoshida coefficient estimator implementnation in CUDA Spyros Fotopoulos Submitted in part fulfillment of the requirements

More information

Cross Teaching Parallelism and Ray Tracing: A Project based Approach to Teaching Applied Parallel Computing

Cross Teaching Parallelism and Ray Tracing: A Project based Approach to Teaching Applied Parallel Computing and Ray Tracing: A Project based Approach to Teaching Applied Parallel Computing Chris Lupo Computer Science Cal Poly Session 0311 GTC 2012 Slide 1 The Meta Data Cal Poly is medium sized, public polytechnic

More information

CUDA Memory Model. Monday, 21 February Some material David Kirk, NVIDIA and Wen-mei W. Hwu, (used with permission)

CUDA Memory Model. Monday, 21 February Some material David Kirk, NVIDIA and Wen-mei W. Hwu, (used with permission) CUDA Memory Model Some material David Kirk, NVIDIA and Wen-mei W. Hwu, 2007-2009 (used with permission) 1 G80 Implementation of CUDA Memories Each thread can: Grid Read/write per-thread registers Read/write

More information

Parallelization of Tau-Leap Coarse-Grained Monte Carlo Simulations on GPUs

Parallelization of Tau-Leap Coarse-Grained Monte Carlo Simulations on GPUs Parallelization of Tau-Leap Coarse-Grained Monte Carlo Simulations on GPUs Lifan Xu, Michela Taufer, Stuart Collins, Dionisios G. Vlachos Global Computing Lab University of Delaware Multiscale Modeling:

More information

GENETIC ALGORITHM BASED FPGA PLACEMENT ON GPU SUNDAR SRINIVASAN SENTHILKUMAR T. R.

GENETIC ALGORITHM BASED FPGA PLACEMENT ON GPU SUNDAR SRINIVASAN SENTHILKUMAR T. R. GENETIC ALGORITHM BASED FPGA PLACEMENT ON GPU SUNDAR SRINIVASAN SENTHILKUMAR T R FPGA PLACEMENT PROBLEM Input A technology mapped netlist of Configurable Logic Blocks (CLB) realizing a given circuit Output

More information

COUNTRY PROFILE. Taiwan

COUNTRY PROFILE. Taiwan COUNTRY PROFILE Taiwan Statistical tables Factor I: Economic Performance WORLD COMPETITIVENESS RANKING 2018 All data are available from the World Competitiveness Online. Visit our eshop 1 COMPETITIVENESS

More information

Parallel FFT Program Optimizations on Heterogeneous Computers

Parallel FFT Program Optimizations on Heterogeneous Computers Parallel FFT Program Optimizations on Heterogeneous Computers Shuo Chen, Xiaoming Li Department of Electrical and Computer Engineering University of Delaware, Newark, DE 19716 Outline Part I: A Hybrid

More information

Next Generation Directional MWD Tool Requirements for Improving Safety and Accuracy

Next Generation Directional MWD Tool Requirements for Improving Safety and Accuracy Next Generation Directional MWD Tool Requirements for Improving Safety and Accuracy Discussion and Proposal for a JIP Market/Technology Feasibility Study Keith Lynch Conoco Phillips (sponsor) 281-293-3746

More information

implementation using GPU architecture is implemented only from the viewpoint of frame level parallel encoding [6]. However, it is obvious that the mot

implementation using GPU architecture is implemented only from the viewpoint of frame level parallel encoding [6]. However, it is obvious that the mot Parallel Implementation Algorithm of Motion Estimation for GPU Applications by Tian Song 1,2*, Masashi Koshino 2, Yuya Matsunohana 2 and Takashi Shimamoto 1,2 Abstract The video coding standard H.264/AVC

More information

Unrolling parallel loops

Unrolling parallel loops Unrolling parallel loops Vasily Volkov UC Berkeley November 14, 2011 1 Today Very simple optimization technique Closely resembles loop unrolling Widely used in high performance codes 2 Mapping to GPU:

More information

Software-defined storage systems from Lenovo

Software-defined storage systems from Lenovo Intel Xeon processor Software-defined storage systems from Lenovo The IT environment of the future will be software-defined There is a broad consensus among IT experts that data centres do not currently

More information

Mobile World Congress Claudine Mangano Director, Global Communications Intel Corporation

Mobile World Congress Claudine Mangano Director, Global Communications Intel Corporation Mobile World Congress 2015 Claudine Mangano Director, Global Communications Intel Corporation Mobile World Congress 2015 Brian Krzanich Chief Executive Officer Intel Corporation 4.9B 2X CONNECTED CONNECTED

More information

CISI - International Introduction to Securities & Investment Study Support Training EUROPE MIDDLE EAST & NORTH AFRICA ASIA

CISI - International Introduction to Securities & Investment Study Support Training EUROPE MIDDLE EAST & NORTH AFRICA ASIA CISI - International Introduction to Securities & Investment Study Support Training About ISC & UIC Investment Studies Center (ISC) Contributing to the provision of promising national cadres, capable of

More information

Deploying, Managing and Reusing R Models in an Enterprise Environment

Deploying, Managing and Reusing R Models in an Enterprise Environment Deploying, Managing and Reusing R Models in an Enterprise Environment Making Data Science Accessible to a Wider Audience Lou Bajuk-Yorgan, Sr. Director, Product Management Streaming and Advanced Analytics

More information

Investor conference 2017 Hong Kong

Investor conference 2017 Hong Kong Investor conference 2017 Hong Kong Disclaimer The information presented and referred herein are based upon the information obtained internally and externally from our company. In light of the forward-looking

More information

COUNTRY PROFILE. Bulgaria

COUNTRY PROFILE. Bulgaria COUNTRY PROFILE Bulgaria Statistical tables Factor I: Economic Performance WORLD COMPETITIVENESS RANKING 2018 All data are available from the World Competitiveness Online. Visit our eshop 1 COMPETITIVENESS

More information

COUNTRY PROFILE. Italy

COUNTRY PROFILE. Italy COUNTRY PROFILE Italy Statistical tables Factor I: Economic Performance WORLD COMPETITIVENESS RANKING 2018 All data are available from the World Competitiveness Online. Visit our eshop 1 COMPETITIVENESS

More information

Standing Together for Financial Industry Resilience Quantum Dawn 3 After-Action Report. November 19, 2015

Standing Together for Financial Industry Resilience Quantum Dawn 3 After-Action Report. November 19, 2015 Standing Together for Financial Industry Resilience Quantum Dawn 3 After-Action Report November 19, 2015 Table of contents Background Exercise objectives Quantum Dawn 3 (QD3) cyberattack scenario QD3 results

More information

Sparse Linear Algebra in CUDA

Sparse Linear Algebra in CUDA Sparse Linear Algebra in CUDA HPC - Algorithms and Applications Alexander Pöppl Technical University of Munich Chair of Scientific Computing November 22 nd 2017 Table of Contents Homework - Worksheet 2

More information

ME964 High Performance Computing for Engineering Applications

ME964 High Performance Computing for Engineering Applications ME964 High Performance Computing for Engineering Applications Execution Scheduling in CUDA Revisiting Memory Issues in CUDA February 17, 2011 Dan Negrut, 2011 ME964 UW-Madison Computers are useless. They

More information

STATISTICS (STAT) Statistics (STAT) 1

STATISTICS (STAT) Statistics (STAT) 1 Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).

More information

Fundamental CUDA Optimization. NVIDIA Corporation

Fundamental CUDA Optimization. NVIDIA Corporation Fundamental CUDA Optimization NVIDIA Corporation Outline Fermi/Kepler Architecture Kernel optimizations Launch configuration Global memory throughput Shared memory access Instruction throughput / control

More information

Fundamental CUDA Optimization. NVIDIA Corporation

Fundamental CUDA Optimization. NVIDIA Corporation Fundamental CUDA Optimization NVIDIA Corporation Outline! Fermi Architecture! Kernel optimizations! Launch configuration! Global memory throughput! Shared memory access! Instruction throughput / control

More information

Using. Research Wizard. Version 4.0. Copyright 2001, Zacks Investment Research, Inc.,

Using. Research Wizard. Version 4.0. Copyright 2001, Zacks Investment Research, Inc., Using Research Wizard Version 4.0 Copyright 2001, Zacks Investment Research, Inc., Contents Introduction 1 Research Wizard 4.0 Overview...1 Using Research Wizard...1 Guided Tour 2 Getting Started in Research

More information

Ending the Confusion About Software- Defined Networking: A Taxonomy

Ending the Confusion About Software- Defined Networking: A Taxonomy Ending the Confusion About Software- Defined Networking: A Taxonomy This taxonomy cuts through confusion generated by the flood of vendor SDN announcements. It presents a framework that network and server

More information

Guidance In Situ Pre-Approval Controlled Functions (PCFs): Confirmation of Due Diligence undertaken

Guidance In Situ Pre-Approval Controlled Functions (PCFs): Confirmation of Due Diligence undertaken 2012 Guidance In Situ Pre-Approval Controlled Functions (PCFs): Confirmation of Due Diligence undertaken Table of Contents Contents Table of Contents... 2 Section 1: Introduction... 3 Purpose of this Guidance...

More information

COUNTRY PROFILE. Colombia

COUNTRY PROFILE. Colombia COUNTRY PROFILE Colombia Statistical tables Factor I: Economic Performance WORLD COMPETITIVENESS RANKING 2018 All data are available from the World Competitiveness Online. Visit our eshop 1 COMPETITIVENESS

More information

Deep Neural Networks for Market Prediction on the Xeon Phi *

Deep Neural Networks for Market Prediction on the Xeon Phi * Deep Neural Networks for Market Prediction on the Xeon Phi * Matthew Dixon 1, Diego Klabjan 2 and * The support of Intel Corporation is gratefully acknowledged Jin Hoon Bang 3 1 Department of Finance,

More information

FactSet Quick Start Guide

FactSet Quick Start Guide FactSet Quick Start Guide Table of Contents FactSet Quick Start Guide... 1 FactSet Quick Start Guide... 3 Getting Started... 3 Inserting Components in Your Workspace... 4 Searching with FactSet... 5 Market

More information

Deutsche Bank Credit. Credit User guide.

Deutsche Bank Credit. Credit User guide. Deutsche Bank Credit Credit User guide http://autobahn.db.com 2 Autobahn is Deutsche Bank s award-winning electronic distribution service. Since 1996, Autobahn has been connecting clients to Deutsche Bank

More information

Introduction to GPU programming. Introduction to GPU programming p. 1/17

Introduction to GPU programming. Introduction to GPU programming p. 1/17 Introduction to GPU programming Introduction to GPU programming p. 1/17 Introduction to GPU programming p. 2/17 Overview GPUs & computing Principles of CUDA programming One good reference: David B. Kirk

More information

CUDA and GPU Performance Tuning Fundamentals: A hands-on introduction. Francesco Rossi University of Bologna and INFN

CUDA and GPU Performance Tuning Fundamentals: A hands-on introduction. Francesco Rossi University of Bologna and INFN CUDA and GPU Performance Tuning Fundamentals: A hands-on introduction Francesco Rossi University of Bologna and INFN * Using this terminology since you ve already heard of SIMD and SPMD at this school

More information

What is GPU? CS 590: High Performance Computing. GPU Architectures and CUDA Concepts/Terms

What is GPU? CS 590: High Performance Computing. GPU Architectures and CUDA Concepts/Terms CS 590: High Performance Computing GPU Architectures and CUDA Concepts/Terms Fengguang Song Department of Computer & Information Science IUPUI What is GPU? Conventional GPUs are used to generate 2D, 3D

More information

Concurrent execution of an analytical workload on a POWER8 server with K40 GPUs A Technology Demonstration

Concurrent execution of an analytical workload on a POWER8 server with K40 GPUs A Technology Demonstration Concurrent execution of an analytical workload on a POWER8 server with K40 GPUs A Technology Demonstration Sina Meraji sinamera@ca.ibm.com Berni Schiefer schiefer@ca.ibm.com Tuesday March 17th at 12:00

More information

Data Parallel Execution Model

Data Parallel Execution Model CS/EE 217 GPU Architecture and Parallel Programming Lecture 3: Kernel-Based Data Parallel Execution Model David Kirk/NVIDIA and Wen-mei Hwu, 2007-2013 Objective To understand the organization and scheduling

More information

Lessons learned from a simple application

Lessons learned from a simple application Computation to Core Mapping Lessons learned from a simple application A Simple Application Matrix Multiplication Used as an example throughout the course Goal for today: Show the concept of Computation-to-Core

More information

Intel Many Integrated Core (MIC) Architecture

Intel Many Integrated Core (MIC) Architecture Intel Many Integrated Core (MIC) Architecture Karl Solchenbach Director European Exascale Labs BMW2011, November 3, 2011 1 Notice and Disclaimers Notice: This document contains information on products

More information