Deep Neural Networks for Market Prediction on the Xeon Phi *

Similar documents
Implementing Deep Neural Networks for Financial Market Prediction on the Intel Xeon Phi

ImageNet Classification with Deep Convolutional Neural Networks

A performance comparison of Deep Learning frameworks on KNL

Characterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager

Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability

Convolutional Layer Pooling Layer Fully Connected Layer Regularization

Deep Learning with Tensorflow AlexNet

SDA: Software-Defined Accelerator for Large- Scale DNN Systems

Restricted Boltzmann Machines. Shallow vs. deep networks. Stacked RBMs. Boltzmann Machine learning: Unsupervised version

Parallel Methods for Convex Optimization. A. Devarakonda, J. Demmel, K. Fountoulakis, M. Mahoney

Assignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation

Minimizing Computation in Convolutional Neural Networks

Convolutional Neural Network Layer Reordering for Acceleration

SDA: Software-Defined Accelerator for Large- Scale DNN Systems

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers

Study of Residual Networks for Image Recognition

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

Training Deep Neural Networks (in parallel)

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Deep Learning With Noise

Implementation of Deep Convolutional Neural Net on a Digital Signal Processor

Brainchip OCTOBER

Deep Learning for Computer Vision II

A Deep Learning primer

Deep learning prevalence. first neuroscience department. Spiking Neuron Operant conditioning First 1 Billion transistor processor

On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Data Mining: STATISTICA

Introduction to Data Mining and Data Analytics

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Solving Large Regression Problems using an Ensemble of GPU-accelerated ELMs

Perceptron: This is convolution!

Harp-DAAL for High Performance Big Data Computing

Machine Learning. MGS Lecture 3: Deep Learning

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Using CNN Across Intel Architecture

Convolutional Neural Networks

CafeGPI. Single-Sided Communication for Scalable Deep Learning

GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE)

An Exploration of Computer Vision Techniques for Bird Species Classification

A Feed Forward Neural Network in CUDA for a Financial Application

Layer-wise Performance Bottleneck Analysis of Deep Neural Networks

OPTIMIZED GPU KERNELS FOR DEEP LEARNING. Amir Khosrowshahi

Performance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster

Parallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors

MURDOCH RESEARCH REPOSITORY

Deep Learning with Intel DAAL

Using Intel Math Kernel Library with MathWorks* MATLAB* on Intel Xeon Phi Coprocessor System

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.

NVIDIA FOR DEEP LEARNING. Bill Veenhuis

Deep Learning & Neural Networks

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Stochastic Function Norm Regularization of DNNs

Parallel Deep Network Training

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor

High-Performance Data Loading and Augmentation for Deep Neural Network Training

Parallelization of the QR Decomposition with Column Pivoting Using Column Cyclic Distribution on Multicore and GPU Processors

Bridging Neuroscience and HPC with MPI-LiFE Shashank Gugnani

Deep Learning on Modern Architectures. Keren Zhou 4/17/2017

Machine Learning in WAN Research

Machine Learning in WAN Research

Achieving Peak Performance on Intel Hardware. Intel Software Developer Conference London, 2017

Predicting GPU Performance from CPU Runs Using Machine Learning

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

Apparel Classifier and Recommender using Deep Learning

Computer Architectures for Deep Learning. Ethan Dell and Daniyal Iqbal

Parallel Computing: Parallel Architectures Jin, Hai

PLB-HeC: A Profile-based Load-Balancing Algorithm for Heterogeneous CPU-GPU Clusters

PORTING CP2K TO THE INTEL XEON PHI. ARCHER Technical Forum, Wed 30 th July Iain Bethune

Improving Linear Algebra Computation on NUMA platforms through auto-tuned tuned nested parallelism

GPU Accelerated Machine Learning for Bond Price Prediction

Application of Deep Learning Techniques in Satellite Telemetry Analysis.

M.Tech Student, Department of ECE, S.V. College of Engineering, Tirupati, India

Finite Element Integration and Assembly on Modern Multi and Many-core Processors

PARALLEL TRAINING OF NEURAL NETWORKS FOR SPEECH RECOGNITION

Introduction to Neural Networks

Neural Networks and Deep Learning

Accelerating Implicit LS-DYNA with GPU

Global Optimality in Neural Network Training

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart

Parallel and Distributed Programming Introduction. Kenjiro Taura

Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing

WHAT TYPE OF NEURAL NETWORK IS IDEAL FOR PREDICTIONS OF SOLAR FLARES?

Image Classification using Fast Learning Convolutional Neural Networks

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

Implementing Strassen-like Fast Matrix Multiplication Algorithms with BLIS

Parallel Systems. Project topics

GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING

DECISION TREES & RANDOM FORESTS X CONVOLUTIONAL NEURAL NETWORKS

Matrix Computations and " Neural Networks in Spark

Brief notes on setting up semi-high performance computing environments. July 25, 2014

GPU Technology Conference 2015 Silicon Valley

Open Access Research on the Prediction Model of Material Cost Based on Data Mining

MACHINE LEARNING CLASSIFIERS ADVANTAGES AND CHALLENGES OF SELECTED METHODS

The Future of High Performance Computing

IMPLEMENTING DEEP LEARNING USING CUDNN 이예하 VUNO INC.

Transcription:

Deep Neural Networks for Market Prediction on the Xeon Phi * Matthew Dixon 1, Diego Klabjan 2 and * The support of Intel Corporation is gratefully acknowledged Jin Hoon Bang 3 1 Department of Finance, Stuart School of Business, Illinois Institute of Technology 2 Department of Industrial Engineering and Management Sciences, Northwestern University 3 Department of Computer Science, Northwestern University

Overview Background on engineering automated trading decisions Early machine learning research and what s changed since then Overview of deep learning Training and backtesting on advanced computer architectures

Algo Trading Strategiess Market data Rules Trading decision Mathematical operations Input Output Configurations

Example fast moving average slow moving average Buy signal Sell signal

Markets Futures market CME 50 liquid futures Other exchanges Equity markets FX markets 10 major currency pairs 30 alternative currency pairs Options markets

Strategy Universe Strategy configurations Configuration c Trading decision s(c) Utility U(s(c)) 1 buy 0 hold -1 sell P&L / drawdown

Strategy Engineering How can we engineer a strategy producing buy / sell decisions?

Finance Research Literature J. Chen, J. F. Diaz, and Y. F. Huang. High technology ETF forecasting: Application of Grey Relational Analysis and Artificial Neural Networks. Frontiers in Finance and Economics, 10(2):129 155, 2013. S. Niaki and S. Hoseinzade. Forecasting S&P 500 index using artificial neural networks and design of experiments. Journal of Industrial Engineering International, 9(1):1, 2013. M. T. Leung, H. Daouk, and A.-S. Chen. Forecasting stock indices: a comparison of classification and level estimation models. International Journal of Forecasting, 16(2):173 190, 2000. A. N. Refenes, A. Zapranis, and G. Francis. Stock performance modeling using neural networks: A comparative study with regression models. Neural Networks, 7(2):375 388, 1994. R. R. Trippi and D. DeSieno. Trading equity index futures with a neural network. The Journal of Portfolio Management, 19(1):27 33, 1992.

Deep Neural Networks Literature Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097 1105, 2012. R. Rojas. Neural Networks: A Systematic Introduction. Springer-Verlag New York, Inc., New York, NY, USA, 1996.

Classifiers Strategy configuration c Features Classifier E.g. -Historic price returns at various lags -Moving averages -Oscillators Trading decision s(c) 1 buy 0 hold -1 sell

11 What s changed since the 90 s?

Computer architecture transformation in half a century RAND computing facilities in the 1950s-60s

Variety of Parallel Architectures Multi-Core CPU Multi-Core CPU Block Diagram CPU-GPU CPU-GPU Block Diagram Thread Execution Control Unit CPU Core 0 CPU Core 1 CPU Core 2 CPU Core 3 CPU Core 0 CPU Core 1 L1/L2 Cache L1/L2 Cache L1/L2 Cache L1/L2 Cache L1/L2 Cache L1/L2 Cache Shared L3 Cache Shared L3 Cache Memory Controller Memory Controller LS LS LS LS LS LS DRAM CPU DRAM DMA GPU DRAM Host CPU GPU Device Compute Cluster Interconnection Network Node 0 Node 1 Node 2 Node3 Compute Cluster Block Diagram 13

Big Data <-> Big Compute BDeep Neural Networks Big Data Big Compute

Feature Engineering Raw features 5 minute mid-prices for 45 CME listed commodity and FX futures over the last 15 years Label -Lagged price differences from 1 to 100 -moving price averages with window size from 5 to 100 -pair-wise correlation of returns Labels from positive, neutral or negative market returns Engineered features Normalized features Labeled feature set Final feature set consists of 9895 features and 50,000 consecutive observations

Deep Learning Algorithm

Walk forward optimization

DNN Performance Comparison of the classification accuracy of a classical ANN (with one hidden layer) and a DNN with four hidden layers. The in-sample and out-of-sample error rates are also compared to check for over-fitting.

DNN Performance

Implementation In designing an algorithm for parallel efficiency on a shared memory architecture, three design goals have been implemented: 1. The algorithm has to be designed with good data locality properties. 2. The dimension of the matrix or for loop being parallelized is at least equal to the number of threads. 3. BLAS routines from the MKL should be used in preference to openmp parallel for loop primitives.

Implementation

Performance on the Intel Xeon Phi Speedup 0 10 20 30 40 240 1000 5000 10000 Speedup of the batched backpropagation algorithm on the Intel Xeon Phi as the number of cores is increased. 0 10 20 30 40 50 60 Number of cores

Performance on the Intel Xeon Phi

Performance on the Intel Xeon Phi Speedup of the batched back-propagation algorithm on the Intel Xeon Phi relative to the baseline for various batch sizes.

Performance on the Intel Xeon Phi The GFLOPS of the weight matrix update (using dgemm) for each layer. Variation of the GFLOPS of the first layer weight matrix update with the batch size.

Summary DNNs exhibit less overfitting than ANNs. Previous studies have applied ANNs to single instruments. We combine multiple instruments across multiple markets and engineer features based on lagging/moving averages and correlations of prices and returns respectively. The training of DNNs requires many parameters and is very computationally intensive we solved this problem by using the Intel Xeon Phi.