Speeding Up the ARDL Estimation Command:

Similar documents
PROGRAM EFFICIENCY & COMPLEXITY ANALYSIS

The cointardl addon for gretl

Lecture 2: Analyzing Algorithms: The 2-d Maxima Problem

DESIGN AND ANALYSIS OF ALGORITHMS. Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES

Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES

The p-sized partitioning algorithm for fast computation of factorials of numbers

[spa-temp.inf] Spatial-temporal information

OUTLINES. Variable names in MATLAB. Matrices, Vectors and Scalar. Entering a vector Colon operator ( : ) Mathematical operations on vectors.

Data Structures and Algorithms CSE 465

Algorithms and Data Structures

AH Matrices.notebook November 28, 2016

CS302 Topic: Algorithm Analysis. Thursday, Sept. 22, 2005

Introduction to Data Structure

CSE373: Data Structures and Algorithms Lecture 4: Asymptotic Analysis. Aaron Bauer Winter 2014

Equations and Functions, Variables and Expressions

Lecture Notes for Chapter 2: Getting Started

Matrix Inverse 2 ( 2) 1 = 2 1 2

Framework for Design of Dynamic Programming Algorithms

(Refer Slide Time: 1:27)

Pointers. A pointer is simply a reference to a variable/object. Compilers automatically generate code to store/retrieve variables from memory

Therefore, after becoming familiar with the Matrix Method, you will be able to solve a system of two linear equations in four different ways.

CITS2401 Computer Analysis & Visualisation

Computer Science 210 Data Structures Siena College Fall Topic Notes: Complexity and Asymptotic Analysis

DATA STRUCTURES AND ALGORITHMS. Asymptotic analysis of the algorithms

Curriculum Map: Mathematics

CS240 Fall Mike Lam, Professor. Algorithm Analysis

Dr Richard Greenaway

Algorithm Design Techniques part I

CSE 230 Intermediate Programming in C and C++ Arrays and Pointers

Algorithmic Analysis. Go go Big O(h)!

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #44. Multidimensional Array and pointers

UNIVERSITY OF CALIFORNIA, SANTA CRUZ BOARD OF STUDIES IN COMPUTER ENGINEERING

Module 9 : Numerical Relaying II : DSP Perspective

Limits. f(x) and lim. g(x) g(x)

C+X. Complex Arithmetic Toolkit. Introduction

Mastery. PRECALCULUS Student Learning Targets

Chapter 18 out of 37 from Discrete Mathematics for Neophytes: Number Theory, Probability, Algorithms, and Other Stuff by J. M. Cargal.


What is an algorithm?

Module 1: Asymptotic Time Complexity and Intro to Abstract Data Types

MATH (CRN 13695) Lab 1: Basics for Linear Algebra and Matlab

Introduction to Matlab

CGF Lecture 2 Numbers

Lecture 2. Arrays. 1 Introduction

ECON 502 INTRODUCTION TO MATLAB Nov 9, 2007 TA: Murat Koyuncu

10/5/2016. Comparing Algorithms. Analyzing Code ( worst case ) Example. Analyzing Code. Binary Search. Linear Search

Matrices. A Matrix (This one has 2 Rows and 3 Columns) To add two matrices: add the numbers in the matching positions:

1.1 calculator viewing window find roots in your calculator 1.2 functions find domain and range (from a graph) may need to review interval notation

Introduction to MATLAB for Engineers, Third Edition

Scientific Computing. Algorithm Analysis

Tentti/Exam Midterm 2/Välikoe 2, December

On the Computational Complexity of Nash Equilibria for (0, 1) Bimatrix Games

Aim. Structure and matrix sparsity: Part 1 The simplex method: Exploiting sparsity. Structure and matrix sparsity: Overview

9. Elementary Algebraic and Transcendental Scalar Functions

PS 6: Regularization. PART A: (Source: HTF page 95) The Ridge regression problem is:

Why study algorithms? CS 561, Lecture 1. Today s Outline. Why study algorithms? (II)

x = 12 x = 12 1x = 16

CSI33 Data Structures

CS Data Structures and Algorithm Analysis

RUNNING TIME ANALYSIS. Problem Solving with Computers-II

CPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017

FreeMat Tutorial. 3x + 4y 2z = 5 2x 5y + z = 8 x x + 3y = -1 xx

1 Definition of Reduction

Introduction to Programming, Aug-Dec 2008

1 Historical Notes. Kinematics 5: Quaternions

Title. Syntax. stata.com. function. void [ function ] declarations Declarations and types. declaration 1 fcnname(declaration 2 ) { declaration 3

CS240 Fall Mike Lam, Professor. Algorithm Analysis

Modular Programming in Stata

Introduction to Algorithms

Chapter 7 File Access. Chapter Table of Contents

Huge Arithmetic Toolkit

INTRODUCTION TO ALGORITHMS

CPSC 320: Intermediate Algorithm Design and Analysis. Tutorial: Week 3

Section 1.1 Definitions and Properties

Table of Laplace Transforms

Teachers Teaching with Technology (Scotland) Teachers Teaching with Technology. Scotland T 3. Matrices. Teachers Teaching with Technology (Scotland)

STEPHEN WOLFRAM MATHEMATICADO. Fourth Edition WOLFRAM MEDIA CAMBRIDGE UNIVERSITY PRESS

Introduction to Analysis of Algorithms

Algorithm. Lecture3: Algorithm Analysis. Empirical Analysis. Algorithm Performance

EE 301 Signals & Systems I MATLAB Tutorial with Questions

SEARCHING, SORTING, AND ASYMPTOTIC COMPLEXITY

Writing Parallel Programs; Cost Model.

Maths for Signals and Systems Linear Algebra in Engineering. Some problems by Gilbert Strang

Basic matrix math in R

Basics of Computational Geometry

Physics 326G Winter Class 2. In this class you will learn how to define and work with arrays or vectors.

Matrix Representations

Introduction to MATLAB 7 for Engineers

SF1901 Probability Theory and Statistics: Autumn 2016 Lab 0 for TCOMK

Theory and Algorithms Introduction: insertion sort, merge sort

Matrices 4: use of MATLAB

1 Training/Validation/Testing

Algorithm Analysis. (Algorithm Analysis ) Data Structures and Programming Spring / 48

Graphics and Interaction Transformation geometry and homogeneous coordinates

Course Number 432/433 Title Algebra II (A & B) H Grade # of Days 120

Space Filling Curves and Hierarchical Basis. Klaus Speer

Lecturers: Sanjam Garg and Prasad Raghavendra March 20, Midterm 2 Solutions

Network Traffic Measurements and Analysis

COMP30019 Graphics and Interaction Transformation geometry and homogeneous coordinates

Lecture 3: Linear Classification

Transcription:

Speeding Up the ARDL Estimation Command: A Case Study in Efficient Programming in Stata and Mata Sebastian Kripfganz 1 Daniel C. Schneider 2 1 University of Exeter 2 Max Planck Institute for Demographic Research German Stata Users Group Meeting, June 23, 2017 Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 1 / 27

Contents Efficient Coding Digression: A Tiny Bit of Asymptotic Notation The ARDL Model Optimal Lag Selection Incremental Code Improvements Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 2 / 27

Introduction Long code execution times are more than a nuisance: they negatively affect the quality of research strategies for speeding up execution: lower-level language parallelization writing efficient code Efficient coding is often the best choice. Moving to lower-level languages is tedious. In many settings, speed improvements are higher than through parallelization. Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 3 / 27

Introduction: Speed of Stata and Mata C is the reference compiled to machine instructions Post of Bill Gould (2014) at the Stata Forum: Stata (interpreted) code is 50-200 times slower than C. Mata compiled byte-code 5-6 times slower than C. => Mata is 10-40 times faster than Stata. In real-world applications, Mata is ~2 times slower than C. Mata has built-in C routines based on very efficient code. Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 4 / 27

Introduction: Efficient Coding Strategies Using Common Sense An if-condition requires at least N comparisons. Use in-conditions instead, if possible. Multiplying two 100x100 matrices requires about 2*100^3 = 2,000,000 arithmetic operations. Using Knowledge of Your Software (Stata, of course!) Examples: Mata: passing of arguments to functions Efficient operators and functions (e.g. Mata s colon operator and its c-conformability) Read the Stata and Mata programming manuals Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 5 / 27

Introduction: Efficient Coding Strategies Using Knowledge of Matrix Algebra Translating mathematical formulas one-to-one into matrix language expressions is oftentimes (very!) inefficient. Examples: diagonal matrices (D) : multiplication of a matrix by D: don t do it! Mata: use c-conformability of the colon operator (see [M-2] op_colon) inverse: flip diagonal elements instead of calling a matrix solver / inverter function (Opnq vs. Opn 3 q) block diagonal matrices: multiplication: just multiply diagonal blocks; the latter is faster by 1{s 2, where s is the number of diagonal blocks inverse: invert individual blocks order of matrix multiplication / parenthesization b px 1 X q 1 px 1 y q is faster than b px 1 X q 1 X 1 y e.g. for k 10, N 10, 000: matrix multiplications are 11 times faster! Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 6 / 27

Asymptotic Notation Definition An algorithm with input size n and running time T pnq is said to be Θpg pnqq ( theta of g of n ) or to have an asymptotically tight bound g pnq if there exist positive real numbers c 1, c 2, n 0 0 such that c 1 g pnq T pnq c 2 g pnq @n n 0 # of arithmetic operations 6e9 4e9 2e9 0 T(n)=0.8n 3-1000n 2 + 1000n + 10e9 is O(n 3 ) 0 500 1000 1500 2000 Algorithm input size n Kripfganz/Schneider Uni Exeter & MPIDR T(n) 0.2 * n 3 Speeding Up ARDL June 0.801 * n 3 23, 2017 7 / 27

Asymptotic Notation O pg pnqq( (big) oh of g of n ), as opposed to Θpg pnqq, is used here to only denote an upper bound. Notation differs in the literature. Technically, Θpg pnqq and Opg pnqq are sets of functions, so we write e.g. T pnq POpg pnqq. For matrix operations, g pnq is frequently n raised to some low integer power. Θ pnq is much better than Θ n 2, which in turn is much better than Θ n 3 (Square) matrix multiplication is Θ n 3 : each element of the new n n matrix is a sum of n terms. Costly! Many types of matrix inversion, e.g. the LU-decomposition, are also Θ n 3. Costly! Inner vector products are Θ pnq. When T pnq is an i-th order polynomial, the leading term asymptotically dominates: T pnq P O n i. Θ pa n q is worse than Θ pn a q; Θ plg nq is better than Θ pnq Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 8 / 27

ARDL: Model Setup ARDL pp, q 1,..., q k q: autoregressive distributed lag model Popular, long-standing single-equation time-series model for continuous variables Linear model : y t c 0 c 1 t p φ i y t i q β 1 i x t i u t, u t iid 0, σ 2 i 1 i 0 py t, x 1 t q1 can be purely I p0q, purely I p1q, or cointegrated: can be used to test for cointegration (bounds testing procedure). (Pesaran, Shin, and Smith, 2001). => econometrics of ARDL can be complicated. net install ardl, from(http://www.kripfganz.de/stata) This talk: programming; for the statistics of ardl, see Kripfganz/Schneider (2016). Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 9 / 27

ARDL: Computational Considerations Despite its complex statistical properties, estimating an ARDL model is just based on OLS! The computational costly parts are: determination of optimal lag orders (e.g. via AIC or BIC) treated at length in this talk simulation of test distributions for cointegration testing (PSS 2001, Narayan 2005). not covered by this talk Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 10 / 27

Optimal Lag Selection: The Problem For k 1 variables (indepvars + depvar) and maxlag lags for each variable, run a regression and calculate an information criterion (IC) for each possible lag combination and select the model with the best IC value. Example: 2 variables (v1 v2), maxlag 2 regress v1 L(1/1).v1 L(0/0).v2 regress v1 L(1/2).v1 L(0/0).v2 regress v1 L(1/1).v1 L(0/1).v2 regress v1 L(1/2).v1 L(0/1).v2 regress v1 L(1/1).v1 L(0/2).v2 regress v1 L(1/2).v1 L(0/2).v2 # of regressions to run is exponential in k: maxlags pmaxlags 1q k : k 1 maxlags # regressions 3 4 100 3 8 650 4 8 5,800 6 8 470,000 8 8 38,000,000 Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 11 / 27

Lag Selection: Preliminaries Lag combination matrix for k 3 and maxlags 2 : 1 0 0 1 0 1 1 0 2 1 1 0 1 1 1 1 1 2 1 2 0 1 2 1 2 2 2 e.g. row 3: 1 0 2 corresponds to regressors L.v1 L(0/0).v2 L(0/2).v3 = v 1 t 1 v 2 t v 3 t v 3 t 1 v 3 t 2 called lagcombs in pseudo-code to follow Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 12 / 27

Lag Selection: Naive Approach Using regress Stata/Mata-like pseudocode: Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 13 / 27

Lag Selection: Timings Timings in seconds (2.5GHz, single core) for N=1000: k 1 maxlags # regressions regress 3 4 100 1.6 3 8 650 12.5 4 8 5,800 132 6 8 470,000 14000 8 8 38,000,000 (13 days?) Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 14 / 27

Lag Selection: Mata I Stata/Mata-like pseudocode: Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 15 / 27

Lag Selection: Mata II (no redundant calculations) Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 16 / 27

Lag Selection: Mata III A sticky point are the many matrix inversions, which are Θ n 3. We will further improve matters by using results from linear algebra. We will introduce and use pointer variables in the process. The following will put forth a somewhat complicated algorithm that affects many parts of the loop. In this talk, we could have focused our attention on many smaller changes for code optimization, but both things are not possible within the time window for this presentation. Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 17 / 27

Updating X 1 X 1 Using Partitioned Matrices For A A11 A 12 A 21 A 22 A 1, with A, A 11 and A 22 square and invertible: D DA 12 A 1 22 A 1A 22 21D A 1 22 A 1A 22 21DA 12 A 1 22 with D A 1 11 A 1A 11 12 A 22 A 21 A 1A 1 11 12 A 21 A 1 11 Here: Let X v X v. The cross-product matrix becomes XvX 1 A11 X 1 X A 12 X 1 v v A 21 A 1 12 A 22 v 1 v Task: calculate px 1 X q 1, X 1 v, v 1 v X 1 vx v 1based on the known terms of: X 1 X, Slight complication: Inserting a column to X, not just appending. Can be solved by permutation vectors ( see [M-1] permutation). Let s call this procedure PMAC (partioned matrices / append column) to ease exposition. Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 18 / 27

Updating X 1 X 1 Using Partitioned Matrices Problem: columns are sometimes deleted, not just added. Lag combination matrix (maxlags 2 for all variables): 1 0 0 1 0 1 1 0 2 1 1 0 1 1 1 1 1 2 1 2 0 1 2 1 2 2 2 e.g. moving from row 3: 1 0 2 to row 4: 1 1 0 deletes two lags of the last regressor Solution: store matrices the algorithm can jump back to using pointers. Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 19 / 27

Pointer Variables General and advanced programming concept, but the basics are easy to understand and apply. Each variable has a name and a type. The name really is just a device to refer to a specific location in memory; every location in memory has a unique address. Since the type of the variable is known to Mata, it knows how big of a memory range a variable name refers to, and how to interpret the value (the bits stored there). Think in these terms: each variable has an address and a value. Pointer variables hold memory addresses of other variables. Pointer variables can point to anything: scalars, matrices, pointers, objects, functions... Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 20 / 27

Pointer Variables Pointers are often assigned to using & ; they are dereferenced using *. Read: & : the address of * : the thing pointed to by mata: s = J(2,2,1) p = &s p // outputs something like 0xcb3cb60 *p // outputs the 2x2 matrix of ones *p = J(2,2,-7) s // now contains the matrix of -7s end See [M-2] pointers for many more details. What we need for Kripfganz/Schneider our algorithm, Uni Exeter & is MPIDR an unknown Speeding Up ARDL number June 23, 2017 (k 21) / 27

Using Pointers for Updating X 1 X 1 Lag combination matrix : 1 0 0 1 0 1 1 0 2 1 1 0 1 1 1 1 1 2 1 2 0 3-element vector of pointers vec p1 p2 p3 ; each element points to a matrix. Then calculate px 1 X q 1 for...... lags (1 0 0) by ordinary matrix inversion; store using p1... lags (1 0 1) by PMAC using p1; store using p3... lags (1 0 2) by PMAC using p3... lags (1 1 0) by PMAC using p1; store using p2... lags (1 1 1) by PMAC using p2; store using p3... lags (1 1 2) by PMAC using p3;... lags (1 2 0) by PMAC using p2; store using p2... and so forth. Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 22 / 27

Lag Selection: Mata III (update inverses) Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 23 / 27

Lag Selection: Timings Timings in seconds (2.5GHz, single core) for N=1000: Mata 2 : no redundancies Mata 3 : no redundancies + inverse updating k 1 maxlags # regressions regress Mata 1 Mata 2 Mata 3 3 4 100 1.6 0.36 0.11 0.14 3 8 650 12.5 1.33 0.09 0.13 4 8 5,800 132 11.8 0.31 0.27 6 8 470,000 14,000 1,400 53 37 8 8 38,000,000 (13 days?) 146,000 6,500 3,200 Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 24 / 27

Recap In this talk, we have discussed Potential strategies for improving code performance Basic asymptotic notation for the computing time of algorithms Quick look at the ARDL model Optimal lag selection Moving Stata code to Mata and optimizing the Mata code An advanced way of using linear algebra results to improve code performance Pointer variables We have tried to illustrate that mindful code creation can be superior to the brute force methods of low-level programming languages and parallelization. Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 25 / 27

Thank you! Questions? Comments? S.Kripfganz@exeter.ac.uk schneider@demogr.mpg.de Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 26 / 27

References Gould, William (2014, April 17): Using Mata Operators efficiently [Msg 8]. Message posted to https://www.statalist.org/forums/forum/general-statadiscussion/mata/993-using-mata-operatorsefficiently?p=1826#post1826 Kripfganz / Schneider (2016): ardl: Stata Module to Estimate Autoregressive Distributed Lag Models. Presentation held at the Stata Conference 2016, Chicago. Narayan, P.K. (2005): The Saving and Investment Nexus for China: Evidence from Cointegration Tests. Applied Economics, 37 (17), 1979-1990. Pesaran, M.H., Shin Y. and R.J. Smith (2001): Bounds Testing Approaches to the Analysis of Level Relationships. Journal of Applied Econometrics, 16 (3), 289-326. Kripfganz/Schneider Uni Exeter & MPIDR Speeding Up ARDL June 23, 2017 27 / 27