Computational Methods in Statistics with Applications A Numerical Point of View. Large Data Sets. L. Eldén. March 2016

Similar documents
Floating-Point Arithmetic

Using Existing Numerical Libraries on Spark

LAPACK. Linear Algebra PACKage. Janice Giudice David Knezevic 1

Using Numerical Libraries on Spark

Matrix algorithms: fast, stable, communication-optimizing random?!

Camera calibration. Robotic vision. Ville Kyrki

Optimizing and Accelerating Your MATLAB Code

A GPU-based Approximate SVD Algorithm Blake Foster, Sridhar Mahadevan, Rui Wang

0.1. alpha(n): learning rate

Nonparametric and Semiparametric Econometrics Lecture Notes for Econ 221. Yixiao Sun Department of Economics, University of California, San Diego

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

General Instructions. Questions

Modern GPUs (Graphics Processing Units)

Unsupervised learning in Vision

CS 6210 Fall 2016 Bei Wang. Lecture 1 A (Hopefully) Fun Introduction to Scientific Computing

CALCULATING RANKS, NULL SPACES AND PSEUDOINVERSE SOLUTIONS FOR SPARSE MATRICES USING SPQR

AM205: lecture 2. 1 These have been shifted to MD 323 for the rest of the semester.

Extra-High Speed Matrix Multiplication on the Cray-2. David H. Bailey. September 2, 1987

Applications Video Surveillance (On-line or off-line)

Lecture 11: Randomized Least-squares Approximation in Practice. 11 Randomized Least-squares Approximation in Practice

Multiple View Geometry in Computer Vision

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University

Lecture 7: Linear Regression (continued)

CS 229 Midterm Review

Algebraic Iterative Methods for Computed Tomography

Contents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

Cluster quality 15. Running time 0.7. Distance between estimated and true means Running time [s]

Retrieval by Content. Part 3: Text Retrieval Latent Semantic Indexing. Srihari: CSE 626 1

An Introduction to Numerical Analysis

In a two-way contingency table, the null hypothesis of quasi-independence. (QI) usually arises for two main reasons: 1) some cells involve structural

Fast Orthogonal Search For Training Radial Basis. Function Neural Networks. By Wahid Ahmed. Thesis Advisor: Donald M. Hummels, Ph.D.

2.7 Numerical Linear Algebra Software

Hyperparameter optimization. CS6787 Lecture 6 Fall 2017

calibrated coordinates Linear transformation pixel coordinates

Roundoff Noise in Scientific Computations

Lab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD

Code Generation for Embedded Convex Optimization

CS 6210 Fall 2016 Bei Wang. Review Lecture What have we learnt in Scientific Computing?

Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center

ABSTRACT TO BE PRESENTED COMPUTATIONAL METHODS FOR ALGEBRAIC SPLINE SURFACES COMPASS II. September 14-16th, 2005

Time Series Analysis by State Space Methods

For the hardest CMO tranche, generalized Faure achieves accuracy 10 ;2 with 170 points, while modied Sobol uses 600 points. On the other hand, the Mon

Statistics (STAT) Statistics (STAT) 1. Prerequisites: grade in C- or higher in STAT 1200 or STAT 1300 or STAT 1400

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Module 4. Computer-Aided Design (CAD) systems

QR Decomposition on GPUs

Feature extraction techniques to use in cereal classification

Lecture 13: Model selection and regularization

Introduction to Optimization

Using Machine Learning to Optimize Storage Systems

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Elemental Set Methods. David Banks Duke University

Realization of Hardware Architectures for Householder Transformation based QR Decomposition using Xilinx System Generator Block Sets

Data-Driven Modeling. Scientific Computation J. NATHAN KUTZ OXPORD. Methods for Complex Systems & Big Data

We deliver Global Engineering Solutions. Efficiently. This page contains no technical data Subject to the EAR or the ITAR

Modern Multidimensional Scaling

COMPEL 17,1/2/3. This work was supported by the Greek General Secretariat of Research and Technology through the PENED 94 research project.

Self Adapting Numerical Software (SANS-Effort)

Speeding up MATLAB Applications Sean de Wolski Application Engineer

Mathematics for chemical engineers

(Creating Arrays & Matrices) Applied Linear Algebra in Geoscience Using MATLAB

LSRN: A Parallel Iterative Solver for Strongly Over- or Under-Determined Systems

International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERA

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank

An Approximate Singular Value Decomposition of Large Matrices in Julia

JDEP 384H: Numerical Methods in Business

STA141C: Big Data & High Performance Statistical Computing

Legal and impossible dependences

Neuro-Dynamic Programming An Overview

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Multiple View Geometry in Computer Vision

Fundamental Matrix & Structure from Motion

Parallel and Distributed Computing with MATLAB The MathWorks, Inc. 1

Parallel and Distributed Computing with MATLAB Gerardo Hernández Manager, Application Engineer

An efficient algorithm for sparse PCA

10-701/15-781, Fall 2006, Final

DECOMPOSITION is one of the important subjects in

Applying Supervised Learning

Globally Stabilized 3L Curve Fitting

Model Parameter Estimation

Modern Methods of Data Analysis - WS 07/08

CPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017

Section 5 Convex Optimisation 1. W. Dai (IC) EE4.66 Data Proc. Convex Optimisation page 5-1

Mining Web Data. Lijun Zhang

DEGENERACY AND THE FUNDAMENTAL THEOREM

EE613 Machine Learning for Engineers LINEAR REGRESSION. Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov.

Implementation of QR Up- and Downdating on a. Massively Parallel Computer. Hans Bruun Nielsen z Mustafa Pnar z. July 8, 1996.

The role of Fisher information in primary data space for neighbourhood mapping

The real voyage of discovery consists not in seeking new landscapes, but in having new eyes.

Numerical Linear Algebra

Analyzing Stochastic Gradient Descent for Some Non- Convex Problems

Scientific Visualization Example exam questions with commented answers

Analysis and Latent Semantic Indexing

scikit-learn (Machine Learning in Python)

ECS289: Scalable Machine Learning

What is MATLAB? What is MATLAB? Programming Environment MATLAB PROGRAMMING. Stands for MATrix LABoratory. A programming environment

Regularized Tensor Factorizations & Higher-Order Principal Components Analysis

Feature selection. Term 2011/2012 LSI - FIB. Javier Béjar cbea (LSI - FIB) Feature selection Term 2011/ / 22

Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi

Transcription:

Computational Methods in Statistics with Applications A Numerical Point of View L. Eldén SeSe March 2016 Large Data Sets IDA Machine Learning Seminars, September 17, 2014. Sequential Decision Making: Experiment Design, Big Data and Reinforcement Learning Christos Dimitrakakis, Computer Science and Engineering, Chalmers Abstract: Gone are the days when statisticians used to work with xed, laboriously compiled and labelled datasets. Nowadays data collection is frequently active and therefore must be adaptive. This talk will give an overview of the eld of sequential decision making and how it relates to experiment design, active learning and the general problem of reinforcement learning. The main technical problems encountered are how to plan and learn eciently. The rst problem requires ecient optimisation algorithms, while the second requires good models that are easy to update online and can deal with large amounts of data.

Large Data Sets Development of computers and computation Scientic research involves automated data collection Very large and highly inhomogeneous data sets Development of new statistical/computational methods for data analysis Necessary to use ecient and reliable numerical methods Computational Statistics Built on the mathematical theory and methods of statistics Includes visualization statistical computing Monte Carlo methods Exploratory methods Development of computationally-intensive methods for mining and visualization of large, non-homogeneous, multi-dimensional datasets to discover knowledge in the data Probability models Statements of condence or of probability Model building and evaluation

Research Areas I Techniques for discovering structure in data with emphasis on large-dimensional datasets Exploratory or visual estimation, clustering, or classication Statistical learning Methods for analysis of extremely large datasets (large number of observations or large number of dimensions) Computationally-intensive methods of analysis (Monte Carlo methods or resampling methods) Simulation methods Methods for statistical modeling Classical statistical parametric models, semiparametric models or nonparametric models Frequentist or Bayesian approach Explicitly or implicitly described models, e.g via dierential equations, especially Stochastic Dierential Equations. Research Areas II Numerical methods for statistical analysis (statistical computing) Methods for statistical problems that have a major "computer science" aspect (record matching, for example)

Why does one need scientic computing? Large data sets: ecient, stable and robust algorithms (very often from linear algebra) Understanding of the inuence of oating point round o errors Algorithms from state-of-the-art software libraries, often packaged in high level programming systems (MATLAB, R) Are oating point errors a major concern? Answer, in general: NO! IEEE double precision arithmetic 64 bit format 52 bit mantissa = relative precision 10 16

How ne is the IEEE oating-point number system? Here is one way to see the answer 1 : According to molecular physics, there are approximately 3 10 8 molecules per meter in a gas at atmospheric conditions essentially the cube root of Avogadro's number. The circumference of the earth is 4 10 7 meters, so in a circle around the earth, there are around 10 16 molecules. In IEEE double precision arithmetic, there are 2 52 numbers between 1 and 2, which is also about 10 16. So if we put a giant circle around the earth with a distance coordinate ranging from 1 to 2, the spacing of the oating-point numbers along the circle will be about the same as the spacing of the air molecules. 1 due to Nick Trefethen, Oxford IEEE double precision is usually sucient BUT: If one performs computations in a bad way, one can lose all the precision. Example: Computation of variance using a one-pass method. More on oating point computations later.

Example: Regression Let y be a vector of observed quantities and a matrix X of explaining variables. Regression model: y X β + ɛ, where ɛ is a random variable. Least squares method: Determine the regression coecients β by solving min β y X β 2 Regression: numerical aspects Solve min β y X β 2 Algorithms, eciency and stability: 1. Normal equations: solve X T X β = X T y 2. QR decomposition: X = QR, where Q is orthogonal and R is triangular 3. Singular value decomposition: X = UΣV T, where U and V are orthogonal and Σ is diagonal What to do if the explaining variables (columns of X ) are (almost) collinear (linearly dependent)? How to solve large problems? How large is large? How to solve large structured (sparse, Toeplitz, etc.) problems?

How large is large? Answer: Depends on the hardware. Today (assume for simplicity that X is n n) Small: n 1000 Takes less than 5 (say) seconds on any computer. Standard methods Medium: 1000 10000 Takes minutes. Standard methods Large: n > 10 4 Takes minutes-hours-days. Often only possible to solve if the problem is structured (sparse, Toeplitz, etc.) Iterative methods, sometimes non-standard Ultralarge: n 10 9 and larger Only special structures and special methods. How large is large? Example: SVD Answer: Depends on the hardware. Computation of SVD (Matlab on desktop computer): dimension time (s) time 2002 500 0.09 12 1000 0.47 129 5000 56 R is based on the same subroutine library: LAPACK

Software: Example from 1999 (Sankhya) The digits outside the boxes are incorrect: Programming environments High quality algorithms packaged with graphics and other tools MATLAB (Statistics toolbox) R SPSS, SAS,... The two commandments of computational statistics: 1. Always use a high quality programming environment! 2. Never write code for standard tasks like solving a linear system, computing singular value decomposition, etc.!