The Curse of Dimensionality. Panagiotis Parchas Advanced Data Management Spring 2012 CSE HKUST

Similar documents
ECG782: Multidimensional Digital Signal Processing

Dimension Reduction CS534

Time Series Analysis DM 2 / A.A

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

CSE 6242 A / CX 4242 DVA. March 6, Dimension Reduction. Guest Lecturer: Jaegul Choo

Final Review. Image Processing CSE 166 Lecture 18

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20

Overview. Spectral Processing of Point- Sampled Geometry. Introduction. Introduction. Fourier Transform. Fourier Transform

CHAPTER 3 DIFFERENT DOMAINS OF WATERMARKING. domain. In spatial domain the watermark bits directly added to the pixels of the cover

High Dimensional Data Mining in Time Series by Reducing Dimensionality and Numerosity

Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi

Clustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017

High Dimensional Indexing by Clustering

Time series representations: a state-of-the-art

Lecture Topic Projects

Image Processing. Application area chosen because it has very good parallelism and interesting output.

CS 521 Data Mining Techniques Instructor: Abdullah Mueen

Answers to practice questions for Midterm 1

The Simplex Algorithm

CSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies.

Lecture 8 Object Descriptors

CPSC 340: Machine Learning and Data Mining. Hierarchical Clustering Fall 2016

Discovering Playing Patterns: Time Series Clustering of Free-To-Play Game Data

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation.

Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases

Transformation. Jane Li Assistant Professor Mechanical Engineering & Robotics Engineering

Texture. Outline. Image representations: spatial and frequency Fourier transform Frequency filtering Oriented pyramids Texture representation

Computer Vision I. Announcements. Fourier Tansform. Efficient Implementation. Edge and Corner Detection. CSE252A Lecture 13.

Image Compression System on an FPGA

Chapter 11 Image Processing

Chapter 2 Basic Structure of High-Dimensional Spaces

Non-linear dimension reduction

Singular Value Decomposition, and Application to Recommender Systems

Dimension reduction for hyperspectral imaging using laplacian eigenmaps and randomized principal component analysis

1. Introduction. 2. Parametrization of General CCSSs. 3. One-Piece through Interpolation. 4. One-Piece through Boolean Operations

Lab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD

Fourier transforms and convolution

Computational Statistics and Mathematics for Cyber Security

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs

MATH 423 Linear Algebra II Lecture 17: Reduced row echelon form (continued). Determinant of a matrix.

Space Filling Curves and Hierarchical Basis. Klaus Speer

Wavelet based Keyframe Extraction Method from Motion Capture Data

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining

FACE RECOGNITION USING FUZZY NEURAL NETWORK

Unsupervised Learning

On domain selection for additive, blind image watermarking

Dimension reduction for hyperspectral imaging using laplacian eigenmaps and randomized principal component analysis:midyear Report

Ripplet: a New Transform for Feature Extraction and Image Representation

Task Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval

Week 7 Picturing Network. Vahe and Bethany

11. Image Data Analytics. Jacobs University Visualization and Computer Graphics Lab

Nonlinear projections. Motivation. High-dimensional. data are. Perceptron) ) or RBFN. Multi-Layer. Example: : MLP (Multi(

HOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery

Recognition: Face Recognition. Linda Shapiro EE/CSE 576

General Instructions. Questions

Unsupervised learning in Vision

GEMINI GEneric Multimedia INdexIng

SuRVoS Workbench. Super-Region Volume Segmentation. Imanol Luengo

MSA220 - Statistical Learning for Big Data

MRT based Fixed Block size Transform Coding

From Fourier Transform to Wavelets

AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS

Application of Daubechies Wavelets for Image Compression

Data Preprocessing. Data Mining 1

Motivation. My General Philosophy. Assumptions. Advanced Computer Graphics (Spring 2013) Precomputation-Based Relighting

SAMPLING AND THE MOMENT TECHNIQUE. By Sveta Oksen

Image Processing. Image Features

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

DYADIC WAVELETS AND DCT BASED BLIND COPY-MOVE IMAGE FORGERY DETECTION

Non-stationary interpolation in the f-x domain

Computer Vision Projective Geometry and Calibration. Pinhole cameras

Computer Graphics 1. Chapter 2 (May 19th, 2011, 2-4pm): 3D Modeling. LMU München Medieninformatik Andreas Butz Computergraphik 1 SS2011

Computer Vision and Graphics (ee2031) Digital Image Processing I

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

S93-8 Page 1 TOMOGRAPHIC APPROACHES TO NONWOVENS STRUCTURE DEFINITION

Nearest Neighbor Classification. Machine Learning Fall 2017

CSE 6242 / CX October 9, Dimension Reduction. Guest Lecturer: Jaegul Choo

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong)

Indexing Spatiotemporal Trajectories with Chebyshev Polynomials

2D Transforms. Lecture 4 CISC440/640 Spring Department of Computer and Information Science

GEOMETRIC MANIFOLD APPROXIMATION USING LOCALLY LINEAR APPROXIMATIONS

Modelling and Visualization of High Dimensional Data. Sample Examination Paper

Mining for Patterns and Anomalies in Data Streams. Sampath Kannan University of Pennsylvania

Recovery of Piecewise Smooth Images from Few Fourier Samples

CHAPTER 3 WAVELET DECOMPOSITION USING HAAR WAVELET

Reconstruction of Images Distorted by Water Waves

A Course in Machine Learning

Deep Learning for Computer Vision

Workshop - Model Calibration and Uncertainty Analysis Using PEST

Clustering Billions of Images with Large Scale Nearest Neighbor Search

ROBUST WATERMARKING OF REMOTE SENSING IMAGES WITHOUT THE LOSS OF SPATIAL INFORMATION

MATRIX REVIEW PROBLEMS: Our matrix test will be on Friday May 23rd. Here are some problems to help you review.

Image Enhancement Techniques for Fingerprint Identification

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

Digital Image Processing. Image Enhancement in the Frequency Domain

Spectral Classification

Latent Semantic Indexing

Numerical Analysis and Statistics on Tensor Parameter Spaces

CS1114 Section 8: The Fourier Transform March 13th, 2013

Feature selection. LING 572 Fei Xia

Transcription:

The Curse of Dimensionality Panagiotis Parchas Advanced Data Management Spring 2012 CSE HKUST

Multiple Dimensions As we discussed in the lectures, many times it is convenient to transform a signal(time series, picture) to a point in multidimensional space. This transformation is handy as we can apply conventional database indexing techniques for queries such as NN, or search This transform may lead as to very high dimensionality (hundreds of dimensions) In high dimensionality, there is a number of problems (geometrical and index performance) that are usually referred to as the Curse of Dimensionality In this presentation: Some intuition about the Curse. Explore techniques that try to overcome it.

The Curse Volume and area depend exponentially on the number of dimensions. No intuitive effects: Geometric effects concerning the volume of hyper cubes and spheres Indexing effects Effects in the Database environment (query selectivity)

a)geometric Effects Lemma: A sphere touching or intersecting all the d-1 borders of a cube, will contain the center. True for 2D and 3D (by visualization) It should be true for higher dimensions (hyper cubes, hyper spheres) It is NOT!

b)indexing Effects

b)indexing effects[cont] The higher the dimensionality the more coarse the indexing (which renders it useless ) This affects all the indexing techniques. CHRISTIAN BOHM, 2001

c)query selectivity

When is NN meaningful? Kevin Beyer et all, 1999

What is the spell for the curse? Various attempts of multidimensional indexing where proved that don t make sense for a big category of data distributions [CHRISTIAN BOHM, 2001] There has been a lot of research on Dimensionality Reduction techniques. They basically apply ideas of compression, to data, in order to reduce the dimensionality. In the next we will focus mainly in Time Series.

Introduction 11.5 Euro-HK$ exchange rate 11 128-D space 10.5 10 9.5 9 128 Data points 9/1/2011 10/1/2011 11/1/2011 12/1/2011 1/1/2012 2/1/2012

0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 20 40 60 80 100 120 DFT DWT SVD APCA PAA PLA Tutorial in IEEE ICDM 2004 by Dr. Keogh

Discrete Fourier Transform (DFT) Every signal, no matter how complex, can be represented as a summation of sinusoids Idea: Find the hiddensinusoids that form the time series Store twonumbers for each: (A, φ) magnitude phase Larger frequency sins generally correspond to details of the time series We can discard them and keep just the first ones (low frequency) Then we use Inverse DFT to get the approximation of the time series. DFT: Inverse DFT:

DFT example 9.5 10 10.5 11 11.5 TIME SERIES 11.1934 11.2485 11.3186 11.2973 11.3036 11.3036 11.2025 11.1209 11.1012 11.0049 A 1339.2 22.672 13.418 10.498 6.8649 3.5188 5.9621 5.5038 2.3058 3.238 1.3209 φ 0-1.4846-0.33742-0.78383-1.8342-1.4738-1.425-1.2617-1.7641-1.8986-1.0088 9 1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103 109 115 121 127 10.9885 10.9401 10.9476 10.7698 10.6544 10.6476 10.7136 10.7492 10.72 10.6328 10.6849 10.6249 10.4904 10.4759 3.1939 2.818 0.34752 2.6411 3.1825 2.2584 2.0786 1.0066 1.25 1.4527 0.38684 1.8025 0.97202 1.3433 1.6972 DFT -1.588-1.8385-2.5873-0.96067-1.4374-1.3702-2.0808-0.69754-0.30255-1.0405 0.092403-1.2293-0.31504-0.24047-1.6034 We store 8+8=16 values!

DFT example(cont) A 1339.2 22.672 13.418 φ 0-1.4846-0.33742 Approximate TS 10.824 10.949 11.059 11.147 11.21 11.243 11.248 11.226 11.181 11.118 9.5 10 10.5 11 11.5 DFT approximation IDFT 13.418 10.498 6.8649 3.5188 5.9621 5.5038-0.33742-0.78383-1.8342-1.4738-1.425-1.2617 11.043 10.963 10.883 10.809 10.746 10.695 10.657 10.632 10.618 10.611 10.609 10.607 10.603 10.594 10.579 9 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126

DFT

DFT (pros & cons) O(nlogn) complexity Hardware Implementations Good ability to compress most signals Many applications Not good approximation for bursty signals Not good approximation if the signal contains both flat and busy segments Cannot support other distance metrics Contains info only for the frequency distribution The time domain?

Why DFT is not enough? 2 It gives us information about the frequency component of a time series, without telling where this frequency lies in the time domain x(t)=sin(5*t)+sin(10*t) 3500 Fourier Decomposition (Spectrum) 1 z(t)=sin(5*t), sin(10*t) 1.5 1 0.5 0-0.5-1 -1.5 3000 2500 2000 1500 1000 500 0.8 0.6 0.4 0.2 0-0.2-0.4-0.6-0.8-2 0-1 0 1000 2000 3000 40000 10 5000 20 600030 7000 40 50 0 60 1000 70 802000 90 3000 100 4000 5000 6000 700

Discrete Wavelet Transform(DWT) This comes as a solution to the previous problem. The wavelet transform contains information both for the frequency domain AND the time domain. The basic Idea is to express the time series as a linear combination of a wavelet basis function. Haar Wavelet is mostly used:

DWT: Graphical Intuition The wavelet is stretchedand shifted in time and this is done for all the possible stretches and shifts. Afterwards, each is multiplied with the TS. We keep only the ones with high product.

DWT: Numerical Intuition Resolution Averages Details 4 [9 7 3 5] 2 [8 4] [1-1] 1 [6] [2] 9 8 7 6 5 4 3 2 1 0 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6

Example taken by Stollnitz, E. et all 1995

DWT 11.4 Wavelet Approximation In our example: We had 128pts The approximation (red line) uses only 16 haar coefficients 11.2 11 10.8 10.6 10.4 10.2 10 9.8 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 105 109 113 117 121 125

DWT(Pros & Cons) Good ability to compress stationary signals. Fast linear time algorithms for DWT exist. Able to support some interesting non-euclidean similarity measures. Signals must have a length n= 2 some_integer Works best if Nis = 2 some_integer. Otherwise wavelets approximate the left side of signal at the expense of the right side. Cannot support weighted distance measures.

Singular Value Decomposition(SVD) All the previous methods, try to transform each time series independently of the others. What if we take into account all the Time Series contained in the Database? We can then achieve the desired dimensionality reduction for the specific Dataset

q SVD: Basic Idea [1]

q SVD: Basic Idea (2)

q SVD: Basic Idea (3)

SVD [more] The goal is to find the axes with the biggest variance. Highvariance A lot of Important axes Information Axes Low variance axes Little Information/ Noise Axes can be truncated

SVD[more] In the previous intuition, we can keep the coefficients of the projections to the new axis. This can be efficiently done by SVD. So we perform the dimensionality reduction in an aggregate way taking into account the whole dataset. This idea was traditionally used in linear algebra for matrix compression. A = UΣV The idea was to find the (nearly) linearly dependent columns of a matrix A and eliminatethem. It can be proved that this compression is optimal. T

SVD: compression Projection to the axis denoted by the biggest singular value s1 q MINIMUM information loss Good for compression

SVD: Clustering Projection to the axis denoted by the smallest singular value s2 q MAXIMUM information loss Good for clustering

SVD(Pros & Cons) Optimal linear dimensionality reduction technique. The eigenvalues tell us something about the underlying structure of the data. Computationally very expensive. Time: O(Mn 2 ) Space: O(Mn) An insertion into the database requires recomputing the SVD. Cannot support weighted distance measures or non Euclidean measures.

Piecewise Aggregate Approximation Very simple, intuitive (PAA) Represent the time series as a summation of boxes of equal length. PAA approximation 11.4 11.2 We keep 13 boxes 11 10.8 10.6 10.4 10.2 10 9.8 0 20 40 60 80 100 120 140

PAA(Pros & Cons) Fast, easy to implement, intuitive The authors claim it is as efficient as other approaches (empirically) Supports queries of arbitrary lengths Supports non Euclidean measures It seems as a simplification of DWT, that cannot be generalized to other types of signals

Adaptive Piecewise Constant What about signals with flat areas and peaks? Approximation (APCA) Raw Data (Electrocardiogram) IDEA: generalize PAA so it can automatically adapt itself to the correct box size. (we should now keep both the length and height of the box) Adaptive Representation (APCA) Reconstruction Error 2.61 HaarWavelet or PAA Reconstruction Error 3.27 DFT Reconstruction Error 3.11 0 50 100 150 200 250 example by E.KeoghIEEE ICDM 2004

APCA [more] In order to implement it, the authors propose first a DWT transformation that is followed by merging of the similar, adjacent wavelets. It is very efficient in some specific datasets However the indexing is more complicated than PAA since we need two numbers for each box. That is the reason why is not used very often.

Piecewise Linear Approximation (PLA) Linear segments for representation (not necessarily connected) Although efficient in some cases, The implementation is slow and it is not indexable example for visualization only

Non Linear Techniques Dimensionality Reduction: A Comparative Review, L.J.P. van der Maaten 2008

Non Linear techniques [2] A lot of techniques hveemerged the last years. However,[Maatenet al 2008] compared them with the PCA (equivalent to SVD) and in most of the datasets all these complicated techniques turn out to be worse. The reasons the authors claim, are data over fitting and curse of dimensionality

Conclusion All the before mentioned techniques have their strong and weak points. DrKeogh tested them over 65 different datasets with different characteristics: On average, they are all about the same. In particular, on 80% of the datasets they are all within 10% of each other. So the choice for the best method depends on the characteristics of the Dataset