Data mining with sparse grids using simplicial basis functions

Size: px

Start display at page:

Download "Data mining with sparse grids using simplicial basis functions"

Percival Walters
5 years ago
Views:

1 Data mining with sparse grids using simplicial basis functions Jochen Garcke and Michael Griebel Institut für Angewandte Mathematik Universität Bonn Part of the work was supported within the project 03GRM6BN by the German Bundesministerium für Bildung und Forschung (BMB+F). This work was carried out in cooperation with Prudential Systems Software GmbH, Chemnitz. Data mining with sparse grids using simplicial basis functions p.1/35

2 Essentials New method for non-linear classification and regression Scales linearly with respect to the number of data Suitable for massive data sets Number of attributes is limited ( in newest version) Key concept Use sparse grids for discretization of feature space Solve minimization problem stemming from regularization network approach Data mining with sparse grids using simplicial basis functions p.2/35

3 Overview Regularization theory Sparse grids / combination technique Simplicial discretization Numerical examples Conclusions and outlook Data mining with sparse grids using simplicial basis functions p.3/35

4 The approximation problem We want to compute a function, the classifier or regressor, which approximates the given training data set but also gives good results on unseen data For that a compromise has to be found between the correctness of the approximation, i.e. the size of the data error, and the generalization qualities of the classifier for new, i.e. before unseen, data can be large, we will consider moderately high can consist of up to millions or billions of data points Data mining with sparse grids using simplicial basis functions p.4/35

5 Approximation with data centered ansatz functions Data mining with sparse grids using simplicial basis functions p.5/35

6 Approximation with data centered ansatz functions Error is zero at the data points, but is overfitting Data mining with sparse grids using simplicial basis functions p.5/35

7 Approximation with data centered ansatz functions Error is zero at the data points, but is overfitting Assume smoothness properties of Data mining with sparse grids using simplicial basis functions p.5/35

8 Regularization theory To get a well-posed, uniquely solvable problem we have to assume knowledge of Regularization theory imposes smoothness constraints Regularization approach considers the variational problem with Data mining with sparse grids using simplicial basis functions p.6/35

9 Regularization theory To get a well-posed, uniquely solvable problem we have to assume knowledge of Regularization theory imposes smoothness constraints Regularization approach considers the variational problem with Error of the classifier on the given data Data mining with sparse grids using simplicial basis functions p.6/35

10 Regularization theory To get a well-posed, uniquely solvable problem we have to assume knowledge of Regularization theory imposes smoothness constraints Regularization approach considers the variational problem with Error of the classifier on the given data Assumed smoothness properties Data mining with sparse grids using simplicial basis functions p.6/35

11 Regularization theory To get a well-posed, uniquely solvable problem we have to assume knowledge of Regularization theory imposes smoothness constraints Regularization approach considers the variational problem with Error of the classifier on the given data Assumed smoothness properties Regularization parameter Data mining with sparse grids using simplicial basis functions p.6/35

12 Exact solution with kernels With a basis of we have In the case of a regularization term of the type where is a decreasing positive sequence, the solution of the variational problem has always the form Data mining with sparse grids using simplicial basis functions p.7/35

13 Reproducing Kernel Hilbert Space is a symmetric kernel function can be interpreted as the kernel of a Reproducing Kernel Hilbert Space (RKHS) In other words if certain functions are used in an approximation scheme which are centered in the location of the data points then the approximation solution is a finite series and involves only terms But in general a full system has to be solved Data mining with sparse grids using simplicial basis functions p.8/35

14 Approximation schemes in regularization network context For radially symmetric kernels we end up with radial basis function approximation schemes Many other approximation schemes like additive models hyper-basis functions ridge approximation models and several types of neural networks can be derived by a specific choice of the regularization operator The support vector machine (SVM) approach can also be expressed in the form of a regularization network All scale in general non-linearly in, the number of data points Data mining with sparse grids using simplicial basis functions p.9/35

15 Discretization Different approach: We explicitly restrict the problem to a finite dimensional subspace, with The ansatz functions should form a basis for Cost function should span and preferably Regularization operator is to be minimized in, i.e.. Data mining with sparse grids using simplicial basis functions p.10/35

16 Derivative of the functional :, Plug-in of and differentiation with respect to ) Or equivalently ( We use in the following Data mining with sparse grids using simplicial basis functions p.11/35

17 Problem to solve With we get the linear equation system is a -matrix with is a -matrix with is a -matrix with is the vector with length of the data classes is the vector of the unknowns and has length Data mining with sparse grids using simplicial basis functions p.12/35

18 Approximation with grid-based ansatz functions Use grid to discretize the data space Basis functions on the grid points Data mining with sparse grids using simplicial basis functions p.13/35

19 Which function space to take? Again, widely used are methods with global data-centered basis functions, which scale with the number of data points We use a grid to discretize the data space and local basis functions on the grid points A naive grid has, where the curse of dimensionality n=6, d=6 results in points grid points, with a reasonable size of gives the mesh size, one encounters To overcome this we use sparse grids, which have grid points here n=6, d=6 results in points Data mining with sparse grids using simplicial basis functions p.14/35

20 Interpolation with the hierarchical basis Interpolation Hierarchical basis 1- case is generalized by means of a tensor product approach Hierarchical values of the -dimensional basis functions are bounded through the size of their supports Data mining with sparse grids using simplicial basis functions p.15/35

21 Supports of Data mining with sparse grids using simplicial basis functions p.16/35

22 Supports of Data mining with sparse grids using simplicial basis functions p.16/35

23 Sparse grids of piece-wise -linear functions Space span Difference-spaces of level Sparse grid space can be splitted accordingly Function Data mining with sparse grids using simplicial basis functions p.17/35

24 Properties of sparse grids number of points approximation properties smoothness properties full grid sparse grid Sparse grid in 2D and 3D with level Data mining with sparse grids using simplicial basis functions p.18/35

25 Sparse grids Example in six dimensions with level full grid: points sparse grid: points, i.e. Now use sparse grids to solve the minimization problem Linear equation system with points Matrix is more densely populated than corresponding full grid matrices, would add further terms to complexity Explicit assembly of the matrix should be avoided Difficult to implement only the action of the matrices Action of the data matrix would scale with # of data points : Data mining with sparse grids using simplicial basis functions p.19/35

26 Combination technique of level 4 in 2D Therefore use combination technique variant of sparse grids = Data mining with sparse grids using simplicial basis functions p.20/35

27 Sparse grids with the combination technique Solve the problem on the sequence of full grids combine solution on With the results sparse grid dim Number of grids # The resulting linear equation system is solved by a diagonally preconditioned conjugate gradient algorithm Data mining with sparse grids using simplicial basis functions p.21/35

28 Complexities of the computation To solve on each grid in the sequence of grids Complexities of the computation storage assembly mv-multipl. is the number of grid points is the number of data points Scales linearly with Data mining with sparse grids using simplicial basis functions p.22/35

29 Using simplicial basis functions On the grids of the combination technique linear basis functions based on a simplicial discretization are also possible So-called Kuhn s triangulation for each rectangular block (1,1,1) (0,0,0) Theoretical properties of this variant of the sparse grid technique still has to be investigated in more detail Since the overlap of supports is greatly reduced due to the use of a simplicial discretization, the complexities scale significantly better Data mining with sparse grids using simplicial basis functions p.23/35

30 Complexities for both discretization variants -linear basis functions linear basis functions on simplicials storage assembly mv-multipl. Reduced -dependence in the complexities with linear basis functions on simplicials N is the number of grid points Scales linearly with, the number of data points Data mining with sparse grids using simplicial basis functions p.24/35

31 Numerical Examples We test our method with Benchmark data sets from the UCI Repository Synthetically generated massive data sets Evaluation and comparison with other methods through either Correctness rates on test data set, which where not used during the computation, 10-fold cross validation, or Leave-one-out cross validation The best is found in an outer loop over several s Data mining with sparse grids using simplicial basis functions p.25/35

32 Ripley data set Ripley data set with level 4 (correctness rate of 91.4 %) Ripley data set with level 8 (correctness rate of 91.0 %) Compare with 90.9 % with level 5, -linear basis functions 91.1 % with neural networks [Penny & Roberts, 1999] Best possible rate is 92.0%, since 8 % error is introduced Data mining with sparse grids using simplicial basis functions p.26/35

66 % leave-one-out correctness Spiral data set with level 8, 89.

33 Spiral data set with linear basis functions Spiral data set with level 6, 84,02 % leave-one-out correctness Spiral data set with level 7, % leave-one-out correctness Spiral data set with level 8, % leave-one-out correctness Compare % with level 6, -linear basis functions 77.2 with neural networks [Singh, 1998] Data mining with sparse grids using simplicial basis functions p.27/35

34 BUPA Liver Disorders data set (6D) linear -linear % % level 2 10-fold train fold test level 3 10-fold train fold test level 4 10-fold train fold test only 345 data points 68.4 % with linear SVM [Mangasarian & Musicant, 2001] 72.8 % with quadratic SVM [Mangasarian & Musicant, 2001] Data mining with sparse grids using simplicial basis functions p.28/35

35 Synthetic massive 6D data set training testing total data matrix # of data correctness correctness time (sec) time (sec) level million level million level million linear basis functions level 2 5 million Data mining with sparse grids using simplicial basis functions p.29/35

36 Synthetic massive 10D data set training testing total data matrix # of data correct. correct. time (sec) time (sec) level million level million Data mining with sparse grids using simplicial basis functions p.30/35

37 Forest cover data set (10D) overall Ponderosa Pine other class 6 attributes level 1 (test set) on evaluation set level 2 (test set) on evaluation set attributes level 1 (test set) on evaluation set Data set separated in three equally sized train, test and evaluation sets, each with about data 86.97% with 6 attributes in [Hegland, Nielsen, and Shen, 2000] Data mining with sparse grids using simplicial basis functions p.31/35

38 Galaxy dim data set (14D) With new variant of the combination technique now up to 14 attributes can be handled level 0 level 1 10-fold train 10-fold test 96.1 % 95.3% 96.7 % 95.6% 4192 data points with 14 attributes 94.8 % with linear SVM in [Fung & Mangasarian, 2001] Data mining with sparse grids using simplicial basis functions p.32/35

39 Parallelization Combination technique parallel on a coarse grain level Classifiers in sequence of grids can be computed independently of each other Just short setup and gather phases are necessary Simple but effective static load balancing strategy Fine grain level parallelization with threads on SMP-machines To compute data dependent the array of the training set can be separated in (# processors) parts Some overhead is introduced to avoid memory conflicts In the iterative solver a vector can be split into parts and each processor now computes the action of the matrix on a vector of size Data mining with sparse grids using simplicial basis functions p.33/35

40 Synthetic massive 10D data set in parallel Coarse grain level parallelization of the combination technique Speed-up of 9.72 with an efficiency of 0.88 on 11 nodes Since only 11 grids have to be calculated for used level 2 no more than 11 nodes are needed Threads for each partial problem in the sequence of grids We achieve acceptable speed-ups from 1.8 for two processors, 7.7 for 12 p. and up to 12.3 for 24 processors As one would expect the efficiency decreases with the number of processors Both parallelization strategies are used simultaneously 11 processes with 6 threads each On 66 processors speed-up of 39.8 with efficiency of 0.6 Data mining with sparse grids using simplicial basis functions p.34/35

41 Conclusions and outlook Our method is well suited for huge data sets Moderate high number of dimensions Enough for a lot of practical applications after the reduction to the essential dimensions Dimension reduction (e.g. SVD) has to be applied Future work Reduce memory requirements further Different refinement levels for the dimensions Can one use this for dimension reduction? Fast solvers for the partial problems in the sequence of grids Regression Approach is implemented in the prudsys DISCOVERER 2000 of Prudential Systems Software GmbH Data mining with sparse grids using simplicial basis functions p.35/35

Data mining with sparse grids

Data mining with sparse grids Jochen Garcke and Michael Griebel Institut für Angewandte Mathematik Universität Bonn Data mining with sparse grids p.1/40 Overview What is Data mining? Regularization networks