Introduction to Jackknife Algorithm

Size: px

Start display at page:

Download "Introduction to Jackknife Algorithm"

Lesley Baker
5 years ago
Views:

1 Polytechnic School of the University of São Paulo Department of Computing Engeneering and Digital Systems Laboratory of Agricultural Automation Introduction to Jackknife Algorithm Renato De Giovanni Fabrício Rodrigues

2 Overview The Jackknife Algorithm Motivation The Jackknife Parallel Version Results Next Activities References

3 The Jackknife Algorithm Used for bias and variance estimation Resample technique Subsamples constructed from the original sample

4 The Jackknife Algorithm Let θ the parameter of interest be estimated An original sample X of size n is selected n Jackknife samples are generated eliminating example i in each new sample, from the original sample X Is calculated θ X= { X 1,X 2,X 3,...,X n }. X i ={ X 1,...,X i 1,X i+1,...,x n }. from the original sample θ i from each sample X (i)

5 The Jackknife Algorithm The Jackknife estimator of θ corrected by bias until order n -1 is obtained: θ J 1 θ J 1 =n θ n 1 θ. = θ n 1 θ. θ The term estimator n 1 θ. θ is the bias jackknife and θ. = θ i /n

6 Motivation Sequential version It is possible to use the Jackknife to determine the importance of each environmental layer in the modeling process of species distribution? Parallel version The sequential version presents high computational cost, which can make its use in the modeling process impracticable.

7 The Jackknife Parallel Version Master-slave model Master Caracteristics Dynamic Scheduling Task Slave 1 Slave 2 Slave n... File size of the Task Partial Results environmental layers Task Master Task Final Result

8 The Jackknife Parallel Version The parallel version of the Jackknife algorithm was developed using MPI library (Message Passing Interface) Messages exchange Inter-process communication SPMD (Sigle Program, Multiple Data)

9 Results Preliminary Tests Initial validation of the Jackknife parallel version Hardware: Intel Core 2 Duo of 1,66 Ghz and 2 GB RAM, Linux Ubuntu 7.04 The time command available in Linux was used Modeling algorithm: GARP Data: 100 occurrence points, being 50 presence points and 50 absence points Stryphnodendron obovatum 67 environmental layers

10 Results Results of the preliminary tests Sequential version took seconds Parallel version with 2 processes took seconds The execution becomes essentially sequential Additional overhead due to messages exchange Parallel version with 3 processes took seconds Approximately 38% faster than the sequential version The adequate use of the available cores by the application processes can drastically reduce the execution time of the Jackknife algorithm in the openmodeller tool

11 Results Tests in the openmodeller Cluster Hardware: SGI Altix XE 1300 system composed by an input node Altix XE 210 with two 2.00GHz Xeon quad Core processors, 8 GB RAM, 500 GB hard disk, 24-port InfiniBand switch, 24-port Gigabit ethernet switch, SGI Propack 5, SUSE Linux 10, and 10 Altix XE 310 nodes, each one with two 2.00GHz Xeon quad Core processors, 8 GB RAM and 250 GB hard disk, totaling 80 cores. The time command available in Linux was used Modeling algorithm: GARP Data: 100 occurrence points, being 50 presence points and 50 absence points Stryphnodendron obovatum 244 environmental layers

12 Results Results of the tests in the cluster Sequential version spent seconds The number of processes varied from 3 to Time in Seconds seconds Number of Processes

13 Results Results of the tests in the cluster The best execution time was with 68 processes (184.3 seconds) Approximately 95% faster than the execution with 3 processes 8500 With 3 processes: seconds Aproximately 98% faster than the sequential version Sequential: seconds Time in seconds Sequential 3 processes 68 processes

14 Results Results of the tests in the cluster The ideal is a linear speedup, that is, when S p = p, resulting in a very good scalability Speedup Number of Processes

15 Results Results of the tests in the cluster When each processor runs just one process, the performance is better Eficiency Efficiency 0,9 0,8 0,7 0,6 0,5 0,4 0,3 0,2 0, Number of Processes

16 Next Activities Tests for hypothesis validation: it is possible to use the Jackknife to determine the importance of each environmental layer in the modeling process of species distribution? To make available a parallel version for cluster use

17 References M. H. Quenouille, Notes on Bias in Estimation. Biometrika, Vol. 43, No. 3/4., pp , B. Efron, Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics, Vol. 7, No. 1, (Jan., 1979), pp

MPI CS 732. Joshua Hegie

MPI CS 732. Joshua Hegie MPI CS 732 Joshua Hegie 09 The goal of this assignment was to get a grasp of how to use the Message Passing Interface (MPI). There are several different projects that help learn the ups and downs of creating