Multi-Modal Metropolis Nested Sampling For Inspiralling Binaries Ed Porter (AEI) & Jon Gair (IOA) 2W@AEI Workshop AEI September 2008 (We acknowledge useful converstions with F. Feroz and M. Hobson (Cavendish Laboratory, Cambridge) regarding the MultiNest algorithm (arxiv:0704:3704, arxiv:0810:0781)) 1
Outline 1 Motivation : SMBHBs, multiple sources, multiple mode solutions... 2 Evolutionary Algorithms : Concepts, variants... 3 The Algorithm : the nested sampling, the clustering, x-means, k-means... 4 Current Results : Yes, we have results 5 Future Plans : EMRIs, spinning SMBHBs, smarter algorithms... 2
1 Motivation 3
Motivation & Terminology If we have two events A and B, we define P(A) = prior or marginal probability of A P(A B) = conditional or posterior probability of A given B Bayes Theorem : P(A B) = P(B A)P(A) / P(B) where P(B) acts as a normalising constant. Can also write Bayes theorem as Posterior = Likelihood x Prior Probability / Evidence Problem is P(B) is always very difficult to calculate. 4
SMBHBs 9 Parameter Set Can use F-Statistic to maximize over Non-spinning No extra harmonics Already have algorithm which uses MHMC algorithm which works very efficiently 5
LISA Response To A Source 6
Finding the Modes of the Solution Sky degeneracy From previous Metropolis-Hastings algorithms we can find either the primary or the antipodal solution, but not both at the same time : Cornish & Porter, CQG 24, 5729 (2007) 7
System Knowledge Main acceleration can come from knowledge of the system in question : e.g. degeneracies, symmetries etc Cornish & Porter, CQG 24, 5729 (2007) 8
2 Evolutionary Algorithms 9
What are they? Origins in artificial intelligence Uses a population based optimization algorithm Inspired by biological evolution : reproduction, death, mutation etc Candidate solutions play the role of organisms Fitness criteria determine the environment 10
Metropolis-Hastings Ratio Priors 11
Metropolis-Hastings Ratio Likelihoods 12
Metropolis-Hastings Ratio Transition Probability 13
Should I stay or should I go? First calculate then generate a = U(0,1) and move with probability a = min{1, H}. 14
Metropolis Sampling Developed in 1953 for Boltzmann distributions. Requires symmetric proposals i.e. q(x y) = q(y x) Now only likelihood ratio is important. Can also preserve detailed balance i.e. p(s x)q(y x)=p(s y)q(x y) 15
Metropolis-Hastings Sampling Updated in 1970 to include non-symmetric proposal distributions. Pros : 1) Faster mixing of chains. 2) Can use multiple proposal distributions which only depend on current state. 3) Just requires that a function proportional to a density can be calculated at a point in the parameter space. This allows us to generate a sample without knowing the normalising constant or the evidence. Cons : 1) Multiple proposal distributions require some knowledge of the system. 2) Can still get stuck on secondary solutions. 16
Simulated Annealing Suppose we want to sample from a distribution of the form P(x) = exp(- L(x) / 2). It is usually easier to sample instead from the distribution P*(x) = exp(- L(x) / kt) where k = 2 and we gradually cool from T 1 17
Simulated Annealing Pros : 1) Can prevent chains from getting stuck on secondaries. 2) Shortens and fattens high probability regions. 3) Can greatly speed up burn-in. Cons : 1) No information on initial temperature a priori (can get around this using thermostated annealing, see Cornish & Porter, Class.Quant.Grav.24: 5729,2007) 2) Cool down needs to be slow. 3) If cool down is too fast, the chain WILL get stuck. 4) Makes chain non-markovian (not really a Con for practical purposes). 18
Nested Sampling (Skilling 2004) Method to evaluate evidence Climb likelihood surface by passing through nested equi-likelihood contours Shrink prior volume by rejecting lowest likelihood point Always search for higher likelihood point within lowest equi-likelihood contour Can also return PDFs 19
BIC Scoring Statistical criterion for model selection Also called the Schwarz criterion or SIC If n = sample size, k = # of free parameters and L = maximized likelihood for the estimated model, then Properties BIC = -2 ln L + k ln(n) (a) Independent of priors (b) can measure efficiency of the parameterized model (c) penalizes complexity (d) Good for clustering 20
k-means Clustering (Duda & Hart 1973, Bishop 1995) k stands for number of clusters Goal : given n points, separate into k clusters Starting centroids chosen at random Can converge to the local minimum of a distortion measure Can be quite slow Issues with choosing value of k 21
x-means Clustering (Pelleg & Moore 1999) Define a k(min) and a k(max) Step 1 : Improve parameters (i.e. do k-means for a cluster) Step 2 : Improve structure (if and where new centroids are needed) 22
4 The Algorithm 23
Initial Population Selection Choose fitness threshold, e.g. SNR = 15 Generate organisms until N = 100 Allow organisms to improve their fitness using an uphill climber i.e. Deterministic search where higher likelihood points are accepted without question Generate k centroids according to x-means clustering and associate each of the the organisms with a centroid Centroids with highest BIC scores survive, while unfit centroids are killed off. 24
Population Evolution Population evolves according to a Swarm Intelligence Model i.e. allow organisms in a swarm to evolve on their own, but track center of mass of the swarm Every 20 iterations, recalculate the position and number of centroids, as swarms can break up Re-allocate the organisms with centroids, such that centroids and centroid number evolves along with the children The goal is that the organisms will evolve towards the different modes of a solution And while the organisms do all the work, it is the centroids that we are most interested in as they define the global fitness of a cluster 25
Moving The Clusters Nested Sampling : Try and replace ln L by jumping within a 1-sigma range Uphill Climber : Try proposed point and accept if ln L is better Metropolis : Uniform proposal distribution Metropolis-Hastings : Include non-uniform proposal distributions 26
4 Current Results 27
Binary Parameters 2 Sources 1) m1 = 1e7, m2 = 1e6, z=1, tc = 0.9 yrs, SNR ~ 200 2) m1 = 4e6, m2 = 1e6, z=1.5, tc = 1.02 yrs, SNR ~ 50 Assume Observation time of 1 year Low frequency approximation for LISA response 28
Single Source SMBHB Initial SA heat = 100, TA threshold set to SNR = 10 Used prior information to cluster in sky position Worked very well, clusters found both real and antipodal solutions Used F-Statistic Fitness threshold for initial population set at SNR of 5 No time of coalescence maximization Small number of organisms needed, ~20-30 Code took 3 hours to run on a dual Xeon processor desktop 29
30
31
32
33
Double Source SMBHB Same initial SA and TA conditions Initial fitness threshold set to SNR of 15 While sky clustering worked quite well, needed more time of coalescence and sky clustering worked better. More organisms needed (~100) to account for multiple modes Code takes ~24 hours to run Set k(min) = 2 and k(max) = 10 34
35
36
37
38
39
40
41
42
43
5 For the future... 44
Things To Try... Birth and death of weak organisms/clusters Growing/pruning of clusters to constant population Ant colony optimization More efficient clustering Using cluster to approximate covariance matrix, obtain size and direction of moves Fisher matrix less algorithm? Cross cluster learning 45
Sources To Try... 1) Spinning black hole binaries / EMRIs 2) Will be slower, will need more organisms 3) Can handle degenerate parameters very well 4) May be successful as we are not relying on a single chain which can get stuck 5) Method is designed to not only find the primary modes, but also the secondaries 46
Conclusion Entirely new algorithm for non-spinning SMBHBs Based on an Evolutionary Algorithm Not only finds multiple mode solutions but also maps PDFs Can also return the evidence Works very well for multiple sources Scaling time compared to MHMC algorithm is comparable Application to EMRIs and spinning black holes 47