A New Pool Control Method for Boolean Compressed Sensing Based Adaptive Group Testing

Proceedings of APSIPA Annual Summit and Conference 27 2-5 December 27, Malaysia A New Pool Control Method for Boolean Compressed Sensing Based Adaptive roup Testing Yujia Lu and Kazunori Hayashi raduate School of Informatics, Kyoto University, Kyoto, Japan E-mail: luyujia@sys.i.kyoto-u.ac.jp raduate School of Engineering, Osaka City University, Osaka, Japan E-mail: kazunori@eng.osaka-cu.ac.jp Abstract In the adaptive group testing, the pool a set of items to be tested used in the next test is determined based on past test results, and its performance heavily depends on the control method of the pool. This paper proposes a new pool control method for Boolean compressed sensing based adaptive group testing. The proposed method firstly selects a pool size of the next test by minimizing the expectation of the approximated required after the next test based on the estimated number of remaining positive items. Then, when the selected pool size is one, an item having the highest probability of being positive will be selected as a pool, otherwise a pool with the selected size will be constructed by randomly selecting items. In addition, a new cardinality estimation method of positive items, that can be implemented in parallel with the proposed pool control method, is also proposed. Computer simulation results reveal that the adaptive group testing with the proposed method has better performance than that with the conventional methods for both with and without the information of cardinality of positive items. I. INTRODUCTION In many fault detection problems, it needs to identify a few faulty or positive items in a large set of items. roup testing is known as an efficient method to solve such problems. In each test of group testing, some items are selected to construct a subset of items called pool and a test to tell whether the pool contains at least one positive item or not is performed. After some tests, positive items can be identified based on the test results using some algorithms. It is known that, group testing with an appropriate algorithm can identify all positive items in much smaller than testing all items one by one. A combinatorial group testing technique was first proposed by Dorfman in 94s []. Since then, group testing has been applied to a variety of research areas, including radio frequency identification RFID system [2], network tomography [3] and others. In general, group testing can be classified into two types, namely, non-adaptive and adaptive. Non-adaptive group testing is a simple method that decides pools of all tests at the beginning, whereas adaptive group testing controls the pool of each test based on previous test results. Therefore, non-adaptive group requires less testing time if all tests can be performed in parallel, while adaptive group testing requires smaller, which would be beneficial if the test has to be performed sequentially. In recent years, group testing has attracted much interest from the active research area of compressed sensing. Especially, Boolean compressed sensing has been applied to the group testing, which solves non-adaptive group testing via linear programming relaxation [4]. Kawaguchi et al. have applied it to adaptive group testing, where the pool size is controlled with a criterion of maximizing the information obtained in the next test and the pool is constructed by uniform random selection [5]. We have proposed Boolean compressed sensing based adaptive group testing with the solution space reduction, where the size of unknown vector is reduced according to identified items, and the pool size is determined by maximizing the expectation of the number of identifiable items [6], [7]. In the Boolean compressed sensing based adaptive group testing methods, the pool of each test is commonly determined in two steps. Namely, in the first step, the pool size is selected based on some objective function pool size control, and then, a pool with the selected size is generated by randomly selecting items in the second step pool selection control. However, a better pool control method might exist, especially because the information of positive tests has not been taken advantage of in the existing methods. In this paper, we propose a novel pool control method for Boolean compressed sensing based adaptive group testing. The proposed method selects a pool size of the next test by minimizing the expectation of the approximated required after the next test, and utilizes the information of previous positive tests to select an item having the highest probability of being positive when the pool size is one. Besides, a revised cardinality estimation method of positive items, that can be implemented in parallel with the proposed pool control method, is proposed in the paper. Computer simulation results show that adaptive group testing with the proposed method requires less to achieve a reliable reconstruction than that with the conventional methods for both with and without the information of cardinality of positive items. II. PROBLEM STATEMENT The model of adaptive group testing is shown in Fig.. Suppose N to be the number of all items with only K of which are positive. Note that we assume N is known, while K unknown. Then, we define a Boolean vector of items x = 978--5386-542-3@27 APSIPA APSIPA ASC 27

Proceedings of APSIPA Annual Summit and Conference 27 2-5 December 27, Malaysia repeat test pool positive negative test Fig.. The model of adaptive group testing [x x 2... x N ] T {, } N, where x j = or indicates that the j-th item is positive or negative, respectively. In each test, some items are selected from the set of all items and are mixed to construct a pool. The pool for the s-th test is defined by a Boolean row vector a s = [a s, a s,2... a s,n ] {, } N, where a s,j = or indicates that the j-th item belongs to the pool for the s-th test or not, respectively, and an S N Boolean matrix A S = [a T a T 2... a T S ]T defines the set of pools for S tests. The result of the s-th test is a single Boolean value y s {, }, where y s = indicates that at least one positive item is included in the pool for the s-th test, while y s = indicates that no positive item is in it. y s can be obtained by calculating the Boolean sum Boolean OR of {x j a s,j = }, which is the set of x j belonging to the pool for the s-th test. Accordingly, the observation model can be represented as y s = N a s,j x j, j= where denotes the Boolean sum and denotes the Boolean AND operation. For convenience, we define y S = [y y 2... y S ] T as the vector of test results obtained by S tests given by y S = A S x, 2 where A S x indicates taking Boolean product on corresponding elements of each row of A S and x, then taking Boolean sum of them. The goal of group testing is to reconstruct the unknown sparse Boolean vector x from the information of the pool matrix A S and the test results vector y S. For the case of adaptive group testing, the pool for the next test is determined based on previous test results. Specifically, the row vector a i+, which defines the pool for the i + - th test, is determined by using the observed results y s s =, 2,..., i and pool vectors a s s =, 2,..., i. In the rest of the paper, ˆx i indicates the estimate of x reconstructed from A i and y i. III. ADAPTIVE ROUP TESTINS VIA BOOLEAN COMPRESSED SENSIN We have proposed a method for adaptive group testing via Boolean compressed sensing with solution space reduction in [7], which is introduced in this section. A. Boolean Compressed Sensing Malioutov and Malyutov have proposed Boolean compressed sensing in [4], where group testing is modified into a linear programming problem through relaxing the elements of the vectors and the matrices into real numbers. The problem of Boolean compressed sensing for adaptive group testing after the i-th test is formulated as ˆx = arg min x R N j subject to : x, A P i x, A N i x =, 3 where and are vectors whose elements are all zero and one, respectively. Symbols and respectively mean every element in a vector in the left hand side is not less than and not greater than the corresponding element in the right hand side. Also, A P i and A N i are the matrices constructed by the rows of A i corresponding to positive and negative tests, respectively. B. Solution Space Reduction If test results are not deteriorated by noise, we can identify all items in negative pools as negative and each item in positive pools of size one as positive. Using these properties, some elements of the unknown vector can be confirmed as positive or negative during the tests, and the size of unknown vector composed by candidate items, namely, items to be further tested, can be reduced. Specifically, at the beginning, the set of solution space T = {, 2,..., N} including indexes of all items, negative-confirmed index set U = and positiveconfirmed index set V = are defined. After the i-th test, we update the sets in light of the test result y i and pool a i as follows: if y i = then x j T i+ = T i \ {j a i,j = }, U i+ = U i {j a i,j = }, V i+ = V i, else if y i = and the pool size is one then otherwise T i+ = T i \ {j a i,j = }, U i+ = U i, V i+ = V i {j a i,j = }, T i+ = T i, U i+ = U i, V i+ = V i. 978--5386-542-3@27 APSIPA APSIPA ASC 27

Proceedings of APSIPA Annual Summit and Conference 27 2-5 December 27, Malaysia The improvement of the reconstruction performance by Boolean compressed sensing could be expected with this manipulation, because the of the required number of tests to achieve reliable reconstruction is monotonically increasing for items number N and positive items number K in non-adaptive group testing according to [8], [9], []. C. Pool Control In [7], making much account of effective solution space reduction, the expected number of identifiable items with the solution space reduction in the next test is employed as the criterion for pool size control. Thus, after the i-th test, we set the pool size of the i + -th test i+ to maximize the expectation of the number of items that can be identified in the i + -th test as where J i+ = i+ = arg max J i+, 4 N i+ K i N i+ > =, and N i+ = T i+ is the number of items in the solution space for the i + -th test. Ki is the estimated number of remaining positive items in solution space after the i-th test, which can be given by 5 K i = ˆK i V i+, 6 where ˆK i is the number of all positive items K estimated after the i-th test and V i+ is the size of positive-confirmed set for the i + -th test. Once the pool size is determined, the pool in the i + -th test will be constructed by randomly selecting items with size i+ as the pool selection control. D. Cardinality Estimation Comparing the actual number of negative tests and the statistical expectation of the number of negative tests, a cardinality estimation scheme of K has been proposed in [7] as follows. Since the approximated probability q i that the i-th test is negative is given by Ni K i q i = i Ni, 7 i the expected number of negative tests after the i-th test Q i is given by Ns i K s s Q i =. 8 Ns s= s Based on these discussions, and defining ˆQ i as the actual number of negative tests, the estimated number of all positive items after the i-th test can be given by Ns i ˆK V s+ ˆK i = arg min s ˆQ i ˆK Ns. 9 s= s E. Algorithm Scheme The iterative algorithm of the conventional method is summarized in Algorithm. Algorithm Method in [7] : initialize ˆK 2: calculate by 4 3: randomly construct a pool with size 4: perform test and recover ˆx by 3 5: if T = then stop the algorithm else continue 6: solution space reduction 7: calculate ˆK by 9 8: return to 2 IV. PROPOSED METHOD In this paper, we propose novel methods of both pool size control and pool selection control to improve the performance. Moreover, a new cardinality estimation method of positive items is also proposed, which can be implemented in parallel with the proposed pool control method. A. Pool Control Pool size control: Our approach is to select a pool size of the next test which minimizes the expectation of the required after the next test. To the best of our knowledge, however, the analytical expression of the required in general adaptive group testing has not been figured out yet. Consequently, in this paper, we use the upper bound K + 2 log N K derived in [] for adaptive group testing with the binary protocol as the approximation of the required. Due to the properties of solution space reduction, we should consider situations that the pool size is more than one and equal to one separately as follows: a pool size > All items in the pool could be identified as negative when the test result is negative, while none of them could be identified when the test result is positive. Therefore, the approximated required after the negative i- th test is given by D n,i = K i + 2 log N i+, K i 978--5386-542-3@27 APSIPA APSIPA ASC 27

Proceedings of APSIPA Annual Summit and Conference 27 2-5 December 27, Malaysia and after the positive i-th test is given by D p,i = K i + 2 log N i+. K i The probability that the i-th test becomes negative is given by Ni+ K i m n,i = Ni+, 2 and the probability of the positive test is obtained as m p,i = m n,i. 3 b pool size = The selected item of that pool could be confirmed after the test regardless of the test result if =. Therefore, the approximated required after the negative i-th test is given by D n,i = K i + 2 log N i+, K i and after the positive i-th test is given by D p,i = Ki + 2 log N i+. K i 4 5 The probability that the i-th test becomes negative is given by m n,i = N i+ K i N i+, 6 and the probability of the positive test is obtained as m p,i = m n,i. 7 In conclusion, the expectation of approximated required after the i-th test can be summarized as m n,i D n,i + m p,i D p,i > F i+ = 8 m n,i D n,i + m p,i D p,i =, and thus, we control the pool size by i+ = arg min F i+. 9 2 Pool selection control: We propose a pool selection control method utilizing the information of previous positive tests, where the idea is that items included in smaller pools with positive test many times have higher probabilities of being positive. To be more specific, we first recall the matrix A P i of size R i N after the i-th test, defined in the previous section, where R i depends on the number of positive tests. Here, we define the j-th column vector of A P i as ã i,j = [a,j a 2,j... a Ri,j] T {, } Ri, and a r,j indicates whether the j-th item is included in the pool of the r-th positive test or not. Moreover, we notice that the item included in smaller pools have a higher probability of being positive. So we use the reciprocal of pool size as the weight of each positive test as w = [ 2... Ri ]. Thus, we approximate the probability p i,j that the j-th item is positive after the i-th test as p i,j wã i,j. 2 For the case with i+ =, we choose an item j max in T i which maximizes the probability as j max = arg max wã i,j. 2 j T i On the other hand, for i+ >, uniform random selection of items in the pool is employed thinking much of the exploration of various items. B. Cardinality Estimation As stated above, since we artificially select an item to construct the pool of size one, the probability p i,j that the remaining item is positive does not obey uniform distribution any longer, whereas the expectation of the number of remaining positive items is not changed as p i,j = K i. 22 j T i Taking advantage of this fact, the probability that the pool of size one is positive can be obtained as p i,jmax = K i wã i,jmax j T i wã i,j. 23 As a result, the formula of probability q i that the i-th test is negative 7 can be rewritten as N i ˆK i V i+ q i = i N i i i > ˆK i V i+ wã i,jmax i =, j T i wã i,j and the cardinality estimation is revised as Ns ˆK i = arg min ˆQ i ˆK V s+ s ˆK Ns {s s>} s ˆK V s+ wã s,jmax {s j T s=} s wã s,j. 25 C. Algorithm The iterative algorithm of the proposed method is summarized in Algorithm 2. 24 978--5386-542-3@27 APSIPA APSIPA ASC 27

Proceedings of APSIPA Annual Summit and Conference 27 2-5 December 27, Malaysia Algorithm 2 Method : initialize ˆK 2: calculate by 9 3: if = then artificially construct the pool by 2 else randomly construct with size 4: perform test and recover ˆx by 3 5: if T = then stop the algorithm else continue 6: solution space reduction 7: calculate ˆK by 25 8: return to 2.2. 2 4 6 8 2 4 6 8 2 Fig. 2. Exact recovery rate versus the N = 2, K = 5.2. 2 4 6 8 2 4 6 8 2 Fig. 3. Exact recovery rate versus the N = 2, K =.2 V. SIMULATION EXPERIMENTS In this section, we evaluate performance of the proposed adaptive group testing method through simulation experiments. The performance is evaluated in terms of the exact recovery rate of the vector x versus the and the exact recovery rate has been obtain by averaging the results in trials. The performance of the proposed method is compared with that of the conventional method in [7], assuming known and unknown K. In the rest of figures, oracle means known K and the orange line labeled as represents the approximation of the required at the beginning, which is calculated by K+2 log N K. Figs. 2, 3 and 4 respectively show the s of the methods for N = 2, K = 5, N = 2, K = and N = 2, K = 5. The difference of performance between the proposed method and the conventional method are not so large in the case of small K. However, we can see that the performance of the proposed method is better than that of the conventional method for both known and unknown K with increasing K. Figs. 5, 6 and 7 with N = 5, K =, N = 5, K = 2 and N = 5, K = 3 confirm that the proposed method even outperforms the conventional method with the perfect information of the number of positive items, when the number of positive items is large.. 2 4 6 8 2 4 6 8 2 Fig. 4. Exact recovery rate versus the N = 2, K = 5 VI. CONCLUSIONS We have proposed a novel pool control method for Boolean compressed sensing based adaptive group testing, which selects a pool size of the next test by minimizing the expectation of the approximated required after the next test and controls the selection of pools by taking advantage of the information of positive tests when selected pool size is one. Moreover, a revised cardinality estimation method of positive items is also proposed in accordance with new pool control method. Computer simulation results show that the proposed method requires less than the conventional methods for both with and without the information of cardinality of positive items. ACKNOWLEDMENT This work was supported in part by JSPS KAKENHI rant Numbers 5K664 and 5H2252. 978--5386-542-3@27 APSIPA APSIPA ASC 27

Proceedings of APSIPA Annual Summit and Conference 27 2-5 December 27, Malaysia.2. 5 5 2 25 3 35 4 45 5.2. 5 5 2 25 3 35 4 45 5 Fig. 5. Exact recovery rate versus the N = 5, K = Fig. 7. Exact recovery rate versus the N = 5, K = 3.2. 5 5 2 25 3 35 4 45 5 Boolean Compressed Sensing Based Adaptive roup Testing, IEICE Technical Report, vol. 6, no. 45, pp. 5-56, July 26. [8]. K. Atia and V. Saligrama, Boolean compressed sensing and noisy group testing, Information Theory, IEEE Transactions, vol. 58, no. 3, pp. 88-9, February. 22. [9] C. L. Chan, P. H. Che, S. Jaggi, and V. Saligrama, Non-adaptive probabilistic group testing with noisy measurements: Near-optimal bounds with efficient algorithms, Communication, Control, and Computing, 2 49th Annual Allerton Conference, pp 832-839, Sept. 2. [] D. Sejdinovic and O. T. Johnson, Note on noisy group testing: Asymptotic bounds and belief propagation reconstruction, Proceedings of the 48th Annual Allerton Confer- ence on Communication, Control and Computing, Monticello Illinois, pp 998-3, 2. [] Keith Ball, Strange Curves, Counting Rabbits, and Other Mathematical Explorations. Princeton, New Jersey: Princeton University Press, 26. Fig. 6. Exact recovery rate versus the N = 5, K = 2 REFERENCES [] R. Dorfman, The detection of defective members of large populations, Annals of Mathematical Statistics, vol. 4, no. 6, pp. 436-44, Dec. 943. [2] B. Sheng, CC Tan, Q Li, and W Mao, Finding popular categories for RFID tags, Proceedings of the 9th ACM international symposium on Mobile ad hoc networking and computing, ACM, pp. 59-68, May 28. [3] M. Mukamoto, T. Matsuda, S. Hara, K. Takizawa, F. Ono, and R. Miura, Adaptive boolean network tomography for link failure detection, Integrated Network Management IM, 25 IFIP/IEEE International Symposium on. IEEE, pp. 646-65, May 25. [4] Malioutov, Dmitry, and Mikhail Malyutov, Boolean compressed sensing: LP relaxation for group testing, 22 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 335-338, March 22. [5] Y. Kawaguchi, T. Osa, S. Barnwal, H. Nagano, and M. Togami, Information-based pool size control of Boolean compressive sensing for adaptive group testing, 24 22nd European Signal Processing Conference, pp. 228-2284, Sept. 24. [6] R. Kawasaki, K. Hayashi, and M. Kaneko, Pool size control for adaptive group testing via Boolean compressed sensing with solution space reduction, Proceedings of APSIPA Annual Summit and Conference 25, pp. 6-9, Dec. 25. [7] Y. Lu and K. Hayashi, Cardinality Estimation of Positive Items for 978--5386-542-3@27 APSIPA APSIPA ASC 27