Mean Square Residue Biclustering with Missing Data and Row Inversions Stefan Gremalschi a, Gulsah Altun b, Irina Astrovskaya a, and Alexander Zelikovsky a a Department of Computer Science, Georgia State University, Atlanta, GA 30303 {stefan,iraa,alexz}@cs.gsu.edu b Department of Reproductive Medicine, University of California, San Diego, CA 92093, galtun@ucsd.edu Abstract. Cheng and Church proposed a greedy deletion-addition algorithm to find a given number of k biclusters, whose mean squared residues (MSRs) are below certain thresholds and the missing values in the matrix are replaced with random numbers. In our previous paper we introduced the dual biclustering method with quadratic optimization to missing data and row inversions. In this paper, we modified the dual biclustering method with quadratic optimization and added three new features. First, we introduce row status for each row in a bicluster where we add and also delete rows from biclusters based on their status in order to find min MSR. We compare our results with Cheng and Church s approach where they inverse rows while adding them to the biclusters. We select the row or the negated row not only at addition, but also at deletion and show improvement. Second, we give a prove for the theorem introduced by Cheng and Church in [4]. Since, missing data often occur in the given data matrices for biclustering, usually, missing data are filled by random numbers. However, we show that ignoring the missing data is a better approach and avoids additional noise caused by randomness. Since, an ideal bicluster is a bicluster with an H value of zero, our results show a significant decrease of H value of the biclusters with lesser noise compared to original dual biclustering and Cheng and Church method. Keywords: biclustering, Mean Square Residue 1 Introduction The gene expression data are given in matrices. In these matrices rows represent genes and columns represent experimental conditions. Each cell in the matrix represents the expression level of a gene under a specific experimental condition. It is well known that, genes can be relevant for a subset of conditions. On the other hand, groups of conditions can be clustered by using different groups of genes. In this case, it is important to do clustering in these two dimensions simultaneously. This led to the discovery of Partially supported by GSU Molecular Basis of Disease Fellowship.
2 Stefan Gremalschi et al. biclusters corresponding to a subset of genes and a subset of conditions with a high similarity score by Cheng and Church [4]. Biclustering algorithms perform simultaneous row-column clustering. The goal in these algorithms is to find homogeneous submatrices. Biclustering has been widely used to find appropriate subsets of experimental conditions in microarray data [1, 5, 15, 7, 9, 11 13, 18, 19]. Cheng and Church s algorithm is based on a natural uniformity model which is the mean squared residue. They proposed a greedy deletion-addition algorithm to find a given number of k biclusters, whose mean squared residues (MSRs) are below certain thresholds. However, in their method, missing values in the matrix is replaced with random numbers. It is possible that these random numbers can interfere the discovery of future biclusters, especially those ones that have overlap with the discovered ones. Yang et al. [16, 15] referred to this as random interference. They generalize the model of bicluster to incorporate missing values and propose a probabilistic algorithm. They defined a probabilistic move-based algorithm FLOC (FLexible Overlapped biclustering) that generalizes the concept of mean squared residue and based on the concept of action and gain. However, FLOC model is still not suitable for non-disjoint clusters and there are more user parameters, including the number of biclusters. These additional features can have negative impacts to the clustering process. In this paper, we propose a similar method to handle the missing data. We have first mathematically characterized general ideal biclusters, i.e., biclusters with zero mean square residue. We have shown that new way of handling missing data is significantly more tolerant to noise. We have also introduced status for each row status -1 means that the corresponding row is inverted (negated), status +1 means that the original row is not inverted. We consider the problem of finding min MSR overall possible row inversions. A limited use of row inversion (without introducing row status) has been applied in [4] when rows are added to biclusters. Based on our findings in [14], we developed a new dual biclustering algorithm and quadratic program that treats missing data accordingly and use the best status assignment. The matrix entries with missing data are not taken in account when computing averages. When comparing our method with Cheng and Church [4], we show that it is better to ignore missing data when adjusting the mean squared residue (MSR) value for finding optimal biclusters. We use a set of methods which includes a dual biclustering algorithm, quadratic program (QP) and combination of dual biclustering with QP which finds (k l)-bicluster with MSR using a greedy approach proposed in paper [14]. We use a set of methods which includes a dual biclustering algorithm, quadratic program and combination of dual biclustering with QP which finds (k l)-bicluster with MSR using a greedy approach proposed in paper [14]. Finally, we apply the best row status assignments and get even better average and median MSR overall set of all biclusters. The reminder of this paper is organized as follows. Section 2 gives the formal definition of mean squared residue. In section 3, we give a new definition for adjusting MSR and prove a necessary and sufficient criteria for a matrix to have a perfect correlation. Section 4 defines the inversion based MSR and shows how to compute it. In section 5, we introduce the dual problem formulation described in [14] and we illustrate the comparison between the new adjusted MSR with Cheng and Church s method. The search
MSR Biclustering with Missing Data and Row Inversions 3 of biclusters using the new MSR is given in section 6. The analysis and validation of experimental study is given in Section 7. Finally, we draw conclusions in Section 8. 2 Mean Squared Residue Mean squared residue problem has been defined before by Cheng and Church [4] and Zhou and Khokhar [13]. In this paper, we use the same terminology as in [13]. In this section, we give a brief introduction to the terminology as given in [14]. Our input is an (N M)-data matrix A, with R rows and C columns, where a cell a ij is a real value that represents the expression level of gene i(row i), under condition j(column j). Matrix A is defined by its set of rows, R = {r 1, r 2,..., r N } and its set of columns C = {c 1, c 2,..., c M }. Given a matrix, biclustering finds sub-matrices, that are subgroups of rows (genes) and subgroups of columns, where the genes exhibit highly correlated behavior for every condition. Given a data matrix A, the goal is to find a set of biclusters such that each bicluster exhibits some similar characteristic. Let A IJ = (I, J) represent a submatrix of A (I R and J C). A IJ contains only the elements aij belonging to the submatrix with set of rows I and set of columns J. A bicluster A IJ = (I, J) can be defined as a k by l sub-matrix of the data matrix where k and l are the number of rows and the number of columns in the submatrix A IJ. The concept of bicluster was introduced by [4] to find correlated subsets of genes and a subset of conditions. Let a ij denote the mean of the i-th row of the bicluster (I, J), a Ij the mean of the j-th column of (I, J), and a IJ the mean of all the elements in the bicluster. As given in [4], more formally, a ij = 1 J a Ij = 1 I a ij,i I, (1) j J a ij,j J, (2) i I a IJ = 1 I J i I,j J a ij. (3) According to [4], the residue of an element a ij in a submatrix A IJ equals r ij = a ij a ij a Ij + a IJ (4) The difference between the actual value of a ij and its expected value predicted from its row, column, and bicluster mean is given by the residue of an element. It also reveals its degree of coherence with the other entries of the bicluster it belongs to. The quality of a bicluster can be evaluated by computing the mean squared residue H, i.e. the sum of all the squared residues of its elements[4]: H(I,J) = 1 I J i I,j J (a ij a ij a Ij + a IJ ) 2 (5)
4 Stefan Gremalschi et al. A submatrix A IJ is called a δ bicluster if H(I,J) δ for some given threshold δ 0. In general, we can formulate biclustering problem bilaterally maximize the size (area) of the biclusters and minimize MSR. But, these two objectives above contradict each other because smaller biclusters have smaller MSR and vice versa. Therefore, there are two optimization problem formulations. Cheng and church considered the following formulation: Maximize the bicluster size (area) subject to an upper bound on MSR. 3 Adjusting MSR for missing data Missing data often occur in biological data. Common practice to deal with them is to fill gaps by random numbers. However, it adds noise and may result in biclusters of lower quality. Alternative approach is to ignore missing data, keeping only originally available information. Let A be a bicluster (I,J). We denote via J i J bicluster s columns without missing data in i-th row and via I j I rows without missing data in j-th column. Then the mean of the i-th row of the bicluster, the mean of the j-th column, and the mean of all the elements in the bicluster are reformulated as follows in equations 6, 7 and 8. a ij = 1 a ij,i I, (6) J i j J i a Ij = 1 a ij,j J, (7) I j i I j 1 a IJ = j J I j i I j,j J a ij. (8) In order to compare the approach with the Cheng-Church s approach for handling missing data, a bicluster with zero H-value were used. A bicluster with H=0 is called ideal bicluster. Theorem Let n m matrix A be a bicluster (I,J). Then, A has a zero H-value if and only if A can be represented as a sum of n-vector X and m-vector Y in the following way a ij = x i + y j,i I,j J. Proof First, we assume that A is a n m bicluster (I,J) with zero H value and try to prove that A can be represented as above-mentioned sum. Zero H value means zero residues r ij,i I,j J. Then each element of A can be calculated as follows a ij = a ij +a Ij a IJ. Denoting X = {x i = a ij ai 2 J } i I and Y = {y j = a Ij ai 2 J } j J results in A = X + Y where vector addition is defined as a ij = x i + y j. Q.E.D. In the other direction, we assume that bicluster A can be represented as a sum of n-vector X and m-vector Y and try to show that A has zero H-value. Since a ij = x i + y j,i I,j J, the mean of the i-th row is a ij = mxi+ j J yj, the mean of the j-th column is a Ij = i I xi+nyj n, and the mean of all the elements in the bicluster is a IJ = m i I xi+n j J yj nm. Obviously, the residues are equalled to zero. Indeed, m
r ij = x i + y j x i m bicluster A has zero H-value. MSR Biclustering with Missing Data and Row Inversions 5 j J yj i I xi i I n y j + xi j J n + yj m = 0. Thus, the Note. Theorem also covers biclusters that are product of two vectors. Indeed, applying logarithm to them produces biclusters that are represented as a sum. 4 MSR with Row Inversions In the original definition of biclusters, it is possible to invert (negate) certain rows. The row inversion corresponds to negative correlation rather than usual positive correlation of the inverted rows with other rows in the bicluster. The row inversion may result in the significant reduction of the bicluster MSR. In contrast to algorithmically handling inversions when adding rows (see [4]), we suggest to embed row inversion in the MSR definition as follows. We associate with each row its status which is equal -1 if the row is inverted and +1, otherwise. Definition. The Mean Square Residue with row inversions is minimum MSR over all possible row statuses. Finding the optimal row status assignment is not a trivial problem. Since MSR of a matrix does not change when positive linear transformations is applied, we can show that there is a single global minimum of MSR among all possible status assignments. A greedy iterative method changing status of row if the resulted MSR of the entire matrix decreases will find such minimum. Unfortunately, this greedy method is too slow to apply even once while it is better to apply it after each node deletion. Therefore, we suggest the following simple heuristic iteratively over each row find which total row square residue is lower: the original or the one with all values inverted (negated). The better choice is used as the row status. In our experiments, this heuristic always finds the optimal inversion status assignment. 5 Dual Biclustering In this section, we give a brief overview of the dual biclustering problem and our algorithm that we described in [14]. We formulate the dual biclustering problem as follows: given expression matrix A, find k l bicluster with the smallest mean squared residue H. For a set of biclusters, we have: Given: matrix A n m, set of bicluster sizes S, total overlapping V. Find: S biclusters with total overlapping at most V and total minimum sum of scores H. This algorithm implements the new computation of MSR which ignores missing data. The algorithm uses only the present data that is available. The greedy algorithm for finding a bicluster may start with the entire matrix and at each step try all single rows (columns) addition (deletion), applying the best operation if it improves the score and terminating when it reaches the bicluster size k l. The output bicluster will have the smaller MSR for the given size. Like in [4], the algorithm uses the structure of the mean
6 Stefan Gremalschi et al. residue score to enable faster greedy steps: for a given threshold α, at each deletion iteration all rows (columns) for which d(i) > αh(i,j) are removed. Also, the algorithm implements the addition of inverse rows to the matrix, allowing the identification of the biclusters which contains co-regulation and inverse co-regulation. Single node deletion and addition algorithms are shown in Figure 1 and Figure 2, respectively. Input: Expression matrix A on genes n, conditions m and bicluster size (k, l). Output: Bicluster A I,J with the smallest adjusted MSR. Initialize: I = n, J = m, w ( i, j) = 0, i n, j m. Iteration: 1. Calculate a ij, a Ij and H(I, J) based on adjusted MSR. If I = k, J = l output I, J. 2. For each row calculate d(i) = 1 J i j J i RS IJ(i, j) 3. For each column calculate e(j) = 1 I j i I j RS IJ(i, j) 4. Take the best row or column and remove it from I or J. Fig. 1. Single node deletion algorithm. Input: Expression matrix A and bicluster size (k, l). Output: Bicluster A I,J with I I and J J. Iteration: 1. Calculate a ij, a Ij and H(I, J) based on the adjusted MSR. 2. Add the columns with 1 I j i I j RS IJ(i, j) H(I, J) 3. Calculate a ij, a Ij and H(I, J) based on the adjusted MSR. j J i RS IJ(i, j) H(I, J) 4. Add the rows with 1 J i 5. If nothing was added or I = k, J = l, halt. Fig. 2. Single node addition algorithm. This algorithm is used as a subroutine and repeatedly applied to the matrix. We are using bicluster overlapping control (BOC) to avoid finding the same bicluster over and over again. The penalty is applied for using the cells present in biclusters found before. By using BOC, we can preserve the original data from losing information it carries because we do not mask biclusters with random numbers. The general biclustering scheme is outlined in Figure 3, where w ij is an element of weights matrix W, A is the resulting data matrix after node deletion on original matrix A; and A is the resulting matrix after node addition on A. We used the measure of bicluster overlapping, V, introduced in [14], which is the complement to ratio of number of distinct cells used in all found biclusters and the area of all biclusters.
MSR Biclustering with Missing Data and Row Inversions 7 Input: Expression matrix A, parameter α and a set S of bicluster sizes. Output: S biclusters in matrix A. Iteration: 1. w ( i, j) = 0, i n, j m. 2. while S not empty do 3. (k, l) = get first element from S 4. S = S {(k, l)} 5. Apply multiple node deletion on A giving (k, l). 6. Apply node addition on A giving (k, l). 7. Store A and update W. 8. end. Fig. 3. Dual biclustering algorithm. 6 MSR Minimization via Quadratic Program We have defined the Dual Biclustering as an optimization problem [6], [3] in [14]. We have also defined a quadratic program for biclustering in [14]. In this paper, we have modified our QP in [14] where we reformulated the objective and constraints in order to handle missing data. We define the dual biclustering formulation as an optimization problem [14]: for a given matrix A n m, find the bicluster with bounded size (area) k l with minimal mean squared residue. It can be easily seen that if MSR has to be defined as QP objective, it will be of a cubic form. Since QP s objective can be contain only squared variables, the following constraint needs to be satisfied: define QP objective in such a way that only quadratic variables are present. To meet this requirement, we simulated variable multiplication by addition as described in [14]. 6.1 Integer Quadratic Program For a given normalized matrix A n m and bicluster size k l, the Integer Quadratic Program is defined as follows: Objective 1 Minimize : I J i n,j m (residue ij) 2 Subject to I = k J = l residue ij = a ij x ij a ij x ij a Ij x ij + a IJ x ij a ij = 1 J j m a ij, a Ij = 1 I x ij row i + column j 1 x ij row i i n a ij and a IJ = 1 I J i n, j m a ij
8 Stefan Gremalschi et al. x ij column j i n row i = k j m column j = l x ij, row i, column j {0, 1} End The QP is used as a subroutine and repeatedly applied to the matrix. For each bicluster size, we generate a separate QP. In order to avoid finding the same bicluster over and over again, the discovered bicluster is masked by replacing the values of its submatrix with random values. Row inversion is simulated by adding to the input matrix A its inversed rows. The resulting matrix will have twice more rows. Missing data is handled in the following way: if an element of the matrix contains a missing value, then it does not participate in computation of mean squared residue H. In this case, the row mean A ij will be equal to the sum of all cells in row i that are not marked as missing values and divided by their number. Similar for column mean A Ij and bicluster average A IJ. Since the integer QP is too slow and its not scalable enough, we used the greedy rounding and random interval rounding methods proposed in [14]. 6.2 Combining Dual Biclustering with Rounded QP Input: Expression matrix A, parameters α, ratio k, ratio l and a set of bicluster sizes S. Output: S biclusters in matrix A. 1. while S not empty do 2. (k, l) = get first element from S 3. S = S {(k, l)} 4. k = k ratio k 5. l = l ratio l 6. Apply multiple node deletion on A giving (k, l ). 7. Apply node addition on A giving (k, l ). 8. Update W. 9. Run QP on A giving (k, l ). 10. Round Fractional Relaxation and store A. 11. end. Fig. 4. Combined Adjusted Dual Biclustering with Rounded QP algorithm. In this section, we combined the adjusted dual biclustering with modified rounded QP algorithm. Here, our goal is to reduce the instance size to speed up the QP. First, we apply adjusted dual biclustering algorithm to input matrix A to reduce the instance size where the new size is specified by two parameters: ratio k and ratio l. Then, we run rounded QP on the output obtained from Dual Biclustering algorithm. This combination improves the running time of the QP and increases the quality of the final bicluster since
MSR Biclustering with Missing Data and Row Inversions 9 an optimization method is applied. The general algorithm scheme is outlined in Figure 4, where W is the weights matrix, A is the resulting matrix after node deletion and A is the resulting matrix after node addition. 7 Experimental Results In this section, we analyze results obtained from Dual Biclustering with adjusted MSR for missing data. We describe comparison criteria, define the swap rule model and analyze the p value of the biclusters. We tested our biclustering algorithms on data from [10] and compared our results with Cheng and Church [4]. For a fair comparison, we used bicluster sizes published by [4]. A systematic comparison and evaluation of biclustering methods for gene expression data is given in [17]. However, their model uses biologically relevant information, whereas our model is more generic and based on statistical approach. Therefore, we haven t used their comparison results in this paper. 7.1 Evaluation of the adjusted MSR To measure robustness of the proposed MSR to noise and evaluate quality of the obtained biclusters, the experiments were run on the imputed data. Let A be a (I,J) bicluster with zero H-value and variation of real data σ 2. Corresponding imputed bicluster A p is defined as follows in the following equation. a p ij = a ij + ε ij (9) where p is a percentage of added noise, {ε ij } i I,j J N(0, 7.2 The goal of our experiments p 100 σ2 ). The goal of our experiments is to find percentage of noise data such that algorithm is still able to distinguish bicluster of size k from non-biclusters in the imputed data. Although, one can determine such percentage in respect to submatrices of the bicluster, the probability of having distinguishable submatrix when bicluster can not be already distinguished from non-bicluster tends which becomes zero due to uniformly distributed imputation of error. 7.3 Experimental results Figure 5 compares Cheng and Church, dual biclustering, dual biclustering coupled with QP, adjusted dual biclustering, adjusted dual biclustering coupled with QP and adjusted dual biclustering with row inversion. Average MSR for adjusted dual and QP represents 68 percent (average) and 48 percent (median) of the data published in [4]. These results show that ignoring missing data for the dual algorithm gives much smaller MSR. The effect of noise on the MSR computation using synthesized data can be seen in Figure 6.
10 Stefan Gremalschi et al. Algorithms Cheng and Church* Cheng and Church** Dual Dual and QP Adjusted Dual Adjusted Dual and QP Adjusted Dual with inverted rows OC parameter n/a n/a 1.8 1.7 1.8 1.7 1.6 Covering 39945 39945 40548 41037 40548 41087 43028 Average MSR 204.29 228.56 205.77 171.5 161.23 154.66 195.9 (%) 100 112 100.72 75.02 70.54 68 95 Median MSR 196.3095 204.96 123.27 104.47 104.66 95.46 77.96 (%) 100 105 62.79 47.91 51.1 47 39.71 Fig. 5. Comparison of biclustering methods Noise vs. MSR 1000000 900000 800000 MSR 700000 600000 500000 400000 300000 200000 100000 0 0% 3% 5% 10% 20% 30% Noise (%) 0% Missing Data 5% Missing Data 10% Missing Data 15% Missing Data Fig. 6. MSR computation for synthesized data Figure 7 shows the noise effect on adjusted MSR computation vs. random filled missing data. It is easy to see that adjusted MSR is less affected by noise than random filled missing data. Figure 8 shows how noise affects adjusted MSR random filled missing data for different levels of noise. We measure the statistical significance of biclusters obtained by our algorithms using p value. P value is computed by running Dual Problem algorithm on 100 random generated input data sets. The random data is obtained from matrix A by randomly selecting two cells in the matrix (a ij,d kl ) and taking their diagonal elements (b kj,c il ). If a ij > b kj and c il < d kl, algorithm swaps a ij with c il and b kj with d kl, it is called a hit. If not, two elements a ij and d kl are randomly chosen again. The matrix is considered randomized if there are nm 2 hits. In our case, p value is smaller than 0.001, which indicates that the results are not random and are statistically significant. 8 Conclusions Random numbers can interfere with the discovery of future biclusters, especially those ones that have overlap with the discovered ones. In this paper, we introduce a new
MSR Biclustering with Missing Data and Row Inversions 11 Missing Data vs. Random Missing Data 1200000 10% Missing Data 10% Random Missing Data 1000000 800000 600000 400000 200000 0 0% 3% 5% 10% 20% 30% Noise (%) 50% 60% 70% 10% Random Missing Data 10% Missing Data MSR Fig. 7. Adjusted MSR vs. random filled missing data 1200000 1000000 Missing Data vs. Random Missing Data 0% Missing Data 10% Missing Data 10% Random Data 800000 600000 400000 200000 0 0% 3% 5% 10% 20% Noise (%) 10% Random Data 10% Missing Data 30% 0% Missing Data MSR 50% 60% 70% Fig. 8. MSR random filled missing data for different levels of noise. approach to handle the missing data which does not take in account entries with missing data. We have characterized ideal biclusters, i.e., biclusters with zero mean square residue and shown that this approach is significantly more stable with respect to increasing noise. Several biclustering methods have been modified accordingly. Our experimental results show a significant decrease of H value of the biclusters when comparing with counterparts with noise reduction (e.g., the original Cheng and Church [4] method). Average MSR for adjusted dual and QP represents 68 percent (average) and 48 percent (median) of the data published in [4]. These results showed that ignoring missing data for the dual algorithm gives much smaller MSR. We also define MSR based on the best row inversion status. We give an efficient heuristic for finding such assignment. This new definition allow to further reduced MSR for a found set of biclusters.
12 Stefan Gremalschi et al. References 1. Angiulli F., Pizzuti C., Gene Expression Biclustering using Random Walk Strategies. In Proceedings of the 7th International Conference on Data Warehousing and Knowledge Discovery (DaWaK 2005), Copenhagen, Denmark, 2005. 2. Baldi P. and Hatfield G.W., DNA Microarrays and Gene Expression. From Experiments to Data Analysis and Modelling, Cambridge Univ. Press, 2002. 3. Bertsimas D., Tsitsiklis J., Introduction to Linear Optimization, Athena Scientific. 4. Cheng Y., Church GM.: Biclustering of Expression Data. In Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB), (AAAI Press) 93-103, 2000. 5. Madeira S.C., Oliveira A.L., Biclustering Algorithms for Biological Data Analysis: A Survey, IEEE Transactions on Computational Biology and Bioinformatics, 1(1):24 45, 2004. 6. Papadimitriou C.H., Steiglitz K., Combinatorial optimization: algorithms and complexity,prentice-hall, Inc., Upper Saddle River, NJ, 2982 7. Prelic A., Bleuler S., Zimmermann P., Wille A., Bhlmann P., Gruissem W., Hennig L., Thiele L., Zitzle E., A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics, 22(9):1122-1129, 2006. 8. Shamir R., Lecture notes, http://www.cs.tau.ac.il/ rshamir/ge/05/scribes/lec04.pdf. 9. Tanay A., Sharan R. and Shamir R., Discovering Statistically Significant Biclusters in Gene Expression Data, Bioinformatics, 18:136-144, 2002. 10. Tavazoie S., Hughes J.D., Campbell M.J., Cho R.J., and Church G.M., Systematic determination of genetic network architecture. Nature Genetics, 22:281 285, 1999. 11. Yang J., Wang H., Wang W., and Yu P., Enhanced biclustering on gene expression data, Proceedings of the 3rd IEEE Conference on Bioinformatics and Bioengineering (BIBE), pp. 321-327, 2003. 12. Zhang Y., Zha H., Chu C.H., A time-series biclustering algorithm for revealing co-regulated genes. In Proc. Int. Symp. Information and Technology: Coding and Computing, (ITCC 2005), pp. 32-37, Las Vegas, USA, 2005. 13. Zhou J., Khokhar A.A., ParRescue: Scalable Parallel Algorithm and Implementation for Biclustering over Large Distributed Datasets, 26th IEEE International Conference on Distributed Computing Systems, (ICDCS 2006), 2006. 14. Gremalschi S. and Altun G., Mean Squared Residue Based Biclustering Algorithms, Proceedings of International Symposium on Bioinformatics Research and Applications (IS- BRA 08), Springer LNBI (Lecture Notes in Computer Science) 4983:232 243, 2008. 15. F. Divina, J. Aguilar, Ruiz Biclustering of Expression Data with Evolutionary Computation. IEEE Transactions on Knowledge and Data Engineering, pp 590-602, Vol. 18, No. 5 May 2006. 16. Yang, J., Wang, W., Wang,H. and Yu, P.S., Enhanced biclustering on expression data. In Proceedings of the 3rd IEEE Conference on Bioinformatics and Bioengineering (BIBE 2003), 321-327, 2003. 17. Prelic A., Bleuler S., Zimmermann P., Wille A., Bhlmann P., Gruissem W., Hennig L., Thiele L, and Zitzler E., A Systematic Comparison and Evaluation of Biclustering Methods for Gene Expression Data. Bioinformatics, 22(9):1122-1129, 2006. 18. Jing Xiao, Lusheng Wang, Xiaowen Liu, Tao Jiang: An Efficient Voting Algorithm for Finding Additive Biclusters with Random Background. Journal of Computational Biology 15(10): 1275-1293 (2008) 19. Xiaowen Liu, Lusheng Wang: Computing the maximum similarity bi-clusters of gene expression data. Bioinformatics 23(1): 50-56 (2007)