Mean Square Residue Biclustering with Missing Data and Row Inversions
|
|
- Magnus Neal
- 6 years ago
- Views:
Transcription
1 Mean Square Residue Biclustering with Missing Data and Row Inversions Stefan Gremalschi a, Gulsah Altun b, Irina Astrovskaya a, and Alexander Zelikovsky a a Department of Computer Science, Georgia State University, Atlanta, GA {stefan,iraa,alexz}@cs.gsu.edu b Department of Reproductive Medicine, University of California, San Diego, CA 92093, galtun@ucsd.edu Abstract. Cheng and Church proposed a greedy deletion-addition algorithm to find a given number of k biclusters, whose mean squared residues (MSRs) are below certain thresholds and the missing values in the matrix are replaced with random numbers. In our previous paper we introduced the dual biclustering method with quadratic optimization to missing data and row inversions. In this paper, we modified the dual biclustering method with quadratic optimization and added three new features. First, we introduce row status for each row in a bicluster where we add and also delete rows from biclusters based on their status in order to find min MSR. We compare our results with Cheng and Church s approach where they inverse rows while adding them to the biclusters. We select the row or the negated row not only at addition, but also at deletion and show improvement. Second, we give a prove for the theorem introduced by Cheng and Church in [4]. Since, missing data often occur in the given data matrices for biclustering, usually, missing data are filled by random numbers. However, we show that ignoring the missing data is a better approach and avoids additional noise caused by randomness. Since, an ideal bicluster is a bicluster with an H value of zero, our results show a significant decrease of H value of the biclusters with lesser noise compared to original dual biclustering and Cheng and Church method. Keywords: biclustering, Mean Square Residue 1 Introduction The gene expression data are given in matrices. In these matrices rows represent genes and columns represent experimental conditions. Each cell in the matrix represents the expression level of a gene under a specific experimental condition. It is well known that, genes can be relevant for a subset of conditions. On the other hand, groups of conditions can be clustered by using different groups of genes. In this case, it is important to do clustering in these two dimensions simultaneously. This led to the discovery of Partially supported by GSU Molecular Basis of Disease Fellowship.
2 2 Stefan Gremalschi et al. biclusters corresponding to a subset of genes and a subset of conditions with a high similarity score by Cheng and Church [4]. Biclustering algorithms perform simultaneous row-column clustering. The goal in these algorithms is to find homogeneous submatrices. Biclustering has been widely used to find appropriate subsets of experimental conditions in microarray data [1, 5, 15, 7, 9, 11 13, 18, 19]. Cheng and Church s algorithm is based on a natural uniformity model which is the mean squared residue. They proposed a greedy deletion-addition algorithm to find a given number of k biclusters, whose mean squared residues (MSRs) are below certain thresholds. However, in their method, missing values in the matrix is replaced with random numbers. It is possible that these random numbers can interfere the discovery of future biclusters, especially those ones that have overlap with the discovered ones. Yang et al. [16, 15] referred to this as random interference. They generalize the model of bicluster to incorporate missing values and propose a probabilistic algorithm. They defined a probabilistic move-based algorithm FLOC (FLexible Overlapped biclustering) that generalizes the concept of mean squared residue and based on the concept of action and gain. However, FLOC model is still not suitable for non-disjoint clusters and there are more user parameters, including the number of biclusters. These additional features can have negative impacts to the clustering process. In this paper, we propose a similar method to handle the missing data. We have first mathematically characterized general ideal biclusters, i.e., biclusters with zero mean square residue. We have shown that new way of handling missing data is significantly more tolerant to noise. We have also introduced status for each row status -1 means that the corresponding row is inverted (negated), status +1 means that the original row is not inverted. We consider the problem of finding min MSR overall possible row inversions. A limited use of row inversion (without introducing row status) has been applied in [4] when rows are added to biclusters. Based on our findings in [14], we developed a new dual biclustering algorithm and quadratic program that treats missing data accordingly and use the best status assignment. The matrix entries with missing data are not taken in account when computing averages. When comparing our method with Cheng and Church [4], we show that it is better to ignore missing data when adjusting the mean squared residue (MSR) value for finding optimal biclusters. We use a set of methods which includes a dual biclustering algorithm, quadratic program (QP) and combination of dual biclustering with QP which finds (k l)-bicluster with MSR using a greedy approach proposed in paper [14]. We use a set of methods which includes a dual biclustering algorithm, quadratic program and combination of dual biclustering with QP which finds (k l)-bicluster with MSR using a greedy approach proposed in paper [14]. Finally, we apply the best row status assignments and get even better average and median MSR overall set of all biclusters. The reminder of this paper is organized as follows. Section 2 gives the formal definition of mean squared residue. In section 3, we give a new definition for adjusting MSR and prove a necessary and sufficient criteria for a matrix to have a perfect correlation. Section 4 defines the inversion based MSR and shows how to compute it. In section 5, we introduce the dual problem formulation described in [14] and we illustrate the comparison between the new adjusted MSR with Cheng and Church s method. The search
3 MSR Biclustering with Missing Data and Row Inversions 3 of biclusters using the new MSR is given in section 6. The analysis and validation of experimental study is given in Section 7. Finally, we draw conclusions in Section 8. 2 Mean Squared Residue Mean squared residue problem has been defined before by Cheng and Church [4] and Zhou and Khokhar [13]. In this paper, we use the same terminology as in [13]. In this section, we give a brief introduction to the terminology as given in [14]. Our input is an (N M)-data matrix A, with R rows and C columns, where a cell a ij is a real value that represents the expression level of gene i(row i), under condition j(column j). Matrix A is defined by its set of rows, R = {r 1, r 2,..., r N } and its set of columns C = {c 1, c 2,..., c M }. Given a matrix, biclustering finds sub-matrices, that are subgroups of rows (genes) and subgroups of columns, where the genes exhibit highly correlated behavior for every condition. Given a data matrix A, the goal is to find a set of biclusters such that each bicluster exhibits some similar characteristic. Let A IJ = (I, J) represent a submatrix of A (I R and J C). A IJ contains only the elements aij belonging to the submatrix with set of rows I and set of columns J. A bicluster A IJ = (I, J) can be defined as a k by l sub-matrix of the data matrix where k and l are the number of rows and the number of columns in the submatrix A IJ. The concept of bicluster was introduced by [4] to find correlated subsets of genes and a subset of conditions. Let a ij denote the mean of the i-th row of the bicluster (I, J), a Ij the mean of the j-th column of (I, J), and a IJ the mean of all the elements in the bicluster. As given in [4], more formally, a ij = 1 J a Ij = 1 I a ij,i I, (1) j J a ij,j J, (2) i I a IJ = 1 I J i I,j J a ij. (3) According to [4], the residue of an element a ij in a submatrix A IJ equals r ij = a ij a ij a Ij + a IJ (4) The difference between the actual value of a ij and its expected value predicted from its row, column, and bicluster mean is given by the residue of an element. It also reveals its degree of coherence with the other entries of the bicluster it belongs to. The quality of a bicluster can be evaluated by computing the mean squared residue H, i.e. the sum of all the squared residues of its elements[4]: H(I,J) = 1 I J i I,j J (a ij a ij a Ij + a IJ ) 2 (5)
4 4 Stefan Gremalschi et al. A submatrix A IJ is called a δ bicluster if H(I,J) δ for some given threshold δ 0. In general, we can formulate biclustering problem bilaterally maximize the size (area) of the biclusters and minimize MSR. But, these two objectives above contradict each other because smaller biclusters have smaller MSR and vice versa. Therefore, there are two optimization problem formulations. Cheng and church considered the following formulation: Maximize the bicluster size (area) subject to an upper bound on MSR. 3 Adjusting MSR for missing data Missing data often occur in biological data. Common practice to deal with them is to fill gaps by random numbers. However, it adds noise and may result in biclusters of lower quality. Alternative approach is to ignore missing data, keeping only originally available information. Let A be a bicluster (I,J). We denote via J i J bicluster s columns without missing data in i-th row and via I j I rows without missing data in j-th column. Then the mean of the i-th row of the bicluster, the mean of the j-th column, and the mean of all the elements in the bicluster are reformulated as follows in equations 6, 7 and 8. a ij = 1 a ij,i I, (6) J i j J i a Ij = 1 a ij,j J, (7) I j i I j 1 a IJ = j J I j i I j,j J a ij. (8) In order to compare the approach with the Cheng-Church s approach for handling missing data, a bicluster with zero H-value were used. A bicluster with H=0 is called ideal bicluster. Theorem Let n m matrix A be a bicluster (I,J). Then, A has a zero H-value if and only if A can be represented as a sum of n-vector X and m-vector Y in the following way a ij = x i + y j,i I,j J. Proof First, we assume that A is a n m bicluster (I,J) with zero H value and try to prove that A can be represented as above-mentioned sum. Zero H value means zero residues r ij,i I,j J. Then each element of A can be calculated as follows a ij = a ij +a Ij a IJ. Denoting X = {x i = a ij ai 2 J } i I and Y = {y j = a Ij ai 2 J } j J results in A = X + Y where vector addition is defined as a ij = x i + y j. Q.E.D. In the other direction, we assume that bicluster A can be represented as a sum of n-vector X and m-vector Y and try to show that A has zero H-value. Since a ij = x i + y j,i I,j J, the mean of the i-th row is a ij = mxi+ j J yj, the mean of the j-th column is a Ij = i I xi+nyj n, and the mean of all the elements in the bicluster is a IJ = m i I xi+n j J yj nm. Obviously, the residues are equalled to zero. Indeed, m
5 r ij = x i + y j x i m bicluster A has zero H-value. MSR Biclustering with Missing Data and Row Inversions 5 j J yj i I xi i I n y j + xi j J n + yj m = 0. Thus, the Note. Theorem also covers biclusters that are product of two vectors. Indeed, applying logarithm to them produces biclusters that are represented as a sum. 4 MSR with Row Inversions In the original definition of biclusters, it is possible to invert (negate) certain rows. The row inversion corresponds to negative correlation rather than usual positive correlation of the inverted rows with other rows in the bicluster. The row inversion may result in the significant reduction of the bicluster MSR. In contrast to algorithmically handling inversions when adding rows (see [4]), we suggest to embed row inversion in the MSR definition as follows. We associate with each row its status which is equal -1 if the row is inverted and +1, otherwise. Definition. The Mean Square Residue with row inversions is minimum MSR over all possible row statuses. Finding the optimal row status assignment is not a trivial problem. Since MSR of a matrix does not change when positive linear transformations is applied, we can show that there is a single global minimum of MSR among all possible status assignments. A greedy iterative method changing status of row if the resulted MSR of the entire matrix decreases will find such minimum. Unfortunately, this greedy method is too slow to apply even once while it is better to apply it after each node deletion. Therefore, we suggest the following simple heuristic iteratively over each row find which total row square residue is lower: the original or the one with all values inverted (negated). The better choice is used as the row status. In our experiments, this heuristic always finds the optimal inversion status assignment. 5 Dual Biclustering In this section, we give a brief overview of the dual biclustering problem and our algorithm that we described in [14]. We formulate the dual biclustering problem as follows: given expression matrix A, find k l bicluster with the smallest mean squared residue H. For a set of biclusters, we have: Given: matrix A n m, set of bicluster sizes S, total overlapping V. Find: S biclusters with total overlapping at most V and total minimum sum of scores H. This algorithm implements the new computation of MSR which ignores missing data. The algorithm uses only the present data that is available. The greedy algorithm for finding a bicluster may start with the entire matrix and at each step try all single rows (columns) addition (deletion), applying the best operation if it improves the score and terminating when it reaches the bicluster size k l. The output bicluster will have the smaller MSR for the given size. Like in [4], the algorithm uses the structure of the mean
6 6 Stefan Gremalschi et al. residue score to enable faster greedy steps: for a given threshold α, at each deletion iteration all rows (columns) for which d(i) > αh(i,j) are removed. Also, the algorithm implements the addition of inverse rows to the matrix, allowing the identification of the biclusters which contains co-regulation and inverse co-regulation. Single node deletion and addition algorithms are shown in Figure 1 and Figure 2, respectively. Input: Expression matrix A on genes n, conditions m and bicluster size (k, l). Output: Bicluster A I,J with the smallest adjusted MSR. Initialize: I = n, J = m, w ( i, j) = 0, i n, j m. Iteration: 1. Calculate a ij, a Ij and H(I, J) based on adjusted MSR. If I = k, J = l output I, J. 2. For each row calculate d(i) = 1 J i j J i RS IJ(i, j) 3. For each column calculate e(j) = 1 I j i I j RS IJ(i, j) 4. Take the best row or column and remove it from I or J. Fig. 1. Single node deletion algorithm. Input: Expression matrix A and bicluster size (k, l). Output: Bicluster A I,J with I I and J J. Iteration: 1. Calculate a ij, a Ij and H(I, J) based on the adjusted MSR. 2. Add the columns with 1 I j i I j RS IJ(i, j) H(I, J) 3. Calculate a ij, a Ij and H(I, J) based on the adjusted MSR. j J i RS IJ(i, j) H(I, J) 4. Add the rows with 1 J i 5. If nothing was added or I = k, J = l, halt. Fig. 2. Single node addition algorithm. This algorithm is used as a subroutine and repeatedly applied to the matrix. We are using bicluster overlapping control (BOC) to avoid finding the same bicluster over and over again. The penalty is applied for using the cells present in biclusters found before. By using BOC, we can preserve the original data from losing information it carries because we do not mask biclusters with random numbers. The general biclustering scheme is outlined in Figure 3, where w ij is an element of weights matrix W, A is the resulting data matrix after node deletion on original matrix A; and A is the resulting matrix after node addition on A. We used the measure of bicluster overlapping, V, introduced in [14], which is the complement to ratio of number of distinct cells used in all found biclusters and the area of all biclusters.
7 MSR Biclustering with Missing Data and Row Inversions 7 Input: Expression matrix A, parameter α and a set S of bicluster sizes. Output: S biclusters in matrix A. Iteration: 1. w ( i, j) = 0, i n, j m. 2. while S not empty do 3. (k, l) = get first element from S 4. S = S {(k, l)} 5. Apply multiple node deletion on A giving (k, l). 6. Apply node addition on A giving (k, l). 7. Store A and update W. 8. end. Fig. 3. Dual biclustering algorithm. 6 MSR Minimization via Quadratic Program We have defined the Dual Biclustering as an optimization problem [6], [3] in [14]. We have also defined a quadratic program for biclustering in [14]. In this paper, we have modified our QP in [14] where we reformulated the objective and constraints in order to handle missing data. We define the dual biclustering formulation as an optimization problem [14]: for a given matrix A n m, find the bicluster with bounded size (area) k l with minimal mean squared residue. It can be easily seen that if MSR has to be defined as QP objective, it will be of a cubic form. Since QP s objective can be contain only squared variables, the following constraint needs to be satisfied: define QP objective in such a way that only quadratic variables are present. To meet this requirement, we simulated variable multiplication by addition as described in [14]. 6.1 Integer Quadratic Program For a given normalized matrix A n m and bicluster size k l, the Integer Quadratic Program is defined as follows: Objective 1 Minimize : I J i n,j m (residue ij) 2 Subject to I = k J = l residue ij = a ij x ij a ij x ij a Ij x ij + a IJ x ij a ij = 1 J j m a ij, a Ij = 1 I x ij row i + column j 1 x ij row i i n a ij and a IJ = 1 I J i n, j m a ij
8 8 Stefan Gremalschi et al. x ij column j i n row i = k j m column j = l x ij, row i, column j {0, 1} End The QP is used as a subroutine and repeatedly applied to the matrix. For each bicluster size, we generate a separate QP. In order to avoid finding the same bicluster over and over again, the discovered bicluster is masked by replacing the values of its submatrix with random values. Row inversion is simulated by adding to the input matrix A its inversed rows. The resulting matrix will have twice more rows. Missing data is handled in the following way: if an element of the matrix contains a missing value, then it does not participate in computation of mean squared residue H. In this case, the row mean A ij will be equal to the sum of all cells in row i that are not marked as missing values and divided by their number. Similar for column mean A Ij and bicluster average A IJ. Since the integer QP is too slow and its not scalable enough, we used the greedy rounding and random interval rounding methods proposed in [14]. 6.2 Combining Dual Biclustering with Rounded QP Input: Expression matrix A, parameters α, ratio k, ratio l and a set of bicluster sizes S. Output: S biclusters in matrix A. 1. while S not empty do 2. (k, l) = get first element from S 3. S = S {(k, l)} 4. k = k ratio k 5. l = l ratio l 6. Apply multiple node deletion on A giving (k, l ). 7. Apply node addition on A giving (k, l ). 8. Update W. 9. Run QP on A giving (k, l ). 10. Round Fractional Relaxation and store A. 11. end. Fig. 4. Combined Adjusted Dual Biclustering with Rounded QP algorithm. In this section, we combined the adjusted dual biclustering with modified rounded QP algorithm. Here, our goal is to reduce the instance size to speed up the QP. First, we apply adjusted dual biclustering algorithm to input matrix A to reduce the instance size where the new size is specified by two parameters: ratio k and ratio l. Then, we run rounded QP on the output obtained from Dual Biclustering algorithm. This combination improves the running time of the QP and increases the quality of the final bicluster since
9 MSR Biclustering with Missing Data and Row Inversions 9 an optimization method is applied. The general algorithm scheme is outlined in Figure 4, where W is the weights matrix, A is the resulting matrix after node deletion and A is the resulting matrix after node addition. 7 Experimental Results In this section, we analyze results obtained from Dual Biclustering with adjusted MSR for missing data. We describe comparison criteria, define the swap rule model and analyze the p value of the biclusters. We tested our biclustering algorithms on data from [10] and compared our results with Cheng and Church [4]. For a fair comparison, we used bicluster sizes published by [4]. A systematic comparison and evaluation of biclustering methods for gene expression data is given in [17]. However, their model uses biologically relevant information, whereas our model is more generic and based on statistical approach. Therefore, we haven t used their comparison results in this paper. 7.1 Evaluation of the adjusted MSR To measure robustness of the proposed MSR to noise and evaluate quality of the obtained biclusters, the experiments were run on the imputed data. Let A be a (I,J) bicluster with zero H-value and variation of real data σ 2. Corresponding imputed bicluster A p is defined as follows in the following equation. a p ij = a ij + ε ij (9) where p is a percentage of added noise, {ε ij } i I,j J N(0, 7.2 The goal of our experiments p 100 σ2 ). The goal of our experiments is to find percentage of noise data such that algorithm is still able to distinguish bicluster of size k from non-biclusters in the imputed data. Although, one can determine such percentage in respect to submatrices of the bicluster, the probability of having distinguishable submatrix when bicluster can not be already distinguished from non-bicluster tends which becomes zero due to uniformly distributed imputation of error. 7.3 Experimental results Figure 5 compares Cheng and Church, dual biclustering, dual biclustering coupled with QP, adjusted dual biclustering, adjusted dual biclustering coupled with QP and adjusted dual biclustering with row inversion. Average MSR for adjusted dual and QP represents 68 percent (average) and 48 percent (median) of the data published in [4]. These results show that ignoring missing data for the dual algorithm gives much smaller MSR. The effect of noise on the MSR computation using synthesized data can be seen in Figure 6.
10 10 Stefan Gremalschi et al. Algorithms Cheng and Church* Cheng and Church** Dual Dual and QP Adjusted Dual Adjusted Dual and QP Adjusted Dual with inverted rows OC parameter n/a n/a Covering Average MSR (%) Median MSR (%) Fig. 5. Comparison of biclustering methods Noise vs. MSR MSR % 3% 5% 10% 20% 30% Noise (%) 0% Missing Data 5% Missing Data 10% Missing Data 15% Missing Data Fig. 6. MSR computation for synthesized data Figure 7 shows the noise effect on adjusted MSR computation vs. random filled missing data. It is easy to see that adjusted MSR is less affected by noise than random filled missing data. Figure 8 shows how noise affects adjusted MSR random filled missing data for different levels of noise. We measure the statistical significance of biclusters obtained by our algorithms using p value. P value is computed by running Dual Problem algorithm on 100 random generated input data sets. The random data is obtained from matrix A by randomly selecting two cells in the matrix (a ij,d kl ) and taking their diagonal elements (b kj,c il ). If a ij > b kj and c il < d kl, algorithm swaps a ij with c il and b kj with d kl, it is called a hit. If not, two elements a ij and d kl are randomly chosen again. The matrix is considered randomized if there are nm 2 hits. In our case, p value is smaller than 0.001, which indicates that the results are not random and are statistically significant. 8 Conclusions Random numbers can interfere with the discovery of future biclusters, especially those ones that have overlap with the discovered ones. In this paper, we introduce a new
11 MSR Biclustering with Missing Data and Row Inversions 11 Missing Data vs. Random Missing Data % Missing Data 10% Random Missing Data % 3% 5% 10% 20% 30% Noise (%) 50% 60% 70% 10% Random Missing Data 10% Missing Data MSR Fig. 7. Adjusted MSR vs. random filled missing data Missing Data vs. Random Missing Data 0% Missing Data 10% Missing Data 10% Random Data % 3% 5% 10% 20% Noise (%) 10% Random Data 10% Missing Data 30% 0% Missing Data MSR 50% 60% 70% Fig. 8. MSR random filled missing data for different levels of noise. approach to handle the missing data which does not take in account entries with missing data. We have characterized ideal biclusters, i.e., biclusters with zero mean square residue and shown that this approach is significantly more stable with respect to increasing noise. Several biclustering methods have been modified accordingly. Our experimental results show a significant decrease of H value of the biclusters when comparing with counterparts with noise reduction (e.g., the original Cheng and Church [4] method). Average MSR for adjusted dual and QP represents 68 percent (average) and 48 percent (median) of the data published in [4]. These results showed that ignoring missing data for the dual algorithm gives much smaller MSR. We also define MSR based on the best row inversion status. We give an efficient heuristic for finding such assignment. This new definition allow to further reduced MSR for a found set of biclusters.
12 12 Stefan Gremalschi et al. References 1. Angiulli F., Pizzuti C., Gene Expression Biclustering using Random Walk Strategies. In Proceedings of the 7th International Conference on Data Warehousing and Knowledge Discovery (DaWaK 2005), Copenhagen, Denmark, Baldi P. and Hatfield G.W., DNA Microarrays and Gene Expression. From Experiments to Data Analysis and Modelling, Cambridge Univ. Press, Bertsimas D., Tsitsiklis J., Introduction to Linear Optimization, Athena Scientific. 4. Cheng Y., Church GM.: Biclustering of Expression Data. In Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB), (AAAI Press) , Madeira S.C., Oliveira A.L., Biclustering Algorithms for Biological Data Analysis: A Survey, IEEE Transactions on Computational Biology and Bioinformatics, 1(1):24 45, Papadimitriou C.H., Steiglitz K., Combinatorial optimization: algorithms and complexity,prentice-hall, Inc., Upper Saddle River, NJ, Prelic A., Bleuler S., Zimmermann P., Wille A., Bhlmann P., Gruissem W., Hennig L., Thiele L., Zitzle E., A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics, 22(9): , Shamir R., Lecture notes, rshamir/ge/05/scribes/lec04.pdf. 9. Tanay A., Sharan R. and Shamir R., Discovering Statistically Significant Biclusters in Gene Expression Data, Bioinformatics, 18: , Tavazoie S., Hughes J.D., Campbell M.J., Cho R.J., and Church G.M., Systematic determination of genetic network architecture. Nature Genetics, 22: , Yang J., Wang H., Wang W., and Yu P., Enhanced biclustering on gene expression data, Proceedings of the 3rd IEEE Conference on Bioinformatics and Bioengineering (BIBE), pp , Zhang Y., Zha H., Chu C.H., A time-series biclustering algorithm for revealing co-regulated genes. In Proc. Int. Symp. Information and Technology: Coding and Computing, (ITCC 2005), pp , Las Vegas, USA, Zhou J., Khokhar A.A., ParRescue: Scalable Parallel Algorithm and Implementation for Biclustering over Large Distributed Datasets, 26th IEEE International Conference on Distributed Computing Systems, (ICDCS 2006), Gremalschi S. and Altun G., Mean Squared Residue Based Biclustering Algorithms, Proceedings of International Symposium on Bioinformatics Research and Applications (IS- BRA 08), Springer LNBI (Lecture Notes in Computer Science) 4983: , F. Divina, J. Aguilar, Ruiz Biclustering of Expression Data with Evolutionary Computation. IEEE Transactions on Knowledge and Data Engineering, pp , Vol. 18, No. 5 May Yang, J., Wang, W., Wang,H. and Yu, P.S., Enhanced biclustering on expression data. In Proceedings of the 3rd IEEE Conference on Bioinformatics and Bioengineering (BIBE 2003), , Prelic A., Bleuler S., Zimmermann P., Wille A., Bhlmann P., Gruissem W., Hennig L., Thiele L, and Zitzler E., A Systematic Comparison and Evaluation of Biclustering Methods for Gene Expression Data. Bioinformatics, 22(9): , Jing Xiao, Lusheng Wang, Xiaowen Liu, Tao Jiang: An Efficient Voting Algorithm for Finding Additive Biclusters with Random Background. Journal of Computational Biology 15(10): (2008) 19. Xiaowen Liu, Lusheng Wang: Computing the maximum similarity bi-clusters of gene expression data. Bioinformatics 23(1): (2007)
e-ccc-biclustering: Related work on biclustering algorithms for time series gene expression data
: Related work on biclustering algorithms for time series gene expression data Sara C. Madeira 1,2,3, Arlindo L. Oliveira 1,2 1 Knowledge Discovery and Bioinformatics (KDBIO) group, INESC-ID, Lisbon, Portugal
More informationBiclustering with δ-pcluster John Tantalo. 1. Introduction
Biclustering with δ-pcluster John Tantalo 1. Introduction The subject of biclustering is chiefly concerned with locating submatrices of gene expression data that exhibit shared trends between genes. That
More informationBiclustering Bioinformatics Data Sets. A Possibilistic Approach
Possibilistic algorithm Bioinformatics Data Sets: A Possibilistic Approach Dept Computer and Information Sciences, University of Genova ITALY EMFCSC Erice 20/4/2007 Bioinformatics Data Sets Outline Introduction
More informationBiclustering Algorithms for Gene Expression Analysis
Biclustering Algorithms for Gene Expression Analysis T. M. Murali August 19, 2008 Problems with Hierarchical Clustering It is a global clustering algorithm. Considers all genes to be equally important
More informationBiclustering for Microarray Data: A Short and Comprehensive Tutorial
Biclustering for Microarray Data: A Short and Comprehensive Tutorial 1 Arabinda Panda, 2 Satchidananda Dehuri 1 Department of Computer Science, Modern Engineering & Management Studies, Balasore 2 Department
More informationDNA chips and other techniques measure the expression level of a large number of genes, perhaps all
INESC-ID TECHNICAL REPORT 1/2004, JANUARY 2004 1 Biclustering Algorithms for Biological Data Analysis: A Survey* Sara C. Madeira and Arlindo L. Oliveira Abstract A large number of clustering approaches
More informationBIMAX. Lecture 11: December 31, Introduction Model An Incremental Algorithm.
Analysis of DNA Chips and Gene Networks Spring Semester, 2009 Lecturer: Ron Shamir BIMAX 11.1 Introduction. Lecture 11: December 31, 2009 Scribe: Boris Kostenko In the course we have already seen different
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 15: Microarray clustering http://compbio.pbworks.com/f/wood2.gif Some slides were adapted from Dr. Shaojie Zhang (University of Central Florida) Microarray
More informationMining Deterministic Biclusters in Gene Expression Data
Mining Deterministic Biclusters in Gene Expression Data Zonghong Zhang 1 Alvin Teo 1 BengChinOoi 1,2 Kian-Lee Tan 1,2 1 Department of Computer Science National University of Singapore 2 Singapore-MIT-Alliance
More informationCS Introduction to Data Mining Instructor: Abdullah Mueen
CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 8: ADVANCED CLUSTERING (FUZZY AND CO -CLUSTERING) Review: Basic Cluster Analysis Methods (Chap. 10) Cluster Analysis: Basic Concepts
More informationDeposited on: 21 March 2012
Filippone, M., Masulli, F., Rovetta, S., Mitra, S., and Banka, H. (2006) Possibilistic approach to biclustering: an application to oligonucleotide microarray data analysis. Lecture Notes in Computer Science,
More informationOptimal Web Page Category for Web Personalization Using Biclustering Approach
Optimal Web Page Category for Web Personalization Using Biclustering Approach P. S. Raja Department of Computer Science, Periyar University, Salem, Tamil Nadu 636011, India. psraja5@gmail.com Abstract
More informationOrder Preserving Clustering by Finding Frequent Orders in Gene Expression Data
Order Preserving Clustering by Finding Frequent Orders in Gene Expression Data Li Teng and Laiwan Chan Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong Abstract.
More informationA Memetic Heuristic for the Co-clustering Problem
A Memetic Heuristic for the Co-clustering Problem Mohammad Khoshneshin 1, Mahtab Ghazizadeh 2, W. Nick Street 1, and Jeffrey W. Ohlmann 1 1 The University of Iowa, Iowa City IA 52242, USA {mohammad-khoshneshin,nick-street,jeffrey-ohlmann}@uiowa.edu
More informationA Web Page Recommendation system using GA based biclustering of web usage data
A Web Page Recommendation system using GA based biclustering of web usage data Raval Pratiksha M. 1, Mehul Barot 2 1 Computer Engineering, LDRP-ITR,Gandhinagar,cepratiksha.2011@gmail.com 2 Computer Engineering,
More informationExploratory data analysis for microarrays
Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA
More informationTriclustering in Gene Expression Data Analysis: A Selected Survey
Triclustering in Gene Expression Data Analysis: A Selected Survey P. Mahanta, H. A. Ahmed Dept of Comp Sc and Engg Tezpur University Napaam -784028, India Email: priyakshi@tezu.ernet.in, hasin@tezu.ernet.in
More information9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology
9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example
More information2. Background. 2.1 Clustering
2. Background 2.1 Clustering Clustering involves the unsupervised classification of data items into different groups or clusters. Unsupervised classificaiton is basically a learning task in which learning
More informationMathematical and Algorithmic Foundations Linear Programming and Matchings
Adavnced Algorithms Lectures Mathematical and Algorithmic Foundations Linear Programming and Matchings Paul G. Spirakis Department of Computer Science University of Patras and Liverpool Paul G. Spirakis
More informationDNA chips and other techniques measure the expression
24 IEEE TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 1, NO. 1, JANUARY-MARCH 2004 Biclustering Algorithms for Biological Data Analysis: A Survey Sara C. Madeira and Arlindo L. Oliveira
More information3 No-Wait Job Shops with Variable Processing Times
3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select
More informationThe Dynamic Hungarian Algorithm for the Assignment Problem with Changing Costs
The Dynamic Hungarian Algorithm for the Assignment Problem with Changing Costs G. Ayorkor Mills-Tettey Anthony Stentz M. Bernardine Dias CMU-RI-TR-07-7 July 007 Robotics Institute Carnegie Mellon University
More informationStatistical Methods and Optimization in Data Mining
Statistical Methods and Optimization in Data Mining Eloísa Macedo 1, Adelaide Freitas 2 1 University of Aveiro, Aveiro, Portugal; macedo@ua.pt 2 University of Aveiro, Aveiro, Portugal; adelaide@ua.pt The
More informationONE TIME ENUMERATION OF MAXIMAL BICLIQUE PATTERNS FROM 3D SYMMETRIC MATRIX
ONE TIME ENUMERATION OF MAXIMAL BICLIQUE PATTERNS FROM 3D SYMMETRIC MATRIX 1 M DOMINIC SAVIO, 2 A SANKAR, 3 R V NATARAJ 1 Department of Applied Mathematics and Computational Sciences, 2 Department of Computer
More informationOn the Approximability of Modularity Clustering
On the Approximability of Modularity Clustering Newman s Community Finding Approach for Social Nets Bhaskar DasGupta Department of Computer Science University of Illinois at Chicago Chicago, IL 60607,
More informationRandomized rounding of semidefinite programs and primal-dual method for integer linear programming. Reza Moosavi Dr. Saeedeh Parsaeefard Dec.
Randomized rounding of semidefinite programs and primal-dual method for integer linear programming Dr. Saeedeh Parsaeefard 1 2 3 4 Semidefinite Programming () 1 Integer Programming integer programming
More informationComputational Genomics and Molecular Biology, Fall
Computational Genomics and Molecular Biology, Fall 2015 1 Sequence Alignment Dannie Durand Pairwise Sequence Alignment The goal of pairwise sequence alignment is to establish a correspondence between the
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationOPEN MP-BASED PARALLEL AND SCALABLE GENETIC SEQUENCE ALIGNMENT
OPEN MP-BASED PARALLEL AND SCALABLE GENETIC SEQUENCE ALIGNMENT Asif Ali Khan*, Laiq Hassan*, Salim Ullah* ABSTRACT: In bioinformatics, sequence alignment is a common and insistent task. Biologists align
More informationIterative Signature Algorithm for the Analysis of Large-Scale Gene Expression Data. By S. Bergmann, J. Ihmels, N. Barkai
Iterative Signature Algorithm for the Analysis of Large-Scale Gene Expression Data By S. Bergmann, J. Ihmels, N. Barkai Reasoning Both clustering and Singular Value Decomposition(SVD) are useful tools
More information6 Randomized rounding of semidefinite programs
6 Randomized rounding of semidefinite programs We now turn to a new tool which gives substantially improved performance guarantees for some problems We now show how nonlinear programming relaxations can
More information1. Lecture notes on bipartite matching
Massachusetts Institute of Technology 18.453: Combinatorial Optimization Michel X. Goemans February 5, 2017 1. Lecture notes on bipartite matching Matching problems are among the fundamental problems in
More informationPlaid models, biclustering, clustering on subsets of attributes, feature selection in clustering, et al.
Plaid models, biclustering, clustering on subsets of attributes, feature selection in clustering, et al. Ramón Díaz-Uriarte rdiaz@cnio.es http://bioinfo.cnio.es/ rdiaz Unidad de Bioinformática Centro Nacional
More informationUse of biclustering for missing value imputation in gene expression data
ORIGINAL RESEARCH Use of biclustering for missing value imputation in gene expression data K.O. Cheng, N.F. Law, W.C. Siu Department of Electronic and Information Engineering, The Hong Kong Polytechnic
More informationFast and Simple Algorithms for Weighted Perfect Matching
Fast and Simple Algorithms for Weighted Perfect Matching Mirjam Wattenhofer, Roger Wattenhofer {mirjam.wattenhofer,wattenhofer}@inf.ethz.ch, Department of Computer Science, ETH Zurich, Switzerland Abstract
More informationSet Cover with Almost Consecutive Ones Property
Set Cover with Almost Consecutive Ones Property 2004; Mecke, Wagner Entry author: Michael Dom INDEX TERMS: Covering Set problem, data reduction rules, enumerative algorithm. SYNONYMS: Hitting Set PROBLEM
More informationGene expression & Clustering (Chapter 10)
Gene expression & Clustering (Chapter 10) Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species Dynamic programming Approximate pattern matching
More informationHARNESSING CERTAINTY TO SPEED TASK-ALLOCATION ALGORITHMS FOR MULTI-ROBOT SYSTEMS
HARNESSING CERTAINTY TO SPEED TASK-ALLOCATION ALGORITHMS FOR MULTI-ROBOT SYSTEMS An Undergraduate Research Scholars Thesis by DENISE IRVIN Submitted to the Undergraduate Research Scholars program at Texas
More informationI How does the formulation (5) serve the purpose of the composite parameterization
Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)
More informationSmall Survey on Perfect Graphs
Small Survey on Perfect Graphs Michele Alberti ENS Lyon December 8, 2010 Abstract This is a small survey on the exciting world of Perfect Graphs. We will see when a graph is perfect and which are families
More informationMicroarray data analysis
Microarray data analysis Computational Biology IST Technical University of Lisbon Ana Teresa Freitas 016/017 Microarrays Rows represent genes Columns represent samples Many problems may be solved using
More informationα Coverage to Extend Network Lifetime on Wireless Sensor Networks
Noname manuscript No. (will be inserted by the editor) α Coverage to Extend Network Lifetime on Wireless Sensor Networks Monica Gentili Andrea Raiconi Received: date / Accepted: date Abstract An important
More informationThe Generalized Topological Overlap Matrix in Biological Network Analysis
The Generalized Topological Overlap Matrix in Biological Network Analysis Andy Yip, Steve Horvath Email: shorvath@mednet.ucla.edu Depts Human Genetics and Biostatistics, University of California, Los Angeles
More informationOn Mining Micro-array data by Order-Preserving Submatrix
On Mining Micro-array data by Order-Preserving Submatrix Lin Cheung Kevin Y. Yip David W. Cheung Ben Kao Michael K. Ng Department of Computer Science, The University of Hong Kong, Hong Kong. {lcheung,
More informationAC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery
: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Hong Cheng Philip S. Yu Jiawei Han University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center {hcheng3, hanj}@cs.uiuc.edu,
More informationCS 5540 Spring 2013 Assignment 3, v1.0 Due: Apr. 24th 11:59PM
1 Introduction In this programming project, we are going to do a simple image segmentation task. Given a grayscale image with a bright object against a dark background and we are going to do a binary decision
More informationCSC Linear Programming and Combinatorial Optimization Lecture 12: Semidefinite Programming(SDP) Relaxation
CSC411 - Linear Programming and Combinatorial Optimization Lecture 1: Semidefinite Programming(SDP) Relaxation Notes taken by Xinwei Gui May 1, 007 Summary: This lecture introduces the semidefinite programming(sdp)
More informationEstimating Error-Dimensionality Relationship for Gene Expression Based Cancer Classification
1 Estimating Error-Dimensionality Relationship for Gene Expression Based Cancer Classification Feng Chu and Lipo Wang School of Electrical and Electronic Engineering Nanyang Technological niversity Singapore
More informationCorrespondence Clustering: An Approach to Cluster Multiple Related Spatial Datasets
Correspondence Clustering: An Approach to Cluster Multiple Related Spatial Datasets Vadeerat Rinsurongkawong, and Christoph F. Eick Department of Computer Science, University of Houston, Houston, TX 77204-3010
More informationLECTURES 3 and 4: Flows and Matchings
LECTURES 3 and 4: Flows and Matchings 1 Max Flow MAX FLOW (SP). Instance: Directed graph N = (V,A), two nodes s,t V, and capacities on the arcs c : A R +. A flow is a set of numbers on the arcs such that
More informationParallel Evaluation of Hopfield Neural Networks
Parallel Evaluation of Hopfield Neural Networks Antoine Eiche, Daniel Chillet, Sebastien Pillement and Olivier Sentieys University of Rennes I / IRISA / INRIA 6 rue de Kerampont, BP 818 2232 LANNION,FRANCE
More informationReflexive Regular Equivalence for Bipartite Data
Reflexive Regular Equivalence for Bipartite Data Aaron Gerow 1, Mingyang Zhou 2, Stan Matwin 1, and Feng Shi 3 1 Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada 2 Department of Computer
More informationClustering Techniques
Clustering Techniques Bioinformatics: Issues and Algorithms CSE 308-408 Fall 2007 Lecture 16 Lopresti Fall 2007 Lecture 16-1 - Administrative notes Your final project / paper proposal is due on Friday,
More informationTopic: Local Search: Max-Cut, Facility Location Date: 2/13/2007
CS880: Approximations Algorithms Scribe: Chi Man Liu Lecturer: Shuchi Chawla Topic: Local Search: Max-Cut, Facility Location Date: 2/3/2007 In previous lectures we saw how dynamic programming could be
More informationNotes for Lecture 24
U.C. Berkeley CS170: Intro to CS Theory Handout N24 Professor Luca Trevisan December 4, 2001 Notes for Lecture 24 1 Some NP-complete Numerical Problems 1.1 Subset Sum The Subset Sum problem is defined
More informationApproximation Algorithms for Wavelength Assignment
Approximation Algorithms for Wavelength Assignment Vijay Kumar Atri Rudra Abstract Winkler and Zhang introduced the FIBER MINIMIZATION problem in [3]. They showed that the problem is NP-complete but left
More informationEnhancing Clustering Results In Hierarchical Approach By Mvs Measures
International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 10, Issue 6 (June 2014), PP.25-30 Enhancing Clustering Results In Hierarchical Approach
More informationA new predictive image compression scheme using histogram analysis and pattern matching
University of Wollongong Research Online University of Wollongong in Dubai - Papers University of Wollongong in Dubai 00 A new predictive image compression scheme using histogram analysis and pattern matching
More informationNeural Network Weight Selection Using Genetic Algorithms
Neural Network Weight Selection Using Genetic Algorithms David Montana presented by: Carl Fink, Hongyi Chen, Jack Cheng, Xinglong Li, Bruce Lin, Chongjie Zhang April 12, 2005 1 Neural Networks Neural networks
More information2. Department of Electronic Engineering and Computer Science, Case Western Reserve University
Chapter MINING HIGH-DIMENSIONAL DATA Wei Wang 1 and Jiong Yang 2 1. Department of Computer Science, University of North Carolina at Chapel Hill 2. Department of Electronic Engineering and Computer Science,
More informationThe Ordered Covering Problem
The Ordered Covering Problem Uriel Feige Yael Hitron November 8, 2016 Abstract We introduce the Ordered Covering (OC) problem. The input is a finite set of n elements X, a color function c : X {0, 1} and
More informationAlgorithms for Bioinformatics
Adapted from slides by Leena Salmena and Veli Mäkinen, which are partly from http: //bix.ucsd.edu/bioalgorithms/slides.php. 582670 Algorithms for Bioinformatics Lecture 6: Distance based clustering and
More informationFoundations of Computing
Foundations of Computing Darmstadt University of Technology Dept. Computer Science Winter Term 2005 / 2006 Copyright c 2004 by Matthias Müller-Hannemann and Karsten Weihe All rights reserved http://www.algo.informatik.tu-darmstadt.de/
More informationSPARSE COMPONENT ANALYSIS FOR BLIND SOURCE SEPARATION WITH LESS SENSORS THAN SOURCES. Yuanqing Li, Andrzej Cichocki and Shun-ichi Amari
SPARSE COMPONENT ANALYSIS FOR BLIND SOURCE SEPARATION WITH LESS SENSORS THAN SOURCES Yuanqing Li, Andrzej Cichocki and Shun-ichi Amari Laboratory for Advanced Brain Signal Processing Laboratory for Mathematical
More informationModels of distributed computing: port numbering and local algorithms
Models of distributed computing: port numbering and local algorithms Jukka Suomela Adaptive Computing Group Helsinki Institute for Information Technology HIIT University of Helsinki FMT seminar, 26 February
More informationClustering: Classic Methods and Modern Views
Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering
More informationOptimizing multiple spaced seeds for homology search
Optimizing multiple spaced seeds for homology search Jinbo Xu Daniel Brown Ming Li Bin Ma Abstract Optimized spaced seeds improve sensitivity and specificity in local homology search. Several authors have
More informationLecture 11: Maximum flow and minimum cut
Optimisation Part IB - Easter 2018 Lecture 11: Maximum flow and minimum cut Lecturer: Quentin Berthet 4.4. The maximum flow problem. We consider in this lecture a particular kind of flow problem, with
More informationOn the Min-Max 2-Cluster Editing Problem
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 29, 1109-1120 (2013) On the Min-Max 2-Cluster Editing Problem LI-HSUAN CHEN 1, MAW-SHANG CHANG 2, CHUN-CHIEH WANG 1 AND BANG YE WU 1,* 1 Department of Computer
More informationLecture 9: Pipage Rounding Method
Recent Advances in Approximation Algorithms Spring 2015 Lecture 9: Pipage Rounding Method Lecturer: Shayan Oveis Gharan April 27th Disclaimer: These notes have not been subjected to the usual scrutiny
More informationFeature Selection Using Modified-MCA Based Scoring Metric for Classification
2011 International Conference on Information Communication and Management IPCSIT vol.16 (2011) (2011) IACSIT Press, Singapore Feature Selection Using Modified-MCA Based Scoring Metric for Classification
More informationStorage Coding for Wear Leveling in Flash Memories
Storage Coding for Wear Leveling in Flash Memories Anxiao (Andrew) Jiang Robert Mateescu Eitan Yaakobi Jehoshua Bruck Paul H Siegel Alexander Vardy Jack K Wolf Department of Computer Science California
More information[Ch 6] Set Theory. 1. Basic Concepts and Definitions. 400 lecture note #4. 1) Basics
400 lecture note #4 [Ch 6] Set Theory 1. Basic Concepts and Definitions 1) Basics Element: ; A is a set consisting of elements x which is in a/another set S such that P(x) is true. Empty set: notated {
More informationSurrogate Gradient Algorithm for Lagrangian Relaxation 1,2
Surrogate Gradient Algorithm for Lagrangian Relaxation 1,2 X. Zhao 3, P. B. Luh 4, and J. Wang 5 Communicated by W.B. Gong and D. D. Yao 1 This paper is dedicated to Professor Yu-Chi Ho for his 65th birthday.
More informationA Parallel Evolutionary Algorithm for Discovery of Decision Rules
A Parallel Evolutionary Algorithm for Discovery of Decision Rules Wojciech Kwedlo Faculty of Computer Science Technical University of Bia lystok Wiejska 45a, 15-351 Bia lystok, Poland wkwedlo@ii.pb.bialystok.pl
More information2 ATTILA FAZEKAS The tracking model of the robot car The schematic picture of the robot car can be seen on Fig.1. Figure 1. The main controlling task
NEW OPTICAL TRACKING METHODS FOR ROBOT CARS Attila Fazekas Debrecen Abstract. In this paper new methods are proposed for intelligent optical tracking of robot cars the important tools of CIM (Computer
More informationGraph Adjacency Matrix Automata Joshua Abbott, Phyllis Z. Chinn, Tyler Evans, Allen J. Stewart Humboldt State University, Arcata, California
Graph Adjacency Matrix Automata Joshua Abbott, Phyllis Z. Chinn, Tyler Evans, Allen J. Stewart Humboldt State University, Arcata, California Abstract We define a graph adjacency matrix automaton (GAMA)
More information10701 Machine Learning. Clustering
171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 04: Variations of sequence alignments http://www.pitt.edu/~mcs2/teaching/biocomp/tutorials/global.html Slides adapted from Dr. Shaojie Zhang (University
More informationVisual Representations for Machine Learning
Visual Representations for Machine Learning Spectral Clustering and Channel Representations Lecture 1 Spectral Clustering: introduction and confusion Michael Felsberg Klas Nordberg The Spectral Clustering
More informationToward the joint design of electronic and optical layer protection
Toward the joint design of electronic and optical layer protection Massachusetts Institute of Technology Slide 1 Slide 2 CHALLENGES: - SEAMLESS CONNECTIVITY - MULTI-MEDIA (FIBER,SATCOM,WIRELESS) - HETEROGENEOUS
More informationGEMINI GEneric Multimedia INdexIng
GEMINI GEneric Multimedia INdexIng GEneric Multimedia INdexIng distance measure Sub-pattern Match quick and dirty test Lower bounding lemma 1-D Time Sequences Color histograms Color auto-correlogram Shapes
More informationSubset sum problem and dynamic programming
Lecture Notes: Dynamic programming We will discuss the subset sum problem (introduced last time), and introduce the main idea of dynamic programming. We illustrate it further using a variant of the so-called
More informationData Mining Technologies for Bioinformatics Sequences
Data Mining Technologies for Bioinformatics Sequences Deepak Garg Computer Science and Engineering Department Thapar Institute of Engineering & Tecnology, Patiala Abstract Main tool used for sequence alignment
More informationEffective probabilistic stopping rules for randomized metaheuristics: GRASP implementations
Effective probabilistic stopping rules for randomized metaheuristics: GRASP implementations Celso C. Ribeiro Isabel Rosseti Reinaldo C. Souza Universidade Federal Fluminense, Brazil July 2012 1/45 Contents
More informationThe Probabilistic Method
The Probabilistic Method Po-Shen Loh June 2010 1 Warm-up 1. (Russia 1996/4 In the Duma there are 1600 delegates, who have formed 16000 committees of 80 persons each. Prove that one can find two committees
More informationApproximation Algorithms
Approximation Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours
More informationDistance-based Methods: Drawbacks
Distance-based Methods: Drawbacks Hard to find clusters with irregular shapes Hard to specify the number of clusters Heuristic: a cluster must be dense Jian Pei: CMPT 459/741 Clustering (3) 1 How to Find
More informationOn Demand Phenotype Ranking through Subspace Clustering
On Demand Phenotype Ranking through Subspace Clustering Xiang Zhang, Wei Wang Department of Computer Science University of North Carolina at Chapel Hill Chapel Hill, NC 27599, USA {xiang, weiwang}@cs.unc.edu
More informationMICROARRAY IMAGE SEGMENTATION USING CLUSTERING METHODS
Mathematical and Computational Applications, Vol. 5, No. 2, pp. 240-247, 200. Association for Scientific Research MICROARRAY IMAGE SEGMENTATION USING CLUSTERING METHODS Volkan Uslan and Đhsan Ömür Bucak
More informationModule 7. Independent sets, coverings. and matchings. Contents
Module 7 Independent sets, coverings Contents and matchings 7.1 Introduction.......................... 152 7.2 Independent sets and coverings: basic equations..... 152 7.3 Matchings in bipartite graphs................
More informationThe Encoding Complexity of Network Coding
The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network
More informationApproximation Algorithms: The Primal-Dual Method. My T. Thai
Approximation Algorithms: The Primal-Dual Method My T. Thai 1 Overview of the Primal-Dual Method Consider the following primal program, called P: min st n c j x j j=1 n a ij x j b i j=1 x j 0 Then the
More informationApproximability Results for the p-center Problem
Approximability Results for the p-center Problem Stefan Buettcher Course Project Algorithm Design and Analysis Prof. Timothy Chan University of Waterloo, Spring 2004 The p-center
More informationClustering Jacques van Helden
Statistical Analysis of Microarray Data Clustering Jacques van Helden Jacques.van.Helden@ulb.ac.be Contents Data sets Distance and similarity metrics K-means clustering Hierarchical clustering Evaluation
More informationOptimal Detector Locations for OD Matrix Estimation
Optimal Detector Locations for OD Matrix Estimation Ying Liu 1, Xiaorong Lai, Gang-len Chang 3 Abstract This paper has investigated critical issues associated with Optimal Detector Locations for OD matrix
More informationClustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search
Informal goal Clustering Given set of objects and measure of similarity between them, group similar objects together What mean by similar? What is good grouping? Computation time / quality tradeoff 1 2
More informationCHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION
CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant
More informationAn Efficient Algorithm for Computing Non-overlapping Inversion and Transposition Distance
An Efficient Algorithm for Computing Non-overlapping Inversion and Transposition Distance Toan Thang Ta, Cheng-Yao Lin and Chin Lung Lu Department of Computer Science National Tsing Hua University, Hsinchu
More information